Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16329#discussion_r93606051
--- Diff:
examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedTypedAggregation.scala
---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/16024
[MINOR][DOCS] Updates to the Accumulator example in the programming guide.
Fixed typos, AccumulatorV2 in Java
## What changes were proposed in this pull request?
This pull request
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16024#discussion_r89788527
--- Diff: docs/programming-guide.md ---
@@ -1424,29 +1431,38 @@ accum.value();
// returns 10
{% endhighlight %}
-Programmers can also
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16024#discussion_r89791975
--- Diff: docs/programming-guide.md ---
@@ -1378,29 +1378,36 @@ res2: Long = 10
While this code used the built-in support for accumulators of
Github user aokolnychyi closed the pull request at:
https://github.com/apache/spark/pull/14050
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16329#discussion_r97589440
--- Diff:
examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java
---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/16329
[SPARK-16046][DOCS] Aggregations in the Spark SQL programming guide
## What changes were proposed in this pull request?
- A separate subsection for Aggregations under âGetting
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16329#discussion_r93019035
--- Diff:
examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedUntypedAggregation.scala
---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16329#discussion_r93019316
--- Diff:
examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedUntypedAggregation.scala
---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/16329
@marmbrus I have updated the pull request. The compiled docs can be found
[here](https://aokolnychyi.github.io/spark-docs/sql-programming-guide.html).
I did not manage to build the
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16329#discussion_r93395745
--- Diff: docs/sql-programming-guide.md ---
@@ -382,6 +382,52 @@ For example:
+## Aggregations
+
+The [built-in DataFrames
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/19193
@hvanhovell here is a summary of tried scenarios:
```
val df = Seq((1, 2), (1, 3), (2, 4), (5, 5)).toDF("a", "b")
val window1 = Window.orderBy
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/19193
I checked PostgreSQL(10.3), MySQL(8.0), Hive(2.1.0).
**1. PostgreSQL**
```
postgres=# CREATE TABLE t1 (c1 integer, c2 integer);
postgres=# INSERT INTO t1 VALUES (1, 2
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/19193
@hvanhovell @cloud-fan I think it would be safer to be consistent with
other databases and what Spark does for nested aggregate functions. It is
really simple to write a subquery to work around
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/21473
[SPARK-21896][SQL] Fix StackOverflow caused by window functions inside
aggregate functions
## What changes were proposed in this pull request?
This PR explicitly prohibits window
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/21473#discussion_r192234621
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1744,11 +1744,14 @@ class Analyzer
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/19193
@cloud-fan @hvanhovell I created PR #21473 that fixes StackOverflow.
Apart from that, I think we might have other potential problems.
**1. Window functions inside WHERE and
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/19193
Let me check other databases and come up with a summary.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18368
@shaneknapp It seems that the build fails with an exception non-related to
the PR. Therefore, I will just close this one and open a new one.
---
If your project is set up for it, you can reply
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18368
@gatorsmile should be fixed now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/18583
[SPARK-21332][SQL] Incorrect result type inferred for some decimal
expressions
## What changes were proposed in this pull request?
This PR changes the direction of expression
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18583
Can we, please, trigger this one more time?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/21580
[SPARK-24575][SQL] Prohibit window expressions inside WHERE and HAVING
clauses
## What changes were proposed in this pull request?
As discussed
[before](https://github.com/apache
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
@SimonBin The initial solution handled your case but then there was a
decision to restrict the proposed rule to cross joins only. You can find the
reason in this
[comment](https://github.com
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18692#discussion_r152660385
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -152,3 +152,71 @@ object EliminateOuterJoin extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18692#discussion_r153066992
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -152,3 +152,99 @@ object EliminateOuterJoin extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18692#discussion_r153329031
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -152,3 +152,99 @@ object EliminateOuterJoin extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18692#discussion_r153420088
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -152,3 +152,99 @@ object EliminateOuterJoin extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18692#discussion_r154164912
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
---
@@ -27,6 +27,8 @@ object ExpressionSet
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/19193
@gatorsmile @cloud-fan could you provide any input?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/19193#discussion_r156493072
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1920,7 +1927,34 @@ class Analyzer
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/19193#discussion_r156495899
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1920,7 +1927,34 @@ class Analyzer
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
Sure, if you guys think it does not give any performance benefits, then
let's revert it.
I also had similar concerns but my understanding was that having an inner
join with
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
I took a look at ``JoinSelection``. It seems we will not get
``BroadcastHashJoin`` or ``ShuffledHashJoin`` if we revert this rule
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
Yeah, correct. So, we should revert then.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
I am not sure we can infer ``a == b`` if ``a in (0, 2, 3, 4)`` and ``b in
(0, 2, 3, 4)``.
table 'a'
```
a1 a2
1 2
3 3
4 5
```
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18692#discussion_r145498671
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -152,3 +152,79 @@ object EliminateOuterJoin extends
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/19252
@gatorsmile thanks for the feedback. I also covered
``TruncateTableCommand`` with additional tests. However, I see a bit strange
behavior while creating a test for
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18692#discussion_r144722742
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -152,3 +152,71 @@ object EliminateOuterJoin extends
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/14050
[MINOR][EXAMPLES] Window function examples
## What changes were proposed in this pull request?
An example that explains the usage of window functions.
It shows the difference
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/14119
[SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programming guide and
examples
## What changes were proposed in this pull request?
- Hard-coded Spark SQL sample snippets were moved
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/14119
@liancheng could you, please, review this PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/14119#discussion_r70173035
--- Diff: docs/sql-programming-guide.md ---
@@ -1380,17 +949,17 @@ metadata.
{% highlight scala %}
-// spark is an existing
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/14119#discussion_r70173058
--- Diff:
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
---
@@ -41,43 +35,47 @@ object HiveFromSpark
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/14119#discussion_r70173131
--- Diff:
examples/src/main/scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala
---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/14119#discussion_r70173180
--- Diff:
examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java
---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/14119
**Summary of the updates**
- `JavaSparkSQL.java` file was removed. I kept it initially since the file
itself was quite old (2+ years) and it was present in your original WIP branch
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
@gatorsmile what is our decision here? Shall we wait until SPARK-21652 is
resolved? In the meantime, I can add some tests and see how the proposed rule
works together with all others.
---
If
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18692#discussion_r137343433
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -152,3 +152,71 @@ object EliminateOuterJoin extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18692#discussion_r137343500
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -152,3 +152,71 @@ object EliminateOuterJoin extends
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/19193
[WIP][SPARK-21896][SQL] Fix Stack Overflow when window function is nested
inside an aggregate function
## What changes were proposed in this pull request?
This WIP PR contains a
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/19252
[SPARK-21969][SQL] CommandUtils.updateTableStats should call refreshTable
## What changes were proposed in this pull request?
Tables in the catalog cache are not invalidated once their
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/18692
[SPARK-21417][SQL] Detect joind conditions via filter expressions
## What changes were proposed in this pull request?
This PR adds an optimization rule that infers join conditions
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
@cloud-fan which rule do you mean? `PushPredicateThroughJoin` seems to be
the closest by logic but it has a slightly different purpose and does not cover
this use case. In fact, I used the
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
@gatorsmile thanks for the input. Let me check that I understood everything
correctly. So, I keep it as a separate rule that is applied only if constraint
propagation enabled. Inside the rule
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/18740
[SPARK-21538][SQL] Attribute resolution inconsistency in the Dataset API
## What changes were proposed in this pull request?
This PR contains a tiny update that removes an attribute
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18740#discussion_r129911780
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
---
@@ -1304,6 +1304,15 @@ class DatasetSuite extends QueryTest with
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
@gatorsmile I took a look at the case above. Indeed, the proposed rule
triggers this issue but only indirectly. In the example above, the optimizer
will never reach a fixed point. Please, find
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18692#discussion_r130662925
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -152,3 +152,72 @@ object EliminateOuterJoin extends
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18692
@gatorsmile I updated the rule to cover cross join cases. Regarding the
case with the redundant condition mentioned by you, I opened
[SPARK-21652](https://issues.apache.org/jira/browse/SPARK
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/18909
[MINOR][SQL] Additional test case for CheckCartesianProducts rule
## What changes were proposed in this pull request?
While discovering optimization rules and their test coverage, I
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18909
@gatorsmile I took a look at both PRs.
I quickly scanned PR #14866 and did not find tests for existence joins.
Also, `SQLConf.CROSS_JOINS_ENABLED = true` is checked only for
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18909
@gatorsmile sure, this PR is only about tests, I was just wondering what is
planned regarding cross joins with inequality conditions.
I borrowed several tests from PR #16762 and added
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/18252
[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds
The PR contains a tiny change to fix the way Spark parses string literals
into timestamps. Currently, some timestamps that
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18252
@ueshin good point, thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18252#discussion_r121251811
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
---
@@ -32,7 +32,7 @@ import
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/18252#discussion_r121252397
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
---
@@ -399,13 +399,13 @@ object DateTimeUtils
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18252
@wzhfy @rxin @ueshin can someone, please, merge this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/18368
[SPARK-21102][SQL] Make refresh resource command less aggressive in pâ¦
### Idea
This PR adds validation to REFRESH sql statements. Currently, users can
specify whatever they want as
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/18368
@shaneknapp can we trigger this one more time, please?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user aokolnychyi closed the pull request at:
https://github.com/apache/spark/pull/19193
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/19193
Hi @dongjoon-hyun. Yep, I'll close this one.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.or
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/22857
[SPARK-25860][SQL] Replace Literal(null, _) with FalseLiteral whenever
possible
## What changes were proposed in this pull request?
This PR proposes a new optimization rule that
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/22857
@dbtsai @gatorsmile @cloud-fan could you guys, please, take a look?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/22857#discussion_r228741800
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -736,3 +736,65 @@ object CombineConcats extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/22857#discussion_r228741884
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -736,3 +736,65 @@ object CombineConcats extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/22857#discussion_r229133550
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -736,3 +736,65 @@ object CombineConcats extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/22857#discussion_r229133793
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
---
@@ -2578,4 +2578,45 @@ class DataFrameSuite extends QueryTest with
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/22857#discussion_r229442843
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -736,3 +736,60 @@ object CombineConcats extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/22857#discussion_r229445313
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -736,3 +736,60 @@ object CombineConcats extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/22857#discussion_r229445682
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -736,3 +736,60 @@ object CombineConcats extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/22857#discussion_r229449194
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
---
@@ -2578,4 +2578,45 @@ class DataFrameSuite extends QueryTest with
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/22857#discussion_r229449496
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -736,3 +736,60 @@ object CombineConcats extends
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/22857#discussion_r229705741
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
---
@@ -2585,4 +2585,45 @@ class DataFrameSuite extends QueryTest with
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/22966
I also think having a performance trend would be useful. I'll be glad to
help with this effort.
---
-
To unsubscri
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/23079
@rednaxelafx I am glad the rule gets more adoption. Renaming also makes
sense to me.
Shall we extend `ReplaceNullWithFalseEndToEndSuite` as well
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/23079#discussion_r234467085
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala
---
@@ -298,6 +299,45
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/23079
LGTM as well.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/23139#discussion_r236467423
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala
---
@@ -0,0 +1,110
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/23139#discussion_r236466962
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala
---
@@ -0,0 +1,110
GitHub user aokolnychyi opened a pull request:
https://github.com/apache/spark/pull/23171
[SPARK-26205][SQL] Optimize In for bytes, shorts, ints
## What changes were proposed in this pull request?
This PR optimizes `In` expressions for byte, short, integer types. It is a
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/23171
@gatorsmile @cloud-fan @dongjoon-hyun @viirya
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/23171
@cloud-fan, yeah, letâs see if this PR is useful.
The original idea wasnât to avoid fixing autoboxing in `InSet`. `In` was
tested on 250 numbers to prove O(1) time complexity on
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/23171
@dbtsai @mgaido91 I think we can come back to this question once
[SPARK-26203](https://issues.apache.org/jira/browse/SPARK-26203) is resolved.
That JIRA will give us enough information about
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/23171
To sum up, I would set the goal of this PR is to make `In` expressions as
efficient as possible for bytes/shorts/ints. Then we can do benchmarks for `In`
vs `InSet` in [SPARK-26203](https
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/23171
As @rxin said, if we introduce a separate expression for the switch-based
approach, then we will need to modify other places. For example,
`DataSourceStrategy$translateFilter`. So, integrating
96 matches
Mail list logo