Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14481
gentle ping @yucai
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/14481
Any update?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14481
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/14481
@yucai okay, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/14481
@maropu, I am doing some refactor recently, will update it soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/14481
@hvanhovell What's the status of this? If nobody takes this, I'll do.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14481
@yucai thanks for posting the benchmarks and the code. One high level
comment would be to start with a properly sorted dataset for the second
benchmark. I would like to know how much time is
Github user chenghao-intel commented on the issue:
https://github.com/apache/spark/pull/14481
@yucai can you please rebase the code?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/14481
@hvanhovell
**Benchmark Result**
**Summary**
We benchmark sortagg code gen with real custormers cases, and it improves
x6 when aggregating without keys, improves x1.18 when
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/14481
@chenghao-intel Hao, kindly take a look at.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/14481
Generated code example, **not for code view yet**.
```
scala> Seq(("a", "10"), ("b", "1"), ("b", "2"), ("c", "5"), ("c", "3")).
| toDF("k",
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/14481
Generated code example, **not for code review yet**
```
scala> Seq(("a", "3"), ("b", "20"), ("b", "2")).toDF("k",
"v").agg(max("v")).debugCodegen()
Found 2 WholeStageCodegen subtrees.
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/14481
@hvanhovell thanks very much for the advice, yes, I will post the benchmark
results first.
And it is WIP, I will post a generated codes, but kindly not review the
codes details at present, I am
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14481
@yucai could you post some benchmark results? I would think that the
overall runtime of the sort based aggregation path is dominated by the
preceding exchange and sort operations, and that as a
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/14481
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14481
**[Test build #3202 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3202/consoleFull)**
for PR 14481 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14481
**[Test build #3202 has
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3202/consoleFull)**
for PR 14481 at commit
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14481
Ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14481
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
19 matches
Mail list logo