Hi all,
We have completed the Spark 3.0 Adaptive Query Execution(AQE) performance tests
in 3TB TPC-DS on 5 node Cascade Lake cluster. 2 queries bring about more than
1.5x performance and 37 queries bring more than 1.1x performance with AQE.
There is no query has significant performance degradat
To do that, we should explicitly document such structured configuration and
implicit effect, which is currently missing.
I would be more than happy if we document such implied relationship, *and*
if we are very sure all configurations are structured correctly coherently.
Until that point, I think i
I'm looking into the case of `spark.dynamicAllocation` and this seems to be
the thing to support my voice.
https://github.com/apache/spark/blob/master/docs/configuration.md#dynamic-allocation
I don't disagree with adding "This requires spark.shuffle.service.enabled
to be set." in the description
Sure, adding "[DISCUSS]" is a good practice to label it. I had to do it
although it might be "redundant" :-) since anyone can give feedback to any
thread in Spark dev mailing list, and discuss.
This is actually more prevailing given my rough reading of configuration
files. I would like to see this
I'm sorry if I miss something, but this is ideally better to be started as
[DISCUSS] as I haven't seen any reference to have consensus on this
practice.
For me it's just there're two different practices co-existing on the
codebase, meaning it's closer to the preference of individual (with
implicit
> I don't plan to document this officially yet
Just to prevent confusion, I meant I don't yet plan to document the fact
that we should write the relationships in configurations as a code/review
guideline in https://spark.apache.org/contributing.html
2020년 2월 12일 (수) 오전 9:57, Hyukjin Kwon 님이 작성:
Hi all,
I happened to review some PRs and I noticed that some configurations don't
have some information
necessary.
To be explicit, I would like to make sure we document the direct
relationship between other configurations
in the documentation. For example,
`spark.sql.adaptive.shuffle.reducePostS
Thanks always..!
Bests,
Takeshi
On Wed, Feb 12, 2020 at 3:28 AM shane knapp ☠ wrote:
> the build queue has been increasing and to help throughput i enabled the
> 'ubuntu-testing' node. i spot-checked a bunch of the spark maven builds,
> and they passed.
>
> i'll keep an eye out for any failure
Hello All,
Could you please let me know what would be next step for PR:
https://github.com/apache/spark/pull/27246?I would like to know if there is any
action item on my side.
Thank youSinisa
Hi, Sean.
Yes. We should keep this minimal.
BTW, for the following questions,
> But how much value does that add?
How much value do you think we have at our binary distribution in the
following link?
-
https://www.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
Docker
I compute the difference of two timestamps and compare them with a
constant interval:
Seq(("2019-01-02 12:00:00", "2019-01-02 13:30:00"))
.toDF("start", "end")
.select($"start".cast(TimestampType), $"end".cast(TimestampType))
.select($"start", $"end", ($"end" - $"start").as("diff"))
.whe
the build queue has been increasing and to help throughput i enabled the
'ubuntu-testing' node. i spot-checked a bunch of the spark maven builds,
and they passed.
i'll keep an eye out for any failures caused by the system and either
remove it from the worker pool of fix what i need to.
shane
--
To be clear this is a convenience 'binary' for end users, not just an
internal packaging to aid the testing framework?
There's nothing wrong with providing an additional official packaging
if we vote on it and it follows all the rules. There is an open
question about how much value it adds vs that
My takeaway from the last time we discussed this was:
1) To be ASF compliant, we needed to only publish images at official
releases
2) There was some ambiguity about whether or not a container image that
included GPL'ed packages (spark images do) might trip over the GPL "viral
propagation" due to i
The problem is that there isn't a consistent number of seconds an interval
represents - as Wenchen mentioned, a month interval isn't a fixed number of
days. If your use case can account for that, maybe you could add the
interval to a fixed reference date and then compare the result.
On Tue, Feb 11
What's your use case to compare intervals? It's tricky in Spark as there is
only one interval type and you can't really compare one month with 30 days.
On Wed, Feb 12, 2020 at 12:01 AM Enrico Minack
wrote:
> Hi Devs,
>
> I would like to know what is the current roadmap of making
> CalendarInterv
Hi Devs,
I would like to know what is the current roadmap of making
CalendarInterval comparable and orderable again (SPARK-29679,
SPARK-29385, #26337).
With #27262, this got reverted but SPARK-30551 does not mention how to
go forward in this matter. I have found SPARK-28494, but this seems t
17 matches
Mail list logo