Oh, looks nice. Thanks for the sharing, Dongjoon
Bests,
Takeshi
On Sat, Dec 7, 2019 at 3:35 AM Dongjoon Hyun
wrote:
> Hi, All.
>
> I want to share the following change to the community.
>
> SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
>
> This is merged today and
lol how did you know I'm going to read this email Sean?
When I manually identified the stale PRs, I used this conditions below:
1. Author's inactivity over a year. If the PRs were simply waiting for a
review, I excluded it from stale PR list.
2. Ping one time and see if there are any updates
Hi Deepak,
Following your suggestion, I put exclusion of guava in topmost POM (under
Spark home directly) as follows.
2227-
2228-
2229-org.apache.hadoop
2230:hadoop-common
2231-3.2.1
2232-
2233-
2234-com.google.guava
2235-
Hi, All.
I want to share the following change to the community.
SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
This is merged today and now Spark's `CREATE TABLE` is using Spark's
default data sources instead of `hive` provider. This is a good and big
improvement for
We used to not be able to close PRs directly, but now we can, so I assume
this is as fine a way of doing so, if we want to. I don't think there's a
policy against it or anything.
Hyukjin how have you managed this one in the past?
I don't mind it being automated if the idle time is long and it
That's true, we do use Actions today. I wonder if Apache Infra allows
Actions to close PRs vs. just updating commit statuses. I only ask because
I remember permissions were an issue in the past when discussing tooling
like this.
In any case, I'd be happy to submit a PR adding this in if there are
I think we can add Actions, right? they're used for the newer tests in
Github?
I'm OK closing PRs inactive for a 'long time', where that's maybe 6-12
months or something. It's standard practice and doesn't mean it can't be
reopened.
Often the related JIRA should be closed as well but we have done
Agree with Bo's idea that the MapStatus could be a more generalized
concept, not necessary to be bound with BlockManager/Executor.
As I understand it, the MapStatus are used to track/record the output data
location of a map task , created by shuffle writer, used by shuffle reader
for finding
Yeah they are very tricky and have to be integrated with Spark's checkpoint
mechanism as well - I guess that's why this mail thread had been quiet
after some time.
Along with these questions, there might be also some edge-cases which we
have to deal with 2PC approach: suppose a batch got into