Re: FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2019-12-06 Thread Takeshi Yamamuro
Oh, looks nice. Thanks for the sharing, Dongjoon Bests, Takeshi On Sat, Dec 7, 2019 at 3:35 AM Dongjoon Hyun wrote: > Hi, All. > > I want to share the following change to the community. > > SPARK-30098 Use default datasource as provider for CREATE TABLE syntax > > This is merged today and

Re: Closing stale PRs with a GitHub Action

2019-12-06 Thread Hyukjin Kwon
lol how did you know I'm going to read this email Sean? When I manually identified the stale PRs, I used this conditions below: 1. Author's inactivity over a year. If the PRs were simply waiting for a review, I excluded it from stale PR list. 2. Ping one time and see if there are any updates

Re: Is it feasible to build and run Spark on Windows?

2019-12-06 Thread Ping Liu
Hi Deepak, Following your suggestion, I put exclusion of guava in topmost POM (under Spark home directly) as follows. 2227- 2228- 2229-org.apache.hadoop 2230:hadoop-common 2231-3.2.1 2232- 2233- 2234-com.google.guava 2235-

FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2019-12-06 Thread Dongjoon Hyun
Hi, All. I want to share the following change to the community. SPARK-30098 Use default datasource as provider for CREATE TABLE syntax This is merged today and now Spark's `CREATE TABLE` is using Spark's default data sources instead of `hive` provider. This is a good and big improvement for

Re: Closing stale PRs with a GitHub Action

2019-12-06 Thread Sean Owen
We used to not be able to close PRs directly, but now we can, so I assume this is as fine a way of doing so, if we want to. I don't think there's a policy against it or anything. Hyukjin how have you managed this one in the past? I don't mind it being automated if the idle time is long and it

Re: Closing stale PRs with a GitHub Action

2019-12-06 Thread Nicholas Chammas
That's true, we do use Actions today. I wonder if Apache Infra allows Actions to close PRs vs. just updating commit statuses. I only ask because I remember permissions were an issue in the past when discussing tooling like this. In any case, I'd be happy to submit a PR adding this in if there are

Re: Closing stale PRs with a GitHub Action

2019-12-06 Thread Sean Owen
I think we can add Actions, right? they're used for the newer tests in Github? I'm OK closing PRs inactive for a 'long time', where that's maybe 6-12 months or something. It's standard practice and doesn't mean it can't be reopened. Often the related JIRA should be closed as well but we have done

Re: Enabling fully disaggregated shuffle on Spark

2019-12-06 Thread Li Hao
Agree with Bo's idea that the MapStatus could be a more generalized concept, not necessary to be bound with BlockManager/Executor. As I understand it, the MapStatus are used to track/record the output data location of a map task , created by shuffle writer, used by shuffle reader for finding

Re: DataSourceWriter V2 Api questions

2019-12-06 Thread Jungtaek Lim
Yeah they are very tricky and have to be integrated with Spark's checkpoint mechanism as well - I guess that's why this mail thread had been quiet after some time. Along with these questions, there might be also some edge-cases which we have to deal with 2PC approach: suppose a batch got into