Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-12 Thread Jungtaek Lim
Never mind, forget about the dead code. Turned out that reverting "via manual" can be very easily done - remove config and apply to the tests -> remove field -> remove the changes into docs. It was considered as non-trivial because we only consider about "git revert" but there's no strict rule to

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-12 Thread Russell Spitzer
I think that the dead code approach, while a bit unpalatable and worse than reverting, is probably better than leaving the parameter (even if it is hidden) On Tue, May 12, 2020 at 12:46 PM Ryan Blue wrote: > +1 for the approach Jungtaek suggests. That will avoid needing to support > behavior

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-12 Thread Ryan Blue
+1 for the approach Jungtaek suggests. That will avoid needing to support behavior that is not well understood with minimal changes. On Tue, May 12, 2020 at 1:45 AM Jungtaek Lim wrote: > Before I forget, we'd better not forget to change the doc, as create table > doc looks to represent current

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-12 Thread Jungtaek Lim
Before I forget, we'd better not forget to change the doc, as create table doc looks to represent current syntax which will be incorrect later. On Tue, May 12, 2020 at 5:32 PM Jungtaek Lim wrote: > It's not only for end users, but also for us. Spark itself uses the config > "true" and "false"

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-12 Thread Jungtaek Lim
It's not only for end users, but also for us. Spark itself uses the config "true" and "false" in tests and it still brings confusion. We still have to deal with both situations. I'm wondering how long days it would be needed to revert it cleanly, but if we worry about the amount of code change

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread Wenchen Fan
SPARK-30098 was merged about 6 months ago. It's not a clean revert and we may need to spend quite a bit of time to resolve conflicts and fix tests. I don't see why it's still a problem if a feature is disabled and hidden from end-users (it's undocumented, the config is internal). The related code

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread Ryan Blue
I'm all for getting the unified syntax into master. The only issue appears to be whether or not to pass the presence of the EXTERNAL keyword through to a catalog in v2. Maybe it's time to start a discuss thread for that issue so we're not stuck for another 6 weeks on it. On Mon, May 11, 2020 at

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread Jungtaek Lim
Btw another wondering here is, is it good to retain the flag on master as an intermediate step? Wouldn't it be better for us to start "unified create table syntax" from scratch? On Tue, May 12, 2020 at 6:50 AM Jungtaek Lim wrote: > I'm sorry, but I have to agree with Ryan and Russell. I chose

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread Jungtaek Lim
I'm sorry, but I have to agree with Ryan and Russell. I chose the option 1 because it's less worse than option 2, but it doesn't mean I fully agree with option 1. Let's make below things clear if we really go with option 1, otherwise please consider reverting it. * Do you fully indicate about

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread Russell Spitzer
I think reverting 30098 is the right decision here if we want to unblock 3.0. We shouldn't ship with features which we know do not function in the way we intend, regardless of how little exposure most users have to them. Even if it's off my default, we should probably work to avoid switches that

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread Ryan Blue
I'm all for fixing behavior in master by turning this off as an intermediate step, but I don't think that Spark 3.0 can safely include SPARK-30098. The problem is that SPARK-30098 introduces strange behavior, as Jungtaek pointed out. And that behavior is not fully understood. While working on a

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread JackyLee
+1. Agree with Xiao Li and Jungtaek Lim. This seems to be controversial, and can not be done in a short time. It is necessary to choose option 1 to unblock Spark 3.0 and support it in 3.1. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread Xiao Li
> > 1. Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default, > which effectively revert SPARK-30098. The CREATE TABLE syntax is still > confusing but it's the same as 2.4 > 2. Do not support the v2 CreateTable command if STORE AS/BY or EXTERNAL is > specified. This gives us more

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-10 Thread Jungtaek Lim
Let's focus on how to unblock Spark 3.0.0 for now, as other blockers are getting resolved. I'm in favor of option 1 to avoid bring multiple backward incompatible changes. Unifying create table would bring backward incompatibility (I'd rather say the new syntax should be cleared up ignoring the

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-08 Thread Wenchen Fan
Hi all, I'd like to bring this up again to share the status and get more feedback. Currently, we all agree to unify the CREATE TABLE syntax by merging the native and Hive-style syntaxes. The unified CREATE TABLE syntax will become the native syntax and there is no Hive-style syntax anymore. This

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-25 Thread Jungtaek Lim
Thanks, filed SPARK-31257 . Thanks again for looking into this - I'll take a look whenever I get time sooner. On Thu, Mar 26, 2020 at 8:06 AM Ryan Blue wrote: > Feel free to open another issue, I just used that one since it describes > this and

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-25 Thread Ryan Blue
Feel free to open another issue, I just used that one since it describes this and doesn't appear to be done. On Wed, Mar 25, 2020 at 4:03 PM Jungtaek Lim wrote: > UPDATE: Sorry I just missed the PR ( > https://github.com/apache/spark/pull/28026). I still think it'd be nice > to avoid recycling

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-25 Thread Ryan Blue
Here's a WIP PR with the basic changes: https://github.com/apache/spark/pull/28026 I still need to update tests in that branch and add the conversions to the old Hive plans. But at least you can see how the parser part works and how I'm converting the extra clauses for DSv2. This also enables us

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-25 Thread Jungtaek Lim
UPDATE: Sorry I just missed the PR ( https://github.com/apache/spark/pull/28026). I still think it'd be nice to avoid recycling the JIRA issue which was resolved before. Shall we have a new JIRA issue with linking to SPARK-30098, and set proper priority? On Thu, Mar 26, 2020 at 7:59 AM Jungtaek

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-25 Thread Jungtaek Lim
Would it be better to prioritize this to make sure the change is included in Spark 3.0? (Maybe filing an issue and set as a blocker) Looks like there's consensus that SPARK-30098 brought ambiguous issue which should be fixed (though the consideration of severity seems to be different), and once

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-24 Thread Wenchen Fan
Hi Ryan, It's great to hear that you are cleaning up this long-standing mess. Please let me know if you hit any problems that I can help with. Thanks, Wenchen On Sat, Mar 21, 2020 at 3:16 AM Nicholas Chammas wrote: > On Thu, Mar 19, 2020 at 3:46 AM Wenchen Fan wrote: > >> 2. PARTITIONED BY

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-20 Thread Nicholas Chammas
On Thu, Mar 19, 2020 at 3:46 AM Wenchen Fan wrote: > 2. PARTITIONED BY colTypeList: I think we can support it in the unified > syntax. Just make sure it doesn't appear together with PARTITIONED BY > transformList. > Another side note: Perhaps as part of (or after) unifying the CREATE TABLE

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-19 Thread Jungtaek Lim
Anything would be OK if the create table DDL provides a "clear way" to expect the table provider "before" they run the query. Great news that it doesn't require major rework - looking forward to the PR. Thanks again to jump in and sort this out. - Jungtaek Lim (HeartSaVioR) On Fri, Mar 20, 2020

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-19 Thread Ryan Blue
I have an update to the parser that unifies the CREATE TABLE rules. It took surprisingly little work to get the parser updated to produce CreateTableStatement and CreateTableAsSelectStatement with the Hive info. And the only fields I need to add to those statements were serde: SerdeInfo and

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-19 Thread Wenchen Fan
Big +1 to have one single unified CREATE TABLE syntax. In general, we can say there are 2 ways to specify the table provider: USING clause and ROW FORMAT/STORED AS clause. These 2 ways are mutually exclusive. If none is specified, it implicitly indicates USING defaultSource . I'm fine with a few

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Jungtaek Lim
Thanks Nicholas for the side comment; you'll need to interpret "CREATE TABLE USING HIVE FORMAT" as CREATE TABLE using "HIVE FORMAT", but yes it may add the confusion. Ryan, thanks for the detailed analysis and proposal. That's what I would like to see in discussion thread. I'm open to solutions

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Nicholas Chammas
Side comment: The current docs for CREATE TABLE add to the confusion by describing the Hive-compatible command as "CREATE TABLE USING HIVE FORMAT", but neither

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Ryan Blue
Jungtaek, it sounds like you consider the two rules to be separate syntaxes with their own consistency rules. For example, if I am using the Hive syntax rule, then the PARTITIONED BY clause adds new (partition) columns and requires types for those columns; if I’m using the Spark syntax rule with

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Jungtaek Lim
I'm trying to understand the reason you have been suggesting to keep the real thing unchanged but change doc instead. Could you please elaborate why? End users would blame us when they hit the case their query doesn't work as intended (1) and found the fact it's undocumented (2) and hard to

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Wenchen Fan
The fact that we have 2 CREATE TABLE syntax is already confusing many users. Shall we only document the native syntax? Then users don't need to worry about which rule their query fits and they don't need to spend a lot of time understanding the subtle difference between these 2 syntaxes. On Wed,

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Jungtaek Lim
A bit correction: the example I provided for vice versa is not really a correct case for vice versa. It's actually same case (intended to use rule 2 which is not default) but different result. On Wed, Mar 18, 2020 at 7:22 PM Jungtaek Lim wrote: > My concern is that although we simply think

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Jungtaek Lim
My concern is that although we simply think about the change to mark "USING provider" to be optional in rule 1, but in reality the change is most likely swapping the default rule for CREATE TABLE, which was "rule 2", and now "rule 1" (it would be the happy case of migration doc if the swap happens

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Wenchen Fan
Document-wise, yes, it's confusing as a simple CREATE TABLE fits both native and Hive syntax. I'm fine with some changes to make it less confusing, as long as the user-facing behavior doesn't change. For example, define "ROW FORMAT" or "STORED AS" as mandatory only if the legacy config is false.

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Jungtaek Lim
Thanks for sharing your view. I agree with you it's good for Spark to promote Spark's own CREATE TABLE syntax. The thing is, we still leave Hive CREATE TABLE syntax unchanged - it's being said as "convenience" but I'm not sure I can agree with. I'll quote my comments in SPARK-31136 here again to

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Wenchen Fan
I think the general guideline is to promote Spark's own CREATE TABLE syntax instead of the Hive one. Previously these two rules are mutually exclusive because the native syntax requires the USING clause while the Hive syntax makes ROW FORMAT or STORED AS clause optional. It's a good move to make

[DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-15 Thread Jungtaek Lim
Hi devs, I'd like to initiate discussion and hear the voices for resolving ambiguous parser rule between two "create table"s being brought by SPARK-30098 [1]. Previously, "create table" parser rules were clearly distinguished via "USING provider", which was very intuitive and deterministic. Say,