Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-06-06 Thread Zoltan Ivanfi
Hi Michael, To answer this I think we should distinguish between the long-term fix and the short-term fix. If understand the replies correctly, everyone agrees that the desired long-term fix is to have two separate SQL types (TIMESTAMP [WITH|WITHOUT] TIME ZONE). Because of having separate types,

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-06-02 Thread Michael Allman
Hi Zoltan, I don't fully understand your proposal for table-specific timestamp type semantics. I think it will be helpful to everyone in this conversation if you can identify the expected behavior for a few concrete scenarios. Suppose we have a Hive metastore table hivelogs with a column named

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-06-02 Thread Zoltan Ivanfi
Hi, We would like to solve the problem of interoperability of existing data, and that is the main use case for having table-level control. Spark should be able to read timestamps written by Impala or Hive and at the same time read back its own data. These have different semantics, so having a

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-06-01 Thread Reynold Xin
Yea I don't see why this needs to be per table config. If the user wants to configure it per table, can't they just declare the data type on a per table basis, once we have separate types for timestamp w/ tz and w/o tz? On Thu, Jun 1, 2017 at 4:14 PM, Michael Allman wrote:

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-06-01 Thread Michael Allman
I would suggest that making timestamp type behavior configurable and persisted per-table could introduce some real confusion, e.g. in queries involving tables with different timestamp type semantics. I suggest starting with the assumption that timestamp type behavior is a per-session flag that

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-05-30 Thread Zoltan Ivanfi
Hi, If I remember correctly, the TIMESTAMP type had UTC-normalized local time semantics even before Spark 2, so I can understand that Spark considers it to be the "established" behavior that must not be broken. Unfortunately, this behavior does not provide interoperability with other SQL engines

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-05-27 Thread Imran Rashid
I had asked zoltan to bring this discussion to the dev list because I think it's a question that extends beyond a single jira (we can't figure out the semantics of timestamp in parquet if we don't k ow the overall goal of the timestamp type) and since its a design question the entire community

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-05-26 Thread Reynold Xin
That's just my point 4, isn't it? On Fri, May 26, 2017 at 1:07 AM, Ofir Manor wrote: > Reynold, > my point is that Spark should aim to follow the SQL standard instead of > rolling its own type system. > If I understand correctly, the existing implementation is similar to

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-05-25 Thread Ofir Manor
Reynold, my point is that Spark should aim to follow the SQL standard instead of rolling its own type system. If I understand correctly, the existing implementation is similar to TIMESTAMP WITH LOCAL TIMEZONE data type in Oracle.. In addition, there are the standard TIMESTAMP and TIMESTAMP WITH

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-05-25 Thread Zoltan Ivanfi
Hi, Ofir, thanks for your support. My understanding is that many users have the same problem as you do. Reynold, thanks for your reply and sorry for the confusion. My personal e-mail was specifically about your concerns regarding SPARK-12297 and I started this separate thread because this is

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-05-25 Thread Reynold Xin
Zoltan, Thanks for raising this again, although I'm a bit confused since I've communicated with you a few times on JIRA and on private emails to explain that you have some misunderstanding of the timestamp type in Spark and some of your statements are wrong (e.g. the except text file part). Not

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-05-25 Thread Ofir Manor
Hi Zoltan, thanks for bringing this up, this is really important to me! Personally, as a user developing app on top of Spark and other tools, the current timestamp semantics has been a source of some pain - needing to undo Spark's "auto-correcting" of timestamps . It would be really great if we

SQL TIMESTAMP semantics vs. SPARK-18350

2017-05-24 Thread Zoltan Ivanfi
Hi, Sorry if you receive this mail twice, it seems that my first attempt did not make it to the list for some reason. I would like to start a discussion about SPARK-18350 before it gets released because it seems to be going in a different