Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-09 Thread Marco Gaido
Jörn, may you explain a bit more your proposal, please? We are not
modifying the existing decimal datatype. This is how it works now. If you
check the PR, the only difference is how we compute the result for the
divsion operation. The discussion about precision and scale is about: shall
we limit them more then we are doing now? Now we are supporting any scale
<= precision and any precision in the range (1, 38].

Il giorno mer 9 gen 2019 alle ore 09:13 Jörn Franke 
ha scritto:

> Maybe it is better to introduce a new datatype that supports negative
> scale, otherwise the migration and testing efforts for organizations
> running Spark application becomes too large. Of course the current decimal
> will be kept as it is.
>
> Am 07.01.2019 um 15:08 schrieb Marco Gaido :
>
> In general we can say that some datasources allow them, others fail. At
> the moment, we are doing no casting before writing (so we can state so in
> the doc). But since there is ongoing discussion for DSv2, we can maybe add
> a flag/interface there for "negative scale intollerant" DS and try and cast
> before writing to them. What do you think about this?
>
> Il giorno lun 7 gen 2019 alle ore 15:03 Wenchen Fan 
> ha scritto:
>
>> AFAIK parquet spec says decimal scale can't be negative. If we want to
>> officially support negative-scale decimal, we should clearly define the
>> behavior when writing negative-scale decimals to parquet and other data
>> sources. The most straightforward way is to fail for this case, but maybe
>> we can do something better, like casting decimal(1, -20) to decimal(20, 0)
>> before writing.
>>
>> On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido 
>> wrote:
>>
>>> Hi Wenchen,
>>>
>>> thanks for your email. I agree adding doc for decimal type, but I am not
>>> sure what you mean speaking of the behavior when writing: we are not
>>> performing any automatic casting before writing; if we want to do that, we
>>> need a design about it I think.
>>>
>>> I am not sure if it makes sense to set a min for it. That would break
>>> backward compatibility (for very weird use case), so I wouldn't do that.
>>>
>>> Thanks,
>>> Marco
>>>
>>> Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan 
>>> ha scritto:
>>>
>>>> I think we need to do this for backward compatibility, and according to
>>>> the discussion in the doc, SQL standard allows negative scale.
>>>>
>>>> To do this, I think the PR should also include a doc for the decimal
>>>> type, like the definition of precision and scale(this one
>>>> <https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale>
>>>> looks pretty good), and the result type of decimal operations, and the
>>>> behavior when writing out decimals(e.g. we can cast decimal(1, -20) to
>>>> decimal(20, 0) before writing).
>>>>
>>>> Another question is, shall we set a min scale? e.g. shall we allow
>>>> decimal(1, -1000)?
>>>>
>>>> On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido 
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> a bit more than one month ago, I sent a proposal for handling properly
>>>>> decimals with negative scales in our operations. This is a long standing
>>>>> problem in our codebase as we derived our rules from Hive and SQLServer
>>>>> where negative scales are forbidden, while in Spark they are not.
>>>>>
>>>>> The discussion has been stale for a while now. No more comments on the
>>>>> design doc:
>>>>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
>>>>> .
>>>>>
>>>>> So I am writing this e-mail in order to check whether there are more
>>>>> comments on it or we can go ahead with the PR.
>>>>>
>>>>> Thanks,
>>>>> Marco
>>>>>
>>>>


Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-09 Thread Jörn Franke
Maybe it is better to introduce a new datatype that supports negative scale, 
otherwise the migration and testing efforts for organizations running Spark 
application becomes too large. Of course the current decimal will be kept as it 
is.

> Am 07.01.2019 um 15:08 schrieb Marco Gaido :
> 
> In general we can say that some datasources allow them, others fail. At the 
> moment, we are doing no casting before writing (so we can state so in the 
> doc). But since there is ongoing discussion for DSv2, we can maybe add a 
> flag/interface there for "negative scale intollerant" DS and try and cast 
> before writing to them. What do you think about this?
> 
>> Il giorno lun 7 gen 2019 alle ore 15:03 Wenchen Fan  ha 
>> scritto:
>> AFAIK parquet spec says decimal scale can't be negative. If we want to 
>> officially support negative-scale decimal, we should clearly define the 
>> behavior when writing negative-scale decimals to parquet and other data 
>> sources. The most straightforward way is to fail for this case, but maybe we 
>> can do something better, like casting decimal(1, -20) to decimal(20, 0) 
>> before writing.
>> 
>>> On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido  wrote:
>>> Hi Wenchen,
>>> 
>>> thanks for your email. I agree adding doc for decimal type, but I am not 
>>> sure what you mean speaking of the behavior when writing: we are not 
>>> performing any automatic casting before writing; if we want to do that, we 
>>> need a design about it I think.
>>> 
>>> I am not sure if it makes sense to set a min for it. That would break 
>>> backward compatibility (for very weird use case), so I wouldn't do that.
>>> 
>>> Thanks,
>>> Marco
>>> 
>>>> Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan  
>>>> ha scritto:
>>>> I think we need to do this for backward compatibility, and according to 
>>>> the discussion in the doc, SQL standard allows negative scale.
>>>> 
>>>> To do this, I think the PR should also include a doc for the decimal type, 
>>>> like the definition of precision and scale(this one looks pretty good), 
>>>> and the result type of decimal operations, and the behavior when writing 
>>>> out decimals(e.g. we can cast decimal(1, -20) to decimal(20, 0) before 
>>>> writing).
>>>> 
>>>> Another question is, shall we set a min scale? e.g. shall we allow 
>>>> decimal(1, -1000)?
>>>> 
>>>>> On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido  
>>>>> wrote:
>>>>> Hi all,
>>>>> 
>>>>> a bit more than one month ago, I sent a proposal for handling properly 
>>>>> decimals with negative scales in our operations. This is a long standing 
>>>>> problem in our codebase as we derived our rules from Hive and SQLServer 
>>>>> where negative scales are forbidden, while in Spark they are not.
>>>>> 
>>>>> The discussion has been stale for a while now. No more comments on the 
>>>>> design doc: 
>>>>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm.
>>>>> 
>>>>> So I am writing this e-mail in order to check whether there are more 
>>>>> comments on it or we can go ahead with the PR.
>>>>> 
>>>>> Thanks,
>>>>> Marco


Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-09 Thread Marco Gaido
Oracle does the same: "The *scale* must be less than or equal to the
precision." (see
https://docs.oracle.com/javadb/10.6.2.1/ref/rrefsqlj15260.html).

Il giorno mer 9 gen 2019 alle ore 05:31 Wenchen Fan 
ha scritto:

> Some more thoughts. If we support unlimited negative scale, why can't we
> support unlimited positive scale? e.g. 0.0001 can be decimal(1, 4) instead
> of (4, 4). I think we need more references here: how other databases deal
> with decimal type and parse decimal literals?
>
> On Mon, Jan 7, 2019 at 10:36 PM Wenchen Fan  wrote:
>
>> I'm OK with it, i.e. fail the write if there are negative-scale decimals
>> (we need to document it though). We can improve it later in data source v2.
>>
>> On Mon, Jan 7, 2019 at 10:09 PM Marco Gaido 
>> wrote:
>>
>>> In general we can say that some datasources allow them, others fail. At
>>> the moment, we are doing no casting before writing (so we can state so in
>>> the doc). But since there is ongoing discussion for DSv2, we can maybe add
>>> a flag/interface there for "negative scale intollerant" DS and try and cast
>>> before writing to them. What do you think about this?
>>>
>>> Il giorno lun 7 gen 2019 alle ore 15:03 Wenchen Fan 
>>> ha scritto:
>>>
>>>> AFAIK parquet spec says decimal scale can't be negative. If we want to
>>>> officially support negative-scale decimal, we should clearly define the
>>>> behavior when writing negative-scale decimals to parquet and other data
>>>> sources. The most straightforward way is to fail for this case, but maybe
>>>> we can do something better, like casting decimal(1, -20) to decimal(20, 0)
>>>> before writing.
>>>>
>>>> On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido 
>>>> wrote:
>>>>
>>>>> Hi Wenchen,
>>>>>
>>>>> thanks for your email. I agree adding doc for decimal type, but I am
>>>>> not sure what you mean speaking of the behavior when writing: we are not
>>>>> performing any automatic casting before writing; if we want to do that, we
>>>>> need a design about it I think.
>>>>>
>>>>> I am not sure if it makes sense to set a min for it. That would break
>>>>> backward compatibility (for very weird use case), so I wouldn't do that.
>>>>>
>>>>> Thanks,
>>>>> Marco
>>>>>
>>>>> Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan <
>>>>> cloud0...@gmail.com> ha scritto:
>>>>>
>>>>>> I think we need to do this for backward compatibility, and according
>>>>>> to the discussion in the doc, SQL standard allows negative scale.
>>>>>>
>>>>>> To do this, I think the PR should also include a doc for the decimal
>>>>>> type, like the definition of precision and scale(this one
>>>>>> <https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale>
>>>>>> looks pretty good), and the result type of decimal operations, and the
>>>>>> behavior when writing out decimals(e.g. we can cast decimal(1, -20) to
>>>>>> decimal(20, 0) before writing).
>>>>>>
>>>>>> Another question is, shall we set a min scale? e.g. shall we allow
>>>>>> decimal(1, -1000)?
>>>>>>
>>>>>> On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> a bit more than one month ago, I sent a proposal for handling
>>>>>>> properly decimals with negative scales in our operations. This is a long
>>>>>>> standing problem in our codebase as we derived our rules from Hive and
>>>>>>> SQLServer where negative scales are forbidden, while in Spark they are 
>>>>>>> not.
>>>>>>>
>>>>>>> The discussion has been stale for a while now. No more comments on
>>>>>>> the design doc:
>>>>>>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
>>>>>>> .
>>>>>>>
>>>>>>> So I am writing this e-mail in order to check whether there are more
>>>>>>> comments on it or we can go ahead with the PR.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Marco
>>>>>>>
>>>>>>


Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-08 Thread Wenchen Fan
Some more thoughts. If we support unlimited negative scale, why can't we
support unlimited positive scale? e.g. 0.0001 can be decimal(1, 4) instead
of (4, 4). I think we need more references here: how other databases deal
with decimal type and parse decimal literals?

On Mon, Jan 7, 2019 at 10:36 PM Wenchen Fan  wrote:

> I'm OK with it, i.e. fail the write if there are negative-scale decimals
> (we need to document it though). We can improve it later in data source v2.
>
> On Mon, Jan 7, 2019 at 10:09 PM Marco Gaido 
> wrote:
>
>> In general we can say that some datasources allow them, others fail. At
>> the moment, we are doing no casting before writing (so we can state so in
>> the doc). But since there is ongoing discussion for DSv2, we can maybe add
>> a flag/interface there for "negative scale intollerant" DS and try and cast
>> before writing to them. What do you think about this?
>>
>> Il giorno lun 7 gen 2019 alle ore 15:03 Wenchen Fan 
>> ha scritto:
>>
>>> AFAIK parquet spec says decimal scale can't be negative. If we want to
>>> officially support negative-scale decimal, we should clearly define the
>>> behavior when writing negative-scale decimals to parquet and other data
>>> sources. The most straightforward way is to fail for this case, but maybe
>>> we can do something better, like casting decimal(1, -20) to decimal(20, 0)
>>> before writing.
>>>
>>> On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido 
>>> wrote:
>>>
>>>> Hi Wenchen,
>>>>
>>>> thanks for your email. I agree adding doc for decimal type, but I am
>>>> not sure what you mean speaking of the behavior when writing: we are not
>>>> performing any automatic casting before writing; if we want to do that, we
>>>> need a design about it I think.
>>>>
>>>> I am not sure if it makes sense to set a min for it. That would break
>>>> backward compatibility (for very weird use case), so I wouldn't do that.
>>>>
>>>> Thanks,
>>>> Marco
>>>>
>>>> Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan <
>>>> cloud0...@gmail.com> ha scritto:
>>>>
>>>>> I think we need to do this for backward compatibility, and according
>>>>> to the discussion in the doc, SQL standard allows negative scale.
>>>>>
>>>>> To do this, I think the PR should also include a doc for the decimal
>>>>> type, like the definition of precision and scale(this one
>>>>> <https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale>
>>>>> looks pretty good), and the result type of decimal operations, and the
>>>>> behavior when writing out decimals(e.g. we can cast decimal(1, -20) to
>>>>> decimal(20, 0) before writing).
>>>>>
>>>>> Another question is, shall we set a min scale? e.g. shall we allow
>>>>> decimal(1, -1000)?
>>>>>
>>>>> On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido 
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> a bit more than one month ago, I sent a proposal for handling
>>>>>> properly decimals with negative scales in our operations. This is a long
>>>>>> standing problem in our codebase as we derived our rules from Hive and
>>>>>> SQLServer where negative scales are forbidden, while in Spark they are 
>>>>>> not.
>>>>>>
>>>>>> The discussion has been stale for a while now. No more comments on
>>>>>> the design doc:
>>>>>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
>>>>>> .
>>>>>>
>>>>>> So I am writing this e-mail in order to check whether there are more
>>>>>> comments on it or we can go ahead with the PR.
>>>>>>
>>>>>> Thanks,
>>>>>> Marco
>>>>>>
>>>>>


Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-07 Thread Wenchen Fan
I'm OK with it, i.e. fail the write if there are negative-scale decimals
(we need to document it though). We can improve it later in data source v2.

On Mon, Jan 7, 2019 at 10:09 PM Marco Gaido  wrote:

> In general we can say that some datasources allow them, others fail. At
> the moment, we are doing no casting before writing (so we can state so in
> the doc). But since there is ongoing discussion for DSv2, we can maybe add
> a flag/interface there for "negative scale intollerant" DS and try and cast
> before writing to them. What do you think about this?
>
> Il giorno lun 7 gen 2019 alle ore 15:03 Wenchen Fan 
> ha scritto:
>
>> AFAIK parquet spec says decimal scale can't be negative. If we want to
>> officially support negative-scale decimal, we should clearly define the
>> behavior when writing negative-scale decimals to parquet and other data
>> sources. The most straightforward way is to fail for this case, but maybe
>> we can do something better, like casting decimal(1, -20) to decimal(20, 0)
>> before writing.
>>
>> On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido 
>> wrote:
>>
>>> Hi Wenchen,
>>>
>>> thanks for your email. I agree adding doc for decimal type, but I am not
>>> sure what you mean speaking of the behavior when writing: we are not
>>> performing any automatic casting before writing; if we want to do that, we
>>> need a design about it I think.
>>>
>>> I am not sure if it makes sense to set a min for it. That would break
>>> backward compatibility (for very weird use case), so I wouldn't do that.
>>>
>>> Thanks,
>>> Marco
>>>
>>> Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan 
>>> ha scritto:
>>>
>>>> I think we need to do this for backward compatibility, and according to
>>>> the discussion in the doc, SQL standard allows negative scale.
>>>>
>>>> To do this, I think the PR should also include a doc for the decimal
>>>> type, like the definition of precision and scale(this one
>>>> <https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale>
>>>> looks pretty good), and the result type of decimal operations, and the
>>>> behavior when writing out decimals(e.g. we can cast decimal(1, -20) to
>>>> decimal(20, 0) before writing).
>>>>
>>>> Another question is, shall we set a min scale? e.g. shall we allow
>>>> decimal(1, -1000)?
>>>>
>>>> On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido 
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> a bit more than one month ago, I sent a proposal for handling properly
>>>>> decimals with negative scales in our operations. This is a long standing
>>>>> problem in our codebase as we derived our rules from Hive and SQLServer
>>>>> where negative scales are forbidden, while in Spark they are not.
>>>>>
>>>>> The discussion has been stale for a while now. No more comments on the
>>>>> design doc:
>>>>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
>>>>> .
>>>>>
>>>>> So I am writing this e-mail in order to check whether there are more
>>>>> comments on it or we can go ahead with the PR.
>>>>>
>>>>> Thanks,
>>>>> Marco
>>>>>
>>>>


Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-07 Thread Marco Gaido
In general we can say that some datasources allow them, others fail. At the
moment, we are doing no casting before writing (so we can state so in the
doc). But since there is ongoing discussion for DSv2, we can maybe add a
flag/interface there for "negative scale intollerant" DS and try and cast
before writing to them. What do you think about this?

Il giorno lun 7 gen 2019 alle ore 15:03 Wenchen Fan 
ha scritto:

> AFAIK parquet spec says decimal scale can't be negative. If we want to
> officially support negative-scale decimal, we should clearly define the
> behavior when writing negative-scale decimals to parquet and other data
> sources. The most straightforward way is to fail for this case, but maybe
> we can do something better, like casting decimal(1, -20) to decimal(20, 0)
> before writing.
>
> On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido  wrote:
>
>> Hi Wenchen,
>>
>> thanks for your email. I agree adding doc for decimal type, but I am not
>> sure what you mean speaking of the behavior when writing: we are not
>> performing any automatic casting before writing; if we want to do that, we
>> need a design about it I think.
>>
>> I am not sure if it makes sense to set a min for it. That would break
>> backward compatibility (for very weird use case), so I wouldn't do that.
>>
>> Thanks,
>> Marco
>>
>> Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan 
>> ha scritto:
>>
>>> I think we need to do this for backward compatibility, and according to
>>> the discussion in the doc, SQL standard allows negative scale.
>>>
>>> To do this, I think the PR should also include a doc for the decimal
>>> type, like the definition of precision and scale(this one
>>> <https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale>
>>> looks pretty good), and the result type of decimal operations, and the
>>> behavior when writing out decimals(e.g. we can cast decimal(1, -20) to
>>> decimal(20, 0) before writing).
>>>
>>> Another question is, shall we set a min scale? e.g. shall we allow
>>> decimal(1, -1000)?
>>>
>>> On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> a bit more than one month ago, I sent a proposal for handling properly
>>>> decimals with negative scales in our operations. This is a long standing
>>>> problem in our codebase as we derived our rules from Hive and SQLServer
>>>> where negative scales are forbidden, while in Spark they are not.
>>>>
>>>> The discussion has been stale for a while now. No more comments on the
>>>> design doc:
>>>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
>>>> .
>>>>
>>>> So I am writing this e-mail in order to check whether there are more
>>>> comments on it or we can go ahead with the PR.
>>>>
>>>> Thanks,
>>>> Marco
>>>>
>>>


Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-07 Thread Wenchen Fan
AFAIK parquet spec says decimal scale can't be negative. If we want to
officially support negative-scale decimal, we should clearly define the
behavior when writing negative-scale decimals to parquet and other data
sources. The most straightforward way is to fail for this case, but maybe
we can do something better, like casting decimal(1, -20) to decimal(20, 0)
before writing.

On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido  wrote:

> Hi Wenchen,
>
> thanks for your email. I agree adding doc for decimal type, but I am not
> sure what you mean speaking of the behavior when writing: we are not
> performing any automatic casting before writing; if we want to do that, we
> need a design about it I think.
>
> I am not sure if it makes sense to set a min for it. That would break
> backward compatibility (for very weird use case), so I wouldn't do that.
>
> Thanks,
> Marco
>
> Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan 
> ha scritto:
>
>> I think we need to do this for backward compatibility, and according to
>> the discussion in the doc, SQL standard allows negative scale.
>>
>> To do this, I think the PR should also include a doc for the decimal
>> type, like the definition of precision and scale(this one
>> <https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale>
>> looks pretty good), and the result type of decimal operations, and the
>> behavior when writing out decimals(e.g. we can cast decimal(1, -20) to
>> decimal(20, 0) before writing).
>>
>> Another question is, shall we set a min scale? e.g. shall we allow
>> decimal(1, -1000)?
>>
>> On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido 
>> wrote:
>>
>>> Hi all,
>>>
>>> a bit more than one month ago, I sent a proposal for handling properly
>>> decimals with negative scales in our operations. This is a long standing
>>> problem in our codebase as we derived our rules from Hive and SQLServer
>>> where negative scales are forbidden, while in Spark they are not.
>>>
>>> The discussion has been stale for a while now. No more comments on the
>>> design doc:
>>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
>>> .
>>>
>>> So I am writing this e-mail in order to check whether there are more
>>> comments on it or we can go ahead with the PR.
>>>
>>> Thanks,
>>> Marco
>>>
>>


Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-07 Thread Marco Gaido
Hi Wenchen,

thanks for your email. I agree adding doc for decimal type, but I am not
sure what you mean speaking of the behavior when writing: we are not
performing any automatic casting before writing; if we want to do that, we
need a design about it I think.

I am not sure if it makes sense to set a min for it. That would break
backward compatibility (for very weird use case), so I wouldn't do that.

Thanks,
Marco

Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan 
ha scritto:

> I think we need to do this for backward compatibility, and according to
> the discussion in the doc, SQL standard allows negative scale.
>
> To do this, I think the PR should also include a doc for the decimal type,
> like the definition of precision and scale(this one
> 
> looks pretty good), and the result type of decimal operations, and the
> behavior when writing out decimals(e.g. we can cast decimal(1, -20) to
> decimal(20, 0) before writing).
>
> Another question is, shall we set a min scale? e.g. shall we allow
> decimal(1, -1000)?
>
> On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido 
> wrote:
>
>> Hi all,
>>
>> a bit more than one month ago, I sent a proposal for handling properly
>> decimals with negative scales in our operations. This is a long standing
>> problem in our codebase as we derived our rules from Hive and SQLServer
>> where negative scales are forbidden, while in Spark they are not.
>>
>> The discussion has been stale for a while now. No more comments on the
>> design doc:
>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
>> .
>>
>> So I am writing this e-mail in order to check whether there are more
>> comments on it or we can go ahead with the PR.
>>
>> Thanks,
>> Marco
>>
>


Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-06 Thread Wenchen Fan
I think we need to do this for backward compatibility, and according to the
discussion in the doc, SQL standard allows negative scale.

To do this, I think the PR should also include a doc for the decimal type,
like the definition of precision and scale(this one

looks pretty good), and the result type of decimal operations, and the
behavior when writing out decimals(e.g. we can cast decimal(1, -20) to
decimal(20, 0) before writing).

Another question is, shall we set a min scale? e.g. shall we allow
decimal(1, -1000)?

On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido  wrote:

> Hi all,
>
> a bit more than one month ago, I sent a proposal for handling properly
> decimals with negative scales in our operations. This is a long standing
> problem in our codebase as we derived our rules from Hive and SQLServer
> where negative scales are forbidden, while in Spark they are not.
>
> The discussion has been stale for a while now. No more comments on the
> design doc:
> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
> .
>
> So I am writing this e-mail in order to check whether there are more
> comments on it or we can go ahead with the PR.
>
> Thanks,
> Marco
>


Re: Decimals with negative scale

2018-12-19 Thread Marco Gaido
That is feasible, the main point is that negative scales were not really
meant to be there in the first place, so it something which was forgot to
be forbidden, and it is something which the DBs we are drawing our
inspiration from for decimals (mainly SQLServer) do not support.
Honestly, my opinion on this topic is:
 - let's add the support to negative scales in the operations (I have
already a PR out for that, https://github.com/apache/spark/pull/22450);
 - let's reduce our usage of DECIMAL in favor of DOUBLE when parsing
literals, as done by Hive, Presto, DB2, ...; so the number of cases when we
deal with negative scales in anyway small (and we do not have issues with
datasources which don't support them).

Thanks,
Marco


Il giorno mar 18 dic 2018 alle ore 19:08 Reynold Xin 
ha scritto:

> So why can't we just do validation to fail sources that don't support
> negative scale, if it is not supported? This way, we don't need to break
> backward compatibility in anyway and it becomes a strict improvement.
>
>
> On Tue, Dec 18, 2018 at 8:43 AM, Marco Gaido 
> wrote:
>
>> This is at analysis time.
>>
>> On Tue, 18 Dec 2018, 17:32 Reynold Xin >
>>> Is this an analysis time thing or a runtime thing?
>>>
>>> On Tue, Dec 18, 2018 at 7:45 AM Marco Gaido 
>>> wrote:
>>>
 Hi all,

 as you may remember, there was a design doc to support operations
 involving decimals with negative scales. After the discussion in the design
 doc, now the related PR is blocked because for 3.0 we have another option
 which we can explore, ie. forbidding negative scales. This is probably a
 cleaner solution, as most likely we didn't want negative scales, but it is
 a breaking change: so we wanted to check the opinion of the community.

 Getting to the topic, here there are the 2 options:
 * - Forbidding negative scales*
   Pros: many sources do not support negative scales (so they can create
 issues); they were something which was not considered as possible in the
 initial implementation, so we get to a more stable situation.
   Cons: some operations which were supported earlier, won't be working
 anymore. Eg. since our max precision is 38, if the scale cannot be negative
 1e36 * 1e36 would cause an overflow, while now works fine (producing a
 decimal with negative scale); basically impossible to create a config which
 controls the behavior.

  *- Handling negative scales in operations*
   Pros: no regressions; we support all the operations we supported on
 2.x.
   Cons: negative scales can cause issues in other moments, eg. when
 saving to a data source which doesn't support them.

 Looking forward to hear your thoughts,
 Thanks.
 Marco

>>>
>


Re: Decimals with negative scale

2018-12-18 Thread Reynold Xin
So why can't we just do validation to fail sources that don't support negative 
scale, if it is not supported? This way, we don't need to break backward 
compatibility in anyway and it becomes a strict improvement.

On Tue, Dec 18, 2018 at 8:43 AM, Marco Gaido < marcogaid...@gmail.com > wrote:

> 
> This is at analysis time.
> 
> On Tue, 18 Dec 2018, 17:32 Reynold Xin < rxin@ databricks. com (
> r...@databricks.com ) wrote:
> 
> 
>> Is this an analysis time thing or a runtime thing?
>> 
>> On Tue, Dec 18, 2018 at 7:45 AM Marco Gaido < marcogaido91@ gmail. com (
>> marcogaid...@gmail.com ) > wrote:
>> 
>> 
>>> Hi all,
>>> 
>>> 
>>> as you may remember, there was a design doc to support operations
>>> involving decimals with negative scales. After the discussion in the
>>> design doc, now the related PR is blocked because for 3.0 we have another
>>> option which we can explore, ie. forbidding negative scales. This is
>>> probably a cleaner solution, as most likely we didn't want negative
>>> scales, but it is a breaking change: so we wanted to check the opinion of
>>> the community.
>>> 
>>> 
>>> Getting to the topic, here there are the 2 options:
>>> * - Forbidding negative scales*
>>>   Pros: many sources do not support negative scales (so they can create
>>> issues); they were something which was not considered as possible in the
>>> initial implementation, so we get to a more stable situation.
>>>   Cons: some operations which were supported earlier, won't be working
>>> anymore. Eg. since our max precision is 38, if the scale cannot be
>>> negative 1e36 * 1e36 would cause an overflow, while now works fine
>>> (producing a decimal with negative scale); basically impossible to create
>>> a config which controls the behavior.
>>> 
>>> 
>>> 
>>>  *- Handling negative scales in operations*
>>>   Pros: no regressions; we support all the operations we supported on 2.x.
>>> 
>>>   Cons: negative scales can cause issues in other moments, eg. when saving
>>> to a data source which doesn't support them.
>>> 
>>> 
>>> 
>>> Looking forward to hear your thoughts,
>>> Thanks.
>>> Marco
>>> 
>> 
>> 
> 
>

Re: Decimals with negative scale

2018-12-18 Thread Marco Gaido
This is at analysis time.

On Tue, 18 Dec 2018, 17:32 Reynold Xin  Is this an analysis time thing or a runtime thing?
>
> On Tue, Dec 18, 2018 at 7:45 AM Marco Gaido 
> wrote:
>
>> Hi all,
>>
>> as you may remember, there was a design doc to support operations
>> involving decimals with negative scales. After the discussion in the design
>> doc, now the related PR is blocked because for 3.0 we have another option
>> which we can explore, ie. forbidding negative scales. This is probably a
>> cleaner solution, as most likely we didn't want negative scales, but it is
>> a breaking change: so we wanted to check the opinion of the community.
>>
>> Getting to the topic, here there are the 2 options:
>> * - Forbidding negative scales*
>>   Pros: many sources do not support negative scales (so they can create
>> issues); they were something which was not considered as possible in the
>> initial implementation, so we get to a more stable situation.
>>   Cons: some operations which were supported earlier, won't be working
>> anymore. Eg. since our max precision is 38, if the scale cannot be negative
>> 1e36 * 1e36 would cause an overflow, while now works fine (producing a
>> decimal with negative scale); basically impossible to create a config which
>> controls the behavior.
>>
>>  *- Handling negative scales in operations*
>>   Pros: no regressions; we support all the operations we supported on 2.x.
>>   Cons: negative scales can cause issues in other moments, eg. when
>> saving to a data source which doesn't support them.
>>
>> Looking forward to hear your thoughts,
>> Thanks.
>> Marco
>>
>>
>>


Decimals with negative scale

2018-12-18 Thread Marco Gaido
Hi all,

as you may remember, there was a design doc to support operations involving
decimals with negative scales. After the discussion in the design doc, now
the related PR is blocked because for 3.0 we have another option which we
can explore, ie. forbidding negative scales. This is probably a cleaner
solution, as most likely we didn't want negative scales, but it is a
breaking change: so we wanted to check the opinion of the community.

Getting to the topic, here there are the 2 options:
* - Forbidding negative scales*
  Pros: many sources do not support negative scales (so they can create
issues); they were something which was not considered as possible in the
initial implementation, so we get to a more stable situation.
  Cons: some operations which were supported earlier, won't be working
anymore. Eg. since our max precision is 38, if the scale cannot be negative
1e36 * 1e36 would cause an overflow, while now works fine (producing a
decimal with negative scale); basically impossible to create a config which
controls the behavior.

 *- Handling negative scales in operations*
  Pros: no regressions; we support all the operations we supported on 2.x.
  Cons: negative scales can cause issues in other moments, eg. when saving
to a data source which doesn't support them.

Looking forward to hear your thoughts,
Thanks.
Marco


[DISCUSS] Support decimals with negative scale in decimal operation

2018-10-25 Thread Marco Gaido
Hi all,

a bit more than one month ago, I sent a proposal for handling properly
decimals with negative scales in our operations. This is a long standing
problem in our codebase as we derived our rules from Hive and SQLServer
where negative scales are forbidden, while in Spark they are not.

The discussion has been stale for a while now. No more comments on the
design doc:
https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
.

So I am writing this e-mail in order to check whether there are more
comments on it or we can go ahead with the PR.

Thanks,
Marco


Re: SPIP: support decimals with negative scale in decimal operation

2018-09-23 Thread Felix Cheung
DISCUSS thread is good to have...



From: Marco Gaido 
Sent: Friday, September 21, 2018 3:31 AM
To: Wenchen Fan
Cc: dev
Subject: Re: SPIP: support decimals with negative scale in decimal operation

Hi Wenchen,
Thank you for the clarification. I agree that this is more a bug fix rather 
than an improvement. I apologize for the error. Please consider this as a 
design doc.

Thanks,
Marco

Il giorno ven 21 set 2018 alle ore 12:04 Wenchen Fan 
mailto:cloud0...@gmail.com>> ha scritto:
Hi Marco,

Thanks for sending it! The problem is clearly explained in this email, but I 
would not treat it as a SPIP. It proposes a fix for a very tricky bug, and SPIP 
is usually for new features. Others please correct me if I was wrong.

Thanks,
Wenchen

On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido 
mailto:marcogaid...@gmail.com>> wrote:
Hi all,

I am writing this e-mail in order to discuss the issue which is reported in 
SPARK-25454 and according to Wenchen's suggestion I prepared a design doc for 
it.

The problem we are facing here is that our rules for decimals operations are 
taken from Hive and MS SQL server and they explicitly don't support decimals 
with negative scales. So the rules we have currently are not meant to deal with 
negative scales. The issue is that Spark, instead, doesn't forbid negative 
scales and - indeed - there are cases in which we are producing them (eg. a SQL 
constant like 1e8 would be turned to a decimal(1, -8)).

Having negative scales most likely wasn't really intended. But unfortunately 
getting rid of them would be a breaking change as many operations working fine 
currently would not be allowed anymore and would overflow (eg. select 1e36 * 
1). As such, this is something I'd definitely agree on doing, but I think 
we can target only for 3.0.

What we can start doing now, instead, is updating our rules in order to handle 
properly also the case when decimal scales are negative. From my investigation, 
it turns out that the only operations which has problems with them is Divide.

Here you can find the design doc with all the details: 
https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing.
 The document is also linked in SPARK-25454. There is also already a PR with 
the change: https://github.com/apache/spark/pull/22450.

Looking forward to hear your feedback,
Thanks.
Marco


Re: SPIP: support decimals with negative scale in decimal operation

2018-09-21 Thread Marco Gaido
Hi Wenchen,
Thank you for the clarification. I agree that this is more a bug fix rather
than an improvement. I apologize for the error. Please consider this as a
design doc.

Thanks,
Marco

Il giorno ven 21 set 2018 alle ore 12:04 Wenchen Fan 
ha scritto:

> Hi Marco,
>
> Thanks for sending it! The problem is clearly explained in this email, but
> I would not treat it as a SPIP. It proposes a fix for a very tricky bug,
> and SPIP is usually for new features. Others please correct me if I was
> wrong.
>
> Thanks,
> Wenchen
>
> On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido 
> wrote:
>
>> Hi all,
>>
>> I am writing this e-mail in order to discuss the issue which is reported
>> in SPARK-25454 and according to Wenchen's suggestion I prepared a design
>> doc for it.
>>
>> The problem we are facing here is that our rules for decimals operations
>> are taken from Hive and MS SQL server and they explicitly don't support
>> decimals with negative scales. So the rules we have currently are not meant
>> to deal with negative scales. The issue is that Spark, instead, doesn't
>> forbid negative scales and - indeed - there are cases in which we are
>> producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1,
>> -8)).
>>
>> Having negative scales most likely wasn't really intended. But
>> unfortunately getting rid of them would be a breaking change as many
>> operations working fine currently would not be allowed anymore and would
>> overflow (eg. select 1e36 * 1). As such, this is something I'd
>> definitely agree on doing, but I think we can target only for 3.0.
>>
>> What we can start doing now, instead, is updating our rules in order to
>> handle properly also the case when decimal scales are negative. From my
>> investigation, it turns out that the only operations which has problems
>> with them is Divide.
>>
>> Here you can find the design doc with all the details:
>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing.
>> The document is also linked in SPARK-25454. There is also already a PR with
>> the change: https://github.com/apache/spark/pull/22450.
>>
>> Looking forward to hear your feedback,
>> Thanks.
>> Marco
>>
>


Re: SPIP: support decimals with negative scale in decimal operation

2018-09-21 Thread Wenchen Fan
Hi Marco,

Thanks for sending it! The problem is clearly explained in this email, but
I would not treat it as a SPIP. It proposes a fix for a very tricky bug,
and SPIP is usually for new features. Others please correct me if I was
wrong.

Thanks,
Wenchen

On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido  wrote:

> Hi all,
>
> I am writing this e-mail in order to discuss the issue which is reported
> in SPARK-25454 and according to Wenchen's suggestion I prepared a design
> doc for it.
>
> The problem we are facing here is that our rules for decimals operations
> are taken from Hive and MS SQL server and they explicitly don't support
> decimals with negative scales. So the rules we have currently are not meant
> to deal with negative scales. The issue is that Spark, instead, doesn't
> forbid negative scales and - indeed - there are cases in which we are
> producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1,
> -8)).
>
> Having negative scales most likely wasn't really intended. But
> unfortunately getting rid of them would be a breaking change as many
> operations working fine currently would not be allowed anymore and would
> overflow (eg. select 1e36 * 1). As such, this is something I'd
> definitely agree on doing, but I think we can target only for 3.0.
>
> What we can start doing now, instead, is updating our rules in order to
> handle properly also the case when decimal scales are negative. From my
> investigation, it turns out that the only operations which has problems
> with them is Divide.
>
> Here you can find the design doc with all the details:
> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing.
> The document is also linked in SPARK-25454. There is also already a PR with
> the change: https://github.com/apache/spark/pull/22450.
>
> Looking forward to hear your feedback,
> Thanks.
> Marco
>


SPIP: support decimals with negative scale in decimal operation

2018-09-21 Thread Marco Gaido
Hi all,

I am writing this e-mail in order to discuss the issue which is reported in
SPARK-25454 and according to Wenchen's suggestion I prepared a design doc
for it.

The problem we are facing here is that our rules for decimals operations
are taken from Hive and MS SQL server and they explicitly don't support
decimals with negative scales. So the rules we have currently are not meant
to deal with negative scales. The issue is that Spark, instead, doesn't
forbid negative scales and - indeed - there are cases in which we are
producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1,
-8)).

Having negative scales most likely wasn't really intended. But
unfortunately getting rid of them would be a breaking change as many
operations working fine currently would not be allowed anymore and would
overflow (eg. select 1e36 * 1). As such, this is something I'd
definitely agree on doing, but I think we can target only for 3.0.

What we can start doing now, instead, is updating our rules in order to
handle properly also the case when decimal scales are negative. From my
investigation, it turns out that the only operations which has problems
with them is Divide.

Here you can find the design doc with all the details:
https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing.
The document is also linked in SPARK-25454. There is also already a PR with
the change: https://github.com/apache/spark/pull/22450.

Looking forward to hear your feedback,
Thanks.
Marco