Re: Issue in to_timestamp/to_date while handling the quoted literal string

2019-07-25 Thread Suraj Kharage
Thanks for the clarification Brendan, that really helps.

On Wed, Jul 24, 2019 at 6:24 PM Brendan Jurd  wrote:

> Hi Suraj,
>
> I think the documentation is reasonably clear about this behaviour, quote:
>
> " In to_date, to_number, and to_timestamp, literal text and double-quoted
> strings result in skipping the number of characters contained in the
> string; for example "XX" skips two input characters (whether or not they
> are XX)."
>
> I can appreciate that this isn't the behaviour you intuitively expected
> from to_timestamp, and I don't think you'd be the first or the last.  The
> purpose of these functions was never to validate that your input string
> precisely matches the non-coding parts of your format pattern.  For that, I
> think you'd be better served by using regular expressions.
>
> Just as an aside, in the example you gave, the string
> '2019-05-24T23:12:45' will cast directly to timestamp just fine, so it
> isn't the kind of situation to_timestamp was really intended for.  It's
> more for when your input string is in an obscure (or ambiguous) format that
> is known to you in advance.
>
> I hope that helps.
>
> Cheers
> Brendan
>
> On Wed, 24 Jul 2019 at 21:38, Suraj Kharage <
> suraj.khar...@enterprisedb.com> wrote:
>
>> Hi,
>>
>> I noticed the issue in to_timestamp()/to_date() while handling the double
>> quote literal string. If any double quote literal characters found in
>> format, we generate the NODE_TYPE_CHAR in parse format and store that
>> actual character in FormatNode->character. n DCH_from_char, we just
>> increment the input string by length of character for NODE_TYPE_CHAR.
>> We are actually not matching these characters in input string and because
>> of this, date values get changed if quoted literal string is not identical
>> in input and format string.
>>
>> e.g:
>>
>> postgres@78619=#select to_timestamp('2019-05-24T23:12:45',
>> '-mm-dd"TT"hh24:mi:ss');
>>to_timestamp
>> ---
>>  2019-05-24 03:12:45+05:30
>> (1 row)
>>
>>
>> In above example, the quoted string is 'TT', so it just increment the
>> input string by 2 while handling these characters and returned the wrong
>> hour value.
>>
>> My suggestion is to match the exact characters from quoted literal string
>> in input string and if doesn't match then throw an error.
>>
>> Attached is the POC patch which almost works for all scenarios except for
>> whitespace - as a quote character.
>>
>> Suggestions?
>> --
>> --
>>
>> Thanks & Regards,
>> Suraj kharage,
>> EnterpriseDB Corporation,
>> The Postgres Database Company.
>>
>

-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.


Re: Issue in to_timestamp/to_date while handling the quoted literal string

2019-07-24 Thread Brendan Jurd
Hi Suraj,

I think the documentation is reasonably clear about this behaviour, quote:

" In to_date, to_number, and to_timestamp, literal text and double-quoted
strings result in skipping the number of characters contained in the
string; for example "XX" skips two input characters (whether or not they
are XX)."

I can appreciate that this isn't the behaviour you intuitively expected
from to_timestamp, and I don't think you'd be the first or the last.  The
purpose of these functions was never to validate that your input string
precisely matches the non-coding parts of your format pattern.  For that, I
think you'd be better served by using regular expressions.

Just as an aside, in the example you gave, the string '2019-05-24T23:12:45'
will cast directly to timestamp just fine, so it isn't the kind of
situation to_timestamp was really intended for.  It's more for when your
input string is in an obscure (or ambiguous) format that is known to you in
advance.

I hope that helps.

Cheers
Brendan

On Wed, 24 Jul 2019 at 21:38, Suraj Kharage 
wrote:

> Hi,
>
> I noticed the issue in to_timestamp()/to_date() while handling the double
> quote literal string. If any double quote literal characters found in
> format, we generate the NODE_TYPE_CHAR in parse format and store that
> actual character in FormatNode->character. n DCH_from_char, we just
> increment the input string by length of character for NODE_TYPE_CHAR.
> We are actually not matching these characters in input string and because
> of this, date values get changed if quoted literal string is not identical
> in input and format string.
>
> e.g:
>
> postgres@78619=#select to_timestamp('2019-05-24T23:12:45',
> '-mm-dd"TT"hh24:mi:ss');
>to_timestamp
> ---
>  2019-05-24 03:12:45+05:30
> (1 row)
>
>
> In above example, the quoted string is 'TT', so it just increment the
> input string by 2 while handling these characters and returned the wrong
> hour value.
>
> My suggestion is to match the exact characters from quoted literal string
> in input string and if doesn't match then throw an error.
>
> Attached is the POC patch which almost works for all scenarios except for
> whitespace - as a quote character.
>
> Suggestions?
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
>


Issue in to_timestamp/to_date while handling the quoted literal string

2019-07-24 Thread Suraj Kharage
Hi,

I noticed the issue in to_timestamp()/to_date() while handling the double
quote literal string. If any double quote literal characters found in
format, we generate the NODE_TYPE_CHAR in parse format and store that
actual character in FormatNode->character. n DCH_from_char, we just
increment the input string by length of character for NODE_TYPE_CHAR.
We are actually not matching these characters in input string and because
of this, date values get changed if quoted literal string is not identical
in input and format string.

e.g:

postgres@78619=#select to_timestamp('2019-05-24T23:12:45',
'-mm-dd"TT"hh24:mi:ss');
   to_timestamp
---
 2019-05-24 03:12:45+05:30
(1 row)


In above example, the quoted string is 'TT', so it just increment the input
string by 2 while handling these characters and returned the wrong hour
value.

My suggestion is to match the exact characters from quoted literal string
in input string and if doesn't match then throw an error.

Attached is the POC patch which almost works for all scenarios except for
whitespace - as a quote character.

Suggestions?
-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.


to_timestamp_quoted_string_POC.patch
Description: Binary data