[Pharo-dev] Re: P3Converter and DateAndTime>>#readFrom: performance

2020-10-13 Thread Sven Van Caekenberghe
Hi Estaban,

I took some time to properly look at your remarks/suggestions.

You are right, the chronology conversions could be improved and made faster.


I committed the following 
https://github.com/svenvc/P3/commit/dd06eb33d0147060e0b6f44c704e2e298f55c0d4

"As noted/analysed by Esteban A. Maringolo, incoming conversions for 
Timestamp/DateAndTime and Date especially were slow. He suggested to take 
advantage of the known ISO format.

Introduction of P3ISO8601Parser that implements a fast, non-checking parser 
specifically for the PSQL 8601 format.

Use P3ISO8601Parser for the chronology conversions in P3Converter, for a 5x to 
10x speedup.

Note that for this to work, the connection output format must be ISO, the 
default."


I did the following small test to confirm the improvements:

p3client := P3Client url: 'psql://sven@localhost:5432/'.

p3client execute: 'DROP TABLE IF EXISTS table1'.

p3client execute: 'CREATE TABLE table1 (id SERIAL PRIMARY KEY, name TEXT, t1 
TIMESTAMP WITHOUT TIME ZONE DEFAULT CURRENT_TIMESTAMP, t2 TIMESTAMP WITH TIME 
ZONE DEFAULT CURRENT_TIMESTAMP, t0 TIME DEFAULT CURRENT_TIME, d1 DATE DEFAULT 
CURRENT_DATE)'.

1 to: 100 do: [ :each | p3client execute: 'INSERT INTO table1 (name) VALUES 
(''n-' , (each + 999 atRandom) asString , ''')' ].

[ p3client query: 'SELECT * FROM table1' ] bench.

p3client close.

This query (6 columns, 100 rows) went from 68.332 per second to 446.711 per 
second on my machine.


Thanks again for your feedback, it was most useful.

Do these changes work for you ?

Regards,

Sven

> On 12 Oct 2020, at 15:57, Sven Van Caekenberghe  wrote:
> 
> Hi Esteban,
> 
> Thanks for this detailed feedback, this is most interesting.
> 
> I don't have enough time to look at this in detail, I will do so later, but 
> here is some initial feedback.
> 
> Note that PostreSQL supports many different date/time formats and that the 
> user can set preferences per connection, hence some flexibility in parsing is 
> probably needed, and that is one reason why the parsing is implemented the 
> way it is.
> 
> Check section 8.5.2. from 
> https://www.postgresql.org/docs/13/datatype-datetime.html
> 
> Note that all of the following give the same result:
> 
> Date readFrom: 'March 2 2020' readStream.
> Date readFrom: '3/2/2020' readStream.
> Date readFrom: '2020-03-02' readStream.
> 
> Date parsing is probably slow because it has to deal with all these formats.
> 
> Of course, parsing a fixed well-known format is always faster.
> 
> The timezones are tricky too, as preferences can be set per connection, so we 
> have to be careful.
> 
> I will come back to this.
> 
> Sven 
> 
>> On 12 Oct 2020, at 15:04, Esteban Maringolo  wrote:
>> 
>> Hi,
>> 
>> I recently stumbled into some performance issue I had, so I profiled
>> it and got it fixed, but by doing that I also noticed a huge time
>> spent when reading DateAndTimes and Dates (which also uses the former)
>> and the culprit seems to be `Date class>>#readFrom:`.
>> 
>> Digging into this, I noticed that this was called from
>> `DateAndTime>>#readFrom:`, but in particular 90% of the time spent was
>> when instantiating the Date part of it, because it does unnecessary
>> lookup of the index of Month when input stream only have a month
>> index.
>> 
>> What I did then was to change the default behavior of DateAndTime
>> class>>#readFrom:
>> 
>> readFrom: aStream defaultOffset: defaultOffset
>> "Parse and return a new DateAndTime instance from stream,
>> as a Date, an optional Time and an optional TimeZone offset.
>> The time defaults to midnight, the timezone to defaultOffset"
>> "self readFrom: '2013-03-04T23:47:52.876+01:00' readStream"
>> 
>> "From this"
>> date := Date readFrom: aStream.
>> 
>> "To this"
>> date := Date readFrom: (aStream next: 10) pattern: '-mm-dd'.
>> 
>> And also in P3Converter to specify the pattern directly.
>> 
>> P3Converter>>#convertDateFrom: bytes length: length description: description
>> ^Date readFrom: (self asciiStreamFor: bytes length: length) pattern:
>> '-mm-dd'
>> 
>> With these two changes I got a 8x-10x speedup when reading
>> Date/DateAndTime from the database.
>> 
>> Then I dig a little further and in P3Converter added a little check to
>> avoid translating the offset of a DateAndTime when it is not needed.
>> 
>> P3Converter>>#convertDateAndTimeWithoutTimezoneFrom: bytes length:
>> length description: description
>> "TIMESTAMP WITHOUT TIME ZONE (TIMESTAMP) is stored internally in
>> Postgres the way it was inserted, its representation remains constant,
>> with no offset added. We use the timezone of the connection to do the
>> necessary shifting. This assumes that the timezones used during
>> insert/update and query are the same."
>> 
>> | timestamp offset |
>> timestamp := DateAndTime readFrom: (self asciiStreamFor: bytes
>> length: length) defaultOffset: Duration zero.
>> offset := self timezone offsetForTimestamp: timestamp.
>> ^offset isZero "<-- added this check to avoid 

[Pharo-dev] Re: P3Converter and DateAndTime>>#readFrom: performance

2020-10-12 Thread Sven Van Caekenberghe
Hi Esteban,

Thanks for this detailed feedback, this is most interesting.

I don't have enough time to look at this in detail, I will do so later, but 
here is some initial feedback.

Note that PostreSQL supports many different date/time formats and that the user 
can set preferences per connection, hence some flexibility in parsing is 
probably needed, and that is one reason why the parsing is implemented the way 
it is.

Check section 8.5.2. from 
https://www.postgresql.org/docs/13/datatype-datetime.html

Note that all of the following give the same result:

Date readFrom: 'March 2 2020' readStream.
Date readFrom: '3/2/2020' readStream.
Date readFrom: '2020-03-02' readStream.

Date parsing is probably slow because it has to deal with all these formats.

Of course, parsing a fixed well-known format is always faster.

The timezones are tricky too, as preferences can be set per connection, so we 
have to be careful.

I will come back to this.

Sven 

> On 12 Oct 2020, at 15:04, Esteban Maringolo  wrote:
> 
> Hi,
> 
> I recently stumbled into some performance issue I had, so I profiled
> it and got it fixed, but by doing that I also noticed a huge time
> spent when reading DateAndTimes and Dates (which also uses the former)
> and the culprit seems to be `Date class>>#readFrom:`.
> 
> Digging into this, I noticed that this was called from
> `DateAndTime>>#readFrom:`, but in particular 90% of the time spent was
> when instantiating the Date part of it, because it does unnecessary
> lookup of the index of Month when input stream only have a month
> index.
> 
> What I did then was to change the default behavior of DateAndTime
> class>>#readFrom:
> 
> readFrom: aStream defaultOffset: defaultOffset
> "Parse and return a new DateAndTime instance from stream,
> as a Date, an optional Time and an optional TimeZone offset.
> The time defaults to midnight, the timezone to defaultOffset"
> "self readFrom: '2013-03-04T23:47:52.876+01:00' readStream"
> 
> "From this"
> date := Date readFrom: aStream.
> 
> "To this"
> date := Date readFrom: (aStream next: 10) pattern: '-mm-dd'.
> 
> And also in P3Converter to specify the pattern directly.
> 
> P3Converter>>#convertDateFrom: bytes length: length description: description
> ^Date readFrom: (self asciiStreamFor: bytes length: length) pattern:
> '-mm-dd'
> 
> With these two changes I got a 8x-10x speedup when reading
> Date/DateAndTime from the database.
> 
> Then I dig a little further and in P3Converter added a little check to
> avoid translating the offset of a DateAndTime when it is not needed.
> 
> P3Converter>>#convertDateAndTimeWithoutTimezoneFrom: bytes length:
> length description: description
> "TIMESTAMP WITHOUT TIME ZONE (TIMESTAMP) is stored internally in
> Postgres the way it was inserted, its representation remains constant,
> with no offset added. We use the timezone of the connection to do the
> necessary shifting. This assumes that the timezones used during
> insert/update and query are the same."
> 
>  | timestamp offset |
>  timestamp := DateAndTime readFrom: (self asciiStreamFor: bytes
> length: length) defaultOffset: Duration zero.
>  offset := self timezone offsetForTimestamp: timestamp.
>  ^offset isZero "<-- added this check to avoid translation"
>ifTrue: [ timestamp ]
>ifFalse: [ timestamp translateTo: offset ]
> 
> And the speedup went even higher. But I don't know whether this last
> part is correct.
> 
> Of course this seems as a quick and dirty fix, but I don't know
> whether DateAndTime>>readFrom: should read any format or just ISO 8601
> formats nor whether the Date/DateAndTime coming from PostgreSQL are
> always formatted that way. I ran my app's small test suite and haven't
> had any issues, neither during development with these changes applied.
> 
> I think there is a significant slowdown involved in the instantiation
> of Dates that should be fixed, whether this fix is definitive or not
> is why I'm asking here.
> 
> Regards!
> 
> Esteban A. Maringolo