Hi Estaban,
I took some time to properly look at your remarks/suggestions.
You are right, the chronology conversions could be improved and made faster.
I committed the following
https://github.com/svenvc/P3/commit/dd06eb33d0147060e0b6f44c704e2e298f55c0d4
"As noted/analysed by Esteban A. Maringolo, incoming conversions for
Timestamp/DateAndTime and Date especially were slow. He suggested to take
advantage of the known ISO format.
Introduction of P3ISO8601Parser that implements a fast, non-checking parser
specifically for the PSQL 8601 format.
Use P3ISO8601Parser for the chronology conversions in P3Converter, for a 5x to
10x speedup.
Note that for this to work, the connection output format must be ISO, the
default."
I did the following small test to confirm the improvements:
p3client := P3Client url: 'psql://sven@localhost:5432/'.
p3client execute: 'DROP TABLE IF EXISTS table1'.
p3client execute: 'CREATE TABLE table1 (id SERIAL PRIMARY KEY, name TEXT, t1
TIMESTAMP WITHOUT TIME ZONE DEFAULT CURRENT_TIMESTAMP, t2 TIMESTAMP WITH TIME
ZONE DEFAULT CURRENT_TIMESTAMP, t0 TIME DEFAULT CURRENT_TIME, d1 DATE DEFAULT
CURRENT_DATE)'.
1 to: 100 do: [ :each | p3client execute: 'INSERT INTO table1 (name) VALUES
(''n-' , (each + 999 atRandom) asString , ''')' ].
[ p3client query: 'SELECT * FROM table1' ] bench.
p3client close.
This query (6 columns, 100 rows) went from 68.332 per second to 446.711 per
second on my machine.
Thanks again for your feedback, it was most useful.
Do these changes work for you ?
Regards,
Sven
> On 12 Oct 2020, at 15:57, Sven Van Caekenberghe wrote:
>
> Hi Esteban,
>
> Thanks for this detailed feedback, this is most interesting.
>
> I don't have enough time to look at this in detail, I will do so later, but
> here is some initial feedback.
>
> Note that PostreSQL supports many different date/time formats and that the
> user can set preferences per connection, hence some flexibility in parsing is
> probably needed, and that is one reason why the parsing is implemented the
> way it is.
>
> Check section 8.5.2. from
> https://www.postgresql.org/docs/13/datatype-datetime.html
>
> Note that all of the following give the same result:
>
> Date readFrom: 'March 2 2020' readStream.
> Date readFrom: '3/2/2020' readStream.
> Date readFrom: '2020-03-02' readStream.
>
> Date parsing is probably slow because it has to deal with all these formats.
>
> Of course, parsing a fixed well-known format is always faster.
>
> The timezones are tricky too, as preferences can be set per connection, so we
> have to be careful.
>
> I will come back to this.
>
> Sven
>
>> On 12 Oct 2020, at 15:04, Esteban Maringolo wrote:
>>
>> Hi,
>>
>> I recently stumbled into some performance issue I had, so I profiled
>> it and got it fixed, but by doing that I also noticed a huge time
>> spent when reading DateAndTimes and Dates (which also uses the former)
>> and the culprit seems to be `Date class>>#readFrom:`.
>>
>> Digging into this, I noticed that this was called from
>> `DateAndTime>>#readFrom:`, but in particular 90% of the time spent was
>> when instantiating the Date part of it, because it does unnecessary
>> lookup of the index of Month when input stream only have a month
>> index.
>>
>> What I did then was to change the default behavior of DateAndTime
>> class>>#readFrom:
>>
>> readFrom: aStream defaultOffset: defaultOffset
>> "Parse and return a new DateAndTime instance from stream,
>> as a Date, an optional Time and an optional TimeZone offset.
>> The time defaults to midnight, the timezone to defaultOffset"
>> "self readFrom: '2013-03-04T23:47:52.876+01:00' readStream"
>>
>> "From this"
>> date := Date readFrom: aStream.
>>
>> "To this"
>> date := Date readFrom: (aStream next: 10) pattern: '-mm-dd'.
>>
>> And also in P3Converter to specify the pattern directly.
>>
>> P3Converter>>#convertDateFrom: bytes length: length description: description
>> ^Date readFrom: (self asciiStreamFor: bytes length: length) pattern:
>> '-mm-dd'
>>
>> With these two changes I got a 8x-10x speedup when reading
>> Date/DateAndTime from the database.
>>
>> Then I dig a little further and in P3Converter added a little check to
>> avoid translating the offset of a DateAndTime when it is not needed.
>>
>> P3Converter>>#convertDateAndTimeWithoutTimezoneFrom: bytes length:
>> length description: description
>> "TIMESTAMP WITHOUT TIME ZONE (TIMESTAMP) is stored internally in
>> Postgres the way it was inserted, its representation remains constant,
>> with no offset added. We use the timezone of the connection to do the
>> necessary shifting. This assumes that the timezones used during
>> insert/update and query are the same."
>>
>> | timestamp offset |
>> timestamp := DateAndTime readFrom: (self asciiStreamFor: bytes
>> length: length) defaultOffset: Duration zero.
>> offset := self timezone offsetForTimestamp: timestamp.
>> ^offset isZero "<-- added this check to avoid