RE: text fields and performance for ETL

2021-11-04 Thread Grega Jesih
David,

> Can you demonstrate, with actual numbers, using today's implementation, a 
> situation where defining a column as char(3) or varchar(3) instead of text 
> has a significant performance improvement?

Sure I can.

But first, what am I to prove ?  A performance in dataflow from server A to 
server B.

What is that performance change based on ? It is based on a fact that the tool 
that pumps the data may calculate block size and thus work with several
rows as a time instead of a row at a time.

Is this Postgres performance related ? NO. It is model design related.

So why would it then be published ? Because inexperienced programmers take your 
statement that "it's the same performance" from a wrong perspective, so it 
would be fair to note, that the remark "it's the same performance" is meant 
"within any operation inside Postgres database". In the moment we want to take 
this data elsewhere,
the problematic lack of model design comes out.

Now for the demo: I can make a video to see it, but here is a column in context 
that talks about the thing:

https://dba.stackexchange.com/questions/102830/avoiding-row-by-row-fetch-method-when-dealing-with-source-lob-columns

If this contribution from stackexchange isn't enough, let me know.

BR
Grega






From: David G. Johnston 
Sent: Wednesday, November 3, 2021 7:40 PM
To: Grega Jesih 
Cc: Tom Lane ; grega.je...@gmail.com; Pg Docs 

Subject: Re: text fields and performance for ETL

On Wed, Nov 3, 2021 at 11:09 AM Grega Jesih 
mailto:grega.je...@actual-it.si>> wrote:

The new architectures include more and more data exchange among databases.
Now when you deal with bigger data sizes that go from millions to billions, 
this fixed size vs of text - undefined size becomes very  relevant.
Can you demonstrate, with actual numbers, using today's implementation, a 
situation where defining a column as char(3) or varchar(3) instead of text has 
a significant performance improvement?  Without a concrete example to examine 
I'm unable to be convinced to move away from the status quo.

You also need to convince me as to why constraints are an insufficient feature. 
 i.e., why is char(3) better than (check length(val) = 3)?

Even with all that I'd probably still not do anything beyond reviewing a 
proposed patch (i.e, I wouldn't try to write one myself from scratch...I don't 
have authority to commit regardless).

David J.


NOTICE - NOT TO BE REMOVED.
This e-mail and any attachments are confidential and may contain legally 
privileged information and/or copyright material of Actual I.T. or third 
parties. If you are not an authorised recipient of this e-mail, please contact 
Actual I.T. immediately by return email or by telephone or facsimile on the 
above numbers.
You should not read, print, re-transmit, store or act in reliance on this email 
or any attachments and you should destroy all copies of them.


Re: text fields and performance for ETL

2021-11-04 Thread David G. Johnston
On Thursday, November 4, 2021, Grega Jesih  wrote:

>
> So why would it then be published ? Because inexperienced programmers take
> your statement that "it's the same performance" from a wrong perspective,
> so it would be fair to note, that the remark "it's the same performance" is
> meant "within any operation inside Postgres database". In the moment we
> want to take this data elsewhere,
>
> the problematic lack of model design comes out.
>
> If you wish to supply an actual patch for consideration I’d review it.
Absent that the documentation serves the vast majority of readers well
as-is.  We’re entitled to a bit of self-centeredness here, especially when
the broader world is so varied.

David J.