I wrote a summary of lesser-known Postgres type system information

2024-04-20 Thread Guyren Howe
At PostgresConf 2024, I gave a presentation about PostgreSQL types (thanks to 
folks on this list for the assist with that).

I wrote up a summary of it here:

https://lydb.xyz/postgres-types/

Re: Thoughts on user-defined types for talk at Postgres conference?

2024-03-06 Thread Guyren Howe
Thanks for that!

It seems as though a Composite Type having only fixed-length fields should be 
able to be regarded as a fixed-length value.
On 6 Mar 2024 at 09:45 -0800, Stephen Frost , wrote:
> Greetings,
>
> * Guyren Howe (guy...@gmail.com) wrote:
> > But what *really* sets Postgres apart from comparable systems is user 
> > defined types. I would like to carefully lay out how to define and use a 
> > user-defined type (I don’t think I have time to dig into doing fancy stuff 
> > with C functions, so just the basic “user defined sum type”), but also any 
> > gotchas.
>
> What I take it you're referring to here are most specifically called
> "Composite Types", as discussed here:
>
> https://www.postgresql.org/docs/current/sql-createtype.html
>
> > And I’d like to finish with some thoughts about when and how to use 
> > user-defined types. My feeling is that this feature is greatly under-used, 
> > mostly because it’s so non-standard. But AFAICT, user-defined types are 
> > fine and other than some ugliness due to SQL (mainly needing parentheses in 
> > some unexpected places), fields in a user defined type work perfectly well 
> > in Postgres’s SQL. I guess you’d need to pull them apart for values 
> > returned to clients, but that isn’t difficult.
>
> Yeah, it's a bit grotty when you're trying to reference them (as you
> note with the comment about the extra parentheses) but it's also a bit
> painful building them up to supply as input (though not impossible).
>
> > So, any gotchas with user defined types? Any thoughts about designing with 
> > them?
>
> The biggest gotcha (which is a bit different from the usability issues,
> which we discuss above) from my perspective is the serialization of
> composite types- we use up a lot more space to store a composite type
> that looks like:
>
> CREATE TYPE twoint AS (a int, b int);
>
> then if we create a table as:
>
> CREATE TABLE t1 (a int, b int);
>
> Let's show this, follow the above two commands, do:
>
> CREATE TABLE t2 (c1 twoint);
>
> INSERT INTO t1 VALUES (1,2);
> INSERT INTO t2 VALUES ('(1,2)');
>
> =*# select pg_column_size(a) from t1;
> pg_column_size
> 
> 4
> (1 row)
>
> =*# select pg_column_size(b) from t1;
> pg_column_size
> 
> 4
> (1 row)
>
> =*# select pg_column_size(a) from t2;
> pg_column_size
> 
> 29
> (1 row)
>
> We can see it for the whole row too:
>
> =*# select pg_column_size(t1.*) from t1;
> pg_column_size
> 
> 32
> (1 row)
>
> =*# select pg_column_size(t2.*) from t2;
> pg_column_size
> 
> 53
> (1 row)
>
> That's an additional 21 bytes, which is really quite a lot. What's
> included in those 21 bytes are the length (since it's now a
> variable-length column to PG, unlike the case with the individual int
> columns in the table where we know from the table structure the length
> of them), the type information (typmod if there is one and the OID of
> the composite type), some other info, and then the actual data of the
> two int values inside of the composite type.
>
> I've often wondered if there's a way to reduce this overhead, as I do
> think that plus some improvements on the usability side would go a long
> way to making composite types more appealing to users. Still, they are
> certainly handy in some instances, I just can't recommend heavy use of
> them for large data sets where size is particularly important (such as
> in heavy OLTP environments) due to their overhead.
>
> Thanks!
>
> Stephen


Thoughts on user-defined types for talk at Postgres conference?

2024-03-02 Thread Guyren Howe
I am doing a talk at Postgres Conf about Postgres’s type system. I already 
asked about this and got some great responses:

https://www.postgresql.org/message-id/flat/b82f8886db61a3395e6eab94981258274cdbacd3.camel%40cybertec.at#f13288b72bd2d564ee758c1a273652ad

Those responses discussed mostly gotchas with built-in types. Lots of good 
stuff, thanks, all!

But what *really* sets Postgres apart from comparable systems is user defined 
types. I would like to carefully lay out how to define and use a user-defined 
type (I don’t think I have time to dig into doing fancy stuff with C functions, 
so just the basic “user defined sum type”), but also any gotchas.

And I’d like to finish with some thoughts about when and how to use 
user-defined types. My feeling is that this feature is greatly under-used, 
mostly because it’s so non-standard. But AFAICT, user-defined types are fine 
and other than some ugliness due to SQL (mainly needing parentheses in some 
unexpected places), fields in a user defined type work perfectly well in 
Postgres’s SQL. I guess you’d need to pull them apart for values returned to 
clients, but that isn’t difficult.

So, any gotchas with user defined types? Any thoughts about designing with them?


Re: Content for talk on Postgres Type System at PostgresConf

2024-02-29 Thread Guyren Howe
On 29 Feb 2024, at 14:51, Tom Lane  wrote:
> 
>>> - time with time zone *does* store the time zone, but this isn’t
>>> actually useful and should be avoided (I’m not entirely sure why and the
>>> docs only gesture at the problems without stating them, IIRC)
> 
>> No it doesn't store the time zone.  Nor do the docs say they do.  And
>> clearly point out the issue that evaluating a time zone without both date
>> and time inputs is basically pointless.
> 
> timetz *does* store a time zone, in the sense of storing a numeric
> offset from UTC (i.e., "so many minutes east or west of Greenwich").
> The problem is that in most real-world applications your notion of
> "time zone" probably includes annual DST changes, which timetz can't
> represent.  I don't say the type is completely useless, but its
> usefulness is a lot less than you might guess.

The closest I can come to this in the docs is:

"The appropriate time zone offset is recorded in the time with time zone value 
and is output as stored; it is not adjusted to the active time zone.”

I expect to be submitting some documentation updates as part of this project, 
fwiw.

Re: Content for talk on Postgres Type System at PostgresConf

2024-02-29 Thread Guyren Howe
On 28 Feb 2024, at 17:08, Guyren Howe  wrote:
> 
> I am to talk about Postgres’s type system at PGConf:
> 
> https://postgresconf.org/conferences/2024/program/proposals/postgres-s-type-system
> 
> I picked the issue because I think it’s poorly understood, greatly 
> under-discussed, and an excellent way to empower postgres users.
> 
> I am reasonably conversant with the issue. I’m not looking for others to 
> write the talk for me, but in order to make the best talk I can, I’m asking:
> 
> What would you want to see in such a talk?
> 
> I’m planning on covering:
> 
> - The built-in types that are underused and their advantages (eg inet)
> - domains
> - such things as details of arrays including multidimensional arrays
> - user-defined types, their relationship to tables, and generally how to use 
> them
> 
> I would spend most of the time discussing ways to make effective use of 
> types. Some examples:
> - defining functions of rows so the table can be used kind of like a set of 
> objects (including the dot notation for invoking functions)
> - using UDFs to make code clearer eg if you have an idiosyncratic functional 
> index, define it using a function of the row, so it’s easy to get right when 
> querying
> - using UDFs as a kind of better domain. eg differentiating imperial from 
> metric units by requiring an explicit constructor, not just accepting any old 
> number
> 
> I would mention enumerated types, although I’m inclined to advise that their 
> inflexibility (eg can’t delete or rearrange them) means that a related table 
> is probably better (I’d be delighted to be proved wrong).
> 
> Custom Range Types are an interesting feature, but I can’t think of a good 
> use case. Any good examples?
> 
> Automatic casting is a feature I’m aware of, but I’d be interested in any 
> cool use cases.
> 
> 
> Anything I’m missing? Any existing good discussions of the subject?

Any gotchas about user defined types or arrays? I was only planning on 
discussing simple user defined types, without going into writing the optional 
features in C.

What is worth saying about custom range types? What even would be a good use 
case? A range of strings or ip address ranges, something like that, I suppose?

Is there a good case for enumerations versus a fk to a table? I guess if you’re 
sure you won’t need to change the values in the enumeration? Days of the week, 
that sort of thing.

Re: Content for talk on Postgres Type System at PostgresConf

2024-02-29 Thread Guyren Howe
In case I’m not correct on the issues with these types, myself, what are the 
misconceptions, or where might I find them for  myself?

My current understanding:

• character is fixed-length, blank-padded. Not sure when you’d want that, but 
it seems clear. Is the name just confusing?
• timestamptz is just converted to a timestamp in UTC. Folks might imagine that 
it stores the time zone but it doesn’t.
• time with time zone *does* store the time zone, but this isn’t actually 
useful and should be avoided (I’m not entirely sure why and the docs only 
gesture at the problems without stating them, IIRC)
• money is a fixed-point decimal value, the number of decimal places is locale 
determined. I’m not aware of any particular problems with that

On 29 Feb 2024 at 01:11 -0800, Laurenz Albe , wrote:
> On Wed, 2024-02-28 at 17:08 -0800, Guyren Howe wrote:
> > I am to talk about Postgres’s type system at PGConf:
> >
> > https://postgresconf.org/conferences/2024/program/proposals/postgres-s-type-system
> >
> > I picked the issue because I think it’s poorly understood, greatly
> > under-discussed, and an excellent way to empower postgres users.
> >
> > What would you want to see in such a talk?
>
> It might be good to explain how "timestamp with time zone" works.
> That's often confusing for beginners, because it is different from
> other databases and arguably deviates from the SQL standard.
>
> Perhaps it would be good to warn people about using data types like
> "character", "time with time zone" and "money".
>
> Yours,
> Laurenz Albe


Content for talk on Postgres Type System at PostgresConf

2024-02-28 Thread Guyren Howe
I am to talk about Postgres’s type system at PGConf:

https://postgresconf.org/conferences/2024/program/proposals/postgres-s-type-system

I picked the issue because I think it’s poorly understood, greatly 
under-discussed, and an excellent way to empower postgres users.

I am reasonably conversant with the issue. I’m not looking for others to write 
the talk for me, but in order to make the best talk I can, I’m asking:

What would you want to see in such a talk?

I’m planning on covering:

- The built-in types that are underused and their advantages (eg inet)
- domains
- such things as details of arrays including multidimensional arrays
- user-defined types, their relationship to tables, and generally how to use 
them

I would spend most of the time discussing ways to make effective use of types. 
Some examples:
- defining functions of rows so the table can be used kind of like a set of 
objects (including the dot notation for invoking functions)
- using UDFs to make code clearer eg if you have an idiosyncratic functional 
index, define it using a function of the row, so it’s easy to get right when 
querying
- using UDFs as a kind of better domain. eg differentiating imperial from 
metric units by requiring an explicit constructor, not just accepting any old 
number

I would mention enumerated types, although I’m inclined to advise that their 
inflexibility (eg can’t delete or rearrange them) means that a related table is 
probably better (I’d be delighted to be proved wrong).

Custom Range Types are an interesting feature, but I can’t think of a good use 
case. Any good examples?

Automatic casting is a feature I’m aware of, but I’d be interested in any cool 
use cases.


Anything I’m missing? Any existing good discussions of the subject?

Good overview of programming IN Postgres?

2024-01-29 Thread Guyren Howe
Is there a good overview of programming WITHIN Postgres? Not writing code in 
client languages that communicates WITH Postgres, but how to take full 
advantage of Postgres’ unique programming features — the advanced type system, 
the various PL languages, triggers, stored procedures, functions, …

I’m thinking both how all those things work, but also how to think about 
effective architectures given that design palette.

The Postgres documentation is great and is the ultimate reference, but is there 
a good overview for someone who wants to take best advantage of all these 
features?

Thinking if there isn’t such a thing that I might write it. But I don’t want to 
duplicate effort.


Re: Converting sql anywhere to postgres

2023-08-16 Thread Guyren Howe
For some reason, I was thinking the rule could see just the fields from the 
command, but you’re right; a rule won’t work. Sorry.

Guyren G Howe
On Aug 15, 2023 at 23:22 -0700, Russell Rose | Passfield Data Systems 
, wrote:
> I have just had a quick look at rules and I am not sure how it can be done. 
> Rules still use the concept of NEW and OLD. If my original row has 'myfield' 
> set to 'me' then I don't think I can tell the difference between:
>
> Update mytable set afield='something'
> and
> Update mytable set afield='something',myfield='me'
>
> Within the rule I think NEW.myfield will be set to 'me' in both cases. Please 
> can you explain how I can tell the difference between the two update 
> statements
>


Re: Converting sql anywhere to postgres

2023-08-15 Thread Guyren Howe
I’m fairly confident you can do this using a Rule. It would even be fairly 
simple.

Be careful, though: Rules are Postgres’ biggest potential foot gun.

Guyren G Howe
On Aug 15, 2023 at 08:05 -0700, Russell Rose | Passfield Data Systems 
, wrote:
> Hi there
>
> I am trying to convert a SQL Anywhere database to postgres. Within SQL 
> anywhere a field can have a default value of ‘last user’. This means that 
> when you perform an update on a table, if the field is not explicitly set 
> then the current user is used. So for instance if I have a field called 
> mod_user in a table, but when I do an update on the table and do not set 
> mod_user then SQL Anywhere sets the field to current_uer. I have tried to 
> replicate this using a postgres trigger in the before update. However, if I 
> do not set the value then it automatically picks up the value that was 
> already in the field. Is there a way to tell the difference between me 
> setting the value to the same as the previous value and postgres 
> automatically picking it up.
>
> If the field myfield contains the word ‘me’. Can I tell the difference 
> between:
> Update table1 set field1=’something’,myfield=’me’
> And
> Update table1 set field1=’something’
>


Re: Large scale reliable software system

2023-06-26 Thread Guyren Howe
Correct. It’s a tragically wrong piece of folk wisdom that’s pretty general 
across web development communities.

> On Jun 26, 2023, at 21:32, Michael Nolan  wrote:
> 
> It's not just Ruby, dumb databases are preferred in projects like
> WordPress, Drupal and Joomla, too.
> 
> Now, if it's because they're used to using MySQL, well maybe that's
> not so hard to understand.  :-)
> 
> On Mon, Jun 26, 2023 at 8:05 PM Guyren Howe  wrote:
>> 
>> This is a reasonable answer, but I want to offer a caveat.
>> 
>> Likely because of the influence of the originator of Ruby on Rails, it is 
>> close to holy writ in the web development community that the database must 
>> be treated as a dumb data bucket and all business logic must be implemented 
>> in the Ruby or Python or whatever back end code.
>> 
>> This heuristic is nearly always mostly wrong.
>> 
>> Guyren G Howe
>> On Jun 26, 2023 at 17:48 -0700, Adrian Klaver , 
>> wrote:
>> 
>> On 6/26/23 16:48, B M wrote:
>> 
>> Dear all,
>> 
>> After greeting,
>> 
>> I taught PostgreSQL myself and developed a small scale
>> experimentalsoftware system using PostgreSQL in the back-end.
>> 
>> I would like to know your advices to develop a large scale reliable
>> software system using PostgreSQL in the back-end, through which i can
>> share the storage with the different system users where they login to
>> the system through the web application front-end with different
>> passwords and usernames , save the privacy of each user data, improve
>> overall system security and performance, achieve fast response, make
>> backups and save the stored data from loss. The system will be hosted on
>> a cloud.
>> 
>> 
>> https://www.djangoproject.com/
>> 
>> 
>> Thank you in advance.
>> 
>> 
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>> 
>> 
>> 



Re: Large scale reliable software system

2023-06-26 Thread Guyren Howe
This is a reasonable answer, but I want to offer a caveat.

Likely because of the influence of the originator of Ruby on Rails, it is close 
to holy writ in the web development community that the database must be treated 
as a dumb data bucket and all business logic must be implemented in the Ruby or 
Python or whatever back end code.

This heuristic is nearly always mostly wrong.

Guyren G Howe
On Jun 26, 2023 at 17:48 -0700, Adrian Klaver , 
wrote:
> On 6/26/23 16:48, B M wrote:
> > Dear all,
> >
> > After greeting,
> >
> > I taught PostgreSQL myself and developed a small scale
> > experimentalsoftware system using PostgreSQL in the back-end.
> >
> > I would like to know your advices to develop a large scale reliable
> > software system using PostgreSQL in the back-end, through which i can
> > share the storage with the different system users where they login to
> > the system through the web application front-end with different
> > passwords and usernames , save the privacy of each user data, improve
> > overall system security and performance, achieve fast response, make
> > backups and save the stored data from loss. The system will be hosted on
> > a cloud.
>
> https://www.djangoproject.com/
>
> >
> > Thank you in advance.
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>


Re: Question about where to deploy the business logics for data processing

2023-06-09 Thread Guyren Howe
People change applications and programming languages all the time.

But change the database? Particularly away from Postgres, which is for nearly 
any purpose clearly the best SQL database available?

You have to pick one. Heck, write your triggers and stored procedures in Python 
and you can change to SQL Server, or in Java and you have the option of Oracle.

There is never a good reason to use MySQL. :-)

Guyren G Howe
On Jun 9, 2023 at 13:39 -0700, Ron , wrote:
> You can be sure that banks and academic research projects have different 
> needs.  Heck, your University's class scheduling software has different needs 
> from the research problems that you support.
>
> The bottom line is that putting all of the "business" logic in TypeORM locks 
> you into using an ORM, while putting as much "business" logic in database as 
> stored procedures, triggers, foreign keys, etc... doesn't.  Parts of the 
> application can be in Java, some in JS, C, C++, Rust, Perl, even COBOL.
>
> On the other hand, putting so much logic into the database essentially locks 
> you into that RDBMS.
>
>
> On 6/9/23 13:36, Nim Li wrote:
> > Hello,
> >
> > Thank you so so much for all the feedback so far.  :D
> >
> > About this comment:
> >
> > > "... an application that requires changing the data model does not seem 
> > > to be well designed...don't allow model change by the business logic..."
> >
> > I work in a science research faculity.  When researchers start a project, 
> > they don't necessary get the full picture of what they are hoping to achive 
> > (yet they may get some ideas about the starting point that allow them to 
> > move forward)  By the time they see 40% percent of what they have done, 
> > they may start to have a different thought and move towards a different 
> > direction, or in some cases, they may spin it off to something different 
> > after a certain period of time  Coming with my Agile Development mindset in 
> > the research area, it is common for me to see users changing their 
> > requirement and expectation, with the same buckets for the data.  Yes, 
> > there is quite a lot of work to keep the researchers happy.  ;-)
> >
> > I suppose when there is a specific end-goal to achive for a project, a more 
> > specific design can be more feasible based on the goal.  But when the 
> > end-goal is not necessary clear, and/or change-able, I am not exactly clear 
> > how we may draw a black-and-white line to determine a design is good or not 
> > (.. and for how long...)
> >
> > I imagine one option may be to put less logics and restrictions on the data 
> > side, which allows the researchers to have more flexible on their end.  But 
> > this may not be always feasbile due to the specific protocol of the study.  
> > Perhaps there may be some other approaches and/or principles to deal with 
> > situation like mine?
> >
> > My major focus is still on getting more opinions about where to implement 
> > the business logics for data processing ...  if you have any thoughts about 
> > the design, I would love to hear your thoughts as well.
> >
> > Thank you so so much for sharing!
> >
> > On Fri, Jun 9, 2023 at 12:35 PM Lorusso Domenico  
> > wrote:
> > > > Uhm me need to start form 2 concepts:
> > > >
> > > > 1. competence
> > > > 2. Network lag
> > > >
> > > > Competence: usually programmers aren't skilled enough about the 
> > > > architectures and the actual needs of each layer.
> > > > This is a problem, because often programmers try to do something with 
> > > > what he already know (e.g. perform join in Java).
> > > >
> > > > A correct design requires to identify at least the data logic, the 
> > > > process logic, the business logic and the presentation logic.
> > > >
> > > > One of the most important goals of Data logic is to ensure the 
> > > > correctness of data from many point of view (all is impossible).
> > > >
> > > > That involve:
> > > >
> > > > • audit information
> > > > • bitemporal management
> > > > • strictly definition and verification of data (foreign key, checks, 
> > > > management of compatibility)
> > > > • replicate consistently data for different usage
> > > > • isolate access for actual needs
> > > > • design
> > > >
> > > > So an application that requires changing the data model does not seem 
> > > > to be well designed...
> > > >
> > > > Network lag
> > > > The first problem is latency, I must minimize the passage of data over 
> > > > the network.
> > > > This means, for example, creating a service that allows the caller to 
> > > > choose only the information it needs.
> > > > But it also means, to get all the information needed in a single call, 
> > > > design asynchronous service, use cache data physically near to the 
> > > > frontend or the middle layer.
> > > >
> > > > Based on these 2 concepts I suggest:
> > > >
> > > > • develop the Data logic near or inside the database;
> > > > • design powerful and addictive api;
> > > > • don't allow model change by the 

Re: massive update on gin index

2022-09-14 Thread Guyren Howe
You might consider defining a phone type that includes your “type” information, 
and just having an array of those, if you really want to do something like this.

But a related table instead would be the obvious answer.

> On Sep 14, 2022, at 12:33 , Marcos Pegoraro  > wrote:
> 
> In a table with people's info I have 3 phone numbers, mobile, work and home. 
> But then some have 2 mobiles, some have 2 work numbers, so decided to test it 
> as an array of json. I know I could have another table for that, but I was 
> just testing.
> 
> So my original table had
> Mobile, Work, Home and all of them are btree indexed.
> 
> Then added a jsonb field and updated it with those 3 phone numbers on it
> [{"phone": 2236279878, "type": 1}, {"phone": 22998432631, "type": 2}]
> [{"phone": 22996783278, "type": 2}]
> create index idxPhones on People using gin(Phones)
> 
> If I select using old or new fields, both uses index and Execution Time is 
> similar
> explain analyze select * from People where Phones @> '[{"phone": 
> 2236279878}]';
> explain analyze select * from People where Mobile = 2236279878 or Work = 
> 2236279878 or Home = 2236279878;
>  
> But then I repeated 2 or 3 times that update which stores those 3 phones on 
> json and then my gin index became slow, very very slow, why ?
> 
> select using btree on 3 phone numbers - Execution Time: 0.164 ms
> select using gin on json on first update - Execution Time: 0.220 ms
> select using gin on json next to 2 or 3 updates - Execution Time: 11.220 ms
> 
> And that execution time will come back to 0.220 ms only if I recreate the 
> index.
> 
> Then I found gin_pending_list_limit and fast_update which I think are used to 
> update GIN indexes, but didn´t find any examples of both.
> 
> What am I missing ? That gin index needs to have some more options or 
> attributes on it ?
> I know in a day by day use I'll never do that massive update twice but just 
> to understand when will this index be updated ?
> 
> Thanks
> Marcos



Are stored procedures/triggers common in your industry

2022-04-20 Thread Guyren Howe
I’ve really only ever worked in web development. 90+% of web developers regard 
doing anything at all clever in the database with suspicion.

I’m considering working on a book about implementing business logic in 
Postgres, and I’m curious about how common that actually is.


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-13 Thread Guyren Howe

> > The MySQL autocomplete is designed without context filtering. Maybe we can 
> > have this implementation too (as alternative)
> >
> > so using all column names + all table names + aliases.column names (when we 
> > know defined alias)
> >
> > Another idea about column excluding. Any implementation on the client side 
> > is very complex, because you need to parse sql. But maybe we can enhance 
> > SQL with some syntax.
> >
> > SELECT ... and LAST optional clause can be HIDE, OR EXCLUDE NAME or EXCLUDE 
> > TYPE
> >
> > SELECT * FROM postgistab HIDE TYPE LIKE 'wkb%'
> > SELECT * FROM postgistab ORDER BY ... EXCLUDE COLUMN LIKE 'shape%'
> >
> > WITH x AS (SELECT * FROM xx)
> > SELECT * FROM x EXCLUDE COLUMN x1,x2
> >
> > The column excluding should be separate *last* clase.
> >
> > More with this syntax is less possibility of collision with ANSI SQL
Not against this. Seems somewhere in here might be a nice quality of life 
change.

Still.

I originally suggested that SQL is terrible and we should fearlessly either 
replace it or migrate it toward something better. And the thread winds up with 
a debate about a minor change to a development tool.

I’m back to just having no earthly idea why anyone who finds relations to be a 
productive tool for building a model would think that SQL being the only means 
to do that is Okay.

SMH.


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-10 Thread Guyren Howe
I get all this. Give me a couple million bucks, and I’ll hire some of the 
Postgres devs to build a new database. We could crib some of the low-level code 
from Postgres, but everything above the low level would need to be rewritten.

I was proposing more that we at least provide higher-level, more general, 
orthogonal etc features in the SQL we have now. eg first-class functions could 
be added to SQL reasonably easily.
On Feb 10, 2022, 22:32 -0800, Tom Lane , wrote:
> Raymond Brinzer  writes:
> > Will it be accepted here? I don't know; I'm not an insider, or in a
> > position to say. But it'd be a much better pitch than a pep talk, or
> > speaking in generalities about SQL. And that's coming from someone who
> > actually agrees with you. I'm 100% on board with the idea that something
> > better is (badly) needed. But is the idea, here, really to talk a highly
> > successful project into doing a 180 based on this sort of argument? If
> > only the people writing the code saw the light, they'd go read the Datomic
> > site, and start overhauling PostgreSQL?
>
> Nah, probably not. I mean, not only are we implementing SQL, but
> we're implementing it in C. I used better languages than C back
> in the seventies ... but here we are. Practical acceptance is
> all about infrastructure and compatible tooling, which SQL and C
> both have in spades, while academic designs really don't.
>
> Also, I fear this discussion underestimates the difficulty of
> putting some other query language on top of Postgres. I know
> you'll say "but the Berkeley guys pasted SQL onto a QUEL engine
> back when, so how hard can it be?" In the first place, that
> was done on top of maybe ten years worth of work, but now there's
> another twenty-five years of development agglomerated on top of
> that. So moving things would be more than 3X harder, even if
> you make the very-naive assumption that the difficulty is merely
> linear. In the second place, QUEL and SQL aren't that far apart
> conceptually, and yet we've still had lots of problems that can
> be traced to their incompatibilities. Something that was really
> different from SQL would be a nightmare to embed into PG. I'll
> just point out one example: if you don't like SQL's semantics for
> NULL (which no I don't much like either), changing that would
> probably require touching tens of thousands of lines of code just
> in the PG core, never mind breaking every API used by extensions.
>
> So for better or worse, Postgres is a SQL engine now. If you
> want Datalog or $other_language, you'd be better off starting
> or contributing to some other project.
>
> That's not to say that we can't do stuff around the margins.
> The idea of "select all columns except these", for instance,
> has been discussed quite a bit, and would probably happen if
> we could get consensus on the syntax. But we're not going to
> throw away thirty-five years' worth of work to chase some
> blue-sky ideas.
>
> regards, tom lane


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-10 Thread Guyren Howe
I’m not proposing some crackpot half-baked idea here. There are well-defined 
and researched alternatives to SQL.

The most fully-developed you-can-use-today offering is Datomic, which uses 
Datalog as its query language. If you know Prolog, and how that is kind of 
database-like, Datomic is pretty much a variant of Prolog.

https://www.datomic.com

I don’t use it because it’s closed source.
On Feb 10, 2022, 21:15 -0800, Raymond Brinzer , wrote:
> > On Thu, Feb 10, 2022 at 11:56 PM Guyren Howe  wrote:
> > > I feel like anyone who is defending SQL here isn’t aware of how much 
> > > better the alternatives are, and how bad SQL really is.
> >
> > Have you written a language description we can read and talk about?
> >
> --
> Ray Brinzer


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-10 Thread Guyren Howe
On Feb 10, 2022, at 17:06 , Mladen Gogala  wrote:
> 
>> But SQL is a terrible, no good, very bad language.
> 
> I cannot accept such a religious persecution of SQL without a detailed 
> explanation.
> 

I feel like anyone who is defending SQL here isn’t aware of how much better the 
alternatives are, and how bad SQL really is.

I mean: it’s hard to write, hard to read. It’s hard to generate. But that’s 
just the starting point.

One of the worst things about it that I don’t see much discussed is that it 
imposes assumptions about the storage model that aren’t part of the relational 
model. Like heavyweight, hard to change tables with transactional guarantees 
and such. Don’t get me wrong, those things are great to have available, but I 
don’t need them all the time.

The whole NoSQL movement has been such a tragedy. Having diagnosed a problem 
with SQL databases, they threw out the relational model and very often 
reimplemented a form of SQL when they should have done the opposite. There is 
no reason you can’t have a relational database with an on-demand schema, with 
eventual consistency, with all those fun things that various NoSQL databases 
provide. Those storage models have their place, but the SQL standard says you 
can’t use them.

But the biggest issue is the verbose, terrible, very bad query language. In the 
web development community where I spend my time, it is almost holy writ to 
treat the database as a dumb data bucket, and do everything in the application 
layer (even things like validations, even when that is a provably incorrect 
approach). Why? I think it’s because they’re used to working in a pleasant 
language like Ruby or Python, and they want to do everything there. And who can 
blame them?

But this is bad. Proper relational design can take over much (most!) of the 
design of a typical business app, with significant efficiency gains the result. 
But no *community* is going to choose that when most of the members of the 
community don’t want to learn SQL and who can blame them?

Another issue: everyone thinks “relational” is the same thing as “SQL”. If we 
could get folks to break that association, then relations should be a standard 
feature of high-level programming languages, just as arrays and hashes are.

Heck, give me a functional programming language with a good relational model, 
and I can implement OOP in that relational language without breaking a sweat.

Software *should* be designed around a logical/relational layer with minimal 
occasional forays into Turing completeness where necessary. But no-one is even 
thinking about software like that because relational is SQL and SQL is awful.


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-10 Thread Guyren Howe
When you dig into it, the powerful idea here is the relational algebra, and its 
equivalence to a first-orderish logic.

I put up with SQL so I can use relations, and I love Postgres because it has 
the least bad SQL (by a mile!)

But SQL is a terrible, no good, very bad language.

I don’t really understand why folks who love the relational model aren’t 
perpetually up in arms about SQL being their only option. Much better query 
languages are known and well studied.

https://www.scattered-thoughts.net/writing/against-sql 
<https://www.scattered-thoughts.net/writing/against-sql>

I’d love to see Postgres offer say Datalog. But until that Halcyon day, we 
could at least aggressively extend SQL to make it less awful.

> On Feb 10, 2022, at 14:38 , Benedict Holland  
> wrote:
> 
> This is a strange post. Why is SQL bad and how do your reconcile that with 
> managing 99%+ of all data? It's so bad that we have systems that plug into 
> sql to query data outside of tables like Athena or Excel. 
> 
> Why are you not using pgadmin4? Yes. Psql as a command line isn't great for 
> humans. It's spectacular for computers though. So we have pgadmin4, which I 
> would take over any other database ui. 
> 
> Do you not want your views to change with underlying base tables changing? Do 
> a fully specified select. It's better programming anyway. Do you want an api? 
> That seems like a bad idea (i would never implement it) but you also have a 
> postgres socket, flask, and sqlalchemy or psycopg2. It would take a few hours 
> to write your own. Again, please don't do that. You will almost surely lose 
> user information like who decided to delete your client data and your api 
> would likely require user privileges to get passed by token (sso would be a 
> nightmare to authenticate) or simply give root privileges to an api. Both are 
> honestly really bad. 
> 
> Now if postgres had the ability to do schema change tracking with rollback... 
> now that would be a victory. But there are sort of 3rd party solutions that 
> sort of work some of the time. It's a hard problem and automated sql 
> generation, particularly automated schema migrations, are really hard to 
> build in general and there are specific things that are damn hard to not 
> break. 
> 
> Thanks,
> Ben
> 
> 
> On Thu, Feb 10, 2022, 4:13 PM Bruce Momjian  <mailto:br...@momjian.us>> wrote:
> On Thu, Feb 10, 2022 at 06:25:45PM +0100, Peter J. Holzer wrote:
> > On 2022-02-10 18:22:29 +0100, Peter J. Holzer wrote:
> > > On 2022-02-09 21:14:39 -0800, Guyren Howe wrote:
> > > >   • SELECT * - b.a_id from a natural join b
> > > >   □ let me describe a select list by removing fields from a 
> > > > relation. In
> > > > the example, I get all fields in the join of  a  and b other 
> > > > than the
> > > > shared key, which I only get once.
> > > 
> > > Natural join already does this.
> > > 
> > > My use case for such a feature are tables which contain one column (or a
> > > small number of columns) which you usually don't want to select: A bytea
> > > column or a very wide text column. In a program I don't mind (in fact I
> > > prefer) listing all the columns explicitely, but exploring a database
> > > interactively with psql typing lots of column names is tedious
> > > (especially since autocomplete doesn't work here).
> > 
> > Forgot to add: I think that the syntax would have to be more explicit.
> > It's too easy to mix up
> > SELECT * - b.a_id FROM ...
> > and
> > SELECT *, - b.a_id FROM ...
> > 
> > Maybe
> > SELECT * EXCEPT b.a_id FROM ...
> 
> Yes, this was proposed on hackers a few months ago and a patch was
> proposed:
> 
> 
> https://www.postgresql.org/message-id/flat/892708.1634233481%40sss.pgh.pa.us#1f17923ad50a1442867162991c54ead9
>  
> <https://www.postgresql.org/message-id/flat/892708.1634233481%40sss.pgh.pa.us#1f17923ad50a1442867162991c54ead9>
> 
> The last post was from October of 2021 so you can email the author to
> ask about its status.
> 
> -- 
>   Bruce Momjian  mailto:br...@momjian.us>>
> https://momjian.us <https://momjian.us/>
>   EDB  https://enterprisedb.com 
> <https://enterprisedb.com/>
> 
>   If only the physical world exists, free will is an illusion.
> 
> 
> 



Re: Can we go beyond the standard to make Postgres radically better?

2022-02-10 Thread Guyren Howe
I’d like to point out that sum types would be great.

(Sum types let you have any of two or more different types in one value)

For example, I could work around the issues with NULL by defining an 
enumeration type with values like MISSING, UNKNOWN, INVALID, … and then I can 
have a column that is a sum of that type with the type we normally keep in the 
column.

We would need some machinery to declare how aggregates work, but I think that’s 
very much in the spirit of how types work in Postgres now.

In general, I would think we could make SQL a better functional programming 
language by just implementing good fp features like this.
On Feb 10, 2022, 11:09 -0800, Merlin Moncure , wrote:
> On Thu, Feb 10, 2022 at 10:54 AM Merlin Moncure  wrote:
> > > On Wed, Feb 9, 2022 at 11:15 PM Guyren Howe  wrote:
> > > > >
> > > > >
> > > > > • Also nested function definitions, so top-level functions can be 
> > > > > built out of local auxiliary functions.
> > > > > • Other languages
> > > > > • Tutorial D, Datalog, Quell, let’s open this puppy up!
> > > > > • SQL is a terrible, no good, very bad language
> > > > >
> > > > Postgres used to suport QUEL way back in the day.  Being blunt, there 
> > > > is zero chance of this happening in core.  A query language translator 
> > > > out of core, preprocessing the language into SQL, might work.   SQL has 
> > > > evolved far beyond what those languages could practically offer.   Main 
> > > > benefit here would be to better support relational division, which 
> > > > bizarrely has yet to arrive in SQL.
> > > >  A portable, low-level API
> >
> > FYI, there was a semi-serious commercial attempt to do this back in 2001, 
> > Dataphor. It's been opensourced.  Wikipedia has a pretty good write up on 
> > it:
> > https://en.wikipedia.org/wiki/Dataphor
> >
> > IMNSHO suggestions like these should travel this technical path; take the 
> > data language you want and translate it into SQL.  Some of these 
> > translations will end up being complicated (read: slow).  Iterating this 
> > kind of thing outside of core would allow for faster development.
> >
> > merlin


Can we go beyond the standard to make Postgres radically better?

2022-02-09 Thread Guyren Howe
Postgres has since the outset gone beyond the SQL standard in many ways : 
types, inheritance, programmability, generality are all well beyond what SQL 
used to mandate and still well beyond the current standard.

There are huge developer benefits available to focusing more on making a great 
relational programming environment, well outside the SQL standard.

Examples of small things Postgres could have:
SELECT * - b.a_id from a natural join b
let me describe a select list by removing fields from a relation. In the 
example, I get all fields in the join of  a  and b other than the shared key, 
which I only get once.
note how this simplifies maintaining views wrt  changes in tables
Let me put the FROM clause first
if I can write FROM a join b SELECT a.height, a.name, b.email then an editor 
can give me autocomplete when I’m writing the select clause.
Hierarchical schemas
Examples of larger things Postgres might have:
First-class functions.
A global-only namespace is dumb. Schemas are only slightly less dumb. The 
obvious way to store and call functions is as values of fields. Let me organize 
them how I choose
Also let me pass around functions as values, let me make new ones and so on. 
Postgres is the best relational programming environment already because it has 
a decent type system and more general features. Let’s expand and also simplify 
that.
Also nested function definitions, so top-level functions can be built out of 
local auxiliary functions.
Other languages
Tutorial D, Datalog, Quell, let’s open this puppy up!
SQL is a terrible, no good, very bad language
A portable, low-level API
An alternative to SQLite that provides CRUD operations on a Postgres database.

I adore Postgres, but this is despite rather than because of SQL. By all means, 
let’s have a close-to-standard subset of features, but let’s fearlessly go 
beyond them when it would be obviously much better.


Re: Match 2 words and more

2021-11-27 Thread Guyren Howe
On Nov 27, 2021, at 16:27 , Shaozhong SHI  wrote:
> 
> select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$’;

The simplest thing that does what you says is I think:

select name FROM a_table where "STREET_NAME" ~ 
‘^([[:alpha:]]+\s)+[[:alpha:]]+$’;


Best options for tracking history of data?

2021-10-26 Thread Guyren Howe
I’m interested in tracking our data over time.

The basic requirement is for debugging and disaster recovery, but I’m passing 
familiar enough with the issue that I can imagine being able to set things up 
so I get that but much more.

I’ll need something that I can use on eg AWS, probably RDS, although I’ll 
listen to an argument for eg Timescale DB.

I want something easy to set up, rugged and with minimal changes to client 
code. My current needs are simple, but I’ll take “extra features basically for 
free” if I can.

I can’t find a good general discussion of options for this. If I missed one, 
I’ll take a link and offer a thank you.

Otherwise, this seems a useful topic to discuss.

Thanks in advance.


Re: "two time periods with only an endpoint in common do not overlap" ???

2021-10-15 Thread Guyren Howe
I’m a bit confused by this conversation. Open- and closed-ended ranges behave 
as I would expect.

select tsrange('2016-01-10 17:00', '2016-01-11 17:00', '[)') && 
tsrange('2016-01-11 17:00', '2016-01-12 17:00', '[]')
false

select tsrange('2016-01-10 17:00', '2016-01-11 17:00', '[]') && 
tsrange('2016-01-11 17:00', '2016-01-12 17:00', '[]')
true
On Oct 15, 2021, 11:27 -0700, Brian Dunavant , wrote:
> Think of it this way.  When someone says they have a meeting from 1-2 and 
> another from 2-3, do those meetings overlap?  They do not, because we're 
> actually saying the first meeting is from 1:00 through 1:59:59.9.   The 
> Postgres date ranges are the same way.   The starting point is inclusive, but 
> the ending time is exclusive.   So [1:00,2:00), and [2:00,3:00), do not 
> overlap.
>
> > On Fri, Oct 15, 2021 at 2:16 PM Ron  wrote:
> > > On 10/15/21 8:59 AM, Adrian Klaver wrote:
> > > > On 10/15/21 06:52, Ron wrote:
> > > >> On 10/14/21 7:02 PM, Adrian Klaver wrote:
> > > >> [snip]
> > > >>> or the third example in the docs:
> > > >>>
> > > >>> SELECT (DATE '2001-02-16', DATE '2001-12-21') OVERLAPS
> > > >>>    (DATE '2001-10-30', DATE '2002-10-30');
> > > >>> Result: true
> > > >>> SELECT (DATE '2001-02-16', INTERVAL '100 days') OVERLAPS
> > > >>>    (DATE '2001-10-30', DATE '2002-10-30');
> > > >>> Result: false
> > > >>> SELECT (DATE '2001-10-29', DATE '2001-10-30') OVERLAPS
> > > >>>    (DATE '2001-10-30', DATE '2001-10-31');
> > > >>> Result: false
> > > >>
> > > >> Why /don't/ they overlap, given that they share a common date?
> > > >
> > > > Per the docs:
> > > >
> > > > https://www.postgresql.org/docs/current/functions-datetime.html
> > > >
> > > > " Each time period is considered to represent the half-open interval 
> > > > start
> > > > <= time < end, unless start and end are equal in which case it 
> > > > represents
> > > > that single time instant."
> > > >
> > > > Which I read as
> > > >
> > > > (DATE '2001-10-29', DATE '2001-10-30') ends at '2001-10-29'
> > > >
> > > > and
> > > >
> > > > (DATE '2001-10-30', DATE '2001-10-31') starts at DATE '2001-10-30'
> > > >
> > > > so no overlap.
> > >
> > > I was afraid you were going to say that.  It's completely bizarre, but 
> > > seems
> > > to be a "thing" in computer science.
> > >
> > > --
> > > Angular momentum makes the world go 'round.
> > >
> > >


Re: The tragedy of SQL

2021-09-16 Thread Guyren Howe
On Sep 16, 2021, at 7:31 , Merlin Moncure  wrote:
> 
> On Wed, Sep 15, 2021 at 7:31 PM FWS Neil  wrote:
>> On Sep 15, 2021, at 2:44 PM, Merlin Moncure  wrote:
>>> I think you ought to recognize that many people on this list make
>>> money directly from managing that complexity :-).
>> 
>> I did not intend to disparage anyone.  People, including myself, make money 
>> when they provide value and there is certainly value here.
>> 
>> But, I am not sure I understand your inference.  Are you saying (and I am 
>> not implying you are) that PostgreSQL does not progress in line with the 
>> original SQL goals of simplifying data access because people are making 
>> money off of the current complexity?
> 
> Not at all.  I'm saying that this mailing list is stuffed with people
> that work on and with the database, and SQL is in many cases
> (including mine) a passion.   Postgres is a technical marvel so that
> we make money by utilizing it effectively, and rendering, er,
> constructive feedback to its stewards so that it may improve and we
> may all become more efficient.
> 
> Points made upthread suggesting SQL is bad due to being based on sets
> were ironic IMO.  SQL is successful and enduring precisely because you
> can direct operations by describing data relationships rather than
> implement specific operations by hand.  Fluency in that technique
> leads to solution achievement that other technologies cannot approach.
> It rewards artistry and creativity, which I guess rubs some people the
> wrong way since it does not align to factory development strategies.
> 
> No offense taken; I enjoy the debate; this thread is an indulgence,
> being a bit of time off from capricious C suite executives, agile
> methodology zealots, etc. So fire away.

Missing my original point here. The set theory is the _point_. SQL is a 
gargantuan distraction from using it efficiently. 

Imagine if COBOL was the only widely-available programming language with 
functions. You might use it, because functions are really great abstraction for 
programming. That wouldn’t mean that COBOL wasn’t an utterly awful language.

SQL is like that, only for relations, sets and logic.


Re: SQL queries as sets: was The tragedy of SQL

2021-09-14 Thread Guyren Howe
Oh, yeah, wow. Big topic.

My original post in the series is in significant part about how SQL hides this 
sort of thing from you.

A table is a set:  a set of true facts, all having the same structure, so you 
can operate on all of them with any operation on the individual rows.

Multiple tables, multiples facts. Sometimes about the same things, which is 
what a join does: the join key is the identifier of the things a bunch of 
statements are all about. A user has a name, and a login, and a creation date. 
When the identifier is the primary key, there is only such statement. When it’s 
a foreign key, there are multiple statements.

As an aside: it would be perfectly reasonable to have multiple tables with the 
same primary key. Or with the same foreign key. You could have only “foreign 
keys” on a bunch of different tables.

When you do a join, you’re finding all the “x and y” statements that the system 
knows to be true. That’s why a 1:many joins produces duplicate 1-side values in 
its results.

When you do a where, you’re reducing the number of rows by only finding the 
rows with a certain property. The joins and wheres together give you a very 
expressive logic engine.

There are two equivalent ways of looking at what the database does: it stores 
logical statements and acts as a logic engine; or it stores sets of tuples and 
gives you set operations on them.

Go read up on the basics of Datalog: it makes all of this beautifully obvious.

My original post was about how different the computer industry would be if only 
we’d made relations as easy as Datalog does. The entire industry would look 
different, just as garbage collection made the entire industry different.
On Sep 14, 2021, 21:55 -0700, Steve Litt , wrote:
> Rich Shepard said on Tue, 14 Sep 2021 05:49:07 -0700 (PDT)
>
> > On Mon, 13 Sep 2021, Guyren Howe wrote:
> >
> > > They are making a decent decision. SQL is a *fucking terrible*
> > > language, which I don’t blame them for not wanting to learn.
> >
> > > > SQL is not the problem. Problem are the devs. I love SQL. I hate
> > > > orms. The problem with databases is people refuse to treat it as
> > > > the entity it is and want to use their beautiful OO system. Problem
> > > > is databases are not OO. We need to recognize that and treat
> > > > databases as databases.
> >
> > Guyren/Hemil,
> >
> > As a non-SQL expert who's used postgres since 1997 I've come to
> > believe the basic issue is that SQL is based on sets, neither
> > procedural or object oriented. Few people think in sets so they try to
> > fit SQL into what they know rather than understand the how sets work.
>
> Rich, could you please elaborate on SQL queries being based on sets? I
> never thought of it that way, and would like to hear your related
> thoughts.
>
> SteveT
>
> Steve Litt
> Spring 2021 featured book: Troubleshooting Techniques of the Successful
> Technologist http://www.troubleshooters.com/techniques
>
>


Re: The tragedy of SQL

2021-09-14 Thread Guyren Howe
Exactly. SQL is the roman numerals of relational databases.
On Sep 14, 2021, 13:08 -0700, Raymond Brinzer , wrote:
> On Tue, Sep 14, 2021 at 3:58 PM Guyren Howe  wrote:
> > You’re confusing SQL with the relational model. Datalog and Quel and 
> > Tutorial D and other database languages and systems can and did provide 
> > those features also.
>
> By analogy: Arabic and Roman numerals both describe the natural
> numbers. Hence, they have the same mathematical properties. Spending
> a little time doing algebra with Roman numerals should convince
> anyone, however, that how you express a concept matters a lot.
>
> --
> Ray Brinzer


Re: The tragedy of SQL

2021-09-14 Thread Guyren Howe
On Sep 14, 2021, 12:51 -0700, Mladen Gogala , wrote:
> Replies in-line
> On 9/14/21 01:51, Guyren Howe wrote:
> > They are making a decent decision. SQL is a *fucking terrible* language, 
> > which I don’t blame them for not wanting to learn.
> Based on what criteria?
Verbosity. Redundancy. Lack of orthogonality. Resemblance to English. The 
standard effectively mandates a particular storage strategy, with heavyweight 
tables that must provide certain CAP theorem guarantees instead of others. 
Rigid storage limitations: less in Postgres than in others, but why can’t I, 
say, nest schemas? Hell, why can’t I nest relations?

https://www.edgedb.com/blog/we-can-do-better-than-sql
> >
> > The whole industry, programming languages, infrastructure, everything would 
> > have developed differently if relations were a natural, pleasurable thing 
> > to use in any programming language. Like an Array, or a Hash.
> > Thee is nothing natural about either relations or arrays and 
> > hashes/dictionaries. Relations are pretty literal implementation of the 
> > basic set theory. Having a decent understanding of the basic set theory is 
> > a condition for understanding SQL. Now, we can discuss whether a language 
> > implementing a mathematical theory is "good" or "bad", whatever the meaning 
> > of "good" or "bad" in the given context. Historically, SQL is a good fit 
> > for the banking business and accounting and that is why it is still around.
> > Another misconception about SQL is treating it as a general purpose 
> > programming language. SQL is data description language, nothing more, 
> > nothing less. It doesn't need loops, arrays, hashes or subroutines, its 
> > principal purpose is to describe a subset of data. Without SQL you would 
> > have to read all the data and filter the unnecessary stuff yourself. 
> > Furthermore, part of SQL are so called "ACID requirements". Transaction can 
> > only see the data that was committed before the transaction has begun. 
> > Implementing ACID takes a lot of code, that's what MVCC is all about. 
> > However, that too has its origins in accounting. You cannot pay the money 
> > you don't have. And the last thing about SQL is transaction management. 
> > Without relational databases and SQL, you would need a proprietary 
> > transaction manager, just like MongoDB. And MongoDB has a complex 
> > proprietary transaction manager and is losing market share. So, to 
> > recapitulate:
> >  • Declarative subset definition
> >  • ACID consistency
> >  • Transaction management
> >  • Ideal fit for accounting.
> > That is why SQL is still around. And that is why we all live in a yellow 
> > subroutine (this reference is not for the millennials or younger)
You’re confusing SQL with the relational model. Datalog and Quel and Tutorial D 
and other database languages and systems can and did provide those features 
also.


Re: The tragedy of SQL

2021-09-13 Thread Guyren Howe
They are making a decent decision. SQL is a *fucking terrible* language, which 
I don’t blame them for not wanting to learn.

The whole industry, programming languages, infrastructure, everything would 
have developed differently if relations were a natural, pleasurable thing to 
use in any programming language. Like an Array, or a Hash.
On Sep 13, 2021, 22:45 -0700, Hemil Ruparel , wrote:
> SQL is not the problem. Problem are the devs. I love SQL. I hate orms. The 
> problem with databases is people refuse to treat it as the entity it is and 
> want to use their beautiful OO system. Problem is databases are not OO. We 
> need to recognize that and treat databases as databases.


The tragedy of SQL

2021-09-13 Thread Guyren Howe
A fun philosophical discussion.

I am no fan of “worse is better”, and particularly its poster child, SQL.

The world’s economic output would be substantially higher (5%?) if our industry 
had settled on almost anything other than SQL for relational databases.

So much of the design of *almost everything* in our industry is a reaction to 
SQL. ORMs fucking *everywhere* so you don’t have to use SQL. Bad query and 
database design. Inefficient system designs that use ORMs rather than 
relations. NoSQL databases. Countless hours on hours of developer time trying 
to work out how to write something in SQL that would be trivial in, say, 
Datalog.

If I had $5 million to invest in a startup, I would hire as many of the core 
Postgres devs as I could to make a new database with all the sophistication of 
Postgres but based on Datalog (or something similar). (Or maybe add Datalog to 
Postgres). If that could get traction, it would lead in a decade to a 
revolution in productivity in our industry.


Re: Sort question - Fractions, Metric etc

2021-08-14 Thread Guyren Howe
You might define new types for temperature, length, whatever, with suitable 
conversion, operation and creation functions. You’d be able to define how the 
new types participate in indexes, support directly sorting on them, so you 
package up this complexity and forget about it.

Either normalize everything to metric on creation, or have an enumeration or 
boolean flag to indicate whether the value is metric or imperial (choose this 
one if you want values to remember how they were created and thus how they will 
display, and to avoid rounding errors converting back to imperial for display).

Depends how many places you use them whether this is worth it. But it would be 
a good way to make this complexity idiot-proof if you’ll be using it all over. 
You’d be able to just add and multiply lengths and such without worrying how 
they were specified.

Looks like this might do what you need on cursory examination: 
https://github.com/df7cb/postgresql-unit
On Aug 14, 2021, 12:51 -0700, Gogala, Mladen , wrote:
> I would write a stable function converting everything to metric (or
> imperial, depends on your preferences) and sort on the return of the
> function. Since unit conversion functions do not need to modify the
> database and should always return the same values for the same
> arguments, the function can be used within a query (that is the meaning
> of the keyword "STABLE")  and you can use it for sorting stuff
>
> Regards
>
> On 8/14/2021 12:14 PM, Bret Stern wrote:
> > I have a table with metric, imperial, fraction columns.
> >
> > Is there a way to sort correctly using imperial (eg; .125, .375, .437
> > -> 1., 1.125)
> >
> > Couldn't handle it with ORDER BY ASC, DESC args so I added a
> > sort_column and sorted based
> >
> > on those values eg; 1,2,3,4,5,6 indicating the value I need to show in
> > which sequence.
> >
> >
> > Just curious what the pro's do
> >
> > Bret
> >
> >
> --
> Mladen Gogala
> Oracle DBA
> Tel: (347) 321-1217
> Blog: https://dbwhisperer.wordpress.com
>
>


Re: syntax question

2021-06-03 Thread Guyren Howe
I know it would be non-standard, but I would love to see Postgres support the 
likes of nested functions.

I know that would be non-standard, but Postgres has lots of non-standard 
features that make it more like a real programming language and considerably 
more productive.
On Jun 3, 2021, 12:34 -0700, Bruce Momjian , wrote:
> On Thu, Jun 3, 2021 at 03:21:15PM -0400, Tom Lane wrote:
> > Adrian Klaver  writes:
> > > On 6/3/21 12:01 PM, Bruce Momjian wrote:
> > > > On Thu, Jun 3, 2021 at 08:58:03PM +0200, Marc Millas wrote:
> > > > > within a function, I want to create another function.
> >
> > > > You can't create functions inside of functions; same for procedures.
> >
> > > Sure you can:
> >
> > Yeah. The actual problem here is that Marc is expecting variable
> > substitution to occur within a utility (DDL) statement, which it
> > doesn't. The workaround is to build the command as a string and
> > use EXECUTE, as Adrian illustrated:
> >
> > > EXECUTE 'create function ' || bidule || '() RETURNS void language
> > > plpgsql AS $fnc$ BEGIN END; $fnc$ ';
> >
> > This is not terribly well explained in the existing docs. I tried
> > to improve the explanation awhile ago in HEAD:
> >
> > https://www.postgresql.org/docs/devel/plpgsql-statements.html#PLPGSQL-STATEMENTS-GENERAL-SQL
>
> Oh, I thought he wanted to declare a function inside the function that
> could be called only by that function, like private functions in Oracle
> packages can do. Yes, you can create a function that defines a function
> that can be called later. I guess you could also create a function that
> _conditionally_ creates a function that it can call itself too. My
> point is that you can't create a function that has function scope ---
> they all have schema scope.
>
> --
> Bruce Momjian  https://momjian.us
> EDB https://enterprisedb.com
>
> If only the physical world exists, free will is an illusion.
>
>
>


Re: Need help with clarification on stored procedure support in PostGreSQL database

2021-03-08 Thread Guyren Howe
I think that’s burying the lede a bit. Still, unless the “Extending SQL 
Section” acquired an Overview section, I’m not sure where else you’d put it.
On Mar 8, 2021, 20:41 -0800, Tom Lane , wrote:
> Guyren Howe  writes:
> > This seems like an important consideration. I’ve spent 10 minutes searching 
> > the documentation for PG 11 and can’t find where it is documented. Perhaps 
> > it should be made more prominent?
>
> https://www.postgresql.org/docs/current/xproc.html
>
> regards, tom lane


Re: Need help with clarification on stored procedure support in PostGreSQL database

2021-03-08 Thread Guyren Howe
This seems like an important consideration. I’ve spent 10 minutes searching the 
documentation for PG 11 and can’t find where it is documented. Perhaps it 
should be made more prominent?
On Mar 8, 2021, 16:53 -0800, raf , wrote:
> On Mon, Mar 08, 2021 at 06:31:56PM -0500, Tom Lane  wrote:
>
> > raf  writes:
> > > Is there any important different between a stored procedure and
> > > a stored function that returns void? I've been using functions
> > > for everything but calling them procedures in my head (even though
> > > most of them do return a value or result set).
> >
> > A procedure can commit or rollback a transaction (and then start
> > a new one).
> >
> > regards, tom lane
>
> thanks.
>
> cheers,
> raf
>
>
>


How to determine server's own IP address? inet_server_addr not working

2021-02-21 Thread Guyren Howe
The documentation says that inet_server_addr() does this, but on our servers it 
is returning nothing.

Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea

2021-02-10 Thread Guyren Howe
An interesting option is to make your tenants work through views, and have the 
views work through a variable that contains the tenant’s id. There would be a 
bit of coding, but it would be the same for every table, so you could automate 
it easy enough.

When you’re done, client software just sets the id at the beginning of the 
session and the database looks like it only contains the tenant’s data.

This is a particularly nice security guarantee: if you make the id a UUID (ie 
unguessable) then someone can entirely compromise the client application, and 
can still only get at the data for one tenant, and then only given their ID.
On Feb 10, 2021, 18:44 -0800, Rob Sargent , wrote:
>
>
> > On Feb 10, 2021, at 6:45 PM, Tim Cross  wrote:
> >
> >
> > Jagmohan Kaintura  writes:
> >
> > > HI All,
> > >
> > > For POstgreSQL database to store data for multiple tenants, the approach
> > > decided was to have
> > > Shared Database (Holding data for all tenants)
> > >  => Data would be segregated on basis of some additional column
> > > (tennatid,different tenants having different tenantId)
> > >   => Data would be accessed through Views on the basis of tenantId
> > > value.
> > >
> > > This is the basic process of most of the customers who are trying to
> > > implement multiple tenants in PostgreSQL, rather than choosing
> > > separate databases for each tenant.
> > >
> > > Now we need to encrypt the data related to a tenantId, so that now one
> > > knows this data belongs to which tenant even from Operations group.
> > > Is there a method in POstgreSQL for encrypting data with different keys
> > > with respect to different values in a single column.  Moreover pg_crypto
> > > will impose a single key on the column.
> > >
> > > Please share your thoughts in which direction i can start analysing this
> > > area for encryption of data specific to a tenant.
> > >
> >
> > The decision to have all tenants in a single database seems rather
> > unusual to me. Isolating one tenant from adversely impacting another
> > would seem complicated and I'm not sure how you would implement a clear
> > security model. Your model has effectively bypassed all the provided PG
> > facilities for isolation of data. Disaster recovery and business
> > continuity planning under this model must be a nightmare!
> >
> > I doubt you can adopt a solution which is solely within the database.
> > How would the database know which key to use for which rows of data? How
> > would you select the data for your tenant views if all that data is
> > encrypted with different keys? How would you manage these keys in a
> > secure manner?
> >
> > With the model you have adopted, I would be looking at performing
> > encryption/decryption at the client level. However, depending on your
> > data types, this could be challenging. this is really a requirement
> > which should have been factored into the initial architecture design.
> > Anything you try to bolt on now is likely to be complex and have
> > significant performance impact and that is assuming you can re-interpret
> > the requirement to make the objective feasible.
> >
> Yeah, I lost that same arguement in ~2007, where the forces against my push 
> for separation was shouted down with rants on scheme maintenance (divergence) 
> and multiple rollouts per update.  I hadn’t had any coffee before the 9:00am 
> meeting so the hotshot from Amazon got his way.  Then we tried “veils” (a 
> concoction of view and rule re-writing) and we all know how that went.  The 
> company folded before our “next gen” software saw the light of day.
>
> I get the feeling multi-tenancy is, if not the rule these days, at least 
> quite common (on the last of “big iron”?) but it still doesn’t sit well with 
> me.
>
>


Re: Is there a good discussion of optimizations?

2021-01-07 Thread Guyren Howe
On Jan 7, 2021, 13:42 -0800, Florents Tselai , wrote:
> Apologies for the shameless self-promotion :)
>
> Around a year ago I collected my thoughts on this topic. You can read the 
> post here Modern Data Practice and the SQL Tradition .
> It looks like it resonated with a lot of folks in the community. HN 
> Discussion https://news.ycombinator.com/item?id=21482114
>
> I would specifically underline the fact that the newer generation of 
> programmers & data pros (my former self included) don't really appreciate 
> things like triggers and server-side programming. Triggers and DB-side 
> functions are considered something like Assembly code.
>
> Not many neophytes have been shown with use cases why for example writing 2-3 
> lines of PL/SQL can save you a huge overhead of back and forth and 
> environment set up to write the same thing in say Pandas.
>
> I would focus on Triggers, Indices on expressions, and time-related 
> functions. Probably on generated columns too. They may be considered a new 
> feature, but the reasoning of building successively columns on top of a few 
> base ones is quite appealing nowadays, especially for ML purposes.
>
> I also wouldn't hesitate to discuss advanced topics. They are usually 
> considered obscure because people don't present useful examples, rather toy 
> and arbitrary ones.
>
> In a recent O'Reilly training, I was skeptical of talking about triggers for 
> optimization but it looks like it was probably the most useful part of my 
> training as students could actually "steal and tweak" my code.

Thanks for this. May I steal some of your examples if they prove useful? I’ll 
credit you of course.

I’m planning on somewhat emphasizing that a relational database is a logic 
engine. Viewed through this lens, a query or view is a “backward” implication 
and a trigger is a “forward” one. This leads to considering triggers (and the 
moral equivalent in external code) as requiring “truth maintenance”, and is a 
great way to think about when the database is the appropriate place for some 
bit of logic.
> On Thu, Jan 7, 2021 at 11:21 PM Guyren Howe  wrote:
> > On Jan 7, 2021, 13:07 -0800, Kevin Brannen , wrote:
> > > From: Guyren Howe 
> > >
> > > >Most folks, in my experience, who use relational databases don’t really 
> > > >understand the basic theory or even more important the why - the 
> > > >philosophy - of what a relational database is and how to get the most 
> > > >out of them. I see a lot of folks trying to use SQL in an imperative 
> > > >manner - make this temp table, then update it some, then make this other 
> > > >temp table, etc...
> >
> > Actually, I’m mostly going to talk about the relational model, rather than 
> > SQL. Our industry seems to always settle for third-best, and SQL is the 
> > worst of all the examples of this. The world desperately needs a good 
> > relational database based on a better query language — datalog, for example.
> >
> > I put up with SQL so I can use the relational model, and I think that 
> > understanding SQL has to start with that.
> >
> > Anyhow.
> > > An example of this is that we have a report we're trying to write that I'd
> > > like to think can be done in SQL, but I can't think of a way to do it. 
> > > Yet,
> > > if I do the base query and pull the data back into my application, I can 
> > > do
> > > the last bit with 3 lines of Perl very easily. The problem here revolves
> > > around comparing a row of data to the previous row to know whether the 
> > > data
> > > changed "significantly enough" to keep the new row.
> > >
> > > Another example is doing running totals. A couple of years ago I would 
> > > have
> > > said SQL can't do that. Now I know about the OVER clause, something that I
> > > would categorize as somewhat obscure, I can do it as needed.
> >
> > Actually, Window functions might be “advanced”, but are certainly not 
> > obscure. Your example sounds like it’s trivially solved with LAG().
> > > As Michael Lewis pointed out, large dataset can also cause you to choose 
> > > not
> > > to use SQL in 1 big statemet for everything (i.e. advocating the use to 
> > > temp
> > > tables). In some ways, using a CTE is a type of temp table, or at least I
> > > view it as such. That allows a person to solve a problem in bite-sized 
> > > chunks.
> > > I will agree that optimization can do it better at times, but the code 
> > > also has
> > > to be maintained as well – a balancing act.
> >
> > This appears

RE: Is there a good discussion of optimizations?

2021-01-07 Thread Guyren Howe
On Jan 7, 2021, 13:07 -0800, Kevin Brannen , wrote:
> From: Guyren Howe 
>
> >Most folks, in my experience, who use relational databases don’t really 
> >understand the basic theory or even more important the why - the philosophy 
> >- of what a relational database is and how to get the most out of them. I 
> >see a lot of folks trying to use SQL in an imperative manner - make this 
> >temp table, then update it some, then make this other temp table, etc...

Actually, I’m mostly going to talk about the relational model, rather than SQL. 
Our industry seems to always settle for third-best, and SQL is the worst of all 
the examples of this. The world desperately needs a good relational database 
based on a better query language — datalog, for example.

I put up with SQL so I can use the relational model, and I think that 
understanding SQL has to start with that.

Anyhow.
> An example of this is that we have a report we're trying to write that I'd
> like to think can be done in SQL, but I can't think of a way to do it. Yet,
> if I do the base query and pull the data back into my application, I can do
> the last bit with 3 lines of Perl very easily. The problem here revolves
> around comparing a row of data to the previous row to know whether the data
> changed "significantly enough" to keep the new row.
>
> Another example is doing running totals. A couple of years ago I would have
> said SQL can't do that. Now I know about the OVER clause, something that I
> would categorize as somewhat obscure, I can do it as needed.

Actually, Window functions might be “advanced”, but are certainly not obscure. 
Your example sounds like it’s trivially solved with LAG().
> As Michael Lewis pointed out, large dataset can also cause you to choose not
> to use SQL in 1 big statemet for everything (i.e. advocating the use to temp
> tables). In some ways, using a CTE is a type of temp table, or at least I
> view it as such. That allows a person to solve a problem in bite-sized chunks.
> I will agree that optimization can do it better at times, but the code also 
> has
> to be maintained as well – a balancing act.

This appears to be good advice with SQL Server, which I’m coming to learn has a 
fairly poor query optimizer. But I would have thought Postgres’s optimizer 
would usually use a temporary table where appropriate.

Curious to hear if that’s wrong.
> I think your idea is a good one, but I do hope you present that SQL can't
> solve everything ... else why do we have plpgsql. :) You’re correct though,
> SQL isn’t used as much as it should be in many places.

An important consideration will be when relational is inappropriate. My biggest 
goal, though, is to get folks to understand how much relations *can* do — far 
too many devs in my experience don’t use the power of SQL because they don’t 
understand it.

Thanks for taking the time to give me this feedback.


Re: Multi-column index vs index on individual columns

2020-12-26 Thread Guyren Howe
In a typical web storefront or catalog app, there are a fairly small number of 
queries that are run often that are more complex and slower. It is perfectly 
appropriate to consider creating compound/partial indexes to speed up those 
queries. Just be aware that indexes trade off slower insert/update time against 
faster search time. I would read use-the-index-luke.com for detailed advice 
about using compound and partial indexes effectively.
On Dec 26, 2020, 12:18 -0800, Shantanu Shekhar , wrote:
> Hi,
>
> I was going through Postgres documentation on multi-column indexes and came 
> across the recommendation at the bottom states: "Multicolumn indexes should 
> be used sparingly. In most situations, an index on a single column is 
> sufficient and saves space and time".
>
> In my experience typical webapps the queries are known ahead of time and lot 
> of the times the queries are restricted to a single table (no joins). Does 
> this recommendation that still apply in this use-case? I was under the 
> impression that instead of scanning multiple indexes and then combining the 
> results, it may be more performant to simply use a single multi-index column.
>
> Thanks,
>
> Shantanu


Is there a good discussion of optimizations?

2020-12-23 Thread Guyren Howe
I’d like to put together a good video and writeup about what the… philosophy 
behind relational databases is.

Most folks, in my experience, who use relational databases don’t really 
understand the basic theory or even more important the why — the philosophy — 
of what a relational database is and how to get the most out of them. I see a 
lot of folks trying to use SQL in an imperative manner — make this temp table, 
then update it some, then make this other temp table, etc… I see this 
particularly among analysts who for some reason often prefer SQL Server. I 
think this is down to afaict SQL Server having an abominable query optimizer. 
Comments on SQL Server’s query optimizer or this culture of temp tables are 
welcome.

Anyway, I’d like to put together something that explains this. I would go into 
Codd’s original insight and how simple yet powerful it is. I’d like to discuss 
how if you stick to SQL, rather than forcing it into this imperative 
straight-jacket, the database can work out all the details so you don’t have to 
do the bad things. It would be useful if there is some reasonably accessible 
discussion to go through the sort of query optimizations that Postgres can do 
for you, to convince folks that they can stick to describing what they want, 
not how to get it.

I’d be interested in any comments anyone has about useful things to discuss 
here, and in particular I’m interested in an accessible discussion of the sorts 
of query optimizations Postgres can do for us.


Any interest in adding match_recognize?

2020-10-09 Thread Guyren Howe
I just became aware of the SQL 2016 standard match_recognize feature, which 
allows for rich pattern matching across groups of rows. It’s a great feature, 
currently only supported in Oracle.

I can find no evidence it’s ever been discussed here and there’s no mention of 
it on the PG website.

So: is anyone considering adding this feature?


Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

2020-09-14 Thread Guyren Howe
Regarding indexes, I recommend the website use-the-index-luke.com. That guy’s 
other website about modern SQL is also great.

Regarding self-tuning, it was actually part of the original vision for 
relational databases that they would do that, but I’m not aware of any modern 
SQL database that does, although there are third-party tools that will eg offer 
index suggestions (I don’t know which is best, so I won’t propose any).
On Sep 14, 2020, 13:22 -0700, tutilu...@tutanota.com, wrote:
> 4. There is no built-in means to have PG manage (or even suggest) indexes on 
> its own. Trying to figure out what indexes to create/delete/fine-tune, and 
> determine all the extremely complex rules for this art (yes, I just called 
> index management an *art*, because it is!), is just utterly hopeless to me. 
> It never gets any easier. Not even after many years. It's the by far worst 
> part of databases to me (combined with point five). Having to use third-party 
> solutions ensures that it isn't done in practice, at least for me. I don't 
> trust, nor do I want to deal with, external software and extensions in my 
> databases. I still have nightmares from PostGIS, which I only keep around, 
> angrily, out of absolute necessity. I fundamentally don't like third-party 
> add-ons to things, but want the core product to properly support things. 
> Besides, this (adding/managing indexes) is not even some niche/obscure 
> use-case, but something which is crucial for basically any nontrivial 
> database of any kind!


Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

2020-09-14 Thread Guyren Howe
On Sep 14, 2020, 13:22 -0700, tutilu...@tutanota.com, wrote:
> 3. The ability to embed PG to run in an automatic, quiet manner as part of 
> something else. I know about SQLite, but it's extremely limited to the point 
> of being virtually useless IMO, which is why I cannot use that for anything 
> nontrivial

I can’t for the life of me imagine how you arrived at this. SQLite is very 
capable indeed. Its dialect of SQL is (deliberately) very similar to Postgres, 
featuring such niceties as recursive CTEs and window functions, and it can 
handle heavy use and multi-terabyte databases if you need (cf bedrockdb).


Can we get SQL Server-like cross database queries

2020-06-03 Thread Guyren Howe
One of the few really useful features of SQL Server that Postgres doesn’t have 
is straightforward cross-database queries.

You can reference any table in any database on the same server you’re on as 
database.schema.table.

With Postgres, it is necessary to set up a FDW connection between every 
database where you want to perform a query and the database you want to query. 
In an analytics environment where a single server might have a hundred or so 
smaller databases, this is a grossly inefficient proposition.

Is it practical to provide the SQL Server-like feature in Postgres?


Re: Oracle vs. PostgreSQL - a comment

2020-06-02 Thread Guyren Howe
On Jun 2, 2020, at 14:16 , Stephen Frost  wrote:
> 
> Greetings,

> I'm sure there's things we can do to improve the performance of the FDW.
> Not sure we'll get to a point where we are actually cacheing information
> from the far side... but who knows, maybe if we arrange to have a
> notification sent whenever certain objects are updated...
> 
> These things could be worked on independnetly, of course, no need to
> have one done before the other.

By all means, let’s improve the FDW. But is it practical to make it possible to 
query across databases on the same server,  in a similar manner to SQL Server, 
without needing FDW at all?


Re: Oracle vs. PostgreSQL - a comment

2020-06-02 Thread Guyren Howe
On Jun 2, 2020, at 12:45 , Ravi Krishna  wrote:
> 
>> 
>> 
>> Generally speaking, I discourage having lots of databases under one PG
>> cluster for exactly these kinds of reasons.  PG's individual clusters
>> are relatively lightweight, after all.
>> 
> 
> Plus PG does not directly support cross database queries using 3 part name, 
> something
> sqlserver excels at.

Gotta say, not generally a fan of SQL Server, but this is very nice. If I’m 
dealing with a database with 50 small databases on it, it’s an utter pain to 
have to set up fdw connections between every set of databases I want to use 
together.

Any chance of getting this in Postgres? Seems like since the databases are in 
the same program, this ought to be simpler.


Re: How to unnest nested arrays

2020-04-06 Thread Guyren Howe


> On Apr 6, 2020, at 19:44 , David G. Johnston  
> wrote:
> 
> On Mon, Apr 6, 2020 at 6:12 PM Guyren Howe  <mailto:guy...@gmail.com>> wrote:
> Consider this:
> 
> select (array[array[1, 2], array[3, 4]])[i:i]
> from generate_subscripts(array[array[1, 2], array[3, 4]], 1) i
> 
> which produces:
> 
> {{1,2}}
> {{3,4}}
> 
> I expect and want, from that source:
> 
> {1, 2}
> {3, 4}
> 
> 
> Also: could we _please_ get a version of unnest that doesn’t explode any 
> number of dimensions into 1?

Here’s a solution in pure SQL, for reference:

CREATE OR REPLACE FUNCTION public.pairwise(
cards card[]
)
RETURNS table(c1 card, c2 card)
LANGUAGE sql
AS $function$
with
individual_cards as (
select
*
from 
unnest(cards) with ordinality c
)

select
c(c1.suit, c1.rank),
c(c2.suit, c2.rank)
from 
individual_cards c1 join
individual_cards c2 on c1.ordinality = c2.ordinality - 1
where 
c1.ordinality % 2 = 1

$function$
;

Given that Postgres often (with good cause) touts its type system, it’s a shame 
that this basic structured type is great in many ways, but seriously flawed in 
really simple ones.



How to unnest nested arrays

2020-04-06 Thread Guyren Howe
Consider this:

select (array[array[1, 2], array[3, 4]])[i:i]
from generate_subscripts(array[array[1, 2], array[3, 4]], 1) i

which produces:

{{1,2}}
{{3,4}}

I expect and want, from that source:

{1, 2}
{3, 4}

These don’t work:

select (array[array[1, 2], array[3, 4]])[i:i][:]
{{1,2}}
{{3,4}}

select (array[array[1, 2], array[3, 4]])[i:i][1:1]
{{1}}
{{3}}


Also: could we _please_ get a version of unnest that doesn’t explode any number 
of dimensions into 1?


Re: Backing out of privilege grants rabbit hole

2020-04-02 Thread Guyren Howe
More fully:

REASSIGN OWNED BY doomed_role TO successor_role;
DROP OWNED BY doomed_role;
-- repeat the above commands in each database of the cluster
DROP ROLE doomed_role;

> On Apr 2, 2020, at 20:37 , Guyren Howe  wrote:
> 
> https://www.postgresql.org/docs/12/sql-drop-owned.html 
> <https://www.postgresql.org/docs/12/sql-drop-owned.html>
> 
>> On Apr 2, 2020, at 20:34 , AC Gomez > <mailto:ant...@gmail.com>> wrote:
>> 
>> Do I understand correctly that if a role was assigned countless object 
>> privileges and you want to delete that role you have to sift through a 
>> myriad of privilege grants in what amounts to a time consuming trial and 
>> error exercise until you've got them all?
>> 
>> Or is there a single command that with just delete the role and do a blanket 
>> grant removal at the same time?
>> 



Re: Backing out of privilege grants rabbit hole

2020-04-02 Thread Guyren Howe
https://www.postgresql.org/docs/12/sql-drop-owned.html 


> On Apr 2, 2020, at 20:34 , AC Gomez  wrote:
> 
> Do I understand correctly that if a role was assigned countless object 
> privileges and you want to delete that role you have to sift through a myriad 
> of privilege grants in what amounts to a time consuming trial and error 
> exercise until you've got them all?
> 
> Or is there a single command that with just delete the role and do a blanket 
> grant removal at the same time?



Re: Join help, please

2020-03-18 Thread Guyren Howe
The three types of thing (permitted_work; employee; work_type) don’t stand in a 
1:1 relationship with each other. You might have multiple work_types or 
permitted_work for each employee, I’m guessing.

Each existing combination produces one row in the result. So an employee with 
three permitted_works and 4 work types will produce 12 rows in the joined 
result.

If you want one row per employee, you might consider using array_agg with 
group_by to collapse the multiple work_types or permitted_works into arrays 
alongside the employee information.
On Mar 18, 2020, 11:51 -0700, stan , wrote:
> I am confused. given this view:
>
>
> AS
> SELECT
> employee.id ,
> work_type.type ,
> permit ,
> work_type.overhead ,
> work_type.descrip
> from
> permitted_work
> inner join employee on
> employee.employee_key = permitted_work.employee_key
> inner join work_type on
> work_type.work_type_key = work_type.work_type_key
> ;
>
> Why do I have 38475 rows, when the base table only has 855?
>
> My thinking was that the inner joins would constrain this view to the rows
> that exist in the base (permitted_work) table.
>
> Clearly I am misunderstanding something basic here.
>
>
> --
> "They that would give up essential liberty for temporary safety deserve
> neither liberty nor safety."
> -- Benjamin Franklin
>
>


Re: Advice request : simultaneous function/data updates on many databases

2020-03-04 Thread Guyren Howe
On Mar 4, 2020, at 14:33 , Rory Campbell-Lange  wrote:
> 
> Essentially we wish to reduce the window where the frontend and backend
> aren't synchronised.
> 
> If we have (for example) 200 databases which each take 2 seconds to
> update, a client could be on the wrong frontend code for over 6 minutes.

Send each of the servers a PL/PGSQL method that executes all the things in a 
transaction and then waits until the same clock time to commit. Then all the 
servers are committing at the same moment. They will still be out of synch 
somewhat, but this would reduce the degree.


table name "unnest" specified more than once

2020-02-27 Thread Guyren Howe
# select
 * 
from 
unnest(array[array['a', 'b'], array['c', 'c']]), 
unnest(array[array['1', '2'], array['3', '4']]);
ERROR:  42712: table name "unnest" specified more than once

I’m trying to cross-join multiple two-dimensional arrays, expecting to retain 
the inner arrays. I’ve been trying for hours without luck; the ever-esoteric 
SQL syntax foils me at every turn.

It’s a shame I can’t get the unnest function not to just concatenate the inner 
arrays if I just put a bunch of them. This doesn’t strike me as optimal 
behavior.

For more context, I’m trying to make a system of functions to score a Texas 
Hold ‘Em game. So I have a card type consisting of a pair of suit and rank, and 
I’m tossing them about. The cross-join is so I can build all candidate hands 
for scoring. I’m trying to create a function I can call like this:

select
best_hands_with_river(
array[
c('H', 'K'), 
c('D', 'A')
],
array[
c('C', '2'),
c('C', 'K'),
c('S', 'K'),
c('H', 'A'),
c('C', 'A')
])

Here, c is a function that constructs a card type. Card is a ROW(varchar, 
varchar).

So: how do I cross-join three identical arrays of my card type?


Re: a proposal for a new functionality: "SELECT * [EXCEPT col1 [,col2]]

2020-02-25 Thread Guyren Howe


> On Feb 25, 2020, at 11:28 , Paul A Jungwirth  
> wrote:
> 
> On Tue, Feb 25, 2020 at 2:18 AM Stanislav Motyčka
> mailto:stanislav.moty...@gmail.com>> wrote:
>> Sometimes (for tables with many columns) it would be better and easier to 
>> write "SELECT" statement with clause "EXCEPT":
>> "SELECT * [EXCEPT col1 [,col2]] FROM …"

The single biggest benefit is that I could write many more views that don’t 
need to be changed when I change the underlying table.





Re: Returning a table from a function, that requires multiple selects?

2019-08-30 Thread Guyren Howe
On Aug 30, 2019, at 13:03 , stan  wrote:

> I need to encapsulate, what are basically 2 related function calls into a 
> single
> function. The result of each of th calls is a date type.
> 
> y current thinking is to return a 2 row table with the 2 dates in it. But, I 
> seem to
> be having issues getting this to work.
> 
> Is it possible for a function to return a table with results from multiple
> queries?

You could just return a tuple VALUES(a, b). Or you could define a type to 
return if you want to get fancy.

Re: Reproducing query plans in empty database: can I just copy stats and settings?

2019-07-29 Thread Guyren Howe
On Jul 29, 2019, at 12:25 , Adrian Klaver  wrote:
> 
> On 7/29/19 12:07 PM, Guyren Howe wrote:
>> I work with Protected Health Information so have restricted access to an 
>> important database.
>> I would like to look into index and query changes for that database. It 
>> occurs to me that if I was able to reproduce the statistics and cost 
>> settings for the database, then Explain would produce identical results to 
>> the production database.
>> Can do this? Any other advice?
> 
> So what defines restricted access?
> 
> If you can't see/use the data in the production database in your test 
> database I'm not sure how copying the statistics/cost settings is going to 
> help.

Isn’t that the entirety of what the query plan is based on? I was hoping I 
could then run Explain as if I had the data.


Reproducing query plans in empty database: can I just copy stats and settings?

2019-07-29 Thread Guyren Howe
I work with Protected Health Information so have restricted access to an 
important database.

I would like to look into index and query changes for that database. It occurs 
to me that if I was able to reproduce the statistics and cost settings for the 
database, then Explain would produce identical results to the production 
database.

Can do this? Any other advice?


Re: Request for resolution || Support

2019-07-24 Thread Guyren Howe
Another option would be an app that is constantly connected to Postgres using 
LISTEN/NOTIFY.
On Jul 24, 2019, 22:34 -0700, jay chauhan , wrote:
> Hi Thomas, David/Team,
>
> Thanks you for your response. However we need your confirmation whether my 
> Error/issue as mentioned below will be resolved if we upgrade our PostgreSQL 
> Version.
>
> < compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit">>
> David response: Use a newer version
> Tomas response: Yeah, you should use release 11 for a new project.
>
> My Issue while migrating procedure/function from Oracle to PostgreSQL:
> Error-1)
> ERROR:  cannot begin/end transactions in PL/pgSQL
> HINT:  Use a BEGIN block with an EXCEPTION clause instead.
> CONTEXT:  PL/pgSQL function 
> icmsuatnew.eload_commission_payout_active(text,text,text,text,text,text) line 
> 486 at SQL statement
> SQL state: 0A000
> David Response on it : Rewrite your code as instructed
>
> Error-2)
> ERROR:  schema "utl_http" does not exist
> LINE 38:     L_HTTP_REQUEST UTL_HTTP.REQ;
>                             ^
> SQL state: 3F000
> Character: 1785
> Thomas response: That's an Oracle thing for doing HTTP requests from PL/SQL.  
> To do that from plpgsql you could try an extension like this one:
> https://github.com/pramsey/pgsql-http
> Or you could write your own function in Python or  favourite PL>.  That's what I'd probably do.
> https://www.postgresql.org/docs/11/plpython-funcs.html
>
>
> Need your urgent help.
>
> Regards,
> Jay Chauhan
> +918802766181
>
>
> > On Mon, Jul 22, 2019 at 3:53 AM Thomas Munro  wrote:
> > > On Mon, Jul 22, 2019 at 8:49 AM David G. Johnston
> > >  wrote:
> > > > On Sunday, July 21, 2019, jay chauhan  wrote:
> > > >> < > > >> x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red 
> > > >> Hat 4.8.5-4), 64-bit">>
> > > >
> > > > Use a newer version
> > >
> > > Yeah, you should use release 11 for a new project.
> > >
> > > https://www.postgresql.org/support/versioning/
> > >
> > > >> Error-2)
> > > >> ERROR:  schema "utl_http" does not exist
> > > >
> > > > Insufficient data provided to even guess
> > >
> > > That's an Oracle thing for doing HTTP requests from PL/SQL.  To do
> > > that from plpgsql you could try an extension like this one:
> > >
> > > https://github.com/pramsey/pgsql-http
> > >
> > > Or you could write your own function in Python or  > > favourite PL>.  That's what I'd probably do.
> > >
> > > https://www.postgresql.org/docs/11/plpython-funcs.html
> > >
> > > Google should help you find examples of people using plpython to
> > > access the various Python HTTP modules.
> > >
> > > --
> > > Thomas Munro
> > > https://enterprisedb.com


Elastic Search much faster at statistics?

2019-07-08 Thread Guyren Howe
I find this… surprising. Caching?

http://blog.nrowegt.com/database-vs-elasticsearch-speed-column-statistics/


Re: Rules and decision logic triggered by / acting on JSONB nodes in Postgres

2018-09-01 Thread Guyren Howe
On Sep 1, 2018, at 7:14 , Amit Kothari  wrote:
> 
> Hi all,
> 
> I’m sorry if this question is off-topic for postgres-core, but didn’t know 
> where else to ask.
> 
> We’ve built a workflow and BPM (business process management) tool with pg as 
> the database, and we’d like to explore conditional logic within steps in a 
> workflow being triggers as functions when a certain JSONB node changes.
> 
> An example -> If (form field contains “bob”) then (change state of another 
> object)
> 
> In effect, since all our objects are JSONB - we wanted to “listen” to a 
> change in one JSONB node in order to make a corresponding change in another 
> JSONB node. 
> 
> Can this be either native in postgres, or even a pg function? 
> 
> In general - how do people approach rules and decision logic in Postgres at a 
> very large scale e.g. say 100,000 rules that must be evaluated on a commit? 
> 
> If it makes more sense - please see the example screenshot on our features 
> page at https://tallyfy.com/features - which visualizes conditional branching.

That *looks* like each team gets their own workflows, so you could restrict 
your rules search to the rules for the team that created the change. Anything 
you could do to restrict the number of rules would help, of course.

One option would be to express rules as an (antecedent, consequent) pair, where 
the antecedent is the argument to json_extract_path and the consequent is the 
rest of the arguments to a function that carries out the change — something 
along those lines, in any event.

Iterating through a small set of such rules would be tolerable. If you really 
do have 100,000 rules that have to be applied to each change, my first thought 
is to use a SAT solver. But even there, you would have to compute the results 
of all your antecedents, so it would depend how many of those you have.

You might use a map-reduce farm, but all the machines are going to be hitting 
your database pretty hard.

You might be able to reduce your 100,000 antecedents to something a bit more 
tractable by finding the overlaps between them and so only computing the shared 
parts of antecedents once.

You might want to use LISTEN-NOTIFY so an external logic engine — perhaps using 
map-reduce, perhaps you just have a set of logic engines that each takes some 
part of that 100,000 rules and just brute forces them — gets to know about the 
changes.

I’d need to know some more details about what you’re up to, to give you a more 
specific answer, but that’s what occurs to me given what you’ve told us.



Re: Immutable function WAY slower than Stable function?

2018-08-07 Thread Guyren Howe
On Aug 7, 2018, at 9:10 , Tom Lane  wrote:
> 
> I've had a to-do item to rewrite and improve the SQL function cache
> mechanism for a long time, but I hadn't thought it was high priority.
> Maybe it should be.

Personally, I would love this. Far too few developers dig into doing things in 
the database. Conceptually, doing all the data things in the database should be 
faster, more versatile, more secure etc.

Modern SQL has tools that can make its awfulness tolerable. CTEs and SQL 
functions spring immediately to mind. But both suffer from optimization fences 
and corner cases like this that arguably make it not worth the effort.


Re: Code of Conduct plan

2018-06-03 Thread Guyren Howe
On Jun 3, 2018, at 16:08 , Gavin Flower  wrote:
> 
> Be very careful in attempting to codify 'correct' behaviour!

+1 this is a distraction.


Re: Whither 1:1?

2018-06-01 Thread Guyren Howe
On Jun 1, 2018, at 10:16 , Olivier Gautherot  wrote:
> 
> You will get a benefit in terms of space only if the optional fields in the 
> second table exist in a reduced number of instances - and the second table is 
> significantly wider. This can make a difference on big tables but this gain 
> may be offset by the cost of the join. In this perspective, I don’t think 
> that there is a clear benefit or drawback: it should be evaluated on a 
> case-by-case basis.

It seems to me that people take time to catch up with modern hardware reality. 
SSDs reduce seek time to virtually zero. Surely, joins are now much, much 
cheaper. If so, I’m inclined to describe wide tables as a premature 
optimization.



Re: Whither 1:1?

2018-06-01 Thread Guyren Howe
On Jun 1, 2018, at 10:00 , James Keener  wrote:
> 
> I don't think I fully understand. Do you mean all pk using a single sequence? 
> I’m not sure how this would avoid nulls or grouping fields.

Presumably, you would create the multiple tables together or after some main 
one or something where you know the PK you’re going to use. Perhaps you create 
a view with triggers to present them as a single table.

It could avoid nulls by not having rows in some of the 1:1 tables. Say you have 
a users table with login credentials and also personal information (phone 
number, email and such). But some of your users are strictly API users so have 
no personal information. If you split the person information into a separate 
table, you can just not have rows for those users.

One of the originators of the relational model didn’t want to have null at all, 
presumably wanting folks to use this pattern instead.



Whither 1:1?

2018-06-01 Thread Guyren Howe
It’s come to my attention that what seems an obvious and useful database design 
pattern — 1:1 relations between tables by having a shared primary key — is 
hardly discussed or used.

It would seem to be a very simple pattern, and useful to avoid storing nulls or 
for groups of fields that tend to be used together.

Thoughts? Is there some downside I can’t see?


Re: Rationale for aversion to the central database?

2018-04-27 Thread Guyren Howe
On Apr 27, 2018, at 8:45 , Basques, Bob (CI-StPaul) 
 wrote:
> 
> Just chiming in  . . . we’ve taken a somewhat different approach and actually 
> encourage our programmers to build out thier own DBs.  We’re using Postgres 
> to aggregate many varied datasources into postgres as a cahing system, and 
> then develop against this aggregated data.  Yes, we understand that the 
> realtime stuff is a laggy process.  The intent here is to give the 
> programmers a background in DB operation and performance expectations, 
> especially since we primarilry build for Web Use.

I’d love to see a blog post about this.



Rationale for aversion to the central database?

2018-04-08 Thread Guyren Howe
I am a Rails developer at a medium-large size company. I’ve mostly worked at 
smaller companies. I’ve some exposure to other web development communities.

When it comes to databases, I have universally encountered the attitude that 
one should treat the database as a dumb data bucket. There is a *very* strong 
aversion to putting much of any business logic in the database. I encounter 
substantial aversion to have multiple applications access one database, or even 
the reverse: all abstraction should be at the application layer.

My best theory is that these communities developed at a time when Windows was 
more dominant, and just generally it was *significantly* easier to use MySQL 
than Postgres for many, particularly new, developers. And it is pretty 
reasonable to adopt an aversion to sophisticated use of the database in that 
case.

This attitude has just continued to today, even as many of them have switched 
to Postgres.

This is only a hypothesis. I am now officially researching the issue. I would 
be grateful for any wisdom from this community.


Aside: it is rare to find a situation in life or anywhere where one widely 
adopted thing is worse in *every way* than another thing, but this certainly 
was and largely still continues to be the case when one compares MySQL and 
Postgres. So why do folks continue to use MySQL? I find this mystifying.


Re: single sql, multiple schemas, one result set

2018-04-03 Thread Guyren Howe
Make a view that joins all the things, with a column providing the name of the 
schema that they came from.

> On Apr 3, 2018, at 10:47 , PegoraroF10  wrote:
> 
> Suppose a DB with dozens of schemas with same structure.
> DB
>  Schema1
>Table1
>Table2
>  Schema2
>Table1
>Table2
>  Schema3
>Table1
>Table2
> Then we want to execute a SQL on specific schemas and the result of it could
> be a UNION ALL. So, how could be a function that runs that SQL on each
> schema and results just on result set ?
> 
> Then I would use something like ...
> select * from SQLFromMultipleShema(Array['Schema1,Schema3'],'select count(*)
> from Table2');
> 
> 
> 
> 
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
>