Re: (When) can a single SQL statement return multiple result sets?

2024-04-15 Thread Merlin Moncure
On Mon, Apr 15, 2024 at 10:24 AM Jan Behrens 
wrote:

> On Wed, 10 Apr 2024 19:02:48 -0400
> Tom Lane  wrote:
> >
> > > Here, "DELETE FROM magic" returns multiple result sets, even though it
> > > is only a single SQL statement.
> >
> > Right, so it's kind of a case that you have to support.  We're not
> > likely to rip out rules anytime soon, even if they're a bit
> > deprecated.
>
> As it seems to be a corner case that rarely occurs in practice, I was
> considering to simply not support this case in my client library. I
> don't know which SQL error code I could return in that case though.
> Maybe "0A000" (feature_not_supported) or
> "21000" (cardinality_violation). Not sure if either of those is a good
> choice. Any better idea?


If you are asking if "rules" can be ignored or error-walled in terms of
your library design, I'd say yes.  100% yes.

The main caveat would then be the proposed multi-resultset stored procedure
feature, which might break the 'one result per semicolon' assumption you
might be chasing as it has some basis in the standard, so I'd be balancing
risk/reward against that feature IMO if I were you.

merlin


Re: (When) can a single SQL statement return multiple result sets?

2024-04-10 Thread Merlin Moncure
On Wed, Apr 10, 2024 at 4:22 PM Jan Behrens  wrote:

> Hello,
>
> While writing a PostgreSQL client library for Lua supporting
> Pipelining (using PQsendQueryParams), I have been wondering if there
> are any single SQL commands that return multiple result sets. It is
> indeed possible to create such a case by using the RULE system:
>
> db=> CREATE VIEW magic AS SELECT;
> CREATE VIEW
> db=> CREATE RULE r1 AS ON DELETE TO magic
> db-> DO INSTEAD SELECT 42 AS "answer";
> CREATE RULE
> db=> CREATE RULE r2 AS ON DELETE TO magic
> db-> DO ALSO SELECT 'Hello' AS "col1", 'World!' AS "col2";
> CREATE RULE
> db=> DELETE FROM magic; -- single SQL statement!
>  answer
> 
>  42
> (1 row)
>
>  col1  |  col2
> ---+
>  Hello | World!
> (1 row)
>
> DELETE 0
>
> Here, "DELETE FROM magic" returns multiple result sets, even though it
> is only a single SQL statement.
>


 I guess you should have named your table, "sorcery", because that's
what this is.  In the corporate world, we might regard the 'CREATE RULE'
feature as a 'solution opportunity'  :-).  You might be able to overlook
this on your end IMO as the view triggers feature has standardized and
fixed the feature.

> why can't I write a stored procedure or function that returns multiple
result sets?

Functions arguably should not be able to do this, doesn't the standard
allow for procedures (top level statements invoked with CALL) to return
multiple results?

merlin


Re: Timing out A Blocker Based on Time or Count of Waiters

2024-03-25 Thread Merlin Moncure
On Fri, Mar 22, 2024 at 11:25 AM Fred Habash  wrote:

> Facing an issue where sometimes humans login to a database and run DDL
> statements causing a long locking tree of over 1000 waiters. As a
> workaround, we asked developers to always start their DDL sessions
> with 'SET lock_timeout = 'Xs'.
>
> I reviewed the native lock timeout parameter in Postgres and found 7. None
> seem to be related to blocker timeouts directly.
>
> idle_in_transaction_session_timeout
> idle_session_timeout
> lock_timeout: How long a session waits for a lock
> statement_timeout
> authentication_timeout
> deadlock_timeout
> log_lock_waits
>
> Instead, I put together a quick procedure that counts waiter sessions for
> a given blocker and terminates it if waiter count exceeds a threshold.
>
> Is there not a native way to ...
> 1. Automatically time out a blocker
> 2. A metric that shows how many waiters for a blocker?
>


I guess this probably does not belong in the native codebase because in
most real world scenarios with contention you would end up with priority
inversion or a situation where no work gets done.  With current locking
rules, theoretically the work queue would always clear (assuming the locker
doesn't hold the transaction indefinitely), where with your setting enabled
it might not always assume the locker retries.

In your case, a hand written 'unblocker' script might be the way to go, or
(probably better) encourage patterns where critical tables are not blocked,
say by building up a scratch table and swapping in on a separate
transaction.   Reducing contention rather than mitigating the symptoms of
it, is *always* a good thing.

merlin


Re: PostgreSQL as advanced job queuing system

2024-03-25 Thread Merlin Moncure
On Mon, Mar 25, 2024 at 4:43 AM Dominique Devienne 
wrote:

> On Sat, Mar 23, 2024 at 3:13 AM Merlin Moncure  wrote:
>
>> On Fri, Mar 22, 2024 at 6:58 AM ushi  wrote:
>>
>>> the idea to implement a job queuing system using PostgreSQL.
>>>
>>
>> I wrote an enterprise scheduler, called pgtask, which ochestates a very
>> large amount of work [...]
>>
>
> Hi. Anything you can share? OSS? Doesn't look like it...
> If it's not, a more details higher level architecture overview would be
> nice.
>

let me float that, I would love to project-ize this.  Stay tuned

merlin

>


Re: PostgreSQL as advanced job queuing system

2024-03-22 Thread Merlin Moncure
On Fri, Mar 22, 2024 at 6:58 AM ushi  wrote:

> Hello List,
>
> i am playing with the idea to implement a job queuing system using
> PostgreSQL. To meet requirements the system needs to offer some advanced
> features compared to "classic" queuing systems:
>
> - users can create new queues at any time
> - queues have priorities
> - priorities of queues can change at any time
> - jobs in queues with the highest priority should be processed first
>

 You can definitely do this.  I wrote an enterprise scheduler, called
pgtask, which ochestates a very large amount of work each night..   Here's
a couple of screenshots.  It's running a distributed analytics
enterprise analytics batch environment in the vein of airflow.  Here's a
couple of screenshots.   It's a single threaded stored
procedure architecture that uses dbink calls to distribute the work.

https://imgur.com/a/UqZt63V

https://imgur.com/a/68wyWBL

I'm running at a pretty high scale and it's reliable.  It's not really
positioned as a proper queue, but more of a graph orchestration system, and
it supports realtime and quasirealtime.  The main bottleneck is that dblink
does not have epoll style 'wait for first to finish or fail' (it really
needs to), forcing a loop in sql which bounds the active connections a bit.

Being able to manage the state explicitly in the database is wonderful, if
you know what you're doing.

merlin


Re: Clarification regarding managing advisory locks in postgresql

2024-02-08 Thread Merlin Moncure
On Thu, Jan 25, 2024 at 4:44 AM Sasmit Utkarsh 
wrote:

> Okay Thanks. Also please help me understand the below scenarios
>
> From the above statement, I understand is (please correct if I'm wrong
> here), When we fork a client process, each process gets its own database
> connection or transaction context.
>
So far so good


> Therefore, locks acquired in one process (or transaction) do not directly
> affect locks in another process (or transaction).
>
Not following you here. By definition, a lock impacts other processes;
that's the entire purpose.  The affect other processes in that two
processes cannot take a lock on the same thing at the same time.


> Now, I'm faced with another situation where I'm using libpq in C as client
> programs and while calling some function it acquires pg_advisory_lock for
> the request  with some identifier in transaction A. This can be thought
> of as “lock the operation with id = X”  and then make some SQL
> requests(retrieve) from the database. During that if it forks into another
> process B,
>

Client side code should not fork and preserve connections across the fork.
This is multi-threaded access to a connection, and generally speaking you
should not have 2+ threads hitting the same connection returned from
libpq.  This is undefined behavior, so that your questions below this I
suspect are moot.

merlin

>


Re: Moving to Postgresql database

2024-01-17 Thread Merlin Moncure
On Tue, Jan 16, 2024 at 11:05 AM Dominique Devienne 
wrote:

> On Tue, Jan 16, 2024 at 5:07 PM Adrian Klaver 
> wrote:
>
>> On 1/16/24 00:06, Dominique Devienne wrote:
>> > On Mon, Jan 15, 2024 at 5:17 AM veem v > > > wrote:
>> > Is any key design/architectural changes should the app development
>> > team [...], should really aware about
>> > Hi. One of the biggest pitfall of PostgreSQL, from the app-dev
>> perspective,
>> > is the fact any failed statement fails the whole transaction, with
>> > ROLLBACK as the only recourse.
>>
>> "SAVEPOINT establishes a new savepoint within the current transaction.
>>
>
> I wish it was that easy.
> I've been scared away from using them, after reading a few articles...
> Also, that incurs extra round trips to the server, from the extra commands.
>

Er, *every* statement incurs a round trip to the server.   Candidly, I'm
not sure your point here is entirely thought through, unless you are taking
it to mean when writing ad hoc sql written to the console rather than
generally. The savepoint infrastructure directly implements transaction
control, and does it really well.  It's both classic, broadly implemented,
and standard.

If you are concerned about round trips in general, you'd want to move to a
function or a procedure, where you have classic exception handling, if/else
blocks, etc, and there are no round trips.  postgres really rewards mastery
of server side development practices.

merlin

>


Re: Help needed for the resolution of memory leak

2024-01-17 Thread Merlin Moncure
On Wed, Jan 17, 2024 at 1:14 PM Sasmit Utkarsh 
wrote:

> Hi Merlin et al.
>
> Below are some couple of observations attached as an "overview_of_code"
> and other attachments "function_def_other_details" and leak sanitizer
> report. Please assist with some clarifications given in overview_of_code
> with (***). Let me know if you need any more information
>

***How do we handle for the case clearing when PGresult object is
assigned a pointer to the data of the specified field within the
existing PGresult object?
i.e when SQL_get_tpf_rw() actually completes in each iteration?



It is your responsibility to close PGResult and PGConn objects.  Each
one created must be cleaned up.  This is basic libpq usage.  I suggest
studying the documentation.



Start here: https://www.postgresql.org/docs/current/libpq-exec.html

Also Study here: https://www.postgresql.org/docs/current/libpq-example.html


You should not reuse a pointer unless you have cleared the object first.


Is the leak reported due to improper handling of the above case ?
or is it due to some other flow

Your leaks look mostly due to not cleaning PGResult.  However, the
real issue here is you need to learn basic libpq usage a little
better...try writing a smaller program and see when it starts to
complain about leaks.


merlin


Re: Help needed for the resolution of memory leak

2024-01-16 Thread Merlin Moncure
On Tue, Jan 16, 2024 at 9:10 AM Sasmit Utkarsh 
wrote:

> Hi Merlin et al.
>
> I have tried to have the above change added in the missing places. Still,
> I see the below leaks reported by the address sanitizer. Please see
> the attachments for the leak reported and the function definition updated.
> Not sure for PQexecPrepared if we call PQclear(res) for cleaning up as
> well. let me know if you need any more information
>

You have many other leaks.See code below, you are making up to thee
sequential calls to create a result before calling clear.  *All *calls
creating and returning returning PGresult have to be cleared before the
pointer is reused.

merlin



LOG_DEBUG("%s() conninfo=%s",__func__,conninfo);

if(is_shadow_db)
{
shadow_db_conn = PQconnectdb(shadow_db_conn_info);
if ( PQstatus(shadow_db_conn ) != CONNECTION_OK )
{
   LOG_ERROR("Connection to shadow database failed! %s",
PQerrorMessage(conn));
   PQfinish(shadow_db_conn);
   exit(1);
}
*res *= PQexec(shadow_db_conn, "SET bytea_output = 'escape'");
LOG_DEBUG("%s() Connection to shadow_shc_data database 
SUCCESSFUL",__func__);
// execute_stored_procedure(shadow_db_conn);
}

conn = PQconnectdb(conninfo);
if ( PQstatus(conn) != CONNECTION_OK ) {
LOG_ERROR("Connection to database failed! %s", 
PQerrorMessage(conn));
PQfinish(conn);
exit(1);
} else {
*res =* PQexec(conn, "SET bytea_output = 'escape'");
LOG_DEBUG("%s() Connection to shc_data database 
SUCCESSFUL",__func__);
}

*res *= PQexec(conn, "START TRANSACTION");
if (PQresultStatus(res) != PGRES_COMMAND_OK)
{
LOG_ERROR("START TRANSACTION failed: %s", PQerrorMessage(conn));
SQL_exit_nicely(conn,res);
}
PQclear(res);


Re: Help needed for the resolution of memory leak

2024-01-15 Thread Merlin Moncure
On Mon, Jan 15, 2024 at 11:32 AM Sasmit Utkarsh 
wrote:

> Hi Team,
>
> I am trying to test a code which basically tries to read some data from
> postgresql db in a loop through a function SQL_get_tpf_rw() whose
> definition and other details are shared in the attached file along with the
> memory leak report resulted during testing. Could you let me know if i
> missed calling  anywhere PQclear()
> in SQL_get_tpf_rw() which resulted in the below .
>
> Last few lines from the report
>
>
yep:  for example,

  if (PQresultStatus(*res*) != PGRES_TUPLES_OK)
{
LOG_ERROR("SELECT failed: %s", PQerrorMessage(conn));
LOG_DEBUG("ROLLBACK TRANSACTION AND CHAIN");
*res* = PQexec(conn,"ROLLBACK TRANSACTION AND CHAIN");
LOG_ERROR("INVALID_FILE_ADDRESS %08X",fa);
rc = ERR_INVALID_FILE_ADDRESS;


See highlighted bits.  You're reusing the res object before clearing
it.  there may be other cases, but this jumped off the page.


merlin


Re: Help understand why DELETE is so slow

2023-12-04 Thread Merlin Moncure
On Mon, Dec 4, 2023 at 3:43 PM Ping Yao  wrote:

>
> Hi Merlin,
>
> Thank you for the comment. I think we do. What I found odd here is the
> Query Plan (with analyze) clearly shows the actual execution to be very
> quick, that's why I don't think it's an index problem. What do you think?
>

I'm not sure with citus. it isn't hard to verify; just check the referring
tables and ensure any relating key is indexed.  Perhaps the citus based
plans don't surface that item -- hard to say.   However, when you have
complaints about slow deletes, this is always first thing to check.  Second
thing is slow triggers.

merlin

>


Re: Help understand why DELETE is so slow

2023-12-04 Thread Merlin Moncure
On Fri, Dec 1, 2023 at 11:48 AM Ping Yao  wrote:

> Hello All.
>
> First timer here, long time user though. Thank you in advance.
>
> Can someone help me understand why my simple DELETE query is so slow to
> run?
>

Not familiar with citus to diagnose any citus issues.  Having said that,
can you confirm that  all tables that have a foreign key reference to this
table have an index on the field referencing?

For example, if you have a table customer, with a column 'customer_id', a
table 'customer_employee' might refer to the customer table if it has
'customer_id REFERENCES customer', which would be very slow with deletes on
customer.  that may not be happening here with the citus stuff, hard to
tell.  Something to rule out though.

merlin


Re: Prepared statements versus stored procedures

2023-11-20 Thread Merlin Moncure
On Mon, Nov 20, 2023 at 4:07 AM Laurenz Albe 
wrote:

> On Sun, 2023-11-19 at 17:30 +, Simon Connah wrote:
> > I was reading about prepared statements and how they allow the server to
> > plan the query in advance so that if you execute that query multiple
> times
> > it gets sped up as the database has already done the planning work.
> >
> > My question is this. If I make a stored procedure doesn't the database
> > already pre-plan and optimise the query because it has access to the
> whole
> > query? Or could I create a stored procedure and then turn it into a
> prepared
> > statement for more speed? I was also thinking a stored procedure would
> help
> > as it requires less network round trips as the query is already on the
> server.
>
> Statements in functions and procedures don't get planned until the function
> or procedure is called for the first time.  These plans don't get cached
> unless
> the procedural language you are using has special support for that.
>
> Currently, only functions and procedures written in PL/pgSQL cache
> execution
> plans of static SQL statements.  And you are right, that is usually a good
> thing.
>

Adding to this,
Stored procedures and functions can provide really dramatic speedups by
eliminating round trips between statements where with traditional
programming approaches you have to bring all the data back to the client
side just to transform, run logic, etc, only to send it back to the
database.  How much benefit this provides is really situation specific, but
can be impactful in many common situations.

Also, they provide the benefit of hiding schema details and providing a
"database API" in situations where you want the application contract to the
database to be written against the query output (perhaps in json) vs the
schema.  This pattern is controversial in some circles but I employ it
often and it runs well.  It can also be comforting not to rely on client
side code to properly frame up the transaction.

This is only touching the surface -- there are many, many advantages to
server side programming and it is a tremendously valuable skill to learn
and master.  The main downside, of course, is that postgres server
programming can only be used in postgres without modification.

merlin


Re: Postgres Out Of Memory Crash

2023-11-06 Thread Merlin Moncure
On Thu, Nov 2, 2023 at 4:13 AM Avi Weinberg  wrote:

> Hi experts,
>
>
>
> I'm using Patroni Postgres installation and noticed that twice already
> postgres crashed due to out of memory.  I'm using logical replication with
> around 30-40 active subscribers on this machine.  The machine has 128GB but
> only 32GB is allocated to Postgres.  How can I know what is actually
> causing the out of memory issue?  Is it caused by not optimal postgres
> configuration or something else?
>
>
>
> /usr/lib/postgresql/13/bin/postgres -D
> /home/postgres/pgdata/pgroot/data
> --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432
> --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on
> --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0
> --max_locks_per_transaction=64 --track_commit_timestamp=on
> --max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on
>

Couple things here.   You don't really allocate memory to postgres, rather
you set up memory reserved for certain kinds of buffering operations via
shared buffers, and other less important settings.   This memory is
carefully managed, and is likely not underneath your oom condition.
Ironically, increasing shared buffers can make your problem more likely to
occur as you are taking memory from other tasks.

>  --max_connections=533

Probably your problem is at least partially here.   This number was
configured specifically, implying you are running out of connections and
had to crank this value.

If you have a lot of idle connections sitting around (say, if you have
several application servers managing connection pools),  a sudden spike in
memory load and/or connection utilization can cause this to occur.  Suggest
one or more of:
* lowering shared buffers
* implementing pgbouncer and lowering max_connections
* increasing physical memory

Also,
> I'm using logical replication with around 30-40 active subscribers on
this machine.

Hm.  Have you considered streaming replica setup, so that you can attach
read only processes to the standby?

merlin


Re: PgAmin view

2023-10-27 Thread Merlin Moncure
On Thu, Oct 26, 2023 at 5:56 AM Shaozhong SHI 
wrote:

> If a PgAmin view is created and tables it queries are missing, what will
> happen?
>
> If a PdAmin view is created, and it may take 20 hours to complete, what
> will happen?
>
> Would views automatically run, when you start the PgAmin?
>

Views are rewritten into the query when the query is planned.

merlin


Re: psql \r and \e -- what does clearing the query buffer actually do?

2023-10-23 Thread Merlin Moncure
On Mon, Oct 23, 2023 at 1:50 PM Tom Lane  wrote:

> Merlin Moncure  writes:
> > \r (clear 'query buffer'):  Other than dutifully reporting that the query
> > buffer has been cleared, I cannot for the life of me detect any
> observable
> > behavior.
>
> Uh ... surely there's a lot.  For example:
>
> regression=# foo
> regression-# bar;
> ERROR:  syntax error at or near "foo"
> LINE 1: foo
> ^
> regression=# foo
> regression-# \r
> Query buffer reset (cleared).
> regression=# bar;
> ERROR:  syntax error at or near "bar"
> LINE 1: bar;
> ^
> regression=#
>
> > Perhaps, it is reset, only to re-reset to last known prompt when prompt
> > returned, but,
> > \r\e
> > ...claims to rest but the query buffer still contains the last executed
> > command, which will then run with :q! in vim.
>
> What \r clears is the current (active) input buffer.  \e is documented
> thus:
>
> If no filename is given, the current query buffer is copied to a
> temporary file which is then edited in the same fashion. Or, if the
> current query buffer is empty, the most recently executed query is
> copied to a temporary file and edited in the same fashion.
>
> So \r\e will indeed result in editing the previous query.
>
> > Regarding \e, things mostly work sanely, except that there is no way to
> not
> > run a query except to explicitly write a blank buffer back, say with vim
> > :wq after deleting all the lines in the buffer..
>
> It's sufficient to get rid of any trailing semicolon in the editor.
> Again, the man page's discussion of \e seems to me to cover this.
> If you want to suggest some adjustments to make it clearer, we
> could discuss that.
>

Hm, well that explains my misunderstanding, TIL.  The man page for \e at
least is very clear both in terms of behavior and intent.  FWIW The finer
details are not in \?, which i did check (vs the man page, which you
correctly guessed I did not check).  It reports:

Query Buffer
  \e [FILE] [LINE]   edit the query buffer (or file) with external
editor
  \ef [FUNCNAME [LINE]]  edit function definition with external editor
  \ev [VIEWNAME [LINE]]  edit view definition with external editor
  \pshow the contents of the query buffer
  \r reset (clear) the query buffer
  \s [FILE]  display history or save it to file
  \w FILEwrite query buffer to file

The basic problem is that the query buffer is used both to represent the
unsent query text and the query to be sent.

Maybe after the 'Query Buffer' line, something like,
  (An empty query buffer will to refer the last executed query, if any)

...don't know if it's worth adjusting in hindsight, but the \e\p mechanics
are odd per the docs, oh well.

merlin


psql \r and \e -- what does clearing the query buffer actually do?

2023-10-23 Thread Merlin Moncure
Hello all,

Couple things -- after unintentionally running a query for the (what feels
like-) millionth time, it suggested a review of psql query editing
behaviors. Starting with,

\r (clear 'query buffer'):  Other than dutifully reporting that the query
buffer has been cleared, I cannot for the life of me detect any observable
behavior.  If there is in fact one, the results are, with some forbearance,
unintuitive.  Consider:
postgres=# \p
select 0;
postgres=# \r
Query buffer reset (cleared).
postgres=# \p
select 0;

Perhaps, it is reset, only to re-reset to last known prompt when prompt
returned, but,
\r\e
...claims to rest but the query buffer still contains the last executed
command, which will then run with :q! in vim.

Regarding \e, things mostly work sanely, except that there is no way to not
run a query except to explicitly write a blank buffer back, say with vim
:wq after deleting all the lines in the buffer..  However, if you do so,
the previously run query shows up again, for the next \e, which is odd.  Is
this behavior intentional?  Is there something I'm missing?  Apologies if
this has been discussed, cursory search did not turn up results, psql,
query, buffer, etc not being selective terms.

merlin


Re: Inheritance in PostgreSQL

2023-10-18 Thread Merlin Moncure
On Wed, Oct 18, 2023 at 7:43 AM Thomas Kellerer  wrote:

> Merlin Moncure schrieb am 18.10.2023 um 03:20:
> > The only thing you can't really do in SQL easily without writing
> > nasty triggers are things like, 'this table must be linked from one
> > and only one of these candidate tables'.  I think the language
> > probably ought to support this, but I don't think postgres would
> > unless the standard did.
>
> Isn't that what assertions are intended to solve in the SQL standard?
>

Hm, interesting,  ...maybe, maybe not.   SQL '92 -- wow.  guessing
why not:  Are they deferrable, always deferred? Doesn't seem like it,
simple inheritance cases would seem problematic otherwise, chicken/egg
conditions. This might explain why they are not in common use despite being
innovated 30 years ago.  Here is an Oracle discussion on the topic:

https://forums.oracle.com/ords/apexds/post/sql-assertions-declarative-multi-row-constraints-8418

...with the discussion implying that interaction with the transaction state
may be an unsolved problem, at least within oracle.  I suspect there may
also be challenges relating to performant implementation.  Sadly, I think
the correct approach remains to utilize complex triggers, or nothing.

merlin


Re: Inheritance in PostgreSQL

2023-10-17 Thread Merlin Moncure
On Tue, Oct 17, 2023 at 5:36 PM Jeff Laing 
wrote:

> “Don’t use table inheritance” IS on that page
>

Yeah, inheritance probably would have been removed a long time ago except
that it was underneath the partitioning feature.  The basic idea of
implementing polymorphic record storage essentially hacks SQL in a way it
doesn't want to be hacks, thus the feature never really worked properly.
If you want variant storage, use hstore, json, etc. or use the various
techniques that split the entity across multiple tables.

The only thing you can't really do in SQL easily without writing nasty
triggers are things like, 'this table must be linked from one and only one
of these candidate tables'.  I think the language probably ought to support
this, but I don't think postgres would unless the standard did.  I try to
avoid handwriting RI when possible in triggers, but in this case it's the
only approach that works within language constraints and can
formally validate the model.

merlin

>


Re: Ad hoc SETOF type definition?

2023-09-27 Thread Merlin Moncure
On Tue, Sep 26, 2023 at 1:15 PM Tom Lane  wrote

> it clear that the type exists independently of the function.  (Our
> behavior of automatically making composite types for tables seems to
> me to have been a rather unfortunate choice.)
>

I really like this behavior and exploit it heavily, in order to,
*) use tables as variable inputs to functions
*) record marshalling, populate_record, etc
*) type safe variable declaration in plpgsql functions
*) arrays of table type for temporary storage (jsonb is eating into this
case though)
*) dblink replication tricks to migrate data across the wire (baroque with
fdw, but still useful in ad hoc coding)

Granted, from the classic sql programming perspective, this is all
highly exotic and weird.  There is an organic beauty though in deep sql or
plpgsql coding that comes out and a lot of it is from the type system :).

In fact, I find the concept that 'tables are types' (which I think you are
implying should ideally not be the case by default) is so brilliant and
profound that it is really what sets postgresql apart from competitive
offerings.  Granted, you can do all of the same things with composite
types, json, etc, but tables often do the job handily and safely with less
programming effort and the type naturally extends with the table refinement
over time.

I find that the opposite case, basically, to create composite types is
increasingly rare in practice, with jsonb handling transient and unsafe
cases, and 'table created types' covering most of the rest.  A lot of it
comes down to style I guess.

merlin


Re: Large scale reliable software system

2023-06-27 Thread Merlin Moncure
On Mon, Jun 26, 2023 at 6:49 PM B M  wrote:

> Dear all,
>
> After greeting,
>
> I taught PostgreSQL myself and developed a small scale experimental
> software system using PostgreSQL in the back-end.
>
> I would like to know your advices to develop a large scale reliable
> software system using PostgreSQL in the back-end, through which i can share
> the storage with the different system users where they login to the system
> through the web application front-end with different passwords and
> usernames , save the privacy of each user data, improve overall system
> security and performance, achieve fast response, make backups and save the
> stored data from loss. The system will be hosted on a cloud.
>
> Thank you in advance.
>

* your sql is code, and treat it as such, check it into git etc
* robust deployment strategy is essential to scaling team
* write a lot of tests
* become intimate with pg_stat_statements
* keep your transactions as short as possible while preserving safety
* avoid developers who advocate for keeping business logic out of the
database religiously (controversial)
* try to avoid assumptions that only one technical stack interacts with
your database
* do not waste time making ERDs use a tool that generates them (i like
schemaspy)
* test your DR strategy before disaster strikes
* think about security model up front

merlin


Re: Question about where to deploy the business logics for data processing

2023-06-12 Thread Merlin Moncure
On Thu, Jun 8, 2023 at 10:22 PM Nim Li  wrote:

> I wonder if anyone in the community has gone through changes like this?  I
> mean ... moving the business logics from PL/SQL within the database to the
> codes in NestJS framework, and reply on only the TypeORM to manage the
> update of the database without any SQL codes?  Any thoughts about such a
> change?
>

Heads up, this is something of a religious database debate in the industry,
and you are asking a bunch of database guys what they think about this, and
their biases will show in their answers.

Having said that, your developers are utterly, completely, wrong. This is
classic, "my technology good, your technology bad", and most of the reasons
given to migrate the stack boil down to "I don't know SQL any will do
absolutely anything to avoid learning it", to the point of rewriting the
entire freaking project into (wait for it) javascript, which might very be
the worst possible language for data management.

The arguments supplied are tautological: "SQL is bad because you have to
write SQL, which is bad", except for the laughably incorrect "sql can't be
checked into git".  Guess what, it can (try git -a my_func.sql), and there
are many techniques to deal with this.

Now, database deployments are a very complex topic which don't go away when
using an ORM.  in fact, they often get worse.  There are tools which can
generate change scripts from database A to A', are there tools to do that
for NestJS object models? Is there automatic dependency tracking for them?
Next thing you know, they will moving all your primary keys to
guids ("scaling problem, solved!") and whining about database performance
when you actually get some users.

WHY is writing SQL so bad?  Is it slower? faster? Better supported?
plpgsql is very highly supported and draws from a higher talent pool than
"NestJS".  Suppose you want to mix in some python, enterprise java, to your
application stack. What then?

ORMs are famously brittle and will often break if any data interaction to
the database does not itself go through the ORM, meaning you will be
writing and deploying programs to do simple tasks.  They are slow,
discourage strong data modelling, interact with the database inefficiently,
and do not manage concurrent access to data well.

merlin


Re: Composite type: Primary Key and validation

2023-06-05 Thread Merlin Moncure
On Mon, Jun 5, 2023 at 12:08 PM Laurenz Albe 
wrote:

> On Mon, 2023-06-05 at 17:06 +0200, Lorusso Domenico wrote:
> > A composite type is useful because I've to add all these information on
> many tables and because
> > it more easy to pass all these informations to functions that have to
> ensure the right format and evaluation.
> >
> > Talking about first point I could use the "table inheritance", but I've
> to reset all constraints for each table :-(
>
> Inheritance is actually a very good way to do this.
> You don't inherit constraints, but at least the column definitions.


Agreed on not using composite type.

Another suggestion, this might be a good idea for range type.

merlin


Re: CTE, lateral or jsonb_object_agg ?

2023-05-22 Thread Merlin Moncure
On Sat, May 20, 2023 at 9:43 AM Marcos Pegoraro  wrote:

> I have a table like pg_settings, so records have name and value.
> This select is really fast, just 0.1 or 0.2 ms, but it runs millions of
> times a day, so ...
>
> Then all the time I have to select up to 10 of these records but the
> result has to be a single record. So it can be done with ...
>
> --Using CTE
> with
> BancoPadrao as (select varvalue from sys_var where name =
> $$/Geral/BancoPadrao$$),
> BancoMatricula as (select varvalue from sys_var where name =
> $$/Geral/BancoMatricula$$),
> BancoParcela as (select varvalue from sys_var where name =
> $$/Geral/BancoParcela$$),
> BancoMaterial as (select varvalue from sys_var where name =
> $$/Geral/BancoMaterial$$)
> select (select * from BancoPadrao) BancoPadrao,
>(select * from BancoMatricula) BancoMatricula,
>(select * from BancoParcela) BancoParcela,
>(select * from BancoMaterial) BancoMaterial;
>

Try the classic method:

select
 max(varvalue) filter (where name = $$/Geral/BancoPadrao$$) as BancoPadrao,
 max(varvalue) filter (where name = $$/Geral/BancoMatricula$$ ) as
BancoMatricula,
 max(varvalue) filter (where name = $$/Geral/BancoParcela$$  ) as
BancoParcela,
 max(varvalue) filter (where name = $$/Geral/BancoMaterial$$  ) as
BancoMaterial
from sys_Var where Name = any('{/Geral/BancoPadrao,/
Geral/BancoMatricula,/Geral/BancoParcela,/Geral/BancoMaterial}'::text[]);

Regardless, these timings are mostly below what I would consider to be the
noise threshold; the actual query runtime is not much compared to the work
the server has to spend setting up the query.  If you want real benefits
for this kind of case, consider things like prepared statements
(PREPARE/EXECUTE) and/or application caching.  You can also consider
organizing "sys_var" into a record instead of a key value store.

merlin

>


Re: what happens if a failed transaction is not rolled back?

2023-04-24 Thread Merlin Moncure
On Mon, Apr 24, 2023 at 8:41 PM Tom Lane  wrote:

> Merlin Moncure  writes:
> > Hm. I also noticed when looking at this that aborted transactions with
> > savepoints are not subjected to the idle_in_transaction timeout which is
> a
> > bit surprising.
>
> Hmm ... I think it's intentional that idle_in_transaction no longer
> applies once the transaction has failed.  But if there's a live
> savepoint, then we should enforce it since resources may still be
> held.  Seems like a bug, if your observation is accurate.
>
>
hm,  double checking, it's not.

merlin


Re: what happens if a failed transaction is not rolled back?

2023-04-24 Thread Merlin Moncure
On Mon, Apr 24, 2023 at 4:20 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Mon, Apr 24, 2023 at 12:56 PM David Wheeler 
> > wrote:
> >> Now I’m curious. Does it have the same impact on performance that an
> idle
> >> in transaction connection has? Eg does it prevent vacuum? Does it still
> >> hold locks?
>
> > Absent documentation to the contrary I would expect the system to at best
> > be in an idle-in-transaction state as-if the failed command never was
> > executed.
>
> A quick experiment will show you that we release locks as soon as the
> transaction is detected to have failed.  I believe the same is true of
> other interesting resources such as snapshots (which'd be what affects
> vacuum) but it's less easy to observe that from the SQL level.  At least
> by intention, a failed transaction won't hold any resources that would
> impact other sessions.
>
> > The concept of savepoints, whether in use in a particular
> > transaction, would require at least that much state be preserved.
>
> Of course, we can't release resources that were acquired by a still-live
> subtransaction, a/k/a savepoint.
>


I think testing pg_stat_activity.backend_xid being not null does the trick.
If it's null, it either never took an xid by doing something that is worth
having one assigned after transaction start (including immediately after
procedure commit;), or had one that was released when aborted (if there is
an active savepoint it would keep backend_xid not null).  Of course, you
can't do that from the aborted transaction until it's rolled back first.

Hm. I also noticed when looking at this that aborted transactions with
savepoints are not subjected to the idle_in_transaction timeout which is a
bit surprising.
.

merlin


Re: Converting row elements into a arrays?

2023-03-03 Thread Merlin Moncure
On Thu, Mar 2, 2023 at 3:47 PM Ron  wrote

> I'm used to the bog standard COUNT, AVG, MIN, MAX, SUM. It didn't occur to
> me that there would be others...
>

wait until you find out you can write your own:

CREATE OR REPLACE FUNCTION agg_leftagg(l TEXT, r anyelement) returns text as
$$
BEGIN
  RETURN CASE
WHEN l IS NOT NULL THEN  format('%s-%s', l, r)
ELSE r::TEXT
  END;
END;
$$ LANGUAGE PLPGSQL;

CREATE AGGREGATE leftagg(anyelement) (SFUNC=agg_leftagg, STYPE=TEXT);

CREATE TEMP TABLE s AS SELECT generate_series(1,10) s;

SELECT leftagg(s) FROM s GROUP BY random() > .5;

postgres=# SELECT leftagg(s) FROM s GROUP BY random() > .5;
  leftagg

 2-3-5-6-10
 1-4-7-8-9
(2 rows)

this is why I fell in love with postgres 20 years ago, and never looked back

merlion


Re: Sequence vs UUID

2023-02-09 Thread Merlin Moncure
On Wed, Feb 8, 2023 at 5:33 AM Peter J. Holzer  wrote:

> On 2023-02-08 14:48:03 +0530, veem v wrote:
> > So wanted to know from experts here, is there really exists any scenario
> in
> > which UUID really cant be avoided?
>
> Probably not. The question is usually not "is this possible" but "does
> this meet the requirements at acceptable cost".
>
>
> > Sequence Number = n*d+m+offset. Where n is the sequence order number, d
> is the
> > dimensions of the multi-master replication, m ranges from 0 to n-1 is the
> > number assigned to each node in the replication, and offset is the
> number to
> > offset the sequence numbers.
>
> Yes, you can do this. In fact, people (including me) have already done
> this.
>
> But it's relatively easy to mess this up:
>
> Firstly, you have to make sure that d is larger than your number of
> (active) replicas will ever be, but still small enough that you will
> never overflow. Probably not a problem with 64 bit sequences (if you set
> d to 1E6, you can still count to 9E12 on each node), but might be a
> problem if you are for some reason limited to 32 bits.
>
> Secondly (and IMHO more importantly) you have to make sure each node
> gets its own unique offset. So this needs to be ensured during
> deployment, but also during migrations, restores from backups and other
> infrequent events.


??  All you have to do is ensure each node has its own unique id, and that
id is involved in sequence generation.  This has to be done for other
reasons than id generation, and is a zero effort/risk process.

The id would then contain the identifier of the node that *generated* the
id, rather than the node that contains the id.  This is exactly analogous
to strategies that use mac# as part of id prefix for example.  Once
generated, it's known unique and you don't have to consider anything.
 This is exactly what I do, and there is no interaction with backups,
deployments, migrations, etc.  Node expansion does require that each node
requires a unique node id, and that's it.

merlin


Re: Sequence vs UUID

2023-02-07 Thread Merlin Moncure
On Mon, Feb 6, 2023 at 1:22 PM Peter J. Holzer  wrote:

> On 2023-02-06 20:04:39 +0100, Julian Backes wrote:
> > I don't really understand what you mean by 'performance'. To me it is not
> > surprising that incrementing (I know it is not just incrementing) a
> > 64bit integer is faster than generating 128 bit data with a good amount
> of
> > random data even if it seems to be too slow.
>
> But UUIDs are random and that plays havoc with locality. For example
> consider one table with invoices and another with invoice items. If you
> want to get all the invoices including the items of a single day, the
> data is probably nicely clustered together in the tables. But the join
> needs to look up random ids in the index, which will be spread all over
> the index. In a simple benchmark for this scenario the UUIDs were about
> 4.5 times slower than sequential ids. (In other benchmarks the
> difference was only a few percent)


This is really key.

While many of the people posting here may understand this, all of the
databases I've seen that are written with the UUID pattern appear to be
written by developers oblivious to this fact.  The UUID pattern seems to be
popular with developers who see abstract away the database underneath the
code and might use an ORM and be weaker in terms of database facing
constraint checking.  My direct observation is that these databases scale
poorly and the developers spend a lot of time building tools that fix
broken data stemming from application bugs.

I'm certain this is not the experience of everyone here.  I do however find
the counter sequence arguments to be somewhat silly; partition safe
sequence generation is simple to solve using simple methods. "ID guessing"
is not insecure along similar lines; if your application relies on id
obfuscation to be secure you might have much bigger issues to contend with
IMO.

merlin


Re: Sequence vs UUID

2023-01-26 Thread Merlin Moncure
On Thu, Jan 26, 2023 at 1:18 PM veem v  wrote:

> Hello, We were trying to understand whether we should use UUID or Sequence
> in general for primary keys. In many of the blogs(one is below) across
> multiple databases, I saw over the internet and all are mostly stating the
> sequence is better as compared to UUID. But I think in the case of
> concurrent data load scenarios UUID will spread the contention point
> whereas sequence can be a single point of contention.
>
> So we want to understand from experts here, if there are any clear rules
> available or if we have any pros vs cons list available for each of those
> to understand the exact scenario in which we should go for one over other?
> Basically I wanted to see if we can perform some test on sample data to
> see the percentage of overhead on read and write performances of the query
> in presence of UUID VS Sequence to draw some conclusion in general? And
> also considering open source postgres as the base for many databases like
> redshift etc, so the results which apply to progress would apply to others
> as well.
>
>
> https://www.percona.com/blog/2019/11/22/uuids-are-popular-but-bad-for-performance-lets-discuss/
>
* UUIDs are big for identifer (16 bytes) sequence is 4 or 8
* sequences are NOT a contention point, database uses some tricks to work
around that
* UUIDS are basically random data causing page fragmentation.  this is
particularly bad in auto clustering architectures like sql server
* Also, UUIDS can deliver very poor buffer hit ratios when sweeping ranges
of records on large tables.

merlin


Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-07 Thread Merlin Moncure
On Sat, Jun 4, 2022 at 12:37 AM Tom Lane  wrote:

> Bryn Llewellyn  writes:
> > I'm going to try to think like this:
> > The number of possible spellings of the names of keys in a JSON object
> is some flavor of infinite. So including this in an object:
> > "k": null
> > really is saying something. It says that I do know about "k" and that
> yet I have simply no information available about its value.
>
> I'd read it as asserting that key "k" is meaningful for this object,
> but the correct value for that key is not known.
>
> I have a hard time with your assertion that {"x": 42, "y": null}
> should be considered equivalent to {"x": 42}, because it would
> render key-exists predicates useless.  Either you have to say that
> key "y" is claimed to exist in both of these objects and indeed every
> object, or you have to make it fail if the key's value is null (so that
> it'd say "false" in both of these cases).  Either of those options
> seems both weird and useless.
>

yeah. I would expect for json or jsonb, two values, a, b,
a is distinct from b
should give the same answer as
a::text is distinct from b::text

merlin


Re: [EXT] Re: Accessing composite type elements

2022-06-02 Thread Merlin Moncure
On Thu, Jun 2, 2022 at 12:05 PM Garfield Lewis 
wrote:

> > Binary representations are not too well documented :-(.  However,
> > looking at record_send() helps here.
>
> will have a look…
>
>
> > Right, with the additional proviso that -1 "length" indicates a null
> > field value.
>
> Thx, Tom… never thought the null field…
>


take a look at libpqtypes. it's client side extension library to libpq that
implements the binary protocol.

https://github.com/pgagarinov/libpqtypes

merlin

>


Re: How to explicitly lock and unlock tables in pgsql?

2022-03-18 Thread Merlin Moncure
On Thu, Mar 17, 2022 at 2:52 AM Laurenz Albe  wrote:
>
> On Wed, 2022-03-16 at 20:30 +, Shaozhong SHI wrote:
> > Table locks present a barrier for progressing queries.
> >
> > How to explicitly lock and unlock tables in pgsql, so that we can guarantee 
> > the progress of running scripts?
>
> You cannot unlock tables except by ending the transaction which took the lock.
>
> The first thing you should do is to make sure that all your database 
> transactions are short.
>
> Also, you should nevr explicitly lock tables.  Table locks are taken 
> automatically
> by the SQL statements you are executing.

Isn't that a bit of overstatement?
LOCK table foo;

Locks the table, with the benefit you can choose the lockmode to
decide what is and is not allowed to run after you lock it.  The main
advantage vs automatic locking is preemptively blocking things so as
to avoid deadlocks.

merlin




Re: Couldn't cast to record[]

2022-03-04 Thread Merlin Moncure
On Thu, Mar 3, 2022 at 4:01 AM Suresh Kumar R
 wrote:
>
> Hi, I created an table with composite type array as datatype for one column.
> When I query that table I need the pg_typeof(column) as record[] instead of
> composite_type[].
> I tried creating a separate function and returning record[], but below error
> is thrown,
>
> ERROR:  PL/pgSQL functions cannot return type record[]
>
> Below is sample for my requirement.
>
> pali=# create type address as (number bigint, city varchar);
> CREATE TYPE
> pali=# create table person(name varchar, addresses address[]);
> CREATE TABLE
> pali=# insert into person values ('Michael', array[(82, 'NYC'),(24,
> 'Dunkirk')]::address[]);
> INSERT 0 1
> pali=# select pg_typeof(addresses::record[]) from person ;
>  pg_typeof
> ---
>  address[]
>
> Here I expect record[] instead of address[].


Why do you think you need this? If you need variant record storage,
you probably want to go to jsonb, then use jsonb_populate_recordset to
convert back to specific type.

merlin




Re: Strange results when casting string to double

2022-02-19 Thread Merlin Moncure
On Sat, Feb 19, 2022 at 8:35 AM Tomas Pospisek  wrote:
>
> On 18.02.22 22:42, Peter J. Holzer wrote:
>
> > If there has been a glibc update (or a postgresql update) in those 480
> > days (Ubuntu 14.04 is out of its normal maintenance period but ESM is
> > still available) the running processes may well run different code than
> > a newly started program. So it could be a bug which has since been
> > fixed.
>
> That would be visible via `lsof`. `libc. The file `...libc...so` that
> `postgres` is keeping open would have the text `DEL` (as in deleted) in
> the `FD` column of `lsof`'s output.
>
> As opposed to a newly started program which would have `REG` (regular
> file) there.

If this doesn't bear fruit, are there debugging symbols?  Setting a
breakpoint might produce some insight.

merlin




Re: Can we go beyond the standard to make Postgres radically better?

2022-02-15 Thread Merlin Moncure
On Sun, Feb 13, 2022 at 4:00 AM Pavel Stehule  wrote:
>
>
>
> ne 13. 2. 2022 v 10:45 odesílatel Guyren Howe  napsal:
>>
>>
>> The MySQL autocomplete is designed without context filtering. Maybe we can 
>> have this implementation too (as alternative)
>>
>> so using all column names + all table names + aliases.column names (when we 
>> know defined alias)
>>
>> Another idea about column excluding. Any implementation on the client side 
>> is very complex, because you need to parse sql. But maybe we can enhance SQL 
>> with some syntax.
>>
>> SELECT ... and LAST optional clause can be HIDE, OR EXCLUDE NAME or EXCLUDE 
>> TYPE
>>
>> SELECT * FROM postgistab HIDE TYPE LIKE 'wkb%'
>> SELECT * FROM postgistab ORDER BY ... EXCLUDE COLUMN LIKE 'shape%'
>>
>> WITH x AS (SELECT * FROM xx)
>> SELECT * FROM x EXCLUDE COLUMN x1,x2
>>
>> The column excluding should be separate *last* clase.
>>
>> More with this syntax is less possibility of collision with ANSI SQL
>>
>> Not against this. Seems somewhere in here might be a nice quality of life 
>> change.
>>
>> Still.
>>
>> I originally suggested that SQL is terrible and we should fearlessly either 
>> replace it or migrate it toward something better. And the thread winds up 
>> with a debate about a minor change to a development tool.
>>
>> I’m back to just having no earthly idea why anyone who finds relations to be 
>> a productive tool for building a model would think that SQL being the only 
>> means to do that is Okay.
>
> I think the rating of data langues is very subjective, and I am happy with 
> SQL more than with special languages like D or Quel, or other. I know SQL has 
> a lot of disadvantages, but it was designed for humans and it works for me.

Exactly. SQL is proven to be more productive and code written in it
has longer longevity than alternatives.  It's also generally more
terse in the hands of a good author.  The authors of all the 'SQL
sucks' rants don't really explore why this is the case.  For example,
SQL has transactions and pretty much all other major languages don't.
They may have it in a limited sense but not standardized throughout
the syntax and the standard libraries.  High quality automatic
concurrency models are another factor.


merlin




Re: Can we go beyond the standard to make Postgres radically better?

2022-02-10 Thread Merlin Moncure
On Thu, Feb 10, 2022 at 10:54 AM Merlin Moncure  wrote:

> On Wed, Feb 9, 2022 at 11:15 PM Guyren Howe  wrote:
>
>>
>>
>
>>- *Also nested function definitions, so top-level functions can be
>>   built out of local auxiliary functions.*
>>- *Other languages*
>>   - *Tutorial D, Datalog, Quell, let’s open this puppy up!*
>>   - *SQL is a terrible, no good, very bad language*
>>
>> Postgres used to suport QUEL way back in the day.  Being blunt, there is
> zero chance of this happening in core.  A query language translator out of
> core, preprocessing the language into SQL, might work.   SQL has evolved
> far beyond what those languages could practically offer.   Main benefit
> here would be to better support relational division, which bizarrely has
> yet to arrive in SQL.
>  *A portable, low-level API*
>

FYI, there was a semi-serious commercial attempt to do this back in 2001,
Dataphor. It's been opensourced.  Wikipedia has a pretty good write up on
it:
https://en.wikipedia.org/wiki/Dataphor

IMNSHO suggestions like these should travel this technical path; take the
data language you want and translate it into SQL.  Some of these
translations will end up being complicated (read: slow).  Iterating this
kind of thing outside of core would allow for faster development.

merlin

>


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-10 Thread Merlin Moncure
On Wed, Feb 9, 2022 at 11:15 PM Guyren Howe  wrote:

> Postgres has since the outset gone beyond the SQL standard in many ways :
> types, inheritance, programmability, generality are all well beyond what
> SQL used to mandate and still well beyond the current standard.
>
> There are huge developer benefits available to focusing more on making a
> great relational programming environment, well outside the SQL standard.
>
> Examples of small things Postgres could have:
>
>- *SELECT * - b.a_id from a natural join b*
>   - *let me describe a select list by removing fields from a
>   relation. In the example, I get all fields in the join of  a  and b 
> other
>   than the shared key, which I only get once.*
>
> I could see this as being extremely useful, I would support non standard
ways of subtracting from '*',not necessarily in that syntax.


>
>- *note how this simplifies maintaining views wrt  changes in tables*
>- *Let me put the FROM clause first*
>   - *if I can write FROM a join b SELECT a.height, a.name
>   , b.email then an editor can give me autocomplete when 
> I’m
>   writing the select clause.*
>
> I see the benefit, but it's not big enough to support standard deviation
IMO.  SQL grammar is already very complex and this wouldn't help, even if
it's deterministically parsable.

>
>- *Hierarchical schemas*
>
> This would violate the standard, and would be unclear.  Rules are set,
identifier is [database.].[schema.][table.].column. and your change would
introduce ambiguity.


> *Examples of larger things Postgres might have:*
>
>- *First-class functions.*
>   - *A global-only namespace is dumb. Schemas are only slightly less
>   dumb. The obvious way to store and call functions is as values of 
> fields.
>   Let me organize them how I choose*
>
> Not following this, can you elaborate specific uses?


>
>- *Also let me pass around functions as values, let me make new ones
>   and so on. Postgres is the best relational programming environment 
> already
>   because it has a decent type system and more general features. Let’s 
> expand
>   and also simplify that.*
>
> Ditto above. Mostly we can work around this with dynamic SQL today without
too much effort.

>
>- *Also nested function definitions, so top-level functions can be
>   built out of local auxiliary functions.*
>- *Other languages*
>   - *Tutorial D, Datalog, Quell, let’s open this puppy up!*
>   - *SQL is a terrible, no good, very bad language*
>
> Postgres used to suport QUEL way back in the day.  Being blunt, there is
zero chance of this happening in core.  A query language translator out of
core, preprocessing the language into SQL, might work.   SQL has evolved
far beyond what those languages could practically offer.   Main benefit
here would be to better support relational division, which bizarrely has
yet to arrive in SQL.
 *A portable, low-level API*

>
>- *An alternative to SQLite that provides CRUD operations on a
>   Postgres database.*
>
> This has been frequently discussed in the archives. Short version, won't
happen; it would put too many constraints on the architecture.

merlin

>


Re: Can Postgres beat Oracle for regexp_count?

2022-02-03 Thread Merlin Moncure
On Wed, Feb 2, 2022 at 4:26 PM Tom Lane  wrote:
>
> "David G. Johnston"  writes:
> > Given we don't have a regexp_count function this isn't surprising...
>
> FYI, it's there in HEAD.
>
> In the meantime, you could possibly do something like
>
> =# select count(*) from regexp_matches('My High Street', '([A-Z][a-z]+[\s])', 
> 'g');
>  count
> ---
>  2
> (1 row)

alternate version:
postgres=# select array_upper(regexp_split_to_array('My High Street My
High Street', 'My High Street'), 1) - 1;
 ?column?
──
2

can help to slide this into complex queries a little bit easier by
avoiding the aggregation :-).

merlin




Re: pg_try_advisory_lock is waiting?

2022-01-31 Thread Merlin Moncure
On Fri, Jan 28, 2022 at 6:34 PM Mladen Gogala  wrote:
>
> On 1/28/22 19:08, Tom Lane wrote:
>
> I doubt it.  I think the FOR UPDATE in the sub-select is blocked
> because the other session has an uncommitted update on the row
> it wants to lock.  This command won't reach the pg_try_advisory_lock
> call until that row lock comes free.
>
> Yes, I figured it out, but pg_try_advisory_lock returned TRUE even without 
> "FOR UPDATE" clause in the subquery. Shouldn't it return false because it 
> can't lock the row until the uncommitted update finishes?

advisory locks and row locks are completely distinct and separate.
It's also not a good idea to make any assumptions on order of
operations as to which lock is acquired first using subqueries in that
fashion.

merlin




Re: Counting the number of repeated phrases in a column

2022-01-27 Thread Merlin Moncure
On Thu, Jan 27, 2022 at 11:56 AM  wrote:
> Le 27/01/2022 à 18:35, Merlin Moncure a écrit :
> > select distinct array_to_string(v[a:b], ' ') phrase, count(*) as occurrences
> > from
> > (
> >select array_agg(t) v
> >from
> >(
> >  select trim(replace(unnest(v), E'\n', '')) t
> >  from regexp_split_to_array(, ' ') v
> >) q
> >where length(t) > 1
> > ) q
> > cross join lateral generate_series(1, array_upper(v, 1)) a
> > cross join lateral generate_series(a + 1, array_upper(v, 1)) b
> > group by 1
> > having count(*) > 1;
> >
> > We are definitely in N^2 space here, so look for things to start
> > breaking down for sentences > 1000 words.
> >
> > merlin
> >
>
> (for better complexity) you may search about "Ukkonen suffix tree"
> Similar problem as yours :
> https://www.geeksforgeeks.org/suffix-tree-application-3-longest-repeated-substring/?ref=lbp

Yep.  Many problems like this are well solved in imperative languages
and will fit poorly into SQL quase-functional space.  That
implementation could probably be converted to pl/pgsql pretty easily,
or a 'sql + tables' variant as a fun challenge.  It also slightly
exploits the fact that only the most repeated needle is returned,
rather than all of them.

Having the need to have single statement stateless SQL solutions to
interesting problems comes up all the time in common development
practice though for simplicity's sake even if there are better
approaches out there.  It's also fun.

merlin




Re: Counting the number of repeated phrases in a column

2022-01-27 Thread Merlin Moncure
On Thu, Jan 27, 2022 at 11:09 AM Rob Sargent  wrote:
>
> On 1/27/22 10:03, Merlin Moncure wrote:
>
> On Wed, Jan 26, 2022 at 5:23 PM Merlin Moncure  wrote:
>
> with s as (select 'Hello World Hello World' as sentence)
> select
>   phrase,
>   array_upper(string_to_array((select sentence from s), phrase), 1) -
> 1 as occurrances
> from
> (
>   select array_to_string(x, ' ') as phrase
>   from
>   (
> select distinct v[a:b]  x
> from regexp_split_to_array((select sentence from s), ' ') v
> cross join lateral generate_series(1, array_upper(v, 1)) a
> cross join lateral generate_series(a + 1, array_upper(v, 1)) b
>   ) q
> ) q;
>
> Simplified to:
> select distinct array_to_string(v[a:b], ' ') phrase, count(*) as occurrences
> from regexp_split_to_array('Hello World Hello World', ' ') v
> cross join lateral generate_series(1, array_upper(v, 1)) a
> cross join lateral generate_series(a + 1, array_upper(v, 1)) b
> group by 1;
>
>  phrase  │ occurances
> ─┼
>  World Hello │  1
>  Hello World Hello   │  1
>  Hello World │  2
>  Hello World Hello World │  1
>  World Hello World   │  1
>
> merlin
>
>
> And since we're looking for repeated phrases maybe add
>
> having count(*) > 1

thanks.  also, testing on actual data, I noticed that a couple other
things are mandatory, mainly doing a bit of massaging before
tokenizing:

select distinct array_to_string(v[a:b], ' ') phrase, count(*) as occurrences
from
(
  select array_agg(t) v
  from
  (
select trim(replace(unnest(v), E'\n', '')) t
from regexp_split_to_array(, ' ') v
  ) q
  where length(t) > 1
) q
cross join lateral generate_series(1, array_upper(v, 1)) a
cross join lateral generate_series(a + 1, array_upper(v, 1)) b
group by 1
having count(*) > 1;

We are definitely in N^2 space here, so look for things to start
breaking down for sentences > 1000 words.

merlin




Re: Counting the number of repeated phrases in a column

2022-01-27 Thread Merlin Moncure
On Wed, Jan 26, 2022 at 5:23 PM Merlin Moncure  wrote:
>
> with s as (select 'Hello World Hello World' as sentence)
> select
>   phrase,
>   array_upper(string_to_array((select sentence from s), phrase), 1) -
> 1 as occurrances
> from
> (
>   select array_to_string(x, ' ') as phrase
>   from
>   (
> select distinct v[a:b]  x
> from regexp_split_to_array((select sentence from s), ' ') v
> cross join lateral generate_series(1, array_upper(v, 1)) a
> cross join lateral generate_series(a + 1, array_upper(v, 1)) b
>   ) q
> ) q;

Simplified to:
select distinct array_to_string(v[a:b], ' ') phrase, count(*) as occurrences
from regexp_split_to_array('Hello World Hello World', ' ') v
cross join lateral generate_series(1, array_upper(v, 1)) a
cross join lateral generate_series(a + 1, array_upper(v, 1)) b
group by 1;

 phrase  │ occurances
─┼
 World Hello │  1
 Hello World Hello   │  1
 Hello World │  2
 Hello World Hello World │  1
 World Hello World   │  1

merlin




Re: Counting the number of repeated phrases in a column

2022-01-26 Thread Merlin Moncure
On Tue, Jan 25, 2022 at 11:10 AM Shaozhong SHI  wrote:
>
> There is a short of a function in the standard Postgres to do the following:
>
> It is easy to count the number of occurrence of words, but it is rather 
> difficult to count the number of occurrence of phrases.
>
> For instance:
>
> A cell of value:  'Hello World' means 1 occurrence a phrase.
>
> A cell of value: 'Hello World World Hello' means no occurrence of any 
> repeated phrase.
>
> But, A cell of value: 'Hello World World Hello Hello World' means 2 
> occurrences of 'Hello World'.
>
> 'The City of London, London' also has no occurrences of any repeated phrase.
>
> Anyone has got such a function to check out the number of occurrence of any 
> repeated phrases?

Let's define phase as a sequence of two or more words, delimited by
space.  you could find it with something like:

with s as (select 'Hello World Hello World' as sentence)
select
  phrase,
  array_upper(string_to_array((select sentence from s), phrase), 1) -
1 as occurrances
from
(
  select array_to_string(x, ' ') as phrase
  from
  (
select distinct v[a:b]  x
from regexp_split_to_array((select sentence from s), ' ') v
cross join lateral generate_series(1, array_upper(v, 1)) a
cross join lateral generate_series(a + 1, array_upper(v, 1)) b
  ) q
) q;

this would be slow for large sentences obviously, and you'd probably
want to prepare the string stripping some characters and such.

merlin




Re: Can we use sql language to create a function in Postgres?

2022-01-05 Thread Merlin Moncure
On Wed, Jan 5, 2022 at 5:27 AM Shaozhong SHI  wrote:
>
> Any online documentation or examples for using Oracle SQL in Postgres?

FYI there are commercial offerings (in particular, EDB, which I am not
affiliated with) that minimize compatibility concerns with oracle.  If
doing this by hand, utilizing the lists (which SPECIFIC questions) and
other internet resources is the way to go. There may be some guides,
but the SQL language and other nuance is moving fast, so try to get
contemporary help if possible.  Conversion to postgres will work, and
what you are doing is a good idea.

merlin




Re: The tragedy of SQL

2021-09-16 Thread Merlin Moncure
On Wed, Sep 15, 2021 at 7:31 PM FWS Neil  wrote:
> On Sep 15, 2021, at 2:44 PM, Merlin Moncure  wrote:
> > I think you ought to recognize that many people on this list make
> > money directly from managing that complexity :-).
>
> I did not intend to disparage anyone.  People, including myself, make money 
> when they provide value and there is certainly value here.
>
> But, I am not sure I understand your inference.  Are you saying (and I am not 
> implying you are) that PostgreSQL does not progress in line with the original 
> SQL goals of simplifying data access because people are making money off of 
> the current complexity?

Not at all.  I'm saying that this mailing list is stuffed with people
that work on and with the database, and SQL is in many cases
(including mine) a passion.   Postgres is a technical marvel so that
we make money by utilizing it effectively, and rendering, er,
constructive feedback to its stewards so that it may improve and we
may all become more efficient.

Points made upthread suggesting SQL is bad due to being based on sets
were ironic IMO.  SQL is successful and enduring precisely because you
can direct operations by describing data relationships rather than
implement specific operations by hand.  Fluency in that technique
leads to solution achievement that other technologies cannot approach.
It rewards artistry and creativity, which I guess rubs some people the
wrong way since it does not align to factory development strategies.

No offense taken; I enjoy the debate; this thread is an indulgence,
being a bit of time off from capricious C suite executives, agile
methodology zealots, etc. So fire away.

merlin




Re: The tragedy of SQL

2021-09-15 Thread Merlin Moncure
On Tue, Sep 14, 2021 at 3:16 PM FWS Neil  wrote:
>
> > On Sep 14, 2021, at 11:10 AM, Michael Nolan  wrote:
> >
> > I started programming in 1967, and over the last 50+ years I've programmed 
> > in more languages than I would want to list.  I spent a decade writing in 
> > FORTRAN on a GA 18/30 (essentially a clone of the IBM 1130) with limited 
> > memory space, so you had to write EFFICIENT code, something that is a bit 
> > of a lost art these days.  I also spent a decade writing in COBOL.
> >
> > I've not found many tasks that I couldn't find a way to write in whatever 
> > language I had available to write it in.  There may be bad (or at least 
> > inefficient) languages, but there are lots of bad programmers.
>
> Yep, me too.  I would say that SQL has not achieved its design goals yet.  
> The original concept was to write what you want to achieve and have the 
> server figure out the best way to get at it.
>
> What people hate about SQL is that the programmer has to optimize SQL to get 
> acceptable performance.  And the optimization is different for every 
> implementation.  I think SQL has not hit its stride yet.  When the common 
> $1000 server has 1024+ CPUs and 1+TB memory, and SQL implementations have 
> adopted good multithreading architecture with access to 1024+ CPU dedicated 
> AI engines, etc. a lot of the crap associated with performant SQL will go 
> away.
>
> At this point, I think it will be smart to strip out implementation details 
> that have made it into the SQL syntax.  There will no longer be a need for 
> it. This will make the SQL language simpler and easier to use, understand, 
> and reason about.

I think you ought to recognize that many people on this list make
money directly from managing that complexity :-).

Processing time remains bounded, and for complex queries how much time
is spend executing vs planning is a very difficult tradeoff.  Just
like in math, there are many ways to describe the same thing, and it
is for the clever to pick the best ones that run efficiently and well.
And thank goodness for that; it sends my kids to college.

Relating to your other point relating to implementations, it's not
different than having different browsers that render html and execute
javascript.  Having several high quality implementations to choose
from is healthy and good, and allows choice best on cost and feature
needs.  I would personally be much more inclined to gripe about
implementations that do not support standard syntax or have other
major standards issues (SQL server, I'm looking at you).

merlin




Re: The tragedy of SQL

2021-09-14 Thread Merlin Moncure
On Tue, Sep 14, 2021 at 9:01 AM Rob Sargent  wrote:
> > ORMs a function of poor development culture and vendor advocacy, not
> > the fault of SQL. If developers don't understand or are unwilling to
> > use joins in language A, they won't in language B either.
>
> Back in the day, within IBM there were two separate relational databases.  
> System-R (which came from San Hose) and PRTV (the Peterlee Relational Test 
> vehicle).  As I understand it SQL came from System-R and the optimizer 
> (amongst other things) came from PRTV.
>
> PRTV 
> (https://en.wikipedia.org/wiki/IBM_Peterlee_Relational_Test_Vehicle_(PRTV)) 
> did not use SQL, and was never a released product, except with a graphical 
> add-on which was sold to two UK local authorities for urban planning.
>
> So there are (and always have been) different ways to send requests to a 
> relational DB, it is just that SQL won the day.
>
> Ah, lets not forget Mr Lane's favourite: quel

Sure, I quite like, er, liked quel also, being more mathematical and
formal.  It's a shame it didn't make the cut. This is however a
telling example that standardization trumps purity once languages hit
a certain spot.  There are many languages with dumb things that will
never get fixed :-).  As they say, 'the devil you know'.

QUEL also uses idiomatic english for most operations, which I guess is
probably a contributing factor for developer resistance to SQL, since
native speakers are a minority of the earth's population. Oh well.


merlin




Re: The tragedy of SQL

2021-09-14 Thread Merlin Moncure
On Tue, Sep 14, 2021 at 12:32 AM Guyren Howe  wrote:
> If I had $5 million to invest in a startup, I would hire as many of the core 
> Postgres devs as I could to make a new database with all the sophistication 
> of Postgres but based on Datalog (or something similar). (Or maybe add 
> Datalog to Postgres). If that could get traction, it would lead in a decade 
> to a revolution in productivity in our industry.

I've long thought that there is more algebraic type syntax sitting
underneath SQL yearning to get out.  If you wanted to try something
like that today, a language pre-compiler or translator which converted
the code to SQL is likely the only realistic approach if you wanted to
get traction.  History is not very kind to these approaches though and
SQL is evolving and has huge investments behind it...much more than 5
million bucks.

ORMs a function of poor development culture and vendor advocacy, not
the fault of SQL. If developers don't understand or are unwilling to
use joins in language A, they won't in language B either.

merlin




Re: Doubt on pgbouncer

2021-07-07 Thread Merlin Moncure
On Sat, Jul 3, 2021 at 10:36 AM Rama Krishnan  wrote:
>
>
> Hi Team,
>
> How can I split read and write queries using pgbouncer

Check out pgbouncer-rr.  I haven't used it, but it may do what you need.

merlin




Re: Primary keys and composite unique keys(basic question)

2021-04-05 Thread Merlin Moncure
On Mon, Apr 5, 2021 at 9:37 PM Rob Sargent  wrote:
>
> It's a small thing, but UUIDs are absolutely not memorizable by
> humans; they have zero semantic value.  Sequential numeric identifiers
> are generally easier to transpose and the value gives some clues to
> its age (of course, in security contexts this can be a downside).
>
> I take the above as a definite plus.  Spent too much of my life correcting 
> others’ use of “remembered” id’s that just happened to perfectly match the 
> wrong thing.
>
> Performance-wise, UUIDS are absolutely horrible for data at scale as
> Tom rightly points out.  Everything is randomized, just awful.  There
> are some alternate implementations of UUID that mitigate this but I've
> never seen them used in the wild in actual code.
>
>
> That b-tree’s have been optimized to handle serial ints might be a considered 
> a reaction to that popular (and distasteful) choice.  Perhaps there should be 
> a ’non-optimized’ option.

It's not just the BTree, but the heap as well.   For large tables, you
are pretty much guaranteed to read a page for each record you want to
load via the key regardless of the pattern of access.  It's incredibly
wasteful regardless of the speed of the underlying storage fabric.
Very few developers actually understand this.

If computers were infinitely fast this wouldn't matter, but they aren't :-).

merlin




Re: Primary keys and composite unique keys(basic question)

2021-04-05 Thread Merlin Moncure
On Fri, Apr 2, 2021 at 3:40 AM Laurenz Albe  wrote:
>
> On Thu, 2021-04-01 at 21:28 -0500, Merlin Moncure wrote:
> > I would never use UUIDS for keys though.
>
> That makes me curious for your reasons.
>
> I see the following disadvantages:
>
> - A UUID requires twice as much storage space as a bigint.
>
> - B-tree indexes are space optimized for inserting at the
>   rightmost leaf page, but UUIDs are random.
>
> - UUIDs are more expensive to generate.
>
> On the other hand, many processes trying to insert into
> the same index page might lead to contention.
>
> Is there anything I have missed?

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value.  Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

Performance-wise, UUIDS are absolutely horrible for data at scale as
Tom rightly points out.  Everything is randomized, just awful.  There
are some alternate implementations of UUID that mitigate this but I've
never seen them used in the wild in actual code.

merlin




Re: Primary keys and composite unique keys(basic question)

2021-04-05 Thread Merlin Moncure
On Thu, Apr 1, 2021 at 10:26 PM Rob Sargent  wrote:
>
> On 4/1/21 8:28 PM, Merlin Moncure wrote:
> >
> > This is one of the great debates in computer science and it is not
> > settled.  There are various tradeoffs around using a composite key
> > derived from the data (aka natural key) vs generated identifiers. It's
> > a complex topic with many facets: performance, organization,
> > validation, and correctness are all relevant considerations.  I would
> > never use UUIDS for keys though.
> >
> > merlin
> >
> >
> And, pray tell, for what exactly would you use universally unique
> identifiers.

I don't disagree that UUID are an ok choice in that scenario.  I'll
tell you what though, that scenario comes up fairly rarely.  However,
there are a couple of alternatives if you're curious.

*) Generate ids from a generator service.  This pattern is fairly
common. It has some downsides (slower, more complicated inserts
mainly) but works well in other ways.  You can mitigate the
performance downsides by allocated identifiers in blocks.

*) Use sequences, but with  a sequence id  added as a composite or
maksed into the integer. This works pretty well in practice.

merlin




Re: Primary keys and composite unique keys(basic question)

2021-04-01 Thread Merlin Moncure
On Wed, Mar 31, 2021 at 3:36 AM Mohan Radhakrishnan
 wrote:
>
> Hello,
>  We have UUIDs in our tables which are primary keys. But in some 
> cases
> we also identify a composite unique key apart from the primary key.
>
> My assumption is that there should be a unique key index created by us using 
> the composite key. And when we fetch using this composite key instead of the 
> primary key we have a performance boost due to the index.
>
> Is this right ? Are there more details I should read to understand this 
> better ? Please point. If I should use the query planner to look at the 
> statistics I will.

This is one of the great debates in computer science and it is not
settled.  There are various tradeoffs around using a composite key
derived from the data (aka natural key) vs generated identifiers. It's
a complex topic with many facets: performance, organization,
validation, and correctness are all relevant considerations.  I would
never use UUIDS for keys though.

merlin




Re: Binary encoding of timetz type

2021-03-22 Thread Merlin Moncure
On Tue, Mar 16, 2021 at 1:31 PM Ruslan  wrote:
>
> Hi folks,
>
> Apologies if it's the wrong place to ask. I have a question for postgres 
> developers.
>
> I wrote parser for COPY binary encoding. Everything is great but one thing 
> worries me. It seems like the time offset field has reversed digit sign. I'm 
> referring to this function
>
> https://github.com/postgres/postgres/blob/REL_13_STABLE/src/backend/utils/adt/date.c#L2150-L2159
>
> COPY response for value '13:37:00-0300'::timetz gives me binary 
> 0x000b69d1d7002a30. And value '13:37:00+0300'::timetz is encoded as 
> 0x000b69d1d700d5d0.
>
> 0xd5d0 is -10800 for signed integer and 0x2a30 is 10800.
>
> I reverse the sign in my code but feeling that I'm missing something is 
> nagging me. Please advise.

Check out libpqtypes -- it has client side send/receive functions for
binary format you can crib from.

https://github.com/pgagarinov/libpqtypes/blob/master/source/src/datetime.c

merlin




Re: Postgres Analog of Oracle APPEND hint

2021-03-05 Thread Merlin Moncure
On Thu, Feb 25, 2021 at 10:26 AM Rumpi Gravenstein  wrote:
>
> Unfortunately, I am not looking to load from an external source.  My process 
> is moving data from source PostgreSQL tables to target PostgreSQL tables.


INSERT INTO ...SELECT ... is one of the fastest ways possible to move
data around.

merlin




Re: New "function tables" in V13 documentation

2020-11-09 Thread Merlin Moncure
On Sun, Nov 8, 2020 at 3:57 PM Thomas Kellerer  wrote:
>
> In case someone is interested: there is a little discussion going on on 
> Reddit whether the new format of presenting functions in V13 is a step 
> backwards:
>
>
> https://www.reddit.com/r/PostgreSQL/comments/jpi0rp/does_anyone_else_feel_like_the_v13_docs_are_a/

It's more than a little ironic that reddit's "old" format (still
visible via old.reddit.com) is objectively clearer and better along
exactly the same lines.

merlin




Re: DB running out of memory issues after upgrade

2020-02-18 Thread Merlin Moncure
On Tue, Feb 18, 2020 at 12:10 PM Nagaraj Raj  wrote:
>
> Below are the same configurations ins .conf file before and after updagrade
>
> show max_connections; = 1743
> show shared_buffers = "4057840kB"
> show effective_cache_size =  "8115688kB"
> show maintenance_work_mem = "259MB"
> show checkpoint_completion_target = "0.9"
> show wal_buffers = "16MB"
> show default_statistics_target = "100"
> show random_page_cost = "1.1"
> show effective_io_concurrency =" 200"
> show work_mem = "4MB"
> show min_wal_size = "256MB"
> show max_wal_size = "2GB"
> show max_worker_processes = "8"
> show max_parallel_workers_per_gather = "2"

This smells like oom killer for sure.  how did you resolve some of
these values.  In particular max_connections and effective_cache_size.
  How much memory is in this server?

merlin




Re: WaitForMultipleObjects in C Extension

2020-01-14 Thread Merlin Moncure
On Tue, Jan 14, 2020 at 1:39 AM İlyas Derse  wrote:
>
>  Hi guys I need your experiences. I made an Extension in C. When I stop the 
> query on PostgreSQL,I want to control my extension's result with use cases in 
> C.So I think , I should use WaitForMultipleObjects. Is there a like a 
> SignalObjectHandle() ? By the way I'm using Windows.

Are you using threading in your extension?  This is something to avoid
except in very specific cases, particularly using native calls.

merlin




Re: How to prevent POSTGRES killing linux system from accepting too much inserts?

2019-12-18 Thread Merlin Moncure
On Wed, Dec 18, 2019 at 3:53 AM James(王旭)  wrote:
>
> Hello,
>>
>> I encountered into this kernel message, and I cannot login into the Linux 
>> system anymore:
>>
>>
>>
>>> Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1
>>>
>>> Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than 
>>> 120 seconds.
>>>
>>> Dec 17 23:02:30 hq-pg kernel: "echo 0 > 
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>
>>> Dec 17 23:02:30 hq-pg kernel: syncD 965ebabd1040 0  
>>> 6573   6572 0x0080
>>>
>>> Dec 17 23:02:30 hq-pg kernel: Call Trace:
>>>
>>> Dec 17 23:02:30 hq-pg kernel: [] ? 
>>> generic_write_sync+0x70/0x70
>>
>>
>> After some google I guess it's the problem that IO speed is low, while the 
>> insert requests are coming too much quickly.So PG put these into cache first 
>> then kernel called sync.
>>
>> I know I can queue the requests, so that POSTGRES will not accept these 
>> requests which will result in an increase in system cache.
>>
>> But is there any way I can tell POSTGRES, that you can only handle 2 
>> records per second, or 4M per second, please don't accept inserts more than 
>> that speed.
>>
>> For me, POSTGRES just waiting is much better than current behavior.
>>
>>
>> Any help will be much appreciated.

This is more a problem with the o/s than with postgres itself.

synchronous_commit is one influential parameter that can possibly help
mitigate the issue with some safety tradeoffs (read the docs).   For
linux, one possible place to look is tuning dirty_background_ratio and
related parameters.  The idea is you want the o/s to be more
aggressive about syncing to reduce the impact of i/o storm; basically
you are trading off some burst performance for consistency of
performance.  Another place to look is checkpoint behavior.   Do some
searches, there is tons of information about this on the net.

merlin




Re: PostgreSQL && data types in ESQL/C

2019-11-07 Thread Merlin Moncure
On Wed, Nov 6, 2019 at 12:32 AM Matthias Apitz  wrote:
> Hello,
>
> On our project roadmap to port our LMS (Library Management System) from
> Sybase/Oracle to PostgreSQL we are now in the phase of addressing the
> ESQL/C and C++ code parts (some million lines of code).
>
> I wrote a small ESQL/C test code to see how the various data types are
> handled.
>
> In general: Is there any good manual about ESQL/C in PostgreSQL?
> Because, even if there are standards any implementation has its details.
>
> In detail:
>
> I've created a table with the most used data types:
>
> $ cat mytypes.sql
>
> create table mytypes (
>   myint  integer,
>   mychar char (4),
>   mydate date,
>   myvchar varchar(81),
>   myblob bytea
>   ) ;
>
> and have loaded a row with some data which is shown in pgsql as:
>
> $ printf "select * from mytypes;\n" | psql -Usisis -d newsisis
>  myint | mychar |   mydate   |   myvchar|myblob
> ---+++--+--
>  1 | char   | 08.05.1945 | освобождение | 
> \xd0bed181d0b2d0bed0b1d0bed0b6d0b4d0b5d0bdd0b8d0b50a
> (1 Zeile)
>
> in the ESQL/C code the host variables are declared as:
>
> EXEC SQL BEGIN DECLARE SECTION;
> ...
> int  myint;
> char mychar[8];
> char mydate[10+1];
> char myvchar[81];
> char myblob[1024];
> ...
> EXEC SQL END DECLARE SECTION;
>
> and the FETCH into these is done with:
>
> EXEC SQL FETCH IN c_statename INTO :myint, :mychar, :mydate, 
> :myvchar, :myblob;
>
> which gives with an ESQL/C test pgm which prints the above host
> variables:
>
> $ /usr/local/sisis-pap/pgsql/bin/ecpg embedded.pgc
> $ gcc -m64 -o embedded embedded.c -I/usr/local/sisis-pap/pgsql/include 
> -L/usr/local/sisis-pap/pgsql/lib/ -lpq -lecpg
>
> $ ./embedded
> stmt: SELECT myint, mychar, mydate, myvchar, myblob FROM mytypes;
> myint   [1]
> mychar  [char]
> mydate  [08.05.1945]
> myvchar [освобождение]
> myblob  [\xd0bed181d0b2d0bed0b1d0bed0b6d0b4d0b5d0bdd0b8d0b50a]
>
> It seems(!):
>
> - an int appears as binary integer
> - all others types (even the column type 'date') appear as C type char*
> - 'date', 'char' and  'varchar' are delivered as '\0' terminated strings
> - 'bytea' appears as '\0' terminated string coded in hex with "\x" in front
>
> Our DBCALL layer must convert these char strings in the data form the
> application layer is expecting, for example a BLOB ('bytea') into a C struct

https://www.postgresql.org/docs/9.1/ecpg-variables.html#ECPG-VARIABLES-TYPE-MAPPING

It looks like there is a workaround for the null terminated strings
and dates.  In practice through it looks like you may need to be
prepared to tweak either the ecpg library or your application to get
this to work.

merlin




Re: question about array indexing

2019-11-04 Thread Merlin Moncure
On Mon, Nov 4, 2019 at 10:05 AM Szymon Lipiński  wrote:
>
> Hey,
> I'm wondering if this isn't a bug somewhere in the parser. I had to add 
> additional parenthesis around the regexp_split_to_array.
>
>
> $ with x as (select 'a b c' a)
> select
> regexp_split_to_array(a, ' ')
> from x;
>
>  regexp_split_to_array
> ---
>  {a,b,c}
> (1 row)
>
> $ with x as (select 'a b c' a)
> select
> regexp_split_to_array(a, ' ')[1]
> from x;
>
> ERROR:  syntax error at or near "["
> LINE 3: regexp_split_to_array(a, ' ')[1]
>  ^
> $ with x as (select 'a b c' a)
> select
> (regexp_split_to_array(a, ' '))[1]
> from x;
>  regexp_split_to_array
> ---
>  a
> (1 row)

It's always been that way.  You also have to wrap function column expansions:
select (func()).*

I don't know the exact technical reason why that's required, but my
money is on precedence related problems.

merlin




Re: Automatically parsing in-line composite types

2019-10-31 Thread Merlin Moncure
On Wed, Oct 30, 2019 at 5:41 PM Andres Freund  wrote:
>
> Hi,
>
> On 2019-10-29 14:33:00 -0400, Tom Lane wrote:
> > Mitar  writes:
> > > I think RowDescription should be extended to provide full recursive
> > > metadata about all data types. That would be the best way to do it.
> >
> > [ shrug... ]  In a world where stability of the wire protocol were
> > of zero value, maybe we would do that.  In the real world, don't
> > hold your breath.
>
> Hm. Wouldn't it be fairly easy to allow the client to specify how much
> metadata they'd want? I.e. opt-in into getting more complete metadata?
>
> Presumably a lot of clients/applications wouldn't want the server to do
> the extra work / use bandwidth for the full details anyway, so making a
> more expansive RowDescription be explicitly opt-in would be good, even
> if there were zero compatibility concerns.
>
> There's different ways we could do the opt-in. We could use the "_pq_."
> startup option stuff to opt in, we could make it an optional parameter
> to D messages (it'd be mildly hacky because unfortunately
> describe_target is not a counted text), we could use an additional
> describe_type etc...

Suppose you had a set of 1000 records, with an array of composites (10
items per array).  How many times would the extra metadata describing
get sent following this approach?  The binary wire format is already
fatter in terms of bandwidth in typical cases (unless escaping is
involved which can cause exponential growth of the text format).  If
the answer is 10k, I'd be worried about performance.

merlin




Re: Automatically parsing in-line composite types

2019-10-30 Thread Merlin Moncure
On Wed, Oct 30, 2019 at 11:15 AM Mitar  wrote:
>
> Hi!
>
> On Wed, Oct 30, 2019 at 8:37 AM Merlin Moncure  wrote:
> > Check out libpqtypes: https://github.com/pgagarinov/libpqtypes
>
> Interesting. I have looked at the code a bit but I do not find how it
> determines the type for inline compound types, like the ones they
> appear in my original SQL query example. Could you maybe point me to
> the piece of code there handling that? Because to my
> understanding/exploration that information is simply not exposed to
> the client in any way. :-(

It looks it up from the database.  See implementation in
https://github.com/pgagarinov/libpqtypes/blob/master/source/src/handler.c
 (look for macro LOOKUP_TYPES) and usage in
https://github.com/pgagarinov/libpqtypes/blob/master/source/src/regression-test.c.

> > it does exactly what you want. It's a wrapper for libpq that provides
> > client side parsing for the binary protocol with array and composite
> > type parsing.
>
> It looks to me that it does parsing of composite types only if they
> are registered composite types. But not for example ones you get if
> you project a subset of fields from a table in a subquery. That has no
> registered composite type?

Correct. Only declared (via CREATE TYPE) composite types will work due
to protocol limitations.  For custom C programming this is fine, but
limiting if you are interested in writing a driver that can handle any
type of object the database can throw at you; (and, don't forget
custom types at the C level!).

> Also, how you are handling discovery of registered types, do you read
> that on-demand from the database? They are not provided over the wire?

The client application has to declare at connection time which types
it is interested in, then they are looked up in the SQL level.   This
would be fairly typical of C applications, I think; but mostly this
works around the limitations of the binary protocol.

> > Virtually any
> > non-C client application really ought to be using json rather than the
> > custom binary structures libpqtyps would provide.
>
> I thought that initially, too, but then found out that JSON has some
> heavy limitations because the implementation in PostgreSQL is standard
> based. There is also no hook to do custom encoding of non-JSON values.
> So binary blobs are converted in an ugly way (base64 would be better).
> You also loose a lot of meta-information, because everything non-JSON
> gets converted to strings automatically. Like knowing what is a date.
> I think MongoDB with BSON made much more sense here. It looks like
> perfect balance between simplicity of JSON structure and adding few
> more useful data types.
>
> But yes, JSON is great also because clients often have optimized JSON
> readers. Which can beat any other binary serialization format. In
> node.js, it is simply the fastest there is to transfer data:

Sure, JSON has only very, very basic type support.  In a practical
sense this means type safety has to be built above the json layer,
(e.g. {"field: "foo", "type":"shape", "data": "(1,1), (2,2)"}) exactly
as we do with the textual sql protocol.

The postgres hacker community will tend to target major standards as a
matter of culture and prudence..this used to annoy me, now I support
this strongly.  Since the type system is extensible it's theoretically
possible to implement bson support or some other funky type safe
format.  I would personally argue (perhaps with good company against
such a type being incorporated in core, or even in contrib. In fact, I
argued (win some, lose some, heh!) that jsonb should be in contrib,
not core; we should be moving stuff OUT of the regular namespace and
into extensions, not the other way around..

Aside: now that stored procedures are good to go, there is one feature
left that IMNSHO postgres desperately needs,  and that is external
package repository management (on the order of CPAN, npm, etc) so that
3rd party repositories handle grooming, precompiling, packaging,
downloading, and deploying (via SQL) of extensions that interact with
the database at the C level.  Enterprise environments and managed
postgres providers will never allow custom compiled C extensions which
is the kiss of death; even if I wanted to use those extensions, I
can't.  So if you decided to scratch in itch and create a postgres
BSON type, no one would likely use it, since the chances of adoption
in core are slim to none.

I had seen your article, and liked it.  During the development of what
was to become the jsonb type, I had argued quite strenuously not to
have it completely displace the old json type variant on performance
and other grounds.

merlin




Re: Automatically parsing in-line composite types

2019-10-30 Thread Merlin Moncure
On Tue, Oct 29, 2019 at 12:58 PM Mitar  wrote:
>
> Hi!
>
> On Tue, Oct 29, 2019 at 9:06 AM Fabio Ugo Venchiarutti
>  wrote:
> > You can use subqueries and array_agg() to deepen your output tree all
> > the way to a stack overflow, a single _to_json() call at the
> > top will recursively traverse and convert whatever you feed it.
>
> Yes, what you are describing is exactly the sad state of things: the
> only way to meaningfully retrieve inline composite types which are
> made when one aggregate things like that, or when you subselect a set
> of fields from a table in a sub-query, is that you then convert the
> whole thing to JSON and transmit it in that way. Because this is the
> only way you can parse things on the client. Because if you leave it
> as raw composite type encoding, you cannot really parse that on the
> client correctly in all cases without knowing what types are stored
> inside those composite types you are getting.
>
> But JSON is not a lossless transport format: it does not support full
> floating point spec (no inf, NANs) and for many types of fields it
> just converts to string representation of that, which can be
> problematic. For example, if you have binary blobs.
>
> So no, JSON is a workaround, but it is sad that we should have to use
> it. PostgreSQL seems to be almost there with the support for composite
> types and nested query results, only it seems you cannot really parse
> it out. I mean, why PostgreSQL even has its own binary format for
> results, then it could just transmit everything as JSON. :-) But that
> does not really work for many data types.
>
> I think RowDescription should be extended to provide full recursive
> metadata about all data types. That would be the best way to do it.

Check out libpqtypes: https://github.com/pgagarinov/libpqtypes

it does exactly what you want. It's a wrapper for libpq that provides
client side parsing for the binary protocol with array and composite
type parsing.  This library was written before the json train left the
station; it's only use today (assuming you don't like the cute
'PQexedf') would be scenarios where performance requirements are
extreme and the client application is written in C (so you can benefit
from direct binary copying vs string serialization).  Virtually any
non-C client application really ought to be using json rather than the
custom binary structures libpqtyps would provide.  I cowrite the
library, it works wondefully, but I've since moved on to json.

JSON also undercuts the argument that the database ought to provide
deep type facilities on both sides of the protocol IMO; I used to be
interested in that, now I'm not; the world is moving on.

merlin




Re: perf tuning for 28 cores and 252GB RAM

2019-06-18 Thread Merlin Moncure
On Mon, Jun 17, 2019 at 6:46 PM Jeff Janes  wrote:
>
> On Mon, Jun 17, 2019 at 4:51 PM Michael Curry  wrote:
>>
>> I am using a Postgres instance in an HPC cluster, where they have generously 
>> given me an entire node. This means I have 28 cores and 252GB RAM. I have to 
>> assume that the very conservative default settings for things like buffers 
>> and max working memory are too small here.
>>
>> We have about 20 billion rows in a single large table.
>
>
> What is that in bytes?  Do you only have that one table?
>
>>
>> The database is not intended to run an application but rather to allow a few 
>> individuals to do data analysis, so we can guarantee the number of 
>> concurrent queries will be small, and that nothing else will need to use the 
>> server. Creating multiple different indices on a few subsets of the columns 
>> will be needed to support the kinds of queries we want.
>>
>> What settings should be changed to maximize performance?
>
>
> With 28 cores for only a few users, parallelization will probably be 
> important.  That feature is fairly new to PostgreSQL and rapidly improving 
> from version to version, so you will want to use the last version you can 
> (v11).  And then increase the values for max_worker_processes, 
> max_parallel_maintenance_workers, max_parallel_workers_per_gather, and 
> max_parallel_workers.  With the potential for so many parallel workers 
> running at once, you wouldn't want to go overboard on work_mem, maybe 2GB.  
> If you don't think all allowed users will be running large queries at the 
> same time (because they are mostly thinking what query to run, or thinking 
> about the results of the last one they ran, rather than actually running 
> queries), then maybe higher than that.
>
> If your entire database can comfortably fit in RAM, I would make 
> shared_buffers large enough to hold the entire database.  If not, I would set 
> the value small (say, 8GB) and let the OS do the heavy lifting of deciding 
> what to keep in cache.  If you go with the first option, you probably want to 
> use pg_prewarm after each restart to get the data into cache as fast as you 
> can, rather than let it get loaded in naturally as you run queries;  Also, 
> you would probably want to set random_page_cost and seq_page_cost quite low, 
> like maybe 0.1 and 0.05.
>
> You haven't described what kind of IO capacity and setup you have, knowing 
> that could suggest other changes to make.  Also, seeing the results of 
> `explain (analyze, buffers)`, especially with track_io_timing turned on, for 
> some actual queries could provide good insight for what else might need 
> changing.

This is all fantastic advice.  If all the data fits in memory (or at
least, all the data that is typically read from) and the cache is warm
then your database becomes an in memory database with respect to read
operations and all the i/o concerns and buffer management overhead go
away.

If your database does not fit in memory and your storage is fast, one
influential setting besides the above to look at besides the above is
effective_io_concurrency; it gets you faster (in some cases much
faster) bitmap heap scans. Also make sure to set effective_cache_size
high reflecting the large amount of memory you have; this will
influence query plan choice.

merlin




Re: Where **not** to use PostgreSQL?

2019-03-01 Thread Merlin Moncure
On Thu, Feb 28, 2019 at 6:24 AM Chris Travers  wrote:
>
> On Thu, Feb 28, 2019 at 1:09 PM Pavel Stehule  wrote:
>>
>> Hi
>>
>> čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler 
>>  napsal:
>>>
>>> Hi experts,
>>>
>>> where would you suggest someone to **not** use PostgreSQL?
>
>
> Hard question.  There are a lot of general places where PostgreSQL is not by 
> itself the best fit, and where a careful weighing of pros and cons would need 
> to be made before deciding to use it.
>
> Having used PostgreSQL in place of ElasticSearch for PB-scale deployments, as 
> a high-throughput queue system, and near-real-time OLAP workloads, I am 
> fairly aware of how hard it can be pushed.
>
> So the answers here are not "don't use PostgreSQL here" but "think about it 
> first and consider alternatives."
>
>
>>>
>>>
>>> Why would you do this?
>
>
> Replacing with "What would you consider to be the tradeoffs?"
>>>
>>>
>>> What alternative would you suggest instead?
>
>
> So a few possibilities:
>
> 1.  a) TB-scale full text search systems.
>  b) PostgreSQL's full text search is quite capable but not so powerful 
> that it can completely replace Lucene-based systems.  So you have to consider 
> complexity vs functionality if you are tying with other data that is already 
> in PostgreSQL.  Note further that my experience with at least ElasticSearch 
> is that it is easier to scale something built on multiple PostgreSQL 
> instances into the PB range than it is to scale ElasticSearch into the PB 
> range.
>  c) Solr or ElasticSearch

In my company we had to swap out solr for postgres.  The main
challenge was that solr's limited query language was not able to deal
with complex authorization use cases that were coming in after the
original project was deployed.  Our only solution was to  heavily
denormalize the documents so that when simple changes happened on the
OLTP side we had to push large amounts of data into SOLR.  In addition
to being slow, solr (or I guess the underlying lucene) started to
develop weird failure modes; there were unpredictable swings in memory
and/or disk usage, underlying system resource exhaustion (especially
fd).  The whole thing felt unstable; we had tested heavily with the
original requirements but the tech did not evolve with the project.

The solution was to junk the whole thing and replace it with an API
compatible version of solr in the database.  To index the document we
use a special search string with upper case keys and lower case values
in a tab delimited text string; pg_trgm/gin does the rest of the
lifting.   It can't compete with solr on best case behavior but give
much better worst case behavior, and, since we don't have to
denormalize, the system fits within memory making scaling a snap.

The moral of the story here is 'Just use postgres'.  This is not
zealotry; if I were a microsoft inclined person, I might be advising
use of sql server.  If you are not totally and completely aware of the
limits of the system you are probably operating within them.  The
database is evolving rapidly and insanely powerful servers,
supercomputers even, from the perspective of even 10 years ago, can be
had for a mouse click on the cheap.

There of course a few use cases were postgres is not optimal tech;
highly unstructured data...super high transaction rate master master
loss tolerant data archiving, warm log storage, etc.  These problems
show up quite rarely in the world of data which is generally directed
towards systems support of business applications.  If you haven't
mastered the database first, you probably shouldn't be building out
complex systems in non-database technology since you don't know what
you don't know (for example, the immense value that transactions bring
to the table).

merlin



Re: libpq PQexecParams & value placeholders

2018-12-20 Thread Merlin Moncure
On Fri, Dec 14, 2018 at 6:09 AM Oleg  wrote:
>
> On Fri, Dec 14, 2018 at 02:47:12PM +0300, Dmitry Igrishin wrote:
> > пт, 14 дек. 2018 г. в 14:33, Oleg :
> > >
> > > Hi, all.
> > >
> > > Do we really need a numeric value placeholders like $1 in command string?
> > It's a syntax defined at the backend side.
> > (https://www.postgresql.org/docs/current/sql-prepare.html)
>
> I know this, but i can't understand why this approach is used.
>
> > > Construction of such string for complex non-static queries is very 
> > > annoying.
> > That's why there are libraries like Pgfe
> > (https://github.com/dmitigr/pgfe) or libpqtypes
>
> This is C++ :-(.
>
> > (http://libpqtypes.esilo.com/)
>
> This is great! PQexecf() is what i need. Why this api is not the part of 
> libpq?
> Thank you for the link!

FYI, I need to move the project to github and I have a few bugfixes
that have not been sent up yet.  Please let me know if you have any
issues.

The postgres core team very graciously extended the libpq API in order
to make libpqtypes work seamlessly with libpq, so they deserve some
credit here.  Direct C programming against the database is a somewhat
niche developer segment and when incorporating new libraries there you
have to carefully manage usefulness vs support down the line.

Having said that I still believe it remains to be the best way to
write against the libpq API, or C database development generally.
There's nothing else like it FWICT.

I personally lost some interest in the project when the json train
really started to leave the station as it changed the way I developed
against the database.

merlin



Re: plpgsql and intarray extension; int[] - int[] operator does not exist ?

2018-11-19 Thread Merlin Moncure
On Mon, Nov 19, 2018 at 4:36 PM Tom Lane  wrote:
>
> Merlin Moncure  writes:
> > On Mon, Nov 19, 2018 at 11:56 AM Tom Lane  wrote:
> >> The search_path in the trigger probably doesn't include public.
> >> You could add a "SET search_path = whatever" clause to the trigger
> >> function definition to ensure it runs with a predictable path.
>
> > Might be worth considering:
> > SET LOCAL search_path = whatever
> > So that LOCAL makes the change local to the transaction.
>
> What I was suggesting was to create the trigger function using
>
> create function ... as $$ ...body... $$ set search_path = whatever;
>
> That makes the setting local to each function call.

Oh, I see.  Yes, that's even better.

merlin



Re: plpgsql and intarray extension; int[] - int[] operator does not exist ?

2018-11-19 Thread Merlin Moncure
On Mon, Nov 19, 2018 at 11:56 AM Tom Lane  wrote:
>
> "Day, David"  writes:
> > Any suggestions as to why the int[] operations are not understood in the 
> > trigger context.?
>
> The search_path in the trigger probably doesn't include public.
> You could add a "SET search_path = whatever" clause to the trigger
> function definition to ensure it runs with a predictable path.

Might be worth considering:
SET LOCAL search_path = whatever

So that LOCAL makes the change local to the transaction.

merlin



Re: Postgres 11 procedures and result sets

2018-10-02 Thread Merlin Moncure
On Mon, Oct 1, 2018 at 6:57 PM Tom Lane  wrote:
> Jan Kohnert  writes:
> > I have a question regarding the new stored procedures in Postgres 11 (I 
> > tested
> > beta4):
> > I'd like to know if it is somehow possible to get a (or possibly more) 
> > result
> > set from selects within the SP, as it is possible in MariaDB, MySQL, or SQL
> > Server.
>
> Not there as of v11, other than the refcursor approach you already know
> about.  We hope to have something nicer worked out for v12.  There
> are a lot of compatibility issues to sort through :-(

There are a few other ways of dealing with this.

If the data being returned isn't very large, you can stuff multiple
'datasets' into a single json.  I do this all the time today, with
functions.  Yet another tactic is to create temp tables (maybe ON
COMMIT DROP) and refer to those tables after calling the procedure.  I
would strongly consider this if the returned data was large and the
function/procedure was not called at a high rate (making system
catalog thrash in issue).  I would probably use these tactics,
especially the json style return, even after multi-result style
invocation were to drop.

merlin



Re: Logical locking beyond pg_advisory

2018-09-17 Thread Merlin Moncure
On Sun, Sep 16, 2018 at 3:53 PM marcelo  wrote:
>
> I need a mechanism of "logical locking" more ductile than the pg_advisory 
> family.
> I'm thinking of a table ("lock_table") that would be part of the database, 
> with columns
> * tablename varchar - name of the table "locked"
> * rowid integer, - id of the row "locked"
> * ownerid varchar, - identifier of the "user" who acquired the lock
> * acquired timestamp - to be able to release "abandoned" locks after a 
> certain time
>
> and a group of functions
> 1) lock_table (tablename varchar, ownerid varchar) bool - get to lock over 
> the entire table, setting rowid to zero
> 2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the table, 
> if the owner is the recorded one
> 3) locked_table (tablename varchar, ownerid varchar) bool - ask if the table 
> is locked by some user other than the ownerid argument
> 4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool - 
> similar to pg_try_advisory_lock
> 5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool - 
> similar to pg_advisory_unlock
> 6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid
>
> The timeout (default, maybe 15 minutes) is implicitly applied if the lock is 
> taken by another user (there will be no notification).
> Redundant locks are not queued, they simply return true, may be after an 
> update of the acquired column.
> Successful locks insert a new row, except the rare case of a timeout, which 
> becomes an update (ownerid and acquired)
> Unlock operations deletes the corresponding row
>
> My question is double
> a) What is the opinion on the project?
> b) What are the consequences of the large number of inserts and deletions
> c) Performance. In fact, pg_advisory* implies a network roundtrip, but (I 
> think) no table operations.

Why can't you use the advisory lock functions?  The challenge with
manually managed locks are they they are slow and you will lose the
coordination the database provides you.  For example, if your
application crashes you will have to clean up all held locks yourself.
Building out that infrastructure will be difficult.

merlin



Re: JSONB filed with default JSON from a file

2018-08-13 Thread Merlin Moncure
On Mon, Aug 13, 2018 at 12:56 PM mrcasa bengaluru  wrote:
>>
>> I assume that you could declare the column as
>>
>> address jsonb not null default 'your json here'::jsonb;
>
>
> Thanks! However, this involves writing the entire JSON in the schema file 
> looks inconvenient. I was hoping I would be able to reference to an external 
> JSON file which could be used for the default value.

1) Stick the default json in a table somewhere, say default_json with
one row, one column
2) Wrap the table with a function, default_json() that returns the
value from the table
3) Make a default function for the table, DEFAULT default_json().
That way you externalize the default into the database

merlin



Re: check_function_bodies not doing much

2018-08-07 Thread Merlin Moncure
On Tue, Aug 7, 2018 at 2:31 PM Tom Lane  wrote:
>
> Marcelo Lacerda  writes:
> > I was trying to get postgres to warn me that I'm referencing a table that
> > it doesn't exists inside a function so I was told on the IRC to check the
> > setting "check_function_bodies", however when I use it in a plpgsql
> > function it doesn't actually check if the tables in the body exist. Is this
> > the correct behavior?
>
> Yes.  It's supposed to be a syntax check, not a check that the function
> would work when executed.  (Depending on the particular PL you're using,
> which you didn't mention, it might be a pretty weak syntax check too.)
>
> An example of why a thorough check would be inadvisable is that a trigger
> function might contain references to OLD and NEW that are in code paths
> protected by checks on the trigger event type.  That could be perfectly
> OK, but a static check couldn't tell.
>
> I believe there are some external tools floating around that check things
> more aggressively, and hence with a higher rate of false positives.

The only valid use of this GUC that I can think of is to work around
this problem;
postgres=# create or replace function f() returns void as
$$
  create temp table x(id int);
  delete from x;
$$ language sql;
ERROR:  relation "x" does not exist

...I've since given up on writing plain sql functions except for
inline cases though so I don't use it anymore.  Static resolution of
tables is not very useful since the state of the database as the time
of function creation is different than what it might be when the
function is run (as opposed to compiled languages obviously).

merlin



Re: Multiple PostgreSQL instances on one machine

2018-06-11 Thread Merlin Moncure
On Fri, Jun 8, 2018 at 3:29 PM Tony Sullivan  wrote:
>
> I am trying to consolidate some machines in my server room particularly in
> the testing environment and I was hoping someone could point me in the
> right direction.
>
> I currently have three machines running PostgreSQL for testing purposes.
> Each week a backup is made of the production database and this is deployed
> onto these three machines. None of these machines is any where near
> capacity. I would like to host all three instances on one machine.
>
> I know that "initdb" can be used to create additional installations, but
> the part I am unsure about is the tablespace. The production database has
> a tablespace defined that resides on its SSD. When I use the production
> backup, I have to create a mount point on the test machine with the same
> name as the one on the production database. I am not certain how I would
> handle this situation if I am serving three separate instances of that
> database from one computer.

Generally speaking, this not an issue.  You can consolidate all three
machines to one cluster, or to multiple clusters running on same box.
The approaches have pros and cons relative to each other.  I would
personally tend to simple create different databases on the same
cluster unless you had a reason not to.  The database would share:

*) database roles (in particular superuser accounts)
*) WAL (possibly presenting a i/o bottleneck)
*) replication for HS/SR (which is all or nothing at the cluster level)
*) shared memory
*) availability -- if one database crashes they will all crash
*) tcp/ip port (this is good -- you don't have to worry about what
port you're connecting to)

merlin



Re: Locks analysis after-the-fact

2018-05-01 Thread Merlin Moncure
On Fri, Apr 27, 2018 at 2:55 AM, Olivier Macchioni
 wrote:
> Hello all,
>
> PostgreSQL version 9.4.17
>
> We have a number of queries running on the same DB from many systems. Among 
> other things, we need to INSERT / UPDATE on a table based on external events 
> - this is done via a pool of 5 SQL connections which process an average of 
> 500 events / second. The table is used by other systems as well...
>
> We have more and more locks on this table, which prevents it to be updated as 
> it should - and I'm trying to see how to improve things without setting up an 
> external queue system just to manage those events.
>
> I've enabled "log_lock_waits = on" in the configuration, which generates the 
> following kind of log entries:
>
> 2018-04-27 07:48:07 CEST [1419.269] "[unknown]" xx@yy LOG:  process 1419 
> still waiting for ExclusiveLock on advisory lock 
> [16417,1653339453,2672182422,2] after 1000.166 ms
> 2018-04-27 07:48:07 CEST [1419.270] "[unknown]" xx@yy DETAIL:  Process 
> holding the lock: 16418. Wait queue: 1419.
> 2018-04-27 07:48:07 CEST [1419.271] "[unknown]" xx@yy STATEMENT:  SELECT 
> pg_advisory_xact_lock(1653339453,  -1622784874);
> 2018-04-27 07:48:07 CEST [1419.272] "[unknown]" xx@yy LOG:  process 1419 
> acquired ExclusiveLock on advisory lock [16417,1653339453,2672182422,2] after 
> 1318.721 ms
> 2018-04-27 07:48:07 CEST [1419.273] "[unknown]" xx@yy STATEMENT:  SELECT 
> pg_advisory_xact_lock(1653339453,  -1622784874);
>
> I can easily identify the session owner of the different processes, but the 
> process holding the lock belong to an ERP which generates a lot of different 
> kind of queries - I'd like to identify precisely which one is causing this 
> long-lasting lock.
>
> It doesn't look like this is possible via the PostgreSQL logging features, 
> and doing the usual lock troubleshooting "on the spot" using pg_locks is not 
> practical due to the short duration and odd timing of those locks.
>
> Does anyone have an idea on how to process in such a case?

Here, the locking query is obvious: "SELECT pg_advisory_xact_lock".
If this is an ERP application I bet advisory locks are being used to
simulate pessimistic locking in some way.   Unfortunately this is no
help as the real operation is elsewhere in this transaction.   Echoing
the advice above you need to enable full statement logging and make
sure your log_line_prefix contains pid in some easily searched for way
such as [%p].  Once you have that, you can trace back in the file
based on the locking pid and get to the bottom of the situation pretty
easily.

Be mindful statement logging, it can consume disk space really quickly
and add a lot of overhead to the database.

merlin



Re: Rationale for aversion to the central database?

2018-04-27 Thread Merlin Moncure
On Sun, Apr 8, 2018 at 4:39 PM, Guyren Howe  wrote:
> I am a Rails developer at a medium-large size company. I’ve mostly worked at
> smaller companies. I’ve some exposure to other web development communities.
>
> When it comes to databases, I have universally encountered the attitude that
> one should treat the database as a dumb data bucket. There is a *very*
> strong aversion to putting much of any business logic in the database. I
> encounter substantial aversion to have multiple applications access one
> database, or even the reverse: all abstraction should be at the application
> layer.
>
> My best theory is that these communities developed at a time when Windows
> was more dominant, and just generally it was *significantly* easier to use
> MySQL than Postgres for many, particularly new, developers. And it is pretty
> reasonable to adopt an aversion to sophisticated use of the database in that
> case.
>
> This attitude has just continued to today, even as many of them have
> switched to Postgres.
>
> This is only a hypothesis. I am now officially researching the issue. I
> would be grateful for any wisdom from this community.
>
>
> Aside: it is rare to find a situation in life or anywhere where one widely
> adopted thing is worse in *every way* than another thing, but this certainly
> was and largely still continues to be the case when one compares MySQL and
> Postgres. So why do folks continue to use MySQL? I find this mystifying.

This is a very common attitude in the industry, and a very unfortunate
one.  It's particularly common the so called 'enterprise' stacks --
java, .net etc.   It's also completely crazy.   SQL isn't _that_
difficult to learn, and in the particular case of postgres, allow for
faster implementation for solutions for many clases of data driven
problems that competing platforms.  SQL is an ideal language for
business logic IMSHO -- particularly the non-procedural forms of it.
There are a number of casual factors for this attitude:

*) outsourcing companies push standard stacks
*) major software vendors (ms/ibm/etc) push 'lock-in' toolsets with
lack of standardization
*) poor understanding of data structure fundamentals is extremely pervasive
*) developer salaries for data developers skew higher (causing HR
departments to look for lower cost skillsets, totally oblivious to
total development costs)
*) 'next big thing in data' companies constantly marketing and pushing
the 'new paradidm' and against proven, low cost solutions.  Hadoop,
big data, etc etc.
*) developer multiculturalism (why do I need to know anything other than java?)
*) perceived slow performance of databases when the problem is really storage
*) poor math training disincline learning of set based logic in school
*) corporate takeover of C.S. education syllabus -- in many overseas
schools the educational programs are written by software vendors and
are essentially tool training, or worse positional education for
outsourcing grind shops

I think things are getting better.  SQL is undergoing a kind of
renaissance for various reasons, and postgresql in particular is
really on a tear.

merlin



Re: Asynchronous Trigger?

2018-04-27 Thread Merlin Moncure
On Thu, Mar 29, 2018 at 5:29 PM, Cory Tucker  wrote:
> Is it possible to have the execution of a trigger (or any function) not
> block the completion of the statement they are associated with?
>
> A pattern I had hoped to implement was to do a quick update of rows that
> signaled they needed attention, and then an async per-row trigger would come
> and do the maintenance (in this case, make an expensive materialized view).
>
> Any suggestions welcome.

Generally the idea is to (in the trigger) invoke some low risk quick
action such as inserting a record in to a 'stuff to do' table.  Then,
some other process comes around and does the heavy lifting.

merlin



Re: is pg_advisory_lock() suitable for long runs

2018-04-03 Thread Merlin Moncure
On Sat, Mar 31, 2018 at 1:49 PM, Radoslav Nedyalkov
 wrote:
> Hi all,
> it's very simple and intuitive case but let me describe first.
> 1. session 1 calls pg_advisory_lock(1234) and succeeds.
> 2. session 2 calls pg_advisory_lock(1234) and stops on waiting.
> All fine BUT pid for session2 appears already with backend_xmin in
> pg_stat_activity
> which means vacuum won't be able to remove rows younger than session2
> backend_xmin.
>
> Well, we planned to use pg_advisory_lock() as a boot phase in a hot-standby
> appserver
> and apparently this will be problematic as the session2 might wait for
> weeks.
>
> Any thoughts ? Do we miss something ?

Holding a transaction open for weeks is generally not a good idea, at
all.  Advisory locks were invented very specifically to allow
application locks to be held without involving long running
transactions.  Holding a session open for weeks might be ok, but any
blocked lockers ought to time out and try another course of action.

merlin



Re: PGSQL 10, many Random named DB

2018-01-25 Thread Merlin Moncure
On Thu, Jan 25, 2018 at 3:38 AM, Durumdara  wrote:
> Dear Members!
>
> Thank you for the suggestions.
> Yes, the sysadmin allowed incoming connections from net.
> We will check your list when we are there.

Don't bother.  We have a confirmed attack, time to take immediate remediation.

1. Disconnect machine from network _immediately_ and move to DMZ
2. Create new server and dump/restore data to the new machine
3. Determine if hacker was able to compromise to the rest of your
network. Warning signs:
  *) installed extensions you didn't install dblink, plsh, plperl, etc
  *) strange errors in database log
  *) starnge errors syslog (is this linux/unix?)
4. Poor network security (hands free ps access to other boxes, etc)

Put short, you have to assume your network is compromised and only
change that assumption when disproven.  These days, if hacker has
shell access you pretty much have reinstall the box from the ground
up.  Root escalation is a thing (there are probably meltdown related
escalations out there) and once the hacker has root it's game over.
In fact, in some cases you have to throw out the hardware too.

merlin



Re: why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Merlin Moncure
On Mon, Jan 15, 2018 at 7:38 AM, Neto pr  wrote:
> Hello all,
> Someone help me analyze the two execution plans below (Explain ANALYZE
> used), is the  query 9 of TPC-H benchmark [1].
> I'm using two servers HP Intel Xeon 2.8GHz/4-core - Memory 8GB. O.S.
> Debian8, using EXT4 filesystem.
>
> Server 1
> - HDD SAS 15 Krpm - 320 GB (Location where O.S. Debian and Postgresql are
> installed).
>
> Server 2
> - Samsung Evo SSD 500 GB (Location where Postgresql is Installed)
> - HDD Sata 7500 Krpm - 1TB (Location where O.S Debian is installed)
>
> My DBMS parameters presents in postgresql.conf is default, but in SSD I have
> changed random_page_cost = 1.0.
>
> I do not understand, because running on an HDD SAS a query used half the
> time. I explain better, in HDD spends on average 12 minutes the query
> execution and on SSD spent 26 minutes.
> I think maybe the execution plan is using more write operations, and so the
> HDD SAS 15Krpm has been faster.
> I checked that the temporary tablespace pg_default is on the SSD in server
> 2, because when running show temp_tablespaces in psql returns empty, will be
> in the default directory, where I installed the DBMS in:
> /media/ssd500gb/opt/pgv101norssd/data.
>
> Anyway, I always thought that an SSD would be equal or faster, but in the
> case and four more cases we have here, it lost a lot for the HDDs.

Generally for reading data, yes, but you changed the query plan also.
To get to the bottom of this let's get SSD performance numbers for
both plans and HDD performance numbers for both plans.  You're trying
to measure device performance about are probably measuring the
relative efficiencies of the generated plans.

merlin



Re: Segmentation fault with core dump

2018-01-10 Thread Merlin Moncure
On Wed, Jan 10, 2018 at 11:08 AM, Tom Lane  wrote:
> Glauco Torres  writes:
>> (gdb) bt
>> #0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
>> pb=pb@entry=0x1be06d2d06644)
>> at bufmgr.c:4137
>> #1  0x00801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
>> b=0x1be06d2d06644 ,
>> c=0x2fc9dfbb1815c , cmp=0x6a4d20
>> )
>> at qsort.c:107
>> #2  0x00801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
>> n=, es=es@entry=20, cmp=cmp@entry=0x6a4d20
>> ) at qsort.c:157
>> #3  0x008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=,
>> n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
>> ) at qsort.c:203
>> #4  0x006a81cf in BufferSync (flags=flags@entry=128) at
>> bufmgr.c:1863
>
> Hm.  I'm not normally one to jump to the conclusion that something is a
> compiler bug, but it's hard to explain this stack trace any other way.
> The value of "n" passed to the inner invocation of pg_qsort should not
> have been more than 29914, but working from either the value of d or the
> value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
> looks a lot more like an address in the array than a proper value of n.
>
> I suppose this might be due to a corrupted copy of the postgres executable
> rather than an actual compiler bug.  Did you build it yourself?
>
> BTW, I notice that ckpt_buforder_comparator assumes it can't possibly
> see the same block ID twice in the array, which I think is an
> unsupportable assumption.  But I cannot see a way that that could lead
> to a crash in pg_qsort --- at worst it might cause a little inefficiency.

simple
SELECT version();
...can give a lot of hints on who/what compiled the database if you don't know.

merlin



Re: help with generation_series in pg10

2018-01-10 Thread Merlin Moncure
On Mon, Jan 8, 2018 at 11:19 PM, Adrian Klaver
 wrote:
> On 01/08/2018 05:25 PM, Márcio A. Sepp wrote:
>>
>>
>> Hi,
>>
>>
>> In pg10 generation series doesn't work like in 9.5.
>> Ex. in 9.5:
>> z=# select generate_series(1, 10), generate_series(1, 5);
>>   generate_series | generate_series
>> -+-
>> 1 |   1
>> 2 |   2
>> 3 |   3
>> 4 |   4
>> 5 |   5
>> 6 |   1
>> 7 |   2
>> 8 |   3
>> 9 |   4
>>10 |   5
>> (10 registros)
>>
>>
>> so, in version 10 the same sql show different result set.
>
>
> The reason why:
>
> https://www.postgresql.org/docs/10/static/release-10.html#idm46428657945600
>
> Change the implementation of set-returning functions appearing in a query's
> SELECT list (Andres Freund)

An interesting (and good) side effect of this change is that this query:
select generate_series(1, nextval('s')), generate_series(1, nextval('s'));

...now terminates.

merlin