Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Jose Luis Tallon

On 8/6/23 15:56, Robert Haas wrote:

Yeah, I've had similar thoughts. I'm not exactly sure what the
advantages of such a refactoring might be, but the current structure
feels pretty limiting. It works OK because we don't do anything in the
postmaster other than fork a new backend, but I'm not sure if that's
the best strategy. It means, for example, that if there's a ton of new
connection requests, we're spawning a ton of new processes, which
means that you can put a lot of load on a PostgreSQL instance even if
you can't authenticate. Maybe we'd be better off with a pool of
processes accepting connections; if authentication fails, that
connection goes back into the pool and tries again.


    This. It's limited by connection I/O, hence a perfect use for 
threads (minimize per-connection overhead).


IMV, "session state" would be best stored/managed here. Would need a way 
to convey it efficiently, though.



If authentication
succeeds, either that process transitions to being a regular backend,
leaving the authentication pool, or perhaps hands off the connection
to a "real backend" at that point and loops around to accept() the
next request.


Nicely done by passing the FD around

But at this point, we'd just get a nice reimplementation of a threaded 
connection pool inside Postgres :\



Whether that's a good ideal in detail or not, the point remains that
having the postmaster handle this task is quite limiting. It forces us
to hand off the connection to a new process at the earliest possible
stage, so that the postmaster remains free to handle other duties.
Giving the responsibility to another process would let us make
decisions about where to perform the hand-off based on real
architectural thought rather than being forced to do a certain way
because nothing else will work.


At least "tcop" surely feels like belonging in a separate process 


    J.L.





Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Jose Luis Tallon

On 7/6/23 23:37, Andres Freund wrote:

[snip]
I think we're starting to hit quite a few limits related to the process model,
particularly on bigger machines. The overhead of cross-process context
switches is inherently higher than switching between threads in the same
process - and my suspicion is that that overhead will continue to
increase. Once you have a significant number of connections we end up spending
a *lot* of time in TLB misses, and that's inherent to the process model,
because you can't share the TLB across processes.


IMHO, as one sysadmin who has previously played with Postgres on "quite 
large" machines, I'd propose what most would call a "hybrid model"


* Threads are a very valuable addition for the "frontend" of the server. 
Most would call this a built-in session-aware connection pooler :)


    Heikki's (and others') efforts towards separating connection state 
into discrete structs is clearly a prerequisite for this; 
Implementation-wise, just toss the connState into a TLS[thread-local 
storage] variable and many problems just vanish.


    Postgres wouldn't be the first to adopt this approach, either...

* For "heavyweight" queries, the scalability of "almost independent" 
processes w.r.t. NUMA is just _impossible to achieve_ (locality of 
reference!) with a pure threaded system. When CPU+mem-bound 
(bandwidth-wise), threads add nothing IMO.


Indeed a separate postmaster is very much needed in order to control the 
processes / guard overall integrity.



Hence, my humble suggestion is to consider a hybrid architecture which 
benefits from each model's strengths. I am quite convinced that 
transition would be much safer and simpler (I do share most of Tom and 
other's concerns...)


Other projects to draw inspiration from:

 * Postfix -- multi-process, postfix's master guards processes and 
performs privileged operations; unprivileged "subsystems". Interesting 
IPC solutions
 * Apache -- MPMs provide flexibility and support for e.g. non-threaded 
workloads (PHP is the most popular; cfr. "prefork" multi-process MPM)
 * NginX is actually multi-process (one per CPU) + event-based 
(multiplexing) ...
 * PowerDNS is internally threaded, but has a "guardian" process. Seems 
to be evolving to a more hybrid model.



I would suggest something along the lines of :

* postmaster -- process supervision and (potentially privileged) 
operations; process coordination (i.e descriptor passing); mostly as-is

* *frontend* -- connection/session handling; possibly even event-driven
* backends -- process heavyweight queries as independently as possible. 
Can span worker threads AND processes when needed
* *dispatcher* -- takes care of cached/lightweight queries (cached 
catalog / full snapshot visibility+processing)
* utility processes can be left "as is" mostly, except to be made 
multi-threaded for heavy-sync ones (e.g. vacuum workers, stat workers)


For fixed-size buffers, i.e. pages / chunks, I'd say mmaped (anonymous) 
shared memory isn't that bad... but haven't read the actual code in years.


For message queues / invalidation messages, i guess that shmem-based 
sync is really a nuisance. My understanding is that Linux-specific (i.e. 
eventfd) mechanisms aren't quite considered .. or are they?



The amount of duplicated code we have to deal with due to to the process model
is quite substantial. We have local memory, statically allocated shared memory
and dynamically allocated shared memory variants for some things. And that's
just going to continue.


Code duplication is indeed a problem... but I wouldn't call "different 
approaches/solution for very similar problems depending on 
context/requirement" a duplicate. I might well be wrong / lack detail, 
though... (again: haven't read PG's code for some years already).



Just my two cents.


Thanks,

    J.L.

--
Parkinson's Law: Work expands to fill the time alloted to it.


Re: Detecting File Damage & Inconsistencies

2020-11-11 Thread Jose Luis Tallon

On 11/11/20 21:56, Simon Riggs wrote:

[ŝnip]

REINDEX VERIFY
After the new index is created, but before we drop the old index:
Check whether the two indexes match:
* checks whether the previous index had pointers to row versions that
don't exist
* checks whether the heap has rows that were not in the old index
This approach piggybacks on existing operations. AccessShareLock is
held on both indexes before the old one is dropped.


FWIW, as long as it's optional (due to the added runtime), it'd be a 
welcome feature.


Maybe something along the lines of:

    REINDEX (verify yes) 



Other ideas are welcome.


I still have nightmares from an specific customer case w/ shared storage 
(using VxFS) among two postmaster instances ---supposedly could never be 
active concurrently, not completely sure that it didn't actually 
happen--- and the corruption that we found there. I seem to remember 
that they even had scripts to remove the locking when switching over and 
back :S


I don't think Postgres can do much about this, but maybe someone can 
come up with a suitable countermeasure.



Just my .02€

Thanks,

    / J.L.






Re: Internal key management system

2020-06-18 Thread Jose Luis Tallon

On 18/6/20 19:41, Cary Huang wrote:

Hi all

Having read through the discussion, I have some comments and 
suggestions that I would like to share.


I think it is still quite early to even talk about external key 
management system even if it is running on the same host as PG. This 
is most likely achieved as an extension that can provide communication 
to external key server and it would be a separate project/discussion. 
I think the focus now is to finalize on the internal KMS design, and 
we can discuss about how to migrate internally managed keys to the 
external when the time is right.


As long as there exists a clean interface, and the "default" (internal) 
backend is a provider of said functionality, it'll be fine.


Given that having different KMS within a single instance (e.g. per 
database) is quite unlikely, I suggest just exposing hook-like 
function-pointer variables and be done with it. Requiring a preloaded 
library for this purpose doesn't seem too restrictive ---at least at 
this stage--- and can be very easily evolved in the future --- 
super-simple API which receives a struct made of function pointers, plus 
another function to reset it to "internal defaults" and that's it.




Key management system is generally built to manage the life cycle of 
cryptographic keys, so our KMS in my opinion needs to be built with 
key life cycle in mind such as:


* Key generation
* key protection
* key storage
* key rotation
* key rewrap
* key disable/enable
* key destroy


Add the support functions for your suggested "key information" 
functionality, and that's a very rough first draft of the API ...


KMS should not perform the above life cycle management by itself 
automatically or hardcoded, instead it should expose some interfaces 
to the end user or even a backend process like TDE to trigger the above.
The only key KMS should manage by itself is the KEK, which is derived 
from cluster passphrase value. This is fine in my opinion. This KEK 
should exist only within KMS to perform key protection (by wrapping) 
and key storage (save as file).


Asking for the "cluster password" is something better left optional / 
made easily overrideable ... or we risk thousands of clusters suddenly 
not working after a reboot :S



Just my .02€


Thanks,

    J.L.




Re: Raw device on PostgreSQL

2020-05-01 Thread Jose Luis Tallon

On 30/4/20 6:22, Thomas Munro wrote:

On Thu, Apr 30, 2020 at 12:26 PM Tomas Vondra
 wrote:

Yeah, I think the question is what are the expected benefits of using
raw devices. It might be an interesting exercise / experiment, but my
understanding is that most of the benefits can be achieved by using file
systems but with direct I/O and async I/O, which would allow us to
continue reusing the existing filesystem code with much less disruption
to our code base.

Agreed.

[snip] That's probably the main work
required to make this work, and might be a valuable thing to have
independently of whether you stick it on a raw device, a big data
file, NV RAM
   ^^  THIS, with NV DIMMs / PMEM (persistent memory) possibly 
becoming a hot topic in the not-too-distant future

or some other kind of storage system -- but it's a really
difficult project.


Indeed But you might have already pointed out the *only* required 
feature for this to work: a "database" of relfilenode ---which is 
actually an int, or rather, a tuple (relfilenode,segment) where both 
components are 32-bit currently: that is, a 64bit "objectID" of sorts--- 
to "set of extents" ---yes, extents, not blocks: sequential I/O is still 
faster in all known storage/persistent (vs RAM) systems where the 
current I/O primitives would be able to write.


Some conversion from "absolute" (within the "file") to "relative" 
(within the "tablespace") offsets would need to happen before delegating 
to the kernel... or even dereferencing a pointer to an mmap'd region !, 
but not much more, ISTM (but I'm far from an expert in this area).


Out of the top of my head:

CREATE TABLESPACE tblspcname [other_options] LOCATION '/dev/nvme1n2' 
WITH (kind=raw, extent_min=4MB);


  or something similar to that approac might do it.

    Please note that I have purposefully specified "namespace 2" in an 
"enterprise" NVME device, to show the possibility.


OR

  use some filesystem (e.g. XFS) with DAX[1] (mount -o dax ) where 
available along something equivalent to  WITH(kind=mmaped)



... though the locking we currently get "for free" from the kernel would 
need to be replaced by something else.



Indeed it seems like an enormous amount of work but it may well pay 
off. I can't fully assess the effort, though



Just my .02€

[1] https://www.kernel.org/doc/Documentation/filesystems/dax.txt


Thanks,

    / J.L.






Re: where should I stick that backup?

2020-04-11 Thread Jose Luis Tallon

On 10/4/20 21:38, Andres Freund wrote:

Hi,

On 2020-04-10 12:20:01 -0400, Robert Haas wrote:

- We're only talking about writing a handful of tar files, and that's
in the context of a full-database backup, which is a much
heavier-weight operation than a query.
- There is not really any state that needs to be maintained across calls.
- The expected result is that a file gets written someplace, which is
not an in-memory data structure but something that gets written to a
place outside of PostgreSQL.

Wouldn't there be state like a S3/ssh/https/... connection?
...to try and save opening a new connection in the context of a 
(potentially) multi-TB backup? :S

And perhaps
a 'backup_id' in the backup metadata DB that'd one would want to update
at the end?


This is, indeed, material for external tools. Each cater for a 
particular set of end-user requirements.


We got many examples already, with most even co-authored by this list's 
regulars... and IMHO none is suitable for ALL use-cases.



BUT I agree that providing better tools with Postgres itself, ready to 
use --- that is, uncomment the default "archive_command" and get going 
for a very basic starting point --- is a huge advancement in the right 
direction. More importantly (IMO): including the call to "pgfile" or 
equivalent quite clearly signals any inadvertent user that there is more 
to safely archiving WAL segments than just doing "cp -a" blindly and 
hoping that the tool magically does all required steps [needed to ensure 
data safety in this case, rather than the usual behaviour]. It's 
probably more effective than just ammending the existing comments to 
point users to a (new?) section within the documentation.



This comment is from experience: I've lost count of how many times I 
have had to "fix" the default command for WAL archiving --- precisely 
because it had been blindly copied from the default without further 
thinking of the implications should there happen any 
(deviation-from-expected-behaviour) during WAL archiving  only to be 
noticed at (attempted) recovery time :\



HTH.

Thanks,

    J.L.






Re: where should I stick that backup?

2020-04-11 Thread Jose Luis Tallon

On 10/4/20 15:49, Robert Haas wrote:

On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian  wrote:

Good point, but if there are multiple APIs, it makes shell script
flexibility even more useful.

[snip]

One thing I do think would be realistic would be to invent a set of
tools that are perform certain local filesystem operations in a
"hardened" way.

+10

  Maybe a single tool with subcommands and options. So
you could say, e.g. 'pgfile cp SOURCE TARGET' and it would create a
temporary file in the target directory, write the contents of the
source into that file, fsync the file, rename it into place, and do
more fsyncs to make sure it's all durable in case of a crash. You
could have a variant of this that instead of using the temporary file
and rename in place approach, does the thing where you open the target
file with O_CREAT|O_EXCL, writes the bytes, and then closes and fsyncs
it.
Behaviour might be decided in the same way as the default for 
'wal_sync_method' gets chosen, as the most appropriate for a particular 
system.

And you could have other things too, like 'pgfile mkdir DIR' to
create a directory and fsync it for durability. A toolset like this
would probably help people write better archive commands


Definitely, "mkdir" and "create-exclusive" (along with cp) would be a 
great addition and simplify the kind of tasks properly (i.e. with 
risking data loss every time)

[excerpted]

pg_basebackup -Ft --pipe-output 'bzip | pgfile create-exclusive - %f.bz2'

[]

pg_basebackup -Ft --pipe-output 'bzip | gpg -e | ssh someuser@somehost
pgfile create-exclusive - /backups/tuesday/%f.bz2'

Yep. Would also fit the case for non-synchronous NFS mounts for backups...

It is of course not impossible to teach pg_basebackup to do all of
that stuff internally, but I have a really difficult time imagining us
ever getting it done. There are just too many possibilities, and new
ones arise all the time.


Indeed. The beauty of Unix-like OSs is precisely this.


A 'pgfile' utility wouldn't help at all for people who are storing to
S3 or whatever. They could use 'aws s3' as a target for --pipe-output,
[snip]
(Incidentally, pg_basebackup already has an option to output the
entire backup as a tarfile on standard output, and a user can already
pipe that into any tool they like. However, it doesn't work with
tablespaces. So you could think of this proposal as extending the
existing functionality to cover that case.)


Been there already :S  Having pg_basebackup output multiple tarballs 
(one per tablespace), ideally separated via something so that splitting 
can be trivially done on the receiving end.


...but that's probably matter for another thread.


Thanks,

    / J.L.






Just for fun: Postgres 20?

2020-02-09 Thread Jose Luis Tallon

Hackers,

    Musing some other date-related things I stumbled upon the thought 
that naming the upcoming release PostgreSQL 20 might be preferrable to 
the current/expected "PostgreSQL 13".



Cons:

 * Discontinuity in versions. 12 -> 20.  Now that we have the precedent 
of 9.6 -> 10 (for very good reasons, I think), this is probably a minor 
issue... Mostly the inconvenience of having to add tests for the skipped 
versions, I believe.


    ¿any others that I don't know about?

Pros:

 * Simplified supportability assessment:  PostgreSQL 20, released in 
2020, would be supported until the release of PostgreSQL 25 (late 2025 
if release cadence is kept as today). Simple and straightforward.


 * We avoid users skipping the release altogether due to superstition 
or analogous reasons ---might be a major issue in some cultures---. 
Postgres 13 would be certainly skipped in production in some 
environments that I know about o_0



Nothing really important, I guess. I think of it as a thought experiment 
mostly, but might spark some ultimate useful debate.



Thanks,

    / J.L.






Re: color by default

2019-12-31 Thread Jose Luis Tallon

On 31/12/19 14:35, Tom Lane wrote:

Peter Eisentraut  writes:

With the attached patch, I propose to enable the colored output by
default in PG13.

FWIW, I shall be setting NO_COLOR permanently if this gets committed.
I wonder how many people there are who actually *like* colored output?
I find it to be invariably less readable than plain B text.

I may well be in the minority, but I think some kind of straw poll
might be advisable, rather than doing this just because.


+1

...and Happy New Year!


    / J.L.






Re: Proposal: Global Index

2019-12-19 Thread Jose Luis Tallon

On 19/12/19 4:03, Bruce Momjian wrote:

On Mon, Nov 25, 2019 at 03:44:39PM -0800, Jeremy Schneider wrote:

On 11/25/19 15:05, Jeremy Schneider wrote:

... the cost of doing the individual index lookups across 180
partitions (and 180 indexes) was very high, so they stored max and min
txn id per partition and would generate a query with all the dates that
a txn id could have been in so that only a small number of partition
indexes would be accessed.

.. If we are looking for higher concurrency, we can usually
add a hack/workaround that filters on a partition key to provide “pretty
good” pruning.  The net result is that you get 2-3x the IO due to the
lack of global index (same workaround as first story above).

Is that basically like a global BRIN index with granularity at the
partition level?

Exactly!  :-)


Actually, one "kind of" BRIN index *per partitioned table* mapping (key 
range) -> (partition oid)... and so concurrency doesn't need to be very 
affected.


(we don't need to do things just like other RDBMS do, ya know... ;)


IIRC, this precise approach was suggested around 2016 when initially 
discussing the "declarative partitioning" which originated Postgres' 
current partitioning scheme, in order to optimize partition pruning.



Just my .02€

    / J.L.






Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-20 Thread Jose Luis Tallon

On 20/9/19 4:06, Michael Paquier wrote:

On Thu, Sep 19, 2019 at 05:40:41PM +0300, Alexey Kondratov wrote:

On 19.09.2019 16:21, Robert Haas wrote:

So, earlier in this thread, I suggested making this part of ALTER
TABLE, and several people seemed to like that idea. Did we have a
reason for dropping that approach?

Personally, I don't find this idea very attractive as ALTER TABLE is
already complicated enough with all the subqueries we already support
in the command, all the logic we need to maintain to make combinations
of those subqueries in a minimum number of steps, and also the number
of bugs we have seen because of the amount of complication present.


Yes, but please keep the other options: At it is, cluster, vacuum full 
and reindex already rewrite the table in full; Being able to write the 
result to a different tablespace than the original object was stored in 
enables a whole world of very interesting possibilities including a 
quick way out of a "so little disk space available that vacuum won't 
work properly" situation --- which I'm sure MANY users will appreciate, 
including me



If we add this option to REINDEX, then for 'ALTER TABLE tb_name action1,
REINDEX SET TABLESPACE tbsp_name, action3' action2 will be just a direct
alias to 'REINDEX TABLE tb_name SET TABLESPACE tbsp_name'. So it seems
practical to do this for REINDEX first.

The only one concern I have against adding REINDEX to ALTER TABLE in this
context is that it will allow user to write such a chimera:

ALTER TABLE tb_name REINDEX SET TABLESPACE tbsp_name, SET TABLESPACE
tbsp_name;

when they want to move both table and all the indexes. Because simple
ALTER TABLE tb_name REINDEX, SET TABLESPACE tbsp_name;
looks ambiguous. Should it change tablespace of table, indexes or both?


Indeed.

IMHO, that form of the command should not allow that much flexibility... 
even on the "principle of least surprise" grounds :S


That is, I'd restrict the ability to change (output) tablespace to the 
"direct" form --- REINDEX name, VACUUM (FULL) name, CLUSTER name --- 
whereas the ALTER table|index SET TABLESPACE would continue to work.


Now that I come to think of it, maybe saying "output" or "move to" 
rather than "set tablespace" would make more sense for this variation of 
the commands? (clearer, less prone to confusion)?



Tricky question, but we don't change the tablespace of indexes when
using an ALTER TABLE, so I would say no on compatibility grounds.
ALTER TABLE has never touched the tablespace of indexes, and I don't
think that we should begin to do so.


Indeed.


I might be missing something, but is there any reason to not *require* a 
explicit transaction for the above multi-action commands? I mean, have 
it be:


BEGIN;

ALTER TABLE tb_name SET TABLESPACE tbsp_name;    -- moves the table  
but possibly NOT the indexes?


ALTER TABLE tb_name REINDEX [OUTPUT TABLESPACE tbsp_name];    -- 
REINDEX, placing the resulting index on tbsp_name instead of the 
original one


COMMIT;

... and have the parser/planner combine the steps if it'd make sense (it 
probably wouldn't in this example)?



Just my .02€


Thanks,

    / J.L.






Re: [PATCH] Implement uuid_version()

2019-07-05 Thread Jose Luis Tallon

On 5/7/19 11:00, Peter Eisentraut wrote:

On 2019-07-05 00:08, Jose Luis Tallon wrote:

On 4/7/19 17:30, Alvaro Herrera wrote:

On 2019-Jul-04, Tom Lane wrote:


A possible option 3 is to keep the function in pgcrypto but change
its C code to call the core code.

Updated patch with this change included.

Great, thanks!




Re: [PATCH] Implement uuid_version()

2019-07-04 Thread Jose Luis Tallon

On 4/7/19 17:30, Alvaro Herrera wrote:

On 2019-Jul-04, Tom Lane wrote:


A possible option 3 is to keep the function in pgcrypto but change
its C code to call the core code.

This seems most reasonable, and is what José Luis proposed upthread.  We
don't have to bump the pgcrypto extension version, as nothing changes
for pgcrypto externally.


Yes, indeed.

...which means I get another todo item if nobody else volunteers :)


Thanks!

    / J.L.






Re: [PATCH] Implement uuid_version()

2019-07-02 Thread Jose Luis Tallon

On 2/7/19 9:26, Peter Eisentraut wrote:

On 2019-06-30 14:50, Fabien COELHO wrote:

I'm wondering whether pg_random_uuid() should be taken out of pgcrypto if
it is available in core?

That would probably require an extension version update dance in
pgcrypto.  I'm not sure if it's worth that.  Thoughts?


What I have devised for my upcoming patch series is to use a 
compatibility "shim" that calls the corresponding core code when the 
expected usage does not match the new names/signatures...


This way we wouldn't even need to version bump pgcrypto (full backwards 
compatibility -> no bump needed). Another matter is whether this should 
raise some "deprecation warning" or the like; I don't think we have any 
such mechanisms available yet.



FWIW, I'm implementing an "alias" functionality for extensions, too, in 
order to achieve transparent (for the user) extension renames.


HTH


Thanks,

    / J.L.






Re: [PATCH] Implement uuid_version()

2019-06-11 Thread Jose Luis Tallon

On 11/6/19 13:11, Peter Eisentraut wrote:

On 2019-06-11 12:31, Jose Luis Tallon wrote:

I wonder whether re-implementing some more of the extension's (ie. UUID
v5) in terms of PgCrypto and in-core makes sense / would actually be
accepted into core?

Those other versions are significantly more complicated to implement,
and I don't think many people really need them, so I'm not currently
interested.


For the record: I was volunteering to implement that functionality. I'd 
only need some committer to take a look and erm... commit it :)


Thank you, in any case; The patch you have provided will be very useful.


    / J.L.







Re: [PATCH] Implement uuid_version()

2019-06-11 Thread Jose Luis Tallon

On 11/6/19 10:49, Peter Eisentraut wrote:

On 2019-04-09 08:04, Peter Eisentraut wrote:

On 2019-04-08 23:06, Andres Freund wrote:

The randomness based UUID generators don't really have dependencies, now
that we have a dependency on strong randomness.  I kinda thing the
dependency argument actually works *against* uuid-ossp - precisely
because of its dependencies (which also vary by OS) it's not a proper
replacement for a type of facility a very sizable fraction of our users
need.

Yeah, I think implementing a v4 generator in core would be trivial and
address almost everyone's requirements.

Here is a proposed patch for this.  I did a fair bit of looking around
in other systems for a naming pattern but didn't find anything
consistent.  So I ended up just taking the function name and code from
pgcrypto.

As you can see, the code is trivial and has no external dependencies.  I
think this would significantly upgrade the usability of the uuid type.


Yes, indeed. Thanks!

This is definitively a good step towards removing external dependencies 
for general usage of UUIDs. As recently commented, enabling extensions 
at some MSPs/Cloud providers can be a bit challenging.



I wonder whether re-implementing some more of the extension's (ie. UUID 
v5) in terms of PgCrypto and in-core makes sense / would actually be 
accepted into core?


I assume that Peter would like to commit that potential patch series?


Thanks,

    / J.L.






Re: [PATCH] Implement uuid_version()

2019-04-08 Thread Jose Luis Tallon

On 8/4/19 17:06, Robert Haas wrote:

On Sun, Apr 7, 2019 at 10:15 AM David Fetter  wrote:

I see some.

UUIDs turn out to be super useful in distributed systems to give good
guarantees of uniqueness without coordinating with a particular node.
Such systems have become a good bit more common since the most recent
time this was discussed.

That's not really a compelling reason, though, because anybody who
needs UUIDs can always install the extension.  And on the other hand,
if we moved UUID support into core, then we'd be adding a hard compile
dependency on one of the UUID facilities, which might annoy some
developers.  We could possibly work around that by implementing our
own UUID facilities in core,


Yup. My proposal basically revolves around implementing v3 / v4 / v5 
(most used/useful versions for the aforementioned use cases) in core, 
using the already existing md5 and sha1 facilities (which are already 
being linked from the current uuid-ossp extension as fallback with 
certain configurations) ... and leaving the remaining functionality in 
the extension, just as it is now.


This way, we guarantee backwards compatibility: Those already using the 
extension wouldn't have to change anything, and new users won't need to 
load any extension to benefit from this (base) functionality.



  but I'm not volunteering to do the work,

Of course, I'd take care of that :)

and I'm not sure that the work has enough benefit to justify the
labor.


With this "encouragement", I'll write the code and submit the patches to 
a future commitfest. Then the normal procedure will take care of judging 
whether it's worth being included or not :$



My biggest gripe about uuid-ossp is that the name is stupid.  I wish
we could see our way clear to renaming that extension to just 'uuid',
because as J.L. says, virtually nobody's actually compiling against
the OSSP library any more.  The trick there is how to do that without
annoying exiting users.  Maybe we could leave behind an "upgrade"
script for the uuid-ossp extension that does CREATE EXTENSION uuid,
then alters all objects owned by the current extension to be owned by
the new extension, and maybe even drops itself.


I believe my proposal above mostly solves the issue: new users with 
"standard" needs won't need to load any extension (better than current), 
old users will get the same functionality as they have today (only part 
in core and part in the extension)...


 ...and a relatively simple "alias" (think Linux kernel modules) 
facility would make the transition fully transparent: rename extension 
to "uuid" ---possibly dropping the dependency on uuid-ossp in the 
process--- and expose an "uuid-ossp" alias for backwards compatibility.



Thanks,

    J.L.






Re: [PATCH] Implement uuid_version()

2019-04-07 Thread Jose Luis Tallon

On 6/4/19 18:35, Tom Lane wrote:

Jose Luis Tallon  writes:

      While working on an application, the need arose to be able
efficiently differentiate v4/v5 UUIDs (for use in partial indexes, among
others)
... so please find attached a trivial patch which adds the
functionality.

No particular objection...


      I'm not sure whether this actually would justify a version bump for
the OSSP-UUID extension

Yes.  Basically, once we've shipped a given version of an extension's
SQL script, that version is *frozen*.  Anything at all that you want
to do to it has to be done in an extension update script, because
otherwise there's no clean migration path for users.


Got it, and done. Please find attached a v2 patch with the upgrade 
script included.



Thank you for taking a look. Your time is much appreciated :)


    J.L.


diff --git a/contrib/uuid-ossp/Makefile b/contrib/uuid-ossp/Makefile
index c52c583d64..7f29bec535 100644
--- a/contrib/uuid-ossp/Makefile
+++ b/contrib/uuid-ossp/Makefile
@@ -4,7 +4,7 @@ MODULE_big = uuid-ossp
 OBJS = uuid-ossp.o $(UUID_EXTRA_OBJS) $(WIN32RES)
 
 EXTENSION = uuid-ossp
-DATA = uuid-ossp--1.1.sql uuid-ossp--1.0--1.1.sql uuid-ossp--unpackaged--1.0.sql
+DATA = uuid-ossp--1.1.sql uuid-ossp--1.0--1.1.sql uuid-ossp--1.1--1.2.sql uuid-ossp--unpackaged--1.0.sql
 PGFILEDESC = "uuid-ossp - UUID generation"
 
 REGRESS = uuid_ossp
diff --git a/contrib/uuid-ossp/uuid-ossp--1.1--1.2.sql b/contrib/uuid-ossp/uuid-ossp--1.1--1.2.sql
new file mode 100644
index 00..8e47ca60a1
--- /dev/null
+++ b/contrib/uuid-ossp/uuid-ossp--1.1--1.2.sql
@@ -0,0 +1,9 @@
+/* contrib/uuid-ossp/uuid-ossp--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION uuid-ossp UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION uuid_version(namespace uuid)
+RETURNS int4
+AS 'MODULE_PATHNAME', 'uuid_version'
+IMMUTABLE STRICT LANGUAGE C PARALLEL SAFE;
diff --git a/contrib/uuid-ossp/uuid-ossp.c b/contrib/uuid-ossp/uuid-ossp.c
index f5ae915f24..b4997281c0 100644
--- a/contrib/uuid-ossp/uuid-ossp.c
+++ b/contrib/uuid-ossp/uuid-ossp.c
@@ -122,6 +122,7 @@ PG_FUNCTION_INFO_V1(uuid_generate_v1mc);
 PG_FUNCTION_INFO_V1(uuid_generate_v3);
 PG_FUNCTION_INFO_V1(uuid_generate_v4);
 PG_FUNCTION_INFO_V1(uuid_generate_v5);
+PG_FUNCTION_INFO_V1(uuid_version);
 
 #ifdef HAVE_UUID_OSSP
 
@@ -531,3 +532,16 @@ uuid_generate_v5(PG_FUNCTION_ARGS)
   VARDATA_ANY(name), VARSIZE_ANY_EXHDR(name));
 #endif
 }
+
+Datum
+uuid_version(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *arg = PG_GETARG_UUID_P(0);
+	dce_uuid_t uu;
+
+	/* function is marked STRICT, so arg can't be NULL */
+	memcpy(,arg,UUID_LEN);
+	UUID_TO_NETWORK(uu);
+
+	PG_RETURN_INT32(uu.time_hi_and_version >> 12);
+}
diff --git a/contrib/uuid-ossp/uuid-ossp.control b/contrib/uuid-ossp/uuid-ossp.control
index 657476c182..3479b06eff 100644
--- a/contrib/uuid-ossp/uuid-ossp.control
+++ b/contrib/uuid-ossp/uuid-ossp.control
@@ -1,5 +1,5 @@
 # uuid-ossp extension
 comment = 'generate universally unique identifiers (UUIDs)'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/uuid-ossp'
 relocatable = true
diff --git a/doc/src/sgml/uuid-ossp.sgml b/doc/src/sgml/uuid-ossp.sgml
index b3b816c372..43dd565886 100644
--- a/doc/src/sgml/uuid-ossp.sgml
+++ b/doc/src/sgml/uuid-ossp.sgml
@@ -156,6 +156,22 @@ SELECT uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org');
 

   
+
+  
+   Functions Returning UUID attributes
+   
+
+ 
+  uuid_version()
+  
+   
+Returns the UUID version (1,3,4,5). Assumes variant 1 (RFC4122).
+   
+  
+ 
+
+   
+  
  
 
  


[PATCH] Implement uuid_version()

2019-04-06 Thread Jose Luis Tallon

Hackers,

    While working on an application, the need arose to be able 
efficiently differentiate v4/v5 UUIDs (for use in partial indexes, among 
others)


... so please find attached a trivial patch which adds the 
functionality. The "uuid_version_bits()" function (from the test suite?) 
seems quite a bit hackish, apart from inefficient :(



    I'm not sure whether this actually would justify a version bump for 
the OSSP-UUID extension ---a misnomer, BTW, since at least in all the 
systems I have access to, the extension is actually linked against 
libuuid from e2fsutils, but I digress --- or not, given that it doesn't 
change exposed functionality.



    Another matter, which I'd like to propose in a later thread, is 
whether it'd be interesting to include the main UUID functionality 
directly in core, with the remaining functions in ossp-uuid (just like 
it is now, for backwards compatibility): Most current patterns for 
distributed/sharded databases are based on using UUIDs for many PKs.



Thanks,

    J.L.


diff --git a/contrib/uuid-ossp/uuid-ossp--1.1.sql b/contrib/uuid-ossp/uuid-ossp--1.1.sql
index c9cefd7360..a2eb217fd8 100644
--- a/contrib/uuid-ossp/uuid-ossp--1.1.sql
+++ b/contrib/uuid-ossp/uuid-ossp--1.1.sql
@@ -52,3 +52,8 @@ CREATE FUNCTION uuid_generate_v5(namespace uuid, name text)
 RETURNS uuid
 AS 'MODULE_PATHNAME', 'uuid_generate_v5'
 IMMUTABLE STRICT LANGUAGE C PARALLEL SAFE;
+
+CREATE FUNCTION uuid_version(namespace uuid)
+RETURNS int4
+AS 'MODULE_PATHNAME', 'uuid_version'
+IMMUTABLE STRICT LANGUAGE C PARALLEL SAFE;
diff --git a/contrib/uuid-ossp/uuid-ossp.c b/contrib/uuid-ossp/uuid-ossp.c
index f5ae915f24..b4997281c0 100644
--- a/contrib/uuid-ossp/uuid-ossp.c
+++ b/contrib/uuid-ossp/uuid-ossp.c
@@ -122,6 +122,7 @@ PG_FUNCTION_INFO_V1(uuid_generate_v1mc);
 PG_FUNCTION_INFO_V1(uuid_generate_v3);
 PG_FUNCTION_INFO_V1(uuid_generate_v4);
 PG_FUNCTION_INFO_V1(uuid_generate_v5);
+PG_FUNCTION_INFO_V1(uuid_version);
 
 #ifdef HAVE_UUID_OSSP
 
@@ -531,3 +532,16 @@ uuid_generate_v5(PG_FUNCTION_ARGS)
   VARDATA_ANY(name), VARSIZE_ANY_EXHDR(name));
 #endif
 }
+
+Datum
+uuid_version(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *arg = PG_GETARG_UUID_P(0);
+	dce_uuid_t uu;
+
+	/* function is marked STRICT, so arg can't be NULL */
+	memcpy(,arg,UUID_LEN);
+	UUID_TO_NETWORK(uu);
+
+	PG_RETURN_INT32(uu.time_hi_and_version >> 12);
+}
diff --git a/doc/src/sgml/uuid-ossp.sgml b/doc/src/sgml/uuid-ossp.sgml
index b3b816c372..43dd565886 100644
--- a/doc/src/sgml/uuid-ossp.sgml
+++ b/doc/src/sgml/uuid-ossp.sgml
@@ -156,6 +156,22 @@ SELECT uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org');
 

   
+
+  
+   Functions Returning UUID attributes
+   
+
+ 
+  uuid_version()
+  
+   
+Returns the UUID version (1,3,4,5). Assumes variant 1 (RFC4122).
+   
+  
+ 
+
+   
+  
  
 
  


Re: phase out ossp-uuid?

2019-02-08 Thread Jose Luis Tallon

On 7/2/19 23:03, Andres Freund wrote:

Hi,

On 2019-02-07 09:03:06 +, Dave Page wrote:

On Thu, Feb 7, 2019 at 8:26 AM Peter Eisentraut
 wrote:

I suggest we declare it deprecated in PG12 and remove it altogether in PG13.

Much as I'd like to get rid of it, we don't have an alternative for
Windows do we? The docs for 11 imply it's required for UUID support
(though the wording isn't exactly clear, saying it's required for
UUID-OSSP support!):
https://www.postgresql.org/docs/11/install-windows-full.html#id-1.6.4.8.8

Given that we've now integrated strong crypto support, and are relying
on it for security (scram), perhaps we should just add a core uuidv4?


This. But just make it "uuid" and so both parties will get their own:

On 7/2/19 11:37, I wrote:

AFAIR, Windows has its own DCE/v4 UUID generation support if needed 
UUID v5 can be generated using built-in crypto hashes. v1 are the ones 
(potentially) more cumbersome to generate plus they are the least 
useful IMHO.


- UUIDv3    <- with built-in crypto hashes

- UUIDv4    <- with built-in crypto random

- UUIDv5    <- with built-in crypto hashes

Only v1 remain. For those use cases one could use ossp-uuid so what 
about:


* Rename the extension's type to ossp_uuid or the like.

* Have uuid in-core (we already got the platform independent required 
crypto, so I wouldn't expect portability issues)


I reckon that most use cases should be either UUID v4 or UUID v5 these 
days. For those using v1 UUIDs, either implement v1 in core or provide 
some fallback/migration path; This would only affect the 
"uuid_generate_v1()" and "uuid_generate_v1mc()" calls AFAICS.



Moreover, the documentation (as far back as 9.4) already states:

"If you only need randomly-generated (version 4) UUIDs, consider using 
the |gen_random_uuid()| function from the pgcrypto 
 module instead."


So just importing the datatype into core would go a long way towards 
removing the dependency for most users.



Thanks,

    / J.L.




Re: phase out ossp-uuid?

2019-02-07 Thread Jose Luis Tallon

On 7/2/19 10:03, Dave Page wrote:

On Thu, Feb 7, 2019 at 8:26 AM Peter Eisentraut
 wrote:

I'm wondering whether we should phase out the use of the ossp-uuid
library? (not the uuid-ossp extension)


Hmm... FWIW, just get it in core altogether? Seems small and useful 
enough... if it carries the opfamily with it, UUID would become really 
convenient to use for distributed applications.


(being using that functionality for a while, already)


   We have had preferred
alternatives for a while now, so it shouldn't be necessary to use this
anymore, except perhaps in some marginal circumstances?  As we know,
ossp-uuid isn't maintained anymore, and a few weeks ago the website was
gone altogether, but it seems to be back now.

I suggest we declare it deprecated in PG12 and remove it altogether in PG13.

Much as I'd like to get rid of it, we don't have an alternative for
Windows do we? The docs for 11 imply it's required for UUID support
(though the wording isn't exactly clear, saying it's required for
UUID-OSSP support!):
https://www.postgresql.org/docs/11/install-windows-full.html#id-1.6.4.8.8


AFAIR, Windows has its own DCE/v4 UUID generation support. UUID v5 can 
be generated using built-in crypto hashes. v1 are the ones (potentially) 
more cumbersome to generate plus they are the least useful IMHO.



Just my .02€

Thanks,

    / J.L.





Re: Using POPCNT and other advanced bit manipulation instructions

2018-12-20 Thread Jose Luis Tallon

On 20/12/18 6:53, David Rowley wrote:

Back in 2016 [1] there was some discussion about using the POPCNT
instruction to improve the performance of counting the number of bits
set in a word.  Improving this helps various cases, such as
bms_num_members and also things like counting the allvisible and
frozen pages in the visibility map.

[snip]

I've put together a very rough patch to implement using POPCNT and the
leading and trailing 0-bit instructions to improve the performance of
bms_next_member() and bms_prev_member().  The correct function should
be determined on the first call to each function by way of setting a
function pointer variable to the most suitable supported
implementation.   I've not yet gone through and removed all the
number_of_ones[] arrays to replace with a pg_popcount*() call.


IMVHO: Please do not disregard potential optimization by the compiler 
around those calls.. o_0  That might explain the reduced performance 
improvement observed.


Not that I can see any obvious alternative to your implementation right 
away ...



That
seems to have mostly been done in Thomas' patch [3], part of which
I've used for the visibilitymap.c code changes.  If this patch proves
to be possible, then I'll look at including the other changes Thomas
made in his patch too.

What I'm really looking for by posting now are reasons why we can't do
this. I'm also interested in getting some testing done on older
machines, particularly machines with processors that are from before
2007, both AMD and Intel.


I can offer a 2005-vintage Opteron 2216 rev3 (bought late 2007) to test 
on. Feel free to toss me some test code.


cpuinfo flags:    fpu de tsc msr pae mce cx8 apic mca cmov pat clflush 
mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow 
rep_good nopl extd_apicid eagerfpu pni cx16 hypervisor lahf_lm 
cmp_legacy 3dnowprefetch vmmcall



  2007-2008 seems to be around the time both
AMD and Intel added support for POPCNT and LZCNT, going by [4].


Thanks





Re: Thinking about EXPLAIN ALTER TABLE

2018-12-11 Thread Jose Luis Tallon

We were just busy shooting down a different suggestion of
behavior-changing GUCs.  A GUC that turns all ALTERs into no-ops
sure seems like a foot-gun to me.

Yeah, I like EXPLAIN better.

+1 for EXPLAIN


IMVHO, and for "symmetry" with existing mechanisms:

* EXPLAIN ALTER TABLE    ==> "DDL dry run", but tell me what would be 
done (similar to what EXPLAIN SELECT does)


* EXPLAIN PERFORM ALTER TABLE    (EXPLAIN EXEC?)    would explain + do

    ...and bonus points for explaining each step just before it is 
performed. This way, It'd be easy for users to verify that a particular 
step (i.e. table rewrite) is the one taking æons to run or hammering the 
storage.


    Of course, regular "ALTER TABLE" stays as it is.


Just my .02€ :)

I'm not familiar with this part of the code and clearly can't devote 
enough time to do it in the near future, but count on me to test it if 
accepted.



Thanks,

    Jose