Re: Question regarding how databases support atomicity

2024-05-07 Thread Siddharth Jain
Thanks All for the kind responses. I understand how MVCC takes care of
atomicity for updates to rows. I was developing a project where lets say
data for each table is stored in its own folder together with metadata (we
are not talking postgres now). So if I have two tables A and B I have a
folder structure like:
A
\_ metadata.json
B
\_ metadata.json
Now if I want to rename a table, I need to move the folder and also update
metadata accordingly. These are two separate operations but need to be done
atomically - all or none. in this case it is possible that we succeed in
renaming the folder but fail to update metadata for whatever reason. then
if we try to undo the folder rename we get another failure for whatever
reason. how to deal with such scenarios? are there no such scenarios in
postgres?


On Fri, May 3, 2024 at 8:29 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Friday, May 3, 2024, Siddharth Jain  wrote:
> >> The way I understand this is that if there is a failure in-between, we
> >>> start undoing and reverting the previous operations one by one.
>
> > Not in PostgreSQL.  All work performed is considered provisional until a
> > commit succeeds.  At which point all provisional work, which had been
> > tagged with the same transaction identifier, becomes reality to the rest
> of
> > the system, by virtue of marking the transaction live.
>
> Right.  We don't use UNDO; instead, we use multiple versions of
> database rows (MVCC).  A transaction doesn't modify the contents
> of existing rows, but just marks them as provisionally outdated, and
> then inserts new row versions that are marked provisionally inserted.
> Other transactions ignore the outdate markings and the uncommitted new
> rows, until the transaction commits, at which time the new versions
> become live and the old ones become dead.  If the transaction never
> does commit -- either through ROLLBACK or a crash -- then the old row
> versions remain live and the new ones are dead.  In either case, we
> don't have a consistency or correctness problem, but we do have dead
> rows that must eventually get vacuumed away to prevent indefinite
> storage bloat.  That can be done by background housekeeping processes
> though (a/k/a autovacuum).
>
> I believe Oracle, for one, actually does use UNDO.  I don't know
> what they do about failure-to-UNDO.
>
> regards, tom lane
>


Re: Question regarding how databases support atomicity

2024-05-03 Thread Siddharth Jain
On Fri, May 3, 2024 at 8:00 PM Siddharth Jain  wrote:

> I am trying to sharpen my understanding of databases. Let's say there is
> an operation foo as part of the public API that internally translates to
> more than 1 operation - I am sure there are examples like this in postgres.
> So to do foo we have to do following in order in all or none fashion:
>
> 1. Step 1
> 2. Step 2
> 3. Step 3
>
> The way I understand this is that if there is a failure in-between, we
> start undoing and reverting the previous operations one by one. But what if
> there is a failure and we are not able to revert an operation. How is that
> situation handled? e.g., something failed when we tried to do Step 3. now
> we revert Step 2 and succeed. but when we try to revert step 1 we fail.
> what happens now? To me, it seems its impossible to guarantee true
> atomicity in general.
>
> S.
>


Re: How do you optimize the disk IO when you cannot assume a file will start at a boundary then?

2024-02-22 Thread Siddharth Jain
Thanks Tomas

On Thu, Feb 22, 2024 at 3:05 AM Tomas Vondra 
wrote:

> On 2/22/24 02:22, Siddharth Jain wrote:
> > Hi All,
> >
> > I understand the storage layer in databases goes to great lengths to
> ensure:
> > - a row does not cross a block boundary
> > - read/writes/allocation happen in units of blocks
> > etc. The motivation is that at the OS level, it reads and writes pages
> > (blocks), not individual bytes. I am only concerned about SSDs but I
> think
> > the principle applies to HDD as well.
> >
> > but how can we do all this when we are not even guaranteed that the
> > beginning of a file will be aligned with a block boundary? refer this
> > <
> https://stackoverflow.com/questions/8018449/is-it-guaranteed-that-the-beginning-of-a-file-is-aligned-with-pagesize-of-file-s
> >
> > .
> >
> > Further, I don't see any APIs exposing I/O operations in terms of blocks.
> > All File I/O APIs I see expose a file as a randomly accessible contiguous
> > byte buffer. Would it not have been easier if there were APIs that
> exposed
> > I/O operations in terms of blocks?
> >
> > can someone explain this to me?
> >
>
> The short answer is that this is well outside our control. We do the
> best we can - split our data files to "our" 8kB pages - and hope that
> the OS / filesystem will do the right thing to map this to blocks at the
> storage level.
>
> The filesystems do the same thing, to some extent - they align stuff
> with respect to the beginning of the partition, but if the partition
> itself is not properly aligned, that won't really work.
>
> As for the APIs, we work with what we have in POSIX - I don't think
> there are any APIs working with blocks, and it's not clear to me how
> would it fundamentally differ from the APIs we have now. Moreover, it's
> not really clear which of the "block" would matter. The postgres 8kB
> page? The filesytem page? The storage block/sector size?
>
> FWIW I think for SSDs this matters way more than for HDD, because SSDs
> have to erase the space before a rewrite, which makes it much more
> expensive. But that's not just about the alignment, but about the page
> size (with smaller pages being better).
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


How do you optimize the disk IO when you cannot assume a file will start at a boundary then?

2024-02-21 Thread Siddharth Jain
Hi All,

I understand the storage layer in databases goes to great lengths to ensure:
- a row does not cross a block boundary
- read/writes/allocation happen in units of blocks
etc. The motivation is that at the OS level, it reads and writes pages
(blocks), not individual bytes. I am only concerned about SSDs but I think
the principle applies to HDD as well.

but how can we do all this when we are not even guaranteed that the
beginning of a file will be aligned with a block boundary? refer this

.

Further, I don't see any APIs exposing I/O operations in terms of blocks.
All File I/O APIs I see expose a file as a randomly accessible contiguous
byte buffer. Would it not have been easier if there were APIs that exposed
I/O operations in terms of blocks?

can someone explain this to me?

Sid


what happens if a failed transaction is not rolled back?

2023-04-24 Thread Siddharth Jain
Hi All,

i understand when writing application code, we should rollback a
transaction that fails to commit. this is typically done in the catch block
of a try-catch exception handler. but what if the developer does not
rollback the transaction? what happens in that case?

note that i am not asking: what happens if a transaction is not rolled back?
i am asking: what happens if a *failed* transaction is not rolled back?

failed transaction = you try to commit it but get an exception back from
the database.

thanks.

S.


Re: Question on SSI - Serializable Snapshot Isolation

2023-04-18 Thread Siddharth Jain
ok thanks tom.

On Mon, Apr 17, 2023 at 2:39 PM Tom Lane  wrote:

> Siddharth Jain  writes:
> > When using SSI <https://wiki.postgresql.org/wiki/SSI>, lets say we have
> two
> > transactions T1 and T2 and there is a serialization conflict. Postgres
> > knows when one or the other transaction is doomed to fail
> > [image: image.png]
>
> Please don't use images for things that could perfectly well be
> expressed as text.  They're not quotable, they may not show up
> in the archives (as this one doesn't), etc etc.  email is a text
> medium, despite Google's attempts to convince you otherwise.
>
> > but will not raise serialization error until the transaction commits.
> This
> > can cause a massive perf hit because the transactions could be long
> > running. Why not raise the error early on when the conflict is detected
> to
> > avoid wasting CPU and other resources? Can anyone explain this to me?
>
> Try src/backend/storage/lmgr/README-SSI, notably this bit:
>
> * This SSI implementation makes an effort to choose the
> transaction to be canceled such that an immediate retry of the
> transaction will not fail due to conflicts with exactly the same
> transactions.  Pursuant to this goal, no transaction is canceled
> until one of the other transactions in the set of conflicts which
> could generate an anomaly has successfully committed.  This is
> conceptually similar to how write conflicts are handled.
>
> The main point here is that "at least one of these transactions
> will have to fail" is very different from "all of these transactions
> have to fail".  If the implementation prematurely forecloses on one
> of them, it may be that *no* useful work gets done because the others
> also fail later on for other reasons; moreover, it might be that the
> victim transaction could have committed after those others failed.
> Withholding judgment about which one to cancel until something has
> committed ensures that more than zero work gets completed.
>
> Also note that AFAICS we do notice fairly promptly once a transaction
> has been marked as doomed; it's not the case that we wait till the
> transaction's own commit to check that.
>
> regards, tom lane
>


Question on SSI - Serializable Snapshot Isolation

2023-04-17 Thread Siddharth Jain
Hi All,

When using SSI , lets say we have two
transactions T1 and T2 and there is a serialization conflict. Postgres
knows when one or the other transaction is doomed to fail
[image: image.png]
but will not raise serialization error until the transaction commits. This
can cause a massive perf hit because the transactions could be long
running. Why not raise the error early on when the conflict is detected to
avoid wasting CPU and other resources? Can anyone explain this to me? Is
this an area that could be improved upon? Thanks.

S.


Transaction Rollback errors

2023-04-11 Thread Siddharth Jain
Hi All,

when my application (Node.js) receives a class 40 error:

Class 40 — Transaction Rollback
4 transaction_rollback
40002 transaction_integrity_constraint_violation
40001 serialization_failure
40003 statement_completion_unknown
40P01 deadlock_detectedthen does it mean PG has already rolled back the tx
and therefore I should not attempt to roll it back again? Thanks,

S.


What permissions are needed to drop a column from a table in postgres?

2023-04-02 Thread Siddharth Jain
Hi All,

What permissions does a user need to be able to drop columns from a table
in postgres? I did read this:
https://www.postgresql.org/docs/14/sql-grant.html
but could not find the answer.

Thanks

S.


Re: Question on creating keys on partitioned tables

2023-03-31 Thread Siddharth Jain
On Fri, Mar 31, 2023 at 9:07 AM Tom Lane  wrote:

> Siddharth Jain  writes:
> > I think the two are equivalent. If not, could you please explain why?
>
> Well, they're formally equivalent if you require there to be only one
> X value per partition (ie, PARTITION BY LIST with only one listed value
> per partition); if there's more, they're not the same thing.
>

Yes. This is what I have. I think I understand now. Is following correct?

when a PK is created on (X,Y) on the parent table what happens internally
is that the command is run individually on each of the child tables.
nothing more. nothing less.



>
> Neither one guarantees that Y is globally unique.  We have no mechanism
> for enforcing uniqueness across partitions except for partition key
> columns.
>
> regards, tom lane
>


Re: Question on creating keys on partitioned tables

2023-03-31 Thread Siddharth Jain
Thanks Laurenz.

I think the two are equivalent. If not, could you please explain why?

On Fri, Mar 31, 2023 at 6:46 AM Laurenz Albe 
wrote:

> On Thu, 2023-03-30 at 17:05 -0700, Siddharth Jain wrote:
> > I have this question. Say I create a partitioned table on column X.
> >
> > Option 1:
> >
> > I add a primary key on (X,Y). Y is another column. Even though Y is a
> globally unique PK (global meaning it is unique across partitions, not just
> in one partition), Postgres does not allow me to
> > create a PK on Y in a partitioned table.
> >
> > Option 2:
> >
> > I add PK on Y on each of the partitions
> >
> > Are these not equivalent? If not, which is better and why?
>
> No, they are not equivalent.
>
> Option 2 comes closer to guaranteeing uniqueness for column X, so use that.
>
> > PS: This is what my best friend had to say:
> >
> > [...] If you are using the "table inheritance" approach [...]
>
> Don't even consider that.  Declarative partitioning is so much better.
>
> Yours,
> Laurenz Albe
>


Question on creating keys on partitioned tables

2023-03-30 Thread Siddharth Jain
Hi All,

I have this question. Say I create a partitioned table on column X.

Option 1:

I add a primary key on (X,Y). Y is another column. Even though Y is a
globally unique PK (global meaning it is unique across partitions, not just
in one partition), Postgres does not allow me to create a PK on Y in a
partitioned table.

Option 2:

I add PK on Y on each of the partitions

Are these not equivalent? If not, which is better and why?

Thanks

S.

PS: This is what my best friend had to say:

In PostgreSQL partitioning, the decision of where to place the primary key
can depend on the specific requirements of the application and the
partitioning strategy being used.


If you are using the "table inheritance" approach to partitioning, where
child tables inherit from a parent table, then the primary key should be
placed on the parent table. This is because the child tables do not have
their own primary key constraints, and their primary key columns are
inherited from the parent table.


On the other hand, if you are using the "declarative partitioning"
approach, where each partition is a separate table defined within a
partitioned table, then the primary key can be placed on either the parent
table or the child tables. However, placing the primary key on the child
tables can improve query performance, as it allows for more efficient
indexing and partition pruning.


In summary, if you are using table inheritance for partitioning, place the
primary key on the parent table. If you are using declarative partitioning,
you can choose to place the primary key on either the parent table or the
child tables, with potential performance benefits for placing it on the
child tables.


Re: could not bind IPv4 address "127.0.0.1": Address already in use

2023-03-08 Thread Siddharth Jain
Thanks Tom. found the problem.

On Tue, Mar 7, 2023 at 8:28 PM Tom Lane  wrote:

> Siddharth Jain  writes:
> > But when I try to start the server I get this:
>
> > 2023-03-07 17:16:43.228 PST [25925] LOG:  could not bind IPv6 address
> > "::1": Address already in use
> > 2023-03-07 17:16:43.228 PST [25925] HINT:  Is another postmaster already
> > running on port 5432? If not, wait a few seconds and retry.
> > 2023-03-07 17:16:43.228 PST [25925] LOG:  could not bind IPv4 address
> > "127.0.0.1": Address already in use
> > 2023-03-07 17:16:43.228 PST [25925] HINT:  Is another postmaster already
> > running on port 5432? If not, wait a few seconds and retry.
>
> That's pretty clear I think: the ports are already bound.
>
> > I ran:
> >   netstat -ln -p tcp | grep 5432
> > and there is no output.
>
> This is the wrong incantation for this purpose.  Try
>
> netstat -an -p tcp | grep 5432
>
> I expect you'll see a couple of ports in LISTEN state.  Unfortunately,
> netstat is not much help at locating the processes listening to such
> ports.  This might be more help:
>
> lsof -n | grep 5432
>
> regards, tom lane
>


could not bind IPv4 address "127.0.0.1": Address already in use

2023-03-07 Thread Siddharth Jain
Hi All,

I am trying to run Postgres 14 on a Mac OS. I installed Postgres from EDB
and ran initdb and it gave me this:

❯ initdb
The files belonging to this database system will be owned by user "xxx".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/pgsql/14/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... America/Los_Angeles
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

pg_ctl -D /var/lib/pgsql/14/data -l logfile start

But when I try to start the server I get this:

❯ pg_ctl -D /var/lib/pgsql/14/data start
waiting for server to start2023-03-07 17:16:43.226 PST [25925] LOG:
 starting PostgreSQL 14.7 on x86_64-apple-darwin20.6.0, compiled by Apple
clang version 12.0.0 (clang-1200.0.32.29), 64-bit
2023-03-07 17:16:43.228 PST [25925] LOG:  could not bind IPv6 address
"::1": Address already in use
2023-03-07 17:16:43.228 PST [25925] HINT:  Is another postmaster already
running on port 5432? If not, wait a few seconds and retry.
2023-03-07 17:16:43.228 PST [25925] LOG:  could not bind IPv4 address
"127.0.0.1": Address already in use
2023-03-07 17:16:43.228 PST [25925] HINT:  Is another postmaster already
running on port 5432? If not, wait a few seconds and retry.
2023-03-07 17:16:43.228 PST [25925] WARNING:  could not create listen
socket for "localhost"
2023-03-07 17:16:43.228 PST [25925] FATAL:  could not create any TCP/IP
sockets
2023-03-07 17:16:43.229 PST [25925] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

I ran:

 ~/ netstat -ln -p tcp | grep 5432

and there is no output. I also tried:

❯ pg_ctl status
pg_ctl: no server running

What is wrong here and how to fix?

S.


Re: How does Postgres store a B-Tree on disk while using the OS file system?

2023-03-06 Thread Siddharth Jain
thanks Christophe. will try to go through it. its bit difficult to grasp.

On Mon, Mar 6, 2023 at 5:08 PM Christophe Pettus  wrote:

>
>
> > On Mar 6, 2023, at 16:24, Siddharth Jain  wrote:
> > My question: How can it then store a B Tree on disk? I would think
> storing a B Tree requires storing disk offset addresses and so on (for a
> node to navigate to another etc.). For this, one would need to write
> directly to the disk using low-level disk access functions and not use file
> system API.
>
> All of PostgreSQL's relations (tables and indexes) are stored in files.
> (They're actually stored in a set of files if it's larger than 1GB, but
> each relation is treated as one logical file.)  The "pointers" in this case
> are just offsets from the start of that file.
>
> There's some additional information here:
>
> https://www.postgresql.org/docs/current/storage.html
>
> and here:
>
> https://www.postgresql.org/docs/current/btree-implementation.html


How does Postgres store a B-Tree on disk while using the OS file system?

2023-03-06 Thread Siddharth Jain
I am trying to sharpen my understanding of Postgres. As I understand,
Postgres does not write directly to disk blocks. It uses the file system
provided by the OS:
https://dba.stackexchange.com/questions/80036/is-there-a-way-to-store-a-postgresql-database-directly-on-a-block-device-not-fi
My question: How can it then store a B Tree on disk? I would think storing
a B Tree requires storing disk offset addresses and so on (for a node to
navigate to another etc.). For this, one would need to write directly to
the disk using low-level disk access functions and not use file system API.


Comparing Postgres logical replication to MySQL

2023-02-26 Thread Siddharth Jain
Hi All,

I am wondering if there is any article comparing Postgres logical
replication to MySQL row based replication?
https://dev.mysql.com/doc/refman/8.0/en/replication-sbr-rbr.html
Are these two equivalent? If not, what are the differences?

S.


Re: Does Postgres 14 have a query cache?

2023-02-18 Thread Siddharth Jain
Thanks all for the replies. Just wanted to confirm.

On Sat, Feb 18, 2023 at 10:45 AM Steven Lembark  wrote:

> On Sat, 18 Feb 2023 12:43:42 -0600
> Ron  wrote:
>
> > > I think the answer is no but wanted to confirm here. this is what
> > > my best friend told me.
>
> There are caches for prepared statements, table rows, indexes.
>
> What about the caches are you interested in?
>
>
> --
> Steven Lembark
> Workhorse Computing
> lemb...@wrkhors.com
> +1 888 359 3508
>
>
>


Re: How to use the BRIN index properly?

2023-02-08 Thread Siddharth Jain
OK so in that case we are left with the B-Tree index.

If the B-Tree index will be so large that it cannot fit in memory, then is
it worth creating it at all? Are there any established patterns here?

On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus  wrote:

>
>
> > On Feb 8, 2023, at 13:17, Siddharth Jain  wrote:
> >
> > As I explained in my question that is indeed our dilemma. Our insertion
> order will not be equal to index order. i.e., referring to your response:
> >
> > > who's data is added in the same order as the key in the BRIN index
> >
> > does NOT hold.
>
> A BRIN index is not a good choice in this case.  You can CLUSTER the data
> on an index, but that's a one-time operation: PostgreSQL will not maintain
> that order after the CLUSTER.  If the number of rows in the table at the
> time of the CLUSTER is much larger than the number that are inserted
> between CLUSTER operations, then a BRIN index might be useful, but
> clustering a very large table is an expensive operation, and requires an
> exclusive lock on the table while it is being done.


Re: How to use the BRIN index properly?

2023-02-08 Thread Siddharth Jain
As I explained in my question that is indeed our dilemma. Our insertion
order will not be equal to index order. i.e., referring to your response:

> who's data is added in the same order as the key in the BRIN index

does NOT hold.

On Wed, Feb 8, 2023 at 12:27 PM Ron  wrote:

> Is the data in your tables stored in natural correlation with those
> *three* columns?  I'm dubious that can even happen.
>
> BRIN is best for *range queries* on tables who's data is added in the
> same order as the key in the BRIN index (for example, a BRIN index on a
> timestamp field in a log table where new records are always being appended
> in "timestamp" order).
>
> It would also be great for history tables where you can pre-sort the data
> by, for example, customer_id, and then put the BRIN on customer_id.
>
> On 2/8/23 13:58, Siddharth Jain wrote:
>
> our insertion order is of course != index order otherwise the question
> would have been trivial.
> we use postgres 14
>
> On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain  wrote:
>
>> Hello,
>>
>> We have large tables with billions of rows in them and want to take
>> advantage of the BRIN index on them.
>>
>> Issues we are facing:
>>
>>- as I understand, BRIN index is useful only if the data is stored in
>>index order. As an example we want to create a composite BRIN index on 3
>>columns - integers and strings (varchar). How can we tell Postgres to 
>> store
>>data in index order as new records are inserted into the database?
>>- i understand that turning on autosummarize will keep the index
>>fresh and up-to-date as new records are inserted. is this correct?
>>
>> Thanks for your help.
>>
>> S.
>>
>
> --
> Born in Arizona, moved to Babylonia.
>


Re: How to use the BRIN index properly?

2023-02-08 Thread Siddharth Jain
our insertion order is of course != index order otherwise the question
would have been trivial.
we use postgres 14

On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain  wrote:

> Hello,
>
> We have large tables with billions of rows in them and want to take
> advantage of the BRIN index on them.
>
> Issues we are facing:
>
>- as I understand, BRIN index is useful only if the data is stored in
>index order. As an example we want to create a composite BRIN index on 3
>columns - integers and strings (varchar). How can we tell Postgres to store
>data in index order as new records are inserted into the database?
>- i understand that turning on autosummarize will keep the index fresh
>and up-to-date as new records are inserted. is this correct?
>
> Thanks for your help.
>
> S.
>


How to use the BRIN index properly?

2023-02-08 Thread Siddharth Jain
Hello,

We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.

Issues we are facing:

   - as I understand, BRIN index is useful only if the data is stored in
   index order. As an example we want to create a composite BRIN index on 3
   columns - integers and strings (varchar). How can we tell Postgres to store
   data in index order as new records are inserted into the database?
   - i understand that turning on autosummarize will keep the index fresh
   and up-to-date as new records are inserted. is this correct?

Thanks for your help.

S.


Re: FATAL: database "xxx" does not exist when it does

2023-02-01 Thread Siddharth Jain
>
>
>
> using Postgres 14.4, I created a database as the postgres user:
>>
>> create database xxx;
>>
>> postgres=> \c xxx
>> psql (13.9 (Debian 13.9-0+deb11u1), server 14.4)
>> WARNING: psql major version 13, server major version 14.
>>  Some psql features might not work.
>> SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits:
>> 256, compression: off)
>> You are now connected to database "xxx" as user "postgres".
>>
>> i gave all privileges on xxx to user sysbench:
>>
>> postgres=> GRANT ALL PRIVILEGES ON DATABASE xxx TO sysbench;
>>
>> i can see:
>>
>> Name |  Owner   | Encoding | Collate |  Ctype  |
>> Access privileges
>>
>> --+--+--+-+-+---
>>  xxx | postgres | UTF8 | C.UTF-8 | C.UTF-8 |
>> =Tc/postgres +
>>   |  |  | | |
>> postgres=CTc/postgres+
>>   |  |  | | |
>> sysbench=CTc/postgres
>>
>> but user sysbench is not able to connect to this database:
>>
>> sysbench=> \c xxx
>> FATAL:  database "xxx" does not exist
>> Previous connection kept
>>
>> how can i fix this?
>>
>


Re: How to avoid having to run the GRANT command for newly added tables?

2022-11-23 Thread Siddharth Jain
Thanks Christophe. it works.

On Wed, Nov 23, 2022 at 12:08 PM Christophe Pettus  wrote:

>
>
> > On Nov 23, 2022, at 12:06, Siddharth Jain  wrote:
> > Is there any way to accomplish what I want? Thanks.
>
> There is:
>
>
> https://www.postgresql.org/docs/current/sql-alterdefaultprivileges.html
>


How to avoid having to run the GRANT command for newly added tables?

2022-11-23 Thread Siddharth Jain
Hello - I have a Postgres 14 server. I created a database and then gave all
privileges to tables in the database to a user as shown below:
grant connect on database testdb to test_user;
GRANT all ON ALL TABLES IN SCHEMA "public" TO test_user;
I now verified that test_user is able to access an EXISTING table:
select count(*) from test;
 count
---
 0
(1 row)
However when the postgres superuser creates a NEW table:
create table test2( id integer primary key);
CREATE TABLE
the test_user is not able to access it:
select count(*) from test2;
ERROR:  permission denied for table test2
This is really inconvenient for me and I'd like to avoid having to run the
GRANT command again and again after creating new tables. Is there any way
to accomplish what I want? Thanks.


Re: Some questions about Postgres

2022-11-06 Thread Siddharth Jain
Thanks Laurenz.

On Fri, Nov 4, 2022 at 1:55 AM Laurenz Albe 
wrote:

> On Thu, 2022-11-03 at 10:42 -0700, Christophe Pettus wrote:
> > > On Nov 3, 2022, at 10:38, Siddharth Jain  wrote:
> > > I read an old article comparing MySQL to Postgres and wanted to get
> answers to following questions.
> >
> > Given your questions, I suspect that you read the technical note from
> Uber about why they switched back
> > to MySQL from PostgreSQL.  There are quite a few responses out there to
> it, including:
> >
> > https://thebuild.com/presentations/uber-perconalive-2017.pdf
>
> This is also insightful:
> https://use-the-index-luke.com/blog/2016-07-29/on-ubers-choice-of-databases
>
> Yours,
> Laurenz Albe
>


Re: Some questions about Postgres

2022-11-03 Thread Siddharth Jain
Thanks Christophe. Yes that is the article and I read the response as well.

On Thu, Nov 3, 2022 at 10:42 AM Christophe Pettus  wrote:

>
>
> > On Nov 3, 2022, at 10:38, Siddharth Jain  wrote:
> > I read an old article comparing MySQL to Postgres and wanted to get
> answers to following questions.
>
> Given your questions, I suspect that you read the technical note from Uber
> about why they switched back to MySQL from PostgreSQL.  There are quite a
> few responses out there to it, including:
>
> https://thebuild.com/presentations/uber-perconalive-2017.pdf
>
> To answer the specific questions:
>
> 1. Yes, all PostgreSQL indexes point directly to tuples on disk.  They do
> not point to a primary key (which a table might not even have).
>
> 2. Yes, each backend connection to PostgreSQL is a separate process.  The
> usual way of mitigating any performance issues this might introduce is to
> use a pooler such as pgbouncer.
>
> 3. This is equivalent to PostgreSQL's shared buffers.


Some questions about Postgres

2022-11-03 Thread Siddharth Jain
Hi all,

I am new to Postgres. I read an old article comparing MySQL to Postgres and
wanted to get answers to following questions. All questions are w.r.t.
Postgres 14:

- do postgres secondary indexes point directly to tuples on disk?
- does postgres use a separate process for each connection?
- does postgres use any application cache similar to innodb_buffer_pool?

Thanks a lot. If you can provide references to documentation, that would be
great.


How to load data from CSV into a table that has array types in its columns?

2022-10-26 Thread Siddharth Jain
Hello,

Given a long list like this:

1,2
2,4 --> 2 appears once
7,9
8,9
5,3
2,5 --> note 2 appears twice

I want to load it into this table:

create table tbl (
  id integer primary key,
  fids integer[]
)

so we will have 2 -> [4,5] where 2 is id and [4,5] are the fids

My actual dataset is very large and has 100M rows in it. How can it be
efficiently loaded into postgres?

I think I have to use a program for this and am trying to use the pg
library that comes with Node.js. I am reading the data in batches of 1M or
100k rows for example. I have created a dictionary in Node.js where I am
storing the hashmap. The part where I am stuck is how to generate the SQL
command?

1. I want to make one call to the server for the batch of 1M rows, not 1M
calls
2. Some keys in the hashmap might already exist in the database. For these
keys we want to append to the array
3. Some keys will not exist and for these we want to insert new rows

Can someone help me please? I am using Postgres for the first time.