Re: OOM Killer kills PostgreSQL

2020-05-20 Thread Stephen Frost
Greetings,

* Piotr Włodarczyk (piotrwlodarczy...@gmail.com) wrote:
> We met unexpected PostgreSQL shutdown. After a little investigation we've
> discovered that problem is in OOM killer which kills our PostgreSQL.

You need to configure your system to not overcommit.

Read up on overcommit_ratio and overcommit_memory Linux settings.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Best partition type for billions of addresses

2020-05-02 Thread Stephen Frost
Greetings,

* Arya F (arya6...@gmail.com) wrote:
> > * Arya F (arya6...@gmail.com) wrote:
> > > I need to store about 600 million rows of property addresses across
> > > multiple counties. I need to have partitioning setup on the table as
> > > there will be updates and inserts performed to the table frequently
> > > and I want the queries to have good performance.
> >
> > That's not what partitioning is for, and 600m rows isn't all *that*
> > many.
> 
> But I have noticed that my updates and inserts have slowed down
> dramatically when I started going over about 20 million rows and the
> reason was because every time it has to update the index. When I
> removed the index, my insert performance stayed good no matter the
> size of the table.

Sure it does.

> So I should be able to achieve good performance with just one
> partition? Maybe I just need to get hardware with more memory?

Instead of jumping to partitioning, I'd suggest you post your actual
table structures, queries, and explain results here and ask for help.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Best partition type for billions of addresses

2020-05-02 Thread Stephen Frost
Greetings,

* Arya F (arya6...@gmail.com) wrote:
> I need to store about 600 million rows of property addresses across
> multiple counties. I need to have partitioning setup on the table as
> there will be updates and inserts performed to the table frequently
> and I want the queries to have good performance.

That's not what partitioning is for, and 600m rows isn't all *that*
many.

> >From what I understand hash partitioning would not be the right
> approach in this case, since for each query PostgreSQL has to check
> the indexes of all partitions?
> 
> Would list partitioning be suitable? if I want PostgreSQL to know
> which partition the row is it can directly load the relevant index
> without having to check other partitions. Should I be including the
> partition key in the where clause?
> 
> I'd like to hear some recommendations on the best way to approach
> this. I'm using PostgreSQL 12

In this case, it sounds like "don't" is probably the best option.

Partitioning is good for data management, particularly when you have
data that "ages out" or should be removed/dropped at some point,
provided your queries use the partition key.  Partitioning doesn't speed
up routine inserts and updates that are using a proper index and only
updating a small set of rows at a time.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: PostgreSQL does not choose my indexes well

2020-04-25 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Turns out to be because what was provided wasn't actually what was being
> > used- there's a domain in there and that seems to gum up the works and
> > make it so we don't consider the partial index as being something we can
> > use (see the discussion at the end of the other sub-thread).
> 
> Some simple experiments here don't find that a domain-type column prevents
> use of the partial index.  So it's still not entirely clear what's
> happening for the OP.  I concur with Jeff's suggestion to try forcing
> use of the desired index, and see whether it happens at all and what
> the cost estimate is.

Once burned, twice shy, I suppose- considering we weren't given the
actual DDL the first round, I'm guessing there's other differences.

> I'm also wondering exactly which Postgres version this is.

Also a good question.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: PostgreSQL does not choose my indexes well

2020-04-24 Thread Stephen Frost
Greetings,

* Jeff Janes (jeff.ja...@gmail.com) wrote:
> In order to read 1409985 / 12839 = 109 rows per buffer page, the table must
> be extraordinarily well clustered on this index.  That degree of clustering
> is going to steal much of the thunder from the index-only scan.  But in my
> hands, it does still prefer the partial index with index-only scan by a
> cost estimate ratio of 3 to 1 (despite it actually being slightly slower)
> so I don't know why you don't get it being used.

Turns out to be because what was provided wasn't actually what was being
used- there's a domain in there and that seems to gum up the works and
make it so we don't consider the partial index as being something we can
use (see the discussion at the end of the other sub-thread).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: PostgreSQL does not choose my indexes well

2020-04-23 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I do wonder if we are maybe missing a bet at times though, considering
> > that I'm pretty sure we'll always go through the index in order, and
> > therefore randomly, even when we don't actually need the results in
> > order..?  Has there been much consideration for just opening an index
> > and sequentially scanning it in cases like this where we have to go
> > through all of the index anyway and don't need the results in order?
> 
> As I recall, it's unsafe to do so because of consistency considerations,
> specifically there's a risk of missing or double-visiting some entries due
> to concurrent index page splits.  VACUUM has some way around that, but it
> doesn't work for regular data-fetching cases.  (nbtree/README has more
> about this, but I don't feel like looking it up for you.)

That README isn't exactly small, but the mention of VACUUM having a
trick there helped me find this:

---
The tricky part of this is to avoid missing any deletable tuples in the
presence of concurrent page splits: a page split could easily move some
tuples from a page not yet passed over by the sequential scan to a
lower-numbered page already passed over.  (This wasn't a concern for the
index-order scan, because splits always split right.)  To implement this,
we provide a "vacuum cycle ID" mechanism that makes it possible to
determine whether a page has been split since the current btbulkdelete
cycle started.  If btbulkdelete finds a page that has been split since
it started, and has a right-link pointing to a lower page number, then
it temporarily suspends its sequential scan and visits that page instead.
It must continue to follow right-links and vacuum dead tuples until
reaching a page that either hasn't been split since btbulkdelete started,
or is above the location of the outer sequential scan.  Then it can resume
the sequential scan.  This ensures that all tuples are visited.
---

So the issue is with a page split happening and a tuple being moved to
an earlier leaf page, resulting in us potentially not seeing it even
though we should have during a sequential scan.  The trick that VACUUM
does seems pretty involved and would be more complicated for use for
this as it's not ok to return the same tuples multiple times (though
perhaps in a BitmapIndexScan we could handle that..).  Then again, maybe
the skipping scan mechanism that's been talked about recently would let
us avoid having to scan the entire index even in cases where the
conditional doesn't include the initial index columns, since it looks
like that might be what we're doing now.

> My guess based on your results is that the OP's table *isn't* all-visible,
> or at least the planner doesn't know it is.

Hrmpf, even then I seem to end up with an IndexOnlyScan-

=# select * from pg_visibility_map('entidad') where all_visible;
blkno | all_visible | all_frozen 
---+-+
(0 rows)

analyze entidad;

=# select relallvisible from pg_class where relname = 'entidad';
 relallvisible 
---
 0
(1 row)

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
  QUERY 
PLAN   
---
 Index Only Scan using idx_entidad_tabla_4 on entidad  (cost=0.43..170908.14 
rows=1657114 width=24) (actual time=0.312..3511.629 rows=1720668 loops=1)
   Heap Fetches: 3441336
   Buffers: shared hit=6444271 read=469499
 Planning Time: 2.831 ms
 Execution Time: 3563.413 ms
(5 rows)

I'm pretty suspicious that they've made some odd planner configuration
changes or something along those lines to end up with the plan they got,
or there's some reason we don't think we can use the partial index.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: PostgreSQL does not choose my indexes well

2020-04-23 Thread Stephen Frost
Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Thu, Apr 23, 2020 at 8:29 AM Tom Lane  wrote:
> > "David G. Johnston"  writes:
> > > On Thursday, April 23, 2020, Thomas Kellerer  wrote:
> > >> Plus: scanning idx_tabla_entidad is more efficient because that index is
> > >> smaller.
> >
> > > Really?  The absence of 33 million rows in the partial index seems like
> > it
> > > would compensate fully and then some for the extra included columns.
> >
> > On the other hand, an indexscan is likely to end up being effectively
> > random-access rather than the purely sequential access involved in
> > a seqscan.
> 
> I feel like I'm missing something as the OP's query is choosing indexscan -
> just it is choosing to scan the full index containing the searched upon
> field instead of a partial index that doesn't contain the field but whose
> predicate matches the where condition - in furtherance of a count(*)
> computation where the columns don't really matter.

The actual query isn't a count(*) though, it's a 'select *'.

> I do get "its going to perform 1.4 million random index entries and heap
> lookup anyway - so it doesn't really matter" - but the first answer was
> "the full index is smaller than the partial" which goes against my
> intuition.

Yeah, I'm pretty sure the full index is quite a bit bigger than the
partial index- see my note from just a moment ago.

> The sequential scan that isn't being used would have to touch 25x the
> number of records - so its non-preference seems reasonable.

Agreed on that.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: PostgreSQL does not choose my indexes well

2020-04-23 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> "David G. Johnston"  writes:
> > On Thursday, April 23, 2020, Thomas Kellerer  wrote:
> >> Plus: scanning idx_tabla_entidad is more efficient because that index is
> >> smaller.
> 
> > Really?  The absence of 33 million rows in the partial index seems like it
> > would compensate fully and then some for the extra included columns.
> 
> On the other hand, an indexscan is likely to end up being effectively
> random-access rather than the purely sequential access involved in
> a seqscan.  

An indexscan is what was chosen though, so this doesn't really seem to
be a question of index scan vs. seq scan, it's a question of why one
index vs. another, though it seems a bit odd that we'd pick a regular
index scan instead of a BitmapHeap/Index scan.

> (If the index was built recently, then it might not be
> so bad --- but the planner doesn't know that, so it assumes that the
> index leaf pages are laid out pretty randomly.)  Moreover, unless the
> table is mostly marked all-visible, there will be another pile of
> randomized accesses into the heap to validate visibility of the index
> entries.

If the table *is* marked all visible, though, then certainly that index
will be better, and I think that's what a lot of this is coming down to
in this particular case.

Populating the tables provided based on the minimal info we got,
minimizing the numbers of pages that 'cod_tabla=4' is on:

insert into tabla select generate_series, 'abcdef' from generate_series(1,20);
insert into entidad select generate_series, 4, generate_series+1 from 
generate_series(1,1409985);
insert into entidad select generate_series+1409985, generate_series % 20 + 1, 
generate_series+1 from generate_series(1,34413354) where generate_series % 20 + 
1 <> 4;
vacuum analyze entidad;

With this, the table is 1.7GB, idx_tabla_entidad is about 700MB, while
idx_entidad_tabla_4 is only 81MB.

With this, on v12-HEAD, PG will happily use the partial index:

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
 QUERY PLAN 
 
-
 Index Only Scan using idx_entidad_tabla_4 on entidad  (cost=0.43..55375.20 
rows=1422085 width=24) (actual time=0.050..144.745 rows=1409985 loops=1)
   Heap Fetches: 0
   Buffers: shared hit=8497
 Planning Time: 0.338 ms
 Execution Time: 183.081 ms
(5 rows)

Dropping that index and then running it again shows:

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
  QUERY PLAN
  
--
 Bitmap Heap Scan on entidad  (cost=26641.72..608515.59 rows=1422085 width=24) 
(actual time=102.844..242.522 rows=1409985 loops=1)
   Recheck Cond: (cod_tabla = 4)
   Heap Blocks: exact=8981
   Buffers: shared read=12838
   ->  Bitmap Index Scan on idx_tabla_entidad  (cost=0.00..26286.20 
rows=1422085 width=0) (actual time=101.969..101.969 rows=1409985 loops=1)
 Index Cond: (cod_tabla = 4)
 Buffers: shared read=3857
 Planning Time: 0.264 ms
 Execution Time: 277.854 ms
(9 rows)

If we spread out where the 'cod_tabla=4' tuples are, the partial index
is still used (note that we end up with more like 1.7M tuples instead of
1.4M, but I don't think that's terribly relevant):

truncate entidad;
insert into entidad select generate_series, generate_series % 20 + 1, 
generate_series+1 from generate_series(1,34413354);

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
 QUERY PLAN 
 
-
 Index Only Scan using idx_entidad_tabla_4 on entidad  (cost=0.43..65231.31 
rows=1664459 width=24) (actual time=0.036..185.171 rows=1720668 loops=1)
   Heap Fetches: 0
   Buffers: shared hit=10375
 Planning Time: 0.247 ms
 Execution Time: 233.205 ms
(5 rows)

Things get a lot worse when we drop that partial index:

drop index idx_entidad_tabla_4;

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
  QUERY PLAN
  
--
 Bitmap Heap Scan on entidad  

Re: Fwd: upgrade to PG11 on secondary fails (no initdb was launched)

2019-06-14 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Jun 14, 2019 at 10:39:40AM -0400, Bruce Momjian wrote:
> > On Fri, Jun 14, 2019 at 03:12:29PM +0200, Fabio Pardi wrote:
> > > Using --size-only, tells rsync to only check the size of the blocks.
> > > That is: if the block is present on the destination, and is the same
> > > size as the origin, then skip.
> > 
> > The files are _exactly_ the same on primary and standby, so we don't
> > need to check anything.  Frankly, it is really only doing hard linking
> > of the files.
> 
> Here is the description from our docs:
> 
>What this does is to record the links created by pg_upgrade's
>link mode that connect files in the old and new clusters on the
>primary server.  It then finds matching files in the standby's old
>cluster and creates links for them in the standby's new cluster.
>Files that were not linked on the primary are copied from the
>primary to the standby.  (They are usually small.)  This provides
>rapid standby upgrades.  Unfortunately, rsync needlessly copies
>files associated with temporary and unlogged tables because these
>files don't normally exist on standby servers.
> 
> The primary and standby have to be binary the same or WAL replay would
> not work on the standby.  (Yes, I sometimes forgot how this worked so I
> wrote it down in the docs.)  :-)

Right- this is *not* a general process for building a replica, this is
specifically *only* for when doing a pg_upgrade and *everything* is shut
down when it runs, and every step is checked to ensure that there are no
errors during the process.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Shortest offline window on database migration

2019-06-04 Thread Stephen Frost
Greetings,

* Haroldo Kerry (hke...@callix.com.br) wrote:
> The bottleneck at dump is CPU (a single one, on a 44 thread server), as we
> are using the -Fc option, that does not allow multiple jobs.
> We tried some time ago to use the --jobs option of pg_dump but it was
> slower, even with more threads. Our guess is the sheer volume of files
> outweighs the processing gains of using a compressed file output. Also
> pg_dump even with multiple jobs spends a lot of time (1h+) on the "reading
> dependency data" section that seems to be single threaded (our application
> is multi-tenant and we use schemas to separate tenant data, hence we have a
> lot of tables).

You might want to reconsider using the separate-schemas-for-tenants
approach.  This isn't the only annoyance you can run into with lots and
lots of tables.  That said, are you using the newer version of pg_dump
(which is what you should be doing when migrating to a newer version of
PG, always)?  We've improved it over time, though I can't recall off-hand
if this particular issue was improved of in-between the releases being
discussed here.  Of course, lots of little files and dealing with them
could drag down performance when working in parallel.  Still a bit
surprised that it's ending up slower than -Fc.

> We are creating the replica using :
> docker exec pg-2 pg_basebackup -h 192.168.0.107 -U replication -P --xlog -D
> /var/lib/postgresql/data_9.6
> and it is taking 1h10m , instead of the 2h I reported initially, because we
> were using rsync with checksums to do it, after experimenting with
> pg_basebackup we found out it is faster, rsync was taking 1h just to
> calculate all checksums. Thanks for your insight on this taking too long.

So, it's a bit awkward still, unfortunately, but you can use pgbackrest
to effectively give you a parallel-replica-build.  The steps are
something like:

Get pgbackrest WAL archiving up and going, with the repo on the
destination server/filesystem, but have 'compress=n' in the
pgbackrest.conf for the repo.

Run: pgbackrest --stanza=mydb --type=full --process-max=8 backup

Once that's done, just do:

mv /path/to/repo/backup/mydb/20190605-12F/pg_data /new/pgdata
chmod -R g-rwx /new/pgdata

Then in /new/pgdata, create a recovery.conf file like:

restore_command = 'pgbackrest --stanza=mydb archive-get %f "%p"'

And start up the DB server.

We have some ideas about how make that whole thing cleaner but the
rewrite into C has delayed our efforts, perhaps once that's done (this
fall), we can look at it.

Of course, you won't have an actual backup of the new database server at
that point yet, so you'll want to clean things up and make that happen
ASAP.  Another option, which is what I usually recommend, is just to
take a new backup (properly) and then do a restore from it, but that'll
obviously take longer since there's two copies being done instead of one
(though you can parallelize to your heart's content, so it can still be
quite fast if you have enough CPU and I/O).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Use Postgres as a column store by creating one table per column

2019-05-24 Thread Stephen Frost
Greetings,

* Lev Kokotov (lev.koko...@gmail.com) wrote:
> Is it efficient to use Postgres as a column store by creating one table per
> column?

Short answer is no, not in a traditional arrangement, anyway.  The tuple
overhead would be extremely painful.  It's possible to improve on that,
but it requires sacrificing what the tuple header gives you- visibility
information, along with some other things.  The question will be if
that's acceptable or not.

> I'm thinking since Postgres stores tables in continuous blocks of 16MB each
> (I think that's the default page size?) I would get efficient reads and
> with parallel queries I could benefit from multiple cores.

The page size in PG is 8k, not 16MB.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Fwd: upgrade to PG11 on secondary fails (no initdb was launched)

2019-05-23 Thread Stephen Frost
Greetings,

Please don't post these kinds of questions to this list, it's not the
right list.

Pick the correct list to use in the future, and don't cross-post to
multiple lists.

This list is specifically for performance issues and questions regarding
PostgreSQL, not about how to upgrade.  For that, I would suggest either
-general OR -admin (not both).

> Any idea how to handle it ? I'm sending it to the performance mail list
> because no one answered it in the admin list ..

This isn't an acceptable reason to forward it to another list.  These
lists have specific purposes and should be used for those purposes.
Further, no one is under any obligation to respond to questions posed to
these lists and any help provided is entirely at the discretion of those
on the list as to if they wish to, and have time to, help, or not.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: PostgreSQL VS MongoDB: a use case comparison

2018-11-20 Thread Stephen Frost
Greetings,

* Merlin Moncure (mmonc...@gmail.com) wrote:
> On Tue, Nov 20, 2018 at 11:28 AM Stephen Frost  wrote:
> > Oh yes, having a dictionary would be a great start to reducing the size
> > of the jsonb data, though it could then become a contention point if
> > there's a lot of new values being inserted and such.  Naturally there
> > would also be a cost to pulling that data back out as well but likely it
> > would be well worth the benefit of not having to store the field names
> > repeatedly.
> 
> Yes, the biggest concern with a shared dictionary ought to be
> concurrency type problems.

Hmmm, I wonder if we could do something like have a dictionary per
page..  Or perhaps based on some hash of the toast ID..  Not sure. :)

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PostgreSQL VS MongoDB: a use case comparison

2018-11-20 Thread Stephen Frost
Greetings,

* Merlin Moncure (mmonc...@gmail.com) wrote:
> On Tue, Nov 20, 2018 at 10:43 AM Stephen Frost  wrote:
> > * Merlin Moncure (mmonc...@gmail.com) wrote:
> > > On Mon, Nov 19, 2018 at 11:26 AM Stephen Frost  wrote:
> > > > Looks like a lot of the difference being seen and the comments made
> > > > about one being faster than the other are because one system is
> > > > compressing *everything*, while PG (quite intentionally...) only
> > > > compresses the data sometimes- once it hits the TOAST limit.  That
> > > > likely also contributes to why you're seeing the on-disk size
> > > > differences that you are.
> > >
> > > Hm.  It may be intentional, but is it ideal?  Employing datum
> > > compression in the 1kb-8kb range with a faster but less compressing
> > > algorithm could give benefits.
> >
> > Well, pglz is actually pretty fast and not as good at compression as
> > other things.  I could certainly see an argument for allowing a column
> > to always be (or at least attempted to be) compressed.
> >
> > There's been a lot of discussion around supporting alternative
> > compression algorithms but making that happen is a pretty big task.
> 
> Yeah; pglz is closer to zlib.  There's much faster stuff out
> there...Andres summed it up pretty well;
> https://www.postgresql.org/message-id/20130605150144.GD28067%40alap2.anarazel.de
> 
> There are also some interesting discussions on jsonb specific
> discussion approaches.

Oh yes, having a dictionary would be a great start to reducing the size
of the jsonb data, though it could then become a contention point if
there's a lot of new values being inserted and such.  Naturally there
would also be a cost to pulling that data back out as well but likely it
would be well worth the benefit of not having to store the field names
repeatedly.

Then again, taken far enough, what you end up with are tables... :)

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PostgreSQL VS MongoDB: a use case comparison

2018-11-20 Thread Stephen Frost
Greetings,

* Merlin Moncure (mmonc...@gmail.com) wrote:
> On Mon, Nov 19, 2018 at 11:26 AM Stephen Frost  wrote:
> > Looks like a lot of the difference being seen and the comments made
> > about one being faster than the other are because one system is
> > compressing *everything*, while PG (quite intentionally...) only
> > compresses the data sometimes- once it hits the TOAST limit.  That
> > likely also contributes to why you're seeing the on-disk size
> > differences that you are.
> 
> Hm.  It may be intentional, but is it ideal?  Employing datum
> compression in the 1kb-8kb range with a faster but less compressing
> algorithm could give benefits.

Well, pglz is actually pretty fast and not as good at compression as
other things.  I could certainly see an argument for allowing a column
to always be (or at least attempted to be) compressed.

There's been a lot of discussion around supporting alternative
compression algorithms but making that happen is a pretty big task.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PostgreSQL VS MongoDB: a use case comparison

2018-11-20 Thread Stephen Frost
Greetings,

* Fabio Pardi (f.pa...@portavita.eu) wrote:
> thanks for your feedback.

We prefer on these mailing lists to not top-post but instead to reply
inline, as I'm doing here.  This helps the conversation by eliminating
unnecessary dialogue and being able to make comments regarding specific
points clearly.

> I agree with you the compression is playing a role in the comparison.
> Probably there is a toll to pay when the load is high and the CPU
> stressed from de/compressing data. If we will be able to bring our
> studies that further, this is definitely something we would like to measure.

I was actually thinking of the compression as having more of an impact
with regard to the 'cold' cases because you're pulling fewer blocks when
it's compressed.  The decompression cost on CPU is typically much, much
less than the cost to pull the data off of the storage medium.  When
things are 'hot' and in cache then it might be interesting to question
if the compression/decompression is worth the cost.

> I also agree with you that at the moment Postgres really shines on
> relational data. To be honest, after seeing the outcome of our research,
> we are actually considering to decouple some (or all) fields from their
> JSON structure. There will be a toll to be payed there too, since we are
> receiving data in JSON format.

PostgreSQL has tools to help with this, you might look into
'json_to_record' and friends.

> And the toll will be in time spent to deliver such a solution, and
> indeed time spent by the engine in doing the conversion. It might not be
> that convenient after all.

Oh, the kind of reduction you'd see in space from both an on-disk and
in-memory footprint would almost certainly be worth the tiny amount of
CPU overhead from this.

> Anyway, to bring data from JSON to a relational model is out of topic
> for the current discussion, since we are actually questioning if
> Postgres is a good replacement for Mongo when handling JSON data.

This narrow viewpoint isn't really sensible though- what you should be
thinking about is what's appropriate for your *data*.  JSON is just a
data format, and while it's alright as a system inter-exchange format,
it's rather terrible as a storage format.

> As per sharing the dataset, as mentioned in the post we are handling
> medical data. Even if the content is anonymized, we are not keen to
> share the data structure too for security reasons.

If you really want people to take your analysis seriously, others must
be able to reproduce your results.  I certainly appreciate that there
are very good reasons that you can't share this actual data, but your
testing could be done with completely generated data which happens to be
similar in structure to your data and have similar frequency of values.

The way to approach generating such a data set would be to aggregate up
the actual data to a point where the appropriate committee/board agree
that it can be shared publicly, and then you build a randomly generated
set of data which aggregates to the same result and then use that for
testing.

> That's a pity I know but i cannot do anything about it.
> The queries we ran and the commands we used are mentioned in the blog
> post but if you see gaps, feel free to ask.

There were a lot of gaps that I saw when I looked through the article-
starting with things like the actual CREATE TABLE command you used, and
the complete size/structure of the JSON object, but really what a paper
like this should include is a full script which creates all the tables,
loads all the data, runs the analysis, calculates the results, etc.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PostgreSQL VS MongoDB: a use case comparison

2018-11-19 Thread Stephen Frost
Greetings,

* Fabio Pardi (f.pa...@portavita.eu) wrote:
> We are open to any kind of feedback and we hope you enjoy the reading.

Looks like a lot of the difference being seen and the comments made
about one being faster than the other are because one system is
compressing *everything*, while PG (quite intentionally...) only
compresses the data sometimes- once it hits the TOAST limit.  That
likely also contributes to why you're seeing the on-disk size
differences that you are.

Of course, if you want to see where PG will really shine, you'd stop
thinking of data as just blobs of JSON and actually define individual
fields in PG instead of just one 'jsonb' column, especially when you
know that field will always exist (which is obviously the case if you're
building an index on it, such as your MarriageDate) and then remove
those fields from the jsonb and just reconstruct the JSON when you
query.  Doing that you'll get the size down dramatically.

And that's without even going to that next-level stuff of actual
normalization where you pull out duplicate data from across the JSON
and have just one instance of that data in another, smaller, table and
use a JOIN to bring it all back together.  Even better is when you
realize that then you only have to update one row in this other table
when something changes in that subset of data, unlike when you
repeatedly store that data in individual JSON entries all across the
system and such a change requires rewriting every single JSON object in
the entire system...

Lastly, as with any performance benchmark, please include full details-
all scripts used, all commands run, all data used, so that others can
reproduce your results.  I'm sure it'd be fun to take your json data and
create actual tables out of it and see what it'd be like then.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: need meta data table/command to find query log

2018-03-06 Thread Stephen Frost
Greetings,

These questions are not appropriate for the 'performance' mailing list
but should be either on 'admin' or 'general'.  Please use the
appropriate list for asking questions in the future.

* Rambabu V (ram.wis...@gmail.com) wrote:
> Please help us to get the query log details from meta data table/command in
> postgresql. aw we are not maintaining log files more than 2 days due to
> lack of space.

It's entirely unclear what you are asking for here when you say "meta
data."  Information about tables is stored in the system catalog,
particularly the "pg_class" and "pg_attribute" tables, but that's
independent from the WAL.  To read the WAL files, you can use pg_waldump
(or pg_xlogdump on older versions), though that's not 'meta' data.

> And also please provide document or sop for database upgrade from 9.3 to
> 9.6, as our database size was 4.5 tb and having table spaces as well. as it
> was production database system we do-not want to take any risk, please help
> us on this as well.

You'll likely want to use pg_upgrade to perform such an upgrade:

https://www.postgresql.org/docs/10/static/pgupgrade.html

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: by mistake dropped physical file dropped for one table.

2018-03-06 Thread Stephen Frost
Greetings,

* Rambabu V (ram.wis...@gmail.com) wrote:
> by mistake one physical file dropped for one of our table, as we do-not
> have backup for this table we are getting below error.
> 
> ERROR:  could not open file "base/12669/16394": No such file or directory
> 
> please help us to recover the table.

You're not likely able to recover that table.  To do so would require
completely stopping the system immediately and attempting to perform
filesystem maniuplation to "undelete" the file, or pull back chunks from
the filesystem which contain pieces of the file and attempting to
reconstruct it.

If you've been keeping all WAL since the beginning of the cluster, it's
possible you could recover that way, but you claim to not have any
backups, so I'm guessing that's pretty unlikely.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Updating large tables without dead tuples

2018-02-23 Thread Stephen Frost
Greetings,

* l...@laurent-hasson.com (l...@laurent-hasson.com) wrote:
> This was done during a maintenance window, and that table is read-only except 
> when we ETL data to it on a weekly basis, and so I was just wondering why I 
> should pay the "bloat" penalty for this type of transaction. Is there a trick 
> that could be use here?

Yes, create a new table and INSERT the data into that table, then swap
the new table into place as the old table.  Another option, if you don't
mind the exclusive lock taken on the table, is to dump the data to
another table, then TRUNCATE the current one and then INSERT into it.

There's other options too, involving triggers and such to allow updates
and other changes to be captured during this process, avoiding the need
to lock the table, but that gets a bit complicated.

> More generally, I suspect that the MVCC architecture is so deep that 
> something like LOCK TABLE, which would guarantee that there won't be 
> contentions, couldn't be used as a heuristic to not create dead tuples? That 
> would make quite a performance improvement for this type of work though.

I'm afraid it wouldn't be quite that simple, particularly you have to
think about what happens when you issue a rollback...

Thanks!

Stephen


signature.asc
Description: PGP signature