from:"Jeff Janes"

Re: [GENERAL] pg_basebackup running from a remote machine

2017-11-15 Thread Jeff Janes

On Tue, Nov 14, 2017 at 8:28 AM, Laurenz Albe 
wrote:

> rakeshkumar464 wrote:
> > If pg_basebackup is run from a remote machine with compress option
> --gzip ,
> > compress level 9,
> > will the compression occur prior to the data being sent on the network or
> > after it has been received
> > at the remote machine.
>
> That only means that the output TAR file will be compressed, it has
> nothing to do with the data transfered from the server.
>
> If you want to compress the data sent over the network, use
> pg_basebackup over an SSL connection with SSL compression enabled.
>

But ssl compression is disabled by default on most systems, and not even
supported at all on many without doing a custom compilation of the ssl
library.

Cheers,

Jeff

Re: [GENERAL] filter records by substring match of an postgresql array column

2017-11-08 Thread Jeff Janes

On Wed, Nov 8, 2017 at 4:28 AM, Arup Rakshit 
wrote:

>
> And to do I wanted to add an index like:
>
> CREATE INDEX trgm_idx_video_tags ON videos USING gist
> ((array_to_string(tags, ', ')) gist_trgm_ops)
>
> But on running this I get an error as:
>
> ERROR:  functions in index expression must be marked IMMUTABLE
>
> How can I fix this?
>
>
wrap array_to_string with text[] argument into a function and mark it as
immutable:

create function txt_array_to_string (text[]) returns text language sql
immutable as $$ select array_to_string($1,', ') $$;

And then build your index on that function.

I don't think there are any caveats on this.  Array_to_string is not
immutable because it can work with dates and numbers, which can change with
configuration settings, such as timezone.  But when given text[] argument,
I think it actually is immutable.

Cheers,

Jeff

Re: [GENERAL] filter records by substring match of an postgresql array column

2017-11-08 Thread Jeff Janes

On Nov 8, 2017 02:34, "Arup Rakshit"  wrote:

Hi,

I do have a videos table, and it has a column called `tags` of type array.
I would like to select all videos where any string inside tag column
matches a given substring. What method should I use? The *Contains `@>`
operator* will do full string comparisons as far as I understood.



The extension parray_gin (
https://pgxn.org/dist/parray_gin/doc/parray_gin.html) offers the @@>
operator.

Cheers,

Jeff

Re: [GENERAL] [HACKERS] SSL and Encryption

2017-11-03 Thread Jeff Janes

On Thu, Nov 2, 2017 at 10:22 PM, John R Pierce <pie...@hogranch.com> wrote:

> On 11/2/2017 10:12 PM, Jeff Janes wrote:
>
>> https://wiki.postgresql.org/wiki/List_of_drivers
>>
>> What is 'python native'?  psycopg works as long you update your libpq.
>>
>
>
> I thought pythonistas preferred using a native driver that didn't use
> libpq ?

I'm not a pythonista more of a python-when-I-must, but psycopg is the
dominant driver I've seen used and discussed.  I had to use a different
driver (zxJDBC, for jython) and basically all advice I found was for
psycopg and not applicable to zxJDBC, which most people apparently had
never even heard of.

Anyway, I would think psycopg would be classed as a native binding, as it
uses libpq which is 'native' to PostgreSQL.

If someone has greater knowledge here, it would be nice to update the wiki
pages with new info.

Talk of what is native or not in pythons now has me thinking of the
Everglades.

Cheers,

Jeff

Re: [GENERAL] [HACKERS] SSL and Encryption

2017-11-02 Thread Jeff Janes

On Thu, Nov 2, 2017 at 9:58 PM, John R Pierce  wrote:

> On 11/2/2017 9:39 PM, Michael Paquier wrote:
>
> The SCRAM discussion is spread across two threads mainly with hundreds
> of emails, which may discourage even the bravest. Here are links to
> the important 
> documentation:https://www.postgresql.org/docs/current/static/auth-methods.html#auth-password
>
> so that says...  ... [scram-sha-256] is the most secure of the currently
> provided methods, but it is not supported by older client libraries
>
> whats the state of the more popular bindings now?   jdbc, python native,
> etc ?
>
https://wiki.postgresql.org/wiki/List_of_drivers

What is 'python native'?  psycopg works as long you update your libpq.

Cheers,

Jeff

Re: [GENERAL] Where to find development builds of pg for windows

2017-10-27 Thread Jeff Janes

On Sat, Oct 14, 2017 at 12:18 AM, legrand legrand <
legrand_legr...@hotmail.com> wrote:

> Hello,
>
> Using PG mainly on windows, I would have liked to test new releases /
> development versions before they are available in
> https://www.postgresql.org/download/windows/
>
> Are there some build farms or places where I could find "build snapshots
> for
> windows" without having to build them by myself ?
>

I don't know of nightlies available for Window.  But you can find
pre-release
(beta and release candidates) binaries here when they are in season:

https://www.enterprisedb.com/products-services-training/pgdevdownload

(As linked from https://www.postgresql.org/download/snapshots/)

But since v10 was just released, it will be a long time before the next
beta (for v11) is out.

Cheers,

Jeff

Re: [GENERAL] Using cp to back up a database?

2017-10-09 Thread Jeff Janes

On Mon, Oct 9, 2017 at 9:41 AM, Ron Johnson <ron.l.john...@cox.net> wrote:

> On 10/09/2017 11:33 AM, Jeff Janes wrote:
>
> On Mon, Oct 9, 2017 at 6:40 AM, Ron Johnson <ron.l.john...@cox.net> wrote:
>
>> Hi,
>>
>> v8.4.20
>>
>> This is what the current backup script uses:
>>
>> /usr/bin/psql -U postgres -c "SELECT pg_start_backup('Incrementalba
>> ckup',true);"
>> cp -r /var/lib/pgsql/data/* $dumpdir/data/
>> /usr/bin/psql -U postgres template1 -c "SELECT pg_stop_backup();"
>>
>>
> That's fine, as long as you have a wal archive.  Although I don't know
> what is "Incremental" about it.
>
>
> From reading the docs, that field is just a label with no intrinsic
> meaning.
>

Its purpose is to communicate with you.  If the person who implemented my
backup solution is attempting to communicate with me, and I have no idea
what that communication means, then that makes me quite nervous.

Cheers,

Jeff

Re: [GENERAL] Using cp to back up a database?

2017-10-09 Thread Jeff Janes

On Mon, Oct 9, 2017 at 7:15 AM, Ron Johnson  wrote:

>
> Sure I want a consistent database.  Why doesn't?
>
> But log shipping requires you to rsync/var/lib/pgsql/data  to the remote
> server, and that's consistent, so why wouldn't rsync to a local directory
> also be consistent?
>

But it isn't consistent by itself.  That is why the log shipping is
required (or an equivalent method of keeping the necessary logs around), to
fix up the consistency.

Cheers,

Jeff

Re: [GENERAL] Using cp to back up a database?

2017-10-09 Thread Jeff Janes

On Mon, Oct 9, 2017 at 6:40 AM, Ron Johnson  wrote:

> Hi,
>
> v8.4.20
>
> This is what the current backup script uses:
>
> /usr/bin/psql -U postgres -c "SELECT pg_start_backup('
> Incrementalbackup',true);"
> cp -r /var/lib/pgsql/data/* $dumpdir/data/
> /usr/bin/psql -U postgres template1 -c "SELECT pg_stop_backup();"
>
>
That's fine, as long as you have a wal archive.  Although I don't know what
is "Incremental" about it.  If you upgrade to a version which wasn't quite
so ancient, you could use pg_basebackup.

>
> Should it use rsync or pg_dump instead?
>

rsync is dangerous if not used with great care, and probably isn't going to
get you much for just doing a backup.

pg_dump is also fine, but it does something different, it creates a logical
backup, not a physical backup.  The backup from pg_dump cannot be used to
seed a PITR or streaming replica.  On the other hand, it can restored into
a database from a different version and different architecture.   And with
pg_dump the pg_start_backup and pg_stop_backup are useless and confusing.

Cheers

Re: [GENERAL] pgcrypto encrypt

2017-09-19 Thread Jeff Janes

On Tue, Sep 19, 2017 at 12:20 PM, Bruce Momjian  wrote:

> On Wed, Sep  6, 2017 at 04:19:52PM -0400, Stephen Cook wrote:
> > Hello!
> >
> > Is there a way to decrypt data encrypted with the pgcrypto "encrypt"
> > function, outside the database? Assuming that I know the key etc...
>
> Yes, I think so.  pgcrypto uses openssl and gpg internally, so using
> those tools should work.
>

I know that pgp_sym_encrypt and pgp_sym_decrypt interoperates well with
"outside the database" gpg, although dealing with armoring and de-armoring
as well as text mode or binary mode is a bit of a bother until you get used
to it.

But he seems to be asking about the "F.25.4. Raw Encryption Functions".  I
wouldn't want to reassure him that it would be easy to make those work
outside the database, without having seen it done.  But it should of course
be possible to make it work, even if that means rearranging the code of
pgcrypto and compiling into something that is standalone.

Cheers,

Jeff

Re: [GENERAL] Configuration of pgaudit settings in postgreSQL.conf causes postgreSQL to fail to start

2017-09-14 Thread Jeff Janes

On Sep 14, 2017 7:07 AM, "Arthur Zakirov"  wrote:

On Wed, Sep 13, 2017 at 02:42:18PM +, Troy Hardin wrote:
> Putting either of these two lines in the .conf file cause it to fail to
start.

Can you show error messages from logs?


And the version.

Re: [GENERAL] contrecord is requested

2017-09-12 Thread Jeff Janes

On Mon, Sep 11, 2017 at 1:27 PM, Scott Marlowe 
wrote:

> So we have a db we're trying to rewind and get synced to the master.
> pg_rewind says it doesn't need rewinding, and when we try to bring it
> up, it gets this error:
>
> "contrecord is requested by 2E7/4028"
>
> And fails to get back up.
>
> Is this a known issue? Possible bug in the continuation record code?
>
> The only references I can find for it are in the xlogreader code.
>

I've seen this twice lately and both times it was user error.

One time someone who shall remain nameless made a replica of a remote QA
server using "pg_basebackup -R ...", but then copied the *.conf files
**including recovery.conf** from the running replica of the the remote
production server into the new directory for the replica of the remote QA
server.  So primary_conninfo had been overwritten to point to the wrong
master server.

The other time someone who shall also remain nameless accidentally fully
opened up a newly cloned (from a cold backup, I think) of an dummy
benchmarking server, instead of putting it into standby.  And then tried to
shut it down and re-open it as a standby without doing a full refresh. But
of course it was too late to do that.

Cheers,

Nameless

Re: [GENERAL] pg_ident mapping Kerberos Usernames

2017-09-11 Thread Jeff Janes

On Sun, Sep 10, 2017 at 4:31 PM,  wrote:

>
> GSSAPI is the authentication mechanism of choice, and it's working fine.
>
> Here is what I'm trying to accomplish.
>
> 'user1' == 'user1' and 'us...@a.domain.tld' == 'user1'.
>
> From reading the docs, this is done via the pg_ident.conf file, and from
> reading the logs, there is a problem with my map.
>
> Hmm... Interesting thought.
> *testing*
> It sort of works. Setting the maps below maps the users straight across.
> 'user1' == 'user1' and 'us...@a.domain.tld' == 'us...@a.domain.tld', so
> it's partially working.
>
> pg_indent.conf:
> testnet /^([0-9A-Za-z_-]+)@A\.DOMAIN\.TLD$ \1
> testnet /^([0-9A-Za-z_-]+)@A\.DOMAIN\.TLD$ \1...@a.domain.tld
>

I think you want this:

testnet/(.*)
\1
testnet/^([0-9A-Za-z_-]+)@A\.DOMAIN\.TLD$   \1
testnet   /^([0-9A-Za-z_-]+)
\1...@a.domain.tld


But since your pg_hba has include_realm=1, I don't know how you are getting
the realmless "system user" names in the first place, so the last line
really shouldn't be necessary.

Cheers,

Jeff

Re: [GENERAL] Needing verification on instructions for streaming replication

2017-09-11 Thread Jeff Janes

On Mon, Sep 11, 2017 at 7:02 AM, Dave Florek 
wrote:

> Hi,
>
> I'm trying to setup streaming replication on a cluster of postgresql
> databases and I followed the instructions outlined here (
> https://wiki.postgresql.org/wiki/Streaming_Replication) but I'm stuck on
> the archive and restore commands. In reading the comments from the archive
> command, WAL archiving needs a shared directory between the primary and
> standby machines in order to replicate. Is this something actually
> required? If so, does postgresql have any recommendations on where this
> shared directory ought to be hosted?
>

You can use streaming replication without any WAL archive at all.  If the
only reason you want a WAL archive is so that the standby can fall behind
the primary without losing the ability to catch back up again, then you can
use a replication slot to make the primary hold onto WAL files until the
standby is done with them.  That is going to be simpler than using a WAL
archive, provided the master has enough room to keep them around should the
standby fail for a while.

If you want a WAL archive for other reasons, then the choice of how to host
it would depend on those reasons.

Cheers,

Jeff

Re: [GENERAL] pg_ident mapping Kerberos Usernames

2017-09-10 Thread Jeff Janes

On Sun, Sep 10, 2017 at 11:25 AM,  wrote:

> On 09/10/2017 02:39 AM, Magnus Hagander wrote:
>
>> On Sat, Sep 9, 2017 at 6:44 PM, > > wrote:
>>
>> Hi,
>>
>> I'm trying to get pg_ident to map "user1" and "us...@a.domain.tld"
>> to "user1" in postgres, or
>> vice versa. I'm not picky about which way works.
>>
>> Kerberos authentication works. I've gotten "user1" to login
>> successfully with a Kerberos ticket,
>> but I'm not able to get "us...@a.domain.tld" to match.
>>
>> Environment:
>> * PostgreSQL 9.6 from PostgreSQL repos
>> * CentOS 7
>> * FreeIPA for Kerberos, LDAP, etc.
>> * Realm A.DOMAIN.TLD
>> * "user1" database exists
>> * "user1" role exists
>> * Logging into CentOS usernames are configured to drop the domain, so
>> they appear as "user1"
>> rather then "us...@a.domain.tld".
>>
>>
>> pg_hba.conf:
>>
>> local   all postgrespeer
>> hostall all 127.0.0.1/32 
>>   md5
>> hostall all ::1/128 md5
>> hostall all 192.168.1.0/24 
>> gss include_realm=1
>> map=testnet krb_realm=A.DOMAIN.TLD #This is on one line. Thunderbird
>> is truncating lines.
>>
>>
>> pg_ident.conf:
>>
>> testnet/^([0-9A-Za-z_-]+)@A\.DOMAIN\.TLD$\1
>> testnet/^([0-9A-Za-z_-]+)$ \1
>>
>>
>> Regex that works for both in regexr.com :
>>
>> /^([0-9A-Za-z-_]+)(@A\.DOMAIN\.TLD)?$/gm
>>
>>
>> Command and lines from pg_log:
>>
>> $ psql -h db0 # Logged in as user1 with Kerberos ticket
>>
>> < 2017-09-09 19:50:49.376 CDT - 192.168.1.201 [unknown] > LOG:
>> connection received:
>> host=192.168.1.201 port=44918
>> < 2017-09-09 19:50:49.398 CDT - 192.168.1.201 user1 > LOG:
>> connection authorized: user=user1
>> database=user1
>> < 2017-09-09 19:50:50.912 CDT - 192.168.1.201 user1 > LOG:
>> disconnection: session time:
>> 0:00:01.537 user=user1 database=user1 host=192.168.1.201 port=44918
>>
>> $ psql -h db0 -U us...@a.domain.tld # Logged in as user1 with
>> Kerberos ticket
>>
>> < 2017-09-09 19:50:54.959 CDT - 192.168.1.201 [unknown] > LOG:
>> connection received:
>> host=192.168.1.201 port=44920
>> < 2017-09-09 19:50:55.023 CDT - 192.168.1.201 us...@a.domain.tld >
>> LOG: no match in usermap
>> "testnet" for user "us...@a.domain.tld" authenticated as
>> "us...@a.domain.tld"
>> < 2017-09-09 19:50:55.023 CDT - 192.168.1.201 us...@a.domain.tld >
>> FATAL:  GSSAPI authentication
>> failed for user "us...@a.domain.tld"
>> < 2017-09-09 19:50:55.023 CDT - 192.168.1.201 us...@a.domain.tld >
>> DETAIL:  Connection matched
>> pg_hba.conf line 87: "host   all
>>  all 192.168.1.0/24   gss
>> include_realm=1 map=testnet
>> krb_realm=A.DOMAIN.TLD"
>>
>>
>> Is this something that is possible, or is it something where I need
>> to pick one way to do it?
>>
>>
>> This looks like you are trying to connect with the actual username
>> user1¡A.DOMAIN.TLD. pg_ident only sets what you are allowed to log in as,
>> not what it will attempt.
>>
>> If you are using psql, you are probably doing something like "psql -h
>> myserver". You need to add the user, so "psql -h myserver -U user1", to
>> instruct it of which username to actually use for the login.
>>
>> --
>>   Magnus Hagander
>>   Me: https://www.hagander.net/ 
>>   Work: https://www.redpill-linpro.com/ 
>>
>
> Hi Magnus,
>
> Yes, the system username is "user1", per the default ipa-client-install
> SSSD setup, and the map is working for that. Without the map, I have to
> specify the full Kerberos username, u...@domain.tld, in the psql command.
>
> Works with map:
>
> $ psql -h db0 #Implied -U user1 -d user1
> $ psql -h db0 -U user1 -d user1
>
> Does not work with map:
>
> $ psql -h db0 -U us...@a.domain.tld -d user1
>

If you want that to work with the map, then you need to change the map to
add the domain, rather than removing it, which is what you currently do.

But it is hard to figure out what it is you actually want.  You listed some
cases that work and some that don't, but haven't said which ones you want
to work and which you want not to work.  (Presumably if you want **all**
cases to work, you would just use 'trust' and be done with it.)

Re: [GENERAL] Performance with high correlation in group by on PK

2017-09-07 Thread Jeff Janes

On Tue, Aug 29, 2017 at 1:20 AM, Alban Hertroys <haram...@gmail.com> wrote:

> On 28 August 2017 at 21:32, Jeff Janes <jeff.ja...@gmail.com> wrote:
> > On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys <haram...@gmail.com>
> wrote:
> >>
> >> Hi all,
> >>
> >> It's been a while since I actually got to use PG for anything serious,
> >> but we're finally doing some experimentation @work now to see if it is
> >> suitable for our datawarehouse. So far it's been doing well, but there
> >> is a particular type of query I run into that I expect we will
> >> frequently use and that's choosing a sequential scan - and I can't
> >> fathom why.
> >>
> >> This is on:
> >>
> >>
> >> The query in question is:
> >> select "VBAK_MANDT", max("VBAK_VBELN")
> >>   from staging.etl1_vbak
> >>  group by "VBAK_MANDT";
> >>
> >> This is the header-table for another detail table, and in this case
> >> we're already seeing a seqscan. The thing is, there are 15M rows in
> >> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
> >> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
> >> we only have 1 at the moment!).
> >
> >
> > You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't
> > currently detect and implement them automatically, but you can use a
> > recursive CTE to get it to work.  There are some examples at
> > https://wiki.postgresql.org/wiki/Loose_indexscan
>
> Thanks Jeff, that's an interesting approach. It looks very similar to
> correlated subqueries.
>
> Unfortunately, it doesn't seem to help with my issue. The CTE is
> indeed fast, but when querying the results from the 2nd level ov the
> PK with the CTE results, I'm back at a seqscan on pdw2_vbak again.
>

Something like this works:

create table foo as select trunc(random()*5) as col1, random() as col2 from
generate_series(1,1);
create index on foo (col1, col2);
vacuum analyze foo;


with recursive t as (
   select * from (select col1, col2 from foo order by col1 desc, col2 desc
limit 1) asdfsaf
union all
  select
 (select col1 from foo where foo.col1 < t.col1 order by col1 desc, col2
desc limit 1) as col1,
 (select col2 from foo where foo.col1 < t.col1 order by col1 desc, col2
desc limit 1) as col2
   from t where t.col1 is not null
)
select * from t where t is not null;

It is pretty ugly that you need one subquery in the select list for each
column to be returned.  Maybe someone can find a way to avoid that part.  I
tried using lateral joins to get around it, but couldn't make that work.

Cheers,

Jeff

Re: [GENERAL] Confused about max_standby_streaming_delay

2017-09-07 Thread Jeff Janes

On Thu, Sep 7, 2017 at 1:16 AM, Robert Inder <rob...@interactive.co.uk>
wrote:

>
>
> On 6 September 2017 at 20:47, Jeff Janes <jeff.ja...@gmail.com> wrote:
>
>>
>>> Have I misunderstood something?  Or is Postgres not actually configured
>>> the way I think it is?
>>>
>>
>> The standby will wait for ten minutes to obtain the lock it wishes to
>> obtain.  In 9.4, if something other than dump of database b was already
>> blocking it for 8 minutes before the dump starts, then the dump of database
>> b will only have 2 minutes, not 10, before it gets cancelled.
>>
>
> H...
> You're saying that the time for dumping database b may be spent 8 minutes
> waiting on a lock then 2 minutes actually dumping.
>

No, I'm saying that maybe the replay process was already waiting for
something else for 8 minutes before the pg_dump of database b even
attempted to start.  So it would be cancelled after 2 minutes.

Are these database a, database b, etc. different databases in the same
postgres instance (CREATE DATABASE A) or are they entirely different
postgres instances (initdb -D /opt/database_a)?  Your original description
of different unix users with different installations made me think the 2nd
case is the one, but just want to make sure.

Cheers,

Jeff

Re: [GENERAL] Confused about max_standby_streaming_delay

2017-09-06 Thread Jeff Janes

On Wed, Sep 6, 2017 at 9:43 AM, Robert Inder 
wrote:

...

> And I've read that the answer to this is to set
> max_standby_streaming_delay in postgresql94.conf.
> So I've set it to "600s" -- ten minutes.
>
> I thought this would mean that when there was a conflict with an update
> from the live server, Postgres would give the dump 10 minutes "grace" in
> which to finish before killing it.
>
> Ten minutes may or may not be enough.  But in a case where it isn't
> enough, and the dump is abandonned, I would expect to see something like
>
> the dump of database_a finishing at 5 minutes past the hour,
> the dump of database_b
>   starting after the dump of database_a,
>   having a conflict,
>   being given 10 minutes to complete, and then
>   being abandonned
> the dump of database_c starting after the dump of database_b and
> finishing (say) 3 minutes later
>
> So the dump of database_c should finish at around 18 minutes past the hour.
>
> BUT that is not what I am seeing.
> On occasions where the dump of database_b is being abandonned, the
> successful dump of
> database_c is timestamped less than 10 minutes after the dump of database_a
>
> Which does not fit with the dump of database_b being given 10 minutes in
> which to finish
>
> Have I misunderstood something?  Or is Postgres not actually configured
> the way I think it is?
>

The standby will wait for ten minutes to obtain the lock it wishes to
obtain.  In 9.4, if something other than dump of database b was already
blocking it for 8 minutes before the dump starts, then the dump of database
b will only have 2 minutes, not 10, before it gets cancelled.  So, are
there any long running jobs in database b other than the pg_dump?

Cheers,

Jeff

Re: [GENERAL] Performance with high correlation in group by on PK

2017-08-28 Thread Jeff Janes

On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys  wrote:

> Hi all,
>
> It's been a while since I actually got to use PG for anything serious,
> but we're finally doing some experimentation @work now to see if it is
> suitable for our datawarehouse. So far it's been doing well, but there
> is a particular type of query I run into that I expect we will
> frequently use and that's choosing a sequential scan - and I can't
> fathom why.
>
> This is on:
>
>
> The query in question is:
> select "VBAK_MANDT", max("VBAK_VBELN")
>   from staging.etl1_vbak
>  group by "VBAK_MANDT";
>
> This is the header-table for another detail table, and in this case
> we're already seeing a seqscan. The thing is, there are 15M rows in
> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
> we only have 1 at the moment!).
>

You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't
currently detect and implement them automatically, but you can use a
recursive CTE to get it to work.  There are some examples at
https://wiki.postgresql.org/wiki/Loose_indexscan

Cheers,

Jeff

Re: [GENERAL] install the oracle data wrapper extension

2017-08-22 Thread Jeff Janes

On Tue, Aug 22, 2017 at 2:47 PM, Peter Koukoulis 
wrote:

>
> Hi
>
> Is there a reliable way to create a database link from PosgreSQL 9.6 to
> Oracle 11g?
> I am running 9.6 on Linux 64 bit.
> I can connect to the Oracle database with sqlplus from the PostgreSQL
> server.
>
> Also, I have  in attempted to install the oracle_fdw wrapper as an
> alternatve, but I keep getting the following error:
>
> ft_node=# create extension oracle_fdw;
> ERROR:  could not open extension control file 
> "/usr/share/postgresql/9.6/extension/oracle_fdw.control":
> No such file or directory
> ft_node=# \conninfo
> You are connected to database "ft_node" as user "postgres" on host
> "localhost" at port "5432".
> SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384,
> bits: 256, compression: off)
>
> Any suggestions?
>

What steps did you take to install oracle_fdw?

Cheers,

Jeff

Re: [GENERAL] A question on GIN indexes and arrays

2017-08-20 Thread Jeff Janes

On Sun, Aug 20, 2017 at 1:28 PM, Wells Oliver 
wrote:

>
> Why is this happening and what can I do to get my GIN indexes working?
> Thanks!
>
>
What extensions do you have installed in each database?  I bet one of them
(like intarray) redefines @> for one of your databases.

Try fully qualifying the operator.  OPERATOR(pg_catalog.@>)

Cheers,

Jeff

Re: [GENERAL] How to delete default privileges

2017-08-15 Thread Jeff Janes

On Tue, Aug 15, 2017 at 3:02 PM, Francisco Reyes  wrote:

> I have a DB where we changed ownership of all objects.
>
> We had:
>
> ALTER DEFAULT PRIVILEGES FOR ROLE old_dbowner GRANT SELECT ON tablesTO
> dbgroup_ro_group;
> ALTER DEFAULT PRIVILEGES FOR ROLE old_dbowner GRANT SELECT ON sequences TO
> dbgroup_ro_group;
>
> ALTER DEFAULT PRIVILEGES FOR ROLE old_dbowner GRANT SELECT, UPDATE,
> DELETE, INSERT ON tablesTO dbgroup_rw_group;
> ALTER DEFAULT PRIVILEGES FOR ROLE old_dbowner GRANT SELECT, USAGE
> ON sequences TO dbgroup_rw_group;
> ALTER DEFAULT PRIVILEGES FOR ROLE old_dbowner GRANT EXECUTE
> ON functions TO dbgroup_rw_group;
>
>
> But now there is a new DB owner and I have run the above, but with
> new_dbowner. How can I delete the old default grants? Can't find examples
> and don't see it on the documentation either.
>
>
> Any help would be greatly appreciated.


You just reverse them, changing GRANT...TO into REVOKE...FROM.  When the
altered-default is the same thing as the default-default, then the
altered-default disappears.

Cheers,

Jeff

Re: [GENERAL] How to make server generate more output?

2017-08-11 Thread Jeff Janes

On Fri, Aug 11, 2017 at 1:14 PM, Rui Pacheco  wrote:

> Hello,
>
> I know this is a bit vague but I’m looking for a configuration
> parameter/startup switch that once set or enabled would make Postgresql
> return more data that normal. Specifically the wire protocol would return
> more notification messages and such. I remember seeing this while
> developing an implementation of the wire protocol.
>
> Does anyone know what this could be?
>

client_min_messages ?

Re: [GENERAL] hot standby questions

2017-08-03 Thread Jeff Janes

On Wed, Aug 2, 2017 at 8:19 PM, armand pirvu  wrote:

>
> Hi
>
> Just trying to put together the hot_standby setup
> All docs I read are pointing to use as prefered method to use
> pg_basebackup to set the base
> So far so good
> But
>
> psql postgres -c "select pg_start_backup('backup')"
> pg_basebackup -D /var/lib/pgsql/sample  -Ft -z -P
> psql postgres -c "select pg_stop_backup()"
>

pg_basebackup does the equivalent of pg_start_backup and pg_stop_backup for
you.  It is not helpful, and might even sometimes be harmful, to do them
yourself when using pg_basebackup.

>
> Pretty much every where I looked at -x is not mentioned to be used
>
> So what gives ? What did I miss ? It's gotta be soomething
>

That certainly isn't my experience.  If you find sites that don't mention
-x, -X, or --xlog-method, then I would be reluctant to take any of that
site's other advice seriously.

But note that in version 10, -x will go away and the default will be
changed so that not specifying anything will be the same as -X stream.
 perhaps you are reading advice aimed at a future version.

Cheers,

Jeff

Re: [GENERAL] select md5 result set

2017-08-03 Thread Jeff Janes

On Wed, Aug 2, 2017 at 4:25 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, Aug 2, 2017 at 3:42 PM, Peter Koukoulis 
> wrote:
>
>>
>> SQL> select dbms_sqlhash.gethash('select x,y from test1',2) as md5_value
>> from dual;
>>
>> MD5_VALUE
>> 
>> 
>> 9FDA7FA725B783172CA371DA04AD5754
>>
>>
>> Can I do something similar in PostgreSQL ?
>>
>>
> Similar.
>
>  SELECT md5(string_agg(vals::text, ''))
>  FROM ( VALUES (1, 2), (2, 3) ) vals (x, y)
>
>
>
That is going to build up the entire string in memory, so will fail if the
text representation of the entire table doesn't fit in 1GB.

I don't see any feature in PostgreSQL for calculating hashes over streaming
data.  But it wouldn't be too hard to create something in plperl, for
example, to do that.  You would have to make sure the query always returns
rows in the same order (I don't know if Oracle's function handles that for
you) and that things like datestyle and timezone don't cause differences.

You could use something like:

\copy (select * from blah order by something) to program 'md5sum' binary

but I don't know how you would get the output back into your program once
it shows up on your screen.

Cheers,

Jeff

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-08-01 Thread Jeff Janes

On Tue, Aug 1, 2017 at 9:24 AM, Dmitry Lazurkin <dila...@gmail.com> wrote:

> On 08/01/2017 07:13 PM, Jeff Janes wrote:
>
> I think that HashSet is a Java-specific term.  It is just a hash table in
> which there is no data to store, just the key itself (and probably a cash
> of the hashcode of that key), correct?
>
>
> Yes. And in Java HashSet implemented on top of HashMap (:
>
> I think a more general solution would be to get the planner and executor
> to run the in-list query using the Hash Join, the same way it runs the
> in-VALUES one.
>
>
> Have additional plan nodes big overhead?
>

I don't think a new plan node will have meaningfully more overhead than new
code of equal generality and robustness which has just been bolted on to
the side of an existing node.  I don't know how to test that, though.  If
the existing hash code is slow, it would probably be better to spend time
improving it for everyone than coming up with yet another implementation.
I know the existing simplehash.h code as specialized for tuples in
src/backend/executor/execGrouping.c
is horribly inefficient with memory usage when the tuples are skinny, but
that shouldn't be a problem for the number of values likely to be present
in a hard-coded in-list.

>
>
> I was impressed at how well the JSON and hstore worked, you might want to
> look at how they do it.  It is must be using an internal hash table of some
> sort.
>
> JSONB and HSTORE keep sorted pairs and use binary search.
>
Ah.  that would be another way to approach it.  How many types have btree
ordering functions without hashing functions, or the reverse?

Cheers,

Jeff

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-08-01 Thread Jeff Janes

On Mon, Jul 31, 2017 at 12:29 PM, Dmitry Lazurkin <dila...@gmail.com> wrote:

> On 31.07.2017 19:42, Jeff Janes wrote:
>
> I think it is simply because no one has gotten around to implementing it
> that way.  When you can just write it as a values list instead, the
> incentive to make the regular in-list work better is not all that strong.
>
> Cheers,
>
> Jeff
>
>
> I see from explain that IN-clause uses just array with function ANY. I
> think for efficient implementation of this task I should implement new
> datatype "hashset". Am I wrong?
>

I think that HashSet is a Java-specific term.  It is just a hash table in
which there is no data to store, just the key itself (and probably a cash
of the hashcode of that key), correct?  PostgreSQL already has a template
for in-memory hash tables, src/include/lib/simplehash.h (and also one for
possibly-shared in-memory tables, src/backend/utils/hash/dynahash.c) , and
you should be able to specialize it for the case there there is no data
associated with the key.  I think the harder part would be to get the
planner to use the hash table you implement.  You would also have to
include code to fall back onto the array scanning for data types which do
not have a hash method defined.

I think a more general solution would be to get the planner and executor to
run the in-list query using the Hash Join, the same way it runs the
in-VALUES one.

I was impressed at how well the JSON and hstore worked, you might want to
look at how they do it.  It is must be using an internal hash table of some
sort.  But those only support strings as keys, while the in-list has to
support every data type, including user-defined-ones, so they have more
opportunities for optimization.

Cheers,

Jeff

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-31 Thread Jeff Janes

On Mon, Jul 24, 2017 at 8:03 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Mon, Jul 24, 2017 at 7:58 PM, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane  wrote:
>>
>>>
>>> The cost to form the inner hash is basically negligible whether it's
>>> de-duped or not, but if it's not (known) de-duped then the cost
>>> estimate for the semijoin is going to rise some, and that discourages
>>> selecting it.
>>>
>>
>> Why does the "hash semi join" care about duplication of values on the
>> inner relation?  Doesn't it only care whether a given bucket exists
>> irrespective of its contents?
>>
>
> Rather, it cares about the contents is-so-far as confirming that at least
> one of the tuples in the bucket indeed has the same joining value as the
> outer relation (lost track of the fact that two values can share the same
> hash).  But once it finds one it can move onto the new outer relation tuple
> while an inner join would have to spend more time looking for additional
> matches.
>

What I gathered from the code comments from the last time I dug into
something like this, the main reason to try to de-dup and then use a join,
rather than a semi-join, is that doing it that way allows us to swap the
order of the rels in the join, which then opens other avenues for
optimization.  For example, "A join (B semijoin C)" could become "A join (B
join dedupC)" which could become "(dedupC join A) join B".  But if we don't
actually adopt the swap, then it does seem like it should retain the
semi-join.   Why continue digging through the hash collision chain lookin
for key collisions that can't exist? I don't know, maybe there are some
bits set that make it still do semi-join, just doesn't present itself as
such?

Cheers,

Jeff

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-31 Thread Jeff Janes

On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane  wr
>
>
> regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id
> IN
> :values_clause;
> QUERY PLAN
> 
> ---
>  Aggregate  (cost=245006.16..245006.17 rows=1 width=8) (actual
> time=3550.581..3550.581 rows=1 loops=1)
>  Execution time: 3550.700 ms
>
>


>
> regression=# set enable_hashagg TO 0;
> regression=# set enable_sort TO 0;
> SET
> regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id
> IN
> :values_clause;
>   QUERY PLAN
> 
> ---
>  Aggregate  (cost=320003.90..320003.91 rows=1 width=8) (actual
> time=3548.364..3548.364 rows=1 loops=1)
>  Execution time: 3548.463 ms
>
>


> At least in this example, the actual runtimes are basically identical
> regardless, so there is no great point in sweating over it.
>


Since The run times are equal, but one is estimated to be 30% more
expensive, I think there is at least some little reason to sweat over it.

Incidentally, I accidentally ran this against a server running with your
patch from
https://www.postgresql.org/message-id/10078.1471955...@sss.pgh.pa.us.  On
that server, it did choose the semi-join.  But I have no idea why, as it
seems like the effect of that patch would have been to change the distinct
estimate from the magic hard-coded 200, to the natural 200 coming from the
query itself.  Why would that affect the cost?

Cheers,

Jeff

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-31 Thread Jeff Janes

On Tue, Jul 25, 2017 at 2:03 AM, Dmitry Lazurkin <dila...@gmail.com> wrote:

> On 25.07.2017 05:50, Jeff Janes wrote:
>
>> It isn't either-or.  It is the processing of millions of rows over the
>> large in-list which is taking the time. Processing an in-list as a hash
>> table would be great, but no one has gotten around to it implementing it
>> yet.  Maybe Dmitry will be the one to do that.
>>
>
> Thanks. Yes, I want. But... Is this task too complex for novice?
>

Yes, it is probably too complex for a novice.  On the other hand, you might
not be a novice anymore once you complete it!

Cheers,

Jeff

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-31 Thread Jeff Janes

On Mon, Jul 24, 2017 at 3:12 PM, Dmitry Lazurkin  wrote:

> On 25.07.2017 00:31, David G. Johnston wrote:
>
>
> Basically you want to write something like:
>
> SELECT *
> FROM ids
> JOIN ( :values_clause ) vc (vid) ON (vc.vid = ids.id)
>
> or
>
> WITH vc AS (SELECT vid FROM  ORDER BY ... LIMIT )
> SELECT *
> FROM ids
> JOIN vc ON (vid = ids.id)
>
>
> This query uses JOIN plan node as IN (VALUES ...).
>
> And I have one question. I don't understand why IN-VALUES doesn't use
> Semi-Join? PostgreSQL has Hash Semi-Join...  For which task the database
> has node of this type?
>
>
I think it is simply because no one has gotten around to implementing it
that way.  When you can just write it as a values list instead, the
incentive to make the regular in-list work better is not all that strong.

Cheers,

Jeff

Re: [GENERAL] Index Only Scan and Heap Fetches

2017-07-27 Thread Jeff Janes

On Tue, Jul 18, 2017 at 7:21 AM, Mikhail  wrote:

> Hi guys,
>
> I'm running the process, that executes "select * from sr where sr.id=210
> for update;", then some calculations and finally "update sr set usage =
>  where sr.id = 210;". That operation is done in a loop.
>
> In parallel session i'm running the query:
>
> test=# explain (analyze, buffers) select id from sr where id = 210;
> QUERY PLAN
>
> 
> --
> 
> Index Only Scan using sr_pk on sr (cost=0.57..8.59 rows=1 width=4) (actual
> time=0.018..1.172 rows=1 loops=1)
>Index Cond: (id = 210)
>Heap Fetches: 10
>Buffers: shared hit=592
> Planning time: 0.057 ms
> Execution time: 1.183 ms
> Running that several times I can see, that the number of "Heap Fetches" is
> varying in some range (from 1 to ~80-100), sequentaly growing till
> ~(80-100) than starting from 1.
> Considering that the autovacuum process is turned off (for research
> purposes only :) ), I was expecting the infinite growth of Heap
> Fetches since no cleaning of dead rows or visibility map support occurs.
>
> Can someone explain, what else can decrease the number of heap access
> needed to check the rows visibility?
>

Btree indexes have a micro-vacuum feature.  If you visit a heap tuple based
on reference from an index tuple, and find that the heap tuple is
dead-to-all, then when you get back to the index you can kill that index's
reference to the heap tuple. Future accesses via that same index for the
same tuple then no longer need to visit the heap.

Cheers,

Jeff

Re: [GENERAL] Monitoring of a hot standby with a largely idle master

2017-07-27 Thread Jeff Janes

On Wed, Jul 26, 2017 at 1:44 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Mon, Jul 24, 2017 at 9:08 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> > On Sun, Jul 16, 2017 at 8:47 AM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >> What do you think about the patch attached?
> >
> > Looks OK.  Should it mention specifically "On a hot standby" rather than
> "On
> > a standby"?  Otherwise people might be left confused on how they are
> > supposed to do this on a generic standby.  It is the kind of thing which
> is
> > obvious once you know it, but confusing the first time you encounter it.
>
> Yes, right. Let's update as you suggest.
>

new version looks good.

Thanks,

Jeff

Re: [GENERAL] Indexes being ignored after upgrade to 9.5

2017-07-27 Thread Jeff Janes

On Tue, Jul 25, 2017 at 10:34 PM, Nick Brennan  wrote:

> Hi,
>
> We have recently promoted our Prod DB slave (2TB) to migrate to new
> hardware, and upgraded from v9.2.9.21 to 9.5.1.6 using pg_upgrade.
>
>
> The upgrade went without incident and we have been running for a week, but
> the optimizer is ignoring indexes on 2 of our largest partitioned tables
> causing very slow response times.
>
>
> The indexes are Btree indexes on BIGINT columns, which the optimizer used
> to return queries with ms response times on 9.2. Post-upgrade the queries
> sequential scan and do not use indexes unless we force them.
>

Can you show the explain (analyze) plans for both forcing and non-forcing?
And with both 9.5 and the old 9.2, if that is still available.


>
> We've added duplicate indexes and analyzing, however the new indexes are
> still ignored unless we force using enable_seqscan=no or reduce
> random_page_cost to 2. The query response times using the new indexes are
> still as slow when we do this.
>
Still as slow as what?  As slow as when you use the seq scan, or as slow as
when you used index scans back under 9.2, or as slow as the the
non-duplicate indexes were?

Cheers,

Jeff

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread Jeff Janes

On Jul 24, 2017 14:19, "PT"  wrote:

On Mon, 24 Jul 2017 13:17:56 +0300
Dmitry Lazurkin  wrote:

> On 07/24/2017 01:40 AM, PT wrote:
> > In this example you count approximately 40,000,000 values, which is
> > about 40% of the table.
>
> 4 000 000 (:
>
> > If you really need these queries to be faster, I would suggest
> > materializing the data, i.e. create a table like:
> >
> > CREATE TABLE id_counts (
> >  id BIGINT PRIMARY KEY,
> >  num BIGINT
> > )
> >
> > Then use a trigger or similar technique to keep id_counts in sync
> > with the id table. You can then run queries of the form:
> >
> > SELECT sum(num) FROM id_counts WHERE id IN :values:
> >
> > which I would wager houseboats will be significantly faster.
> I use count only for example because it uses seqscan. I want optimize
> IN-clause ;-).

The IN clause is not what's taking all the time. It's the processing of
millions of rows that's taking all the time.

It isn't either-or.  It is the processing of millions of rows over the
large in-list which is taking the time. Processing an in-list as a hash
table would be great, but no one has gotten around to it implementing it
yet.  Maybe Dmitry will be the one to do that.

Cheers,

Jeff

Re: [GENERAL] Monitoring of a hot standby with a largely idle master

2017-07-24 Thread Jeff Janes

On Sun, Jul 16, 2017 at 8:47 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Fri, Jul 14, 2017 at 9:11 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> > On Thu, Jul 13, 2017 at 10:38 AM, Michael Paquier
> > <michael.paqu...@gmail.com> wrote:
> >>
> >> On Thu, Jul 13, 2017 at 7:23 PM, Jeff Janes <jeff.ja...@gmail.com>
> wrote:
> >> >
> >> > I think that pg_stat_wal_receiver should be crossreferenced in
> >> > https://www.postgresql.org/docs/9.6/static/hot-standby.html, near the
> >> > same
> >> > place which it crossreferences table 9-79.  That would make it more
> >> > discoverable.
> >>
> >> Hm. Hot standby may not involve streaming replication. What about a
> >> paragraph here instead?
> >>
> >> https://www.postgresql.org/docs/devel/static/warm-
> standby.html#streaming-replication
> >>
> >> In the monitoring subsection, we could tell that on a standby the WAL
> >> receiver status can be retrieved from this view when changes are
> >> streamed. What do you think?
> >
> >
> > That works for me.
>
> What do you think about the patch attached?
>

Looks OK.  Should it mention specifically "On a hot standby" rather than
"On a standby"?  Otherwise people might be left confused on how they are
supposed to do this on a generic standby.  It is the kind of thing which is
obvious once you know it, but confusing the first time you encounter it.

>
>  
>   You can retrieve a list of WAL sender processes via the
> - 
> + 
>   pg_stat_replication view. Large differences
> between
> In the previous paragraph I have noticed that the link reference is
> incorrect. pg_stat_replication is listed under
> monitoring-stats-dynamic-views-table.
>

Yes, that is clearly wrong.  But why not link directly to the description
of the view itself, pg-stat-replication-view, rather than the correct table
which mentions the view?  Is that the accepted docs style to link to the
more generic place?  (Same thing applies to your patch, it could link
directly to pg-stat-wal-receiver-view.

Sorry for the delay, it took me awhile to get the new doc build system to
work (solution seems to be, "Don't use CentOS6 anymore")

Cheers,

Jeff

Re: [GENERAL] Monitoring of a hot standby with a largely idle master

2017-07-14 Thread Jeff Janes

On Thu, Jul 13, 2017 at 10:38 AM, Michael Paquier <michael.paqu...@gmail.com
> wrote:

> On Thu, Jul 13, 2017 at 7:23 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> >
> > I think that pg_stat_wal_receiver should be crossreferenced in
> > https://www.postgresql.org/docs/9.6/static/hot-standby.html, near the
> same
> > place which it crossreferences table 9-79.  That would make it more
> > discoverable.
>
> Hm. Hot standby may not involve streaming replication. What about a
> paragraph here instead?
> https://www.postgresql.org/docs/devel/static/warm-standby.html#streaming-
> replication
>
> In the monitoring subsection, we could tell that on a standby the WAL
> receiver status can be retrieved from this view when changes are
> streamed. What do you think?
>

That works for me.

Cheers,

Jeff

Re: [GENERAL] LDAP authentication without OU in ldapbasedn

2017-07-13 Thread Jeff Janes

On Thu, Jul 13, 2017 at 2:46 AM, Gregory Nicol  wrote:

> Good morning all,
>
>
>
> I can’t seem to get LDAP Authentication working without an OU in the
> ldapbasedn. My users are spread across multiple OUs without a common root
> OU which is why I’m trying to authenticate with just the DC.
>
>

Have you tried using the "simple bind mode" where you specify just the
ldapsuffix and the ldapserver?

Cheers,

Jeff

Re: [GENERAL] Monitoring of a hot standby with a largely idle master

2017-07-13 Thread Jeff Janes

On Thu, Jul 13, 2017 at 1:15 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Thu, Jul 13, 2017 at 5:26 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> >
> > I think that none of the recovery information functions
> > (https://www.postgresql.org/docs/9.6/static/functions-admin.
> html#FUNCTIONS-RECOVERY-INFO-TABLE)
> > can distinguish a hot standby which is connected to an idle master,
> versus
> > one which is disconnected.  For example, because the master has crashed,
> or
> > someone has changed the firewall rules.
> >
> > Is there a way to monitor from SQL the last time the standby was able to
> > contact the master and initiate streaming with it?  Other than trying to
> > write a function that parses it out of pg_log?
>
> Not directly I am afraid. One way I can think about is to poll
> periodically the state of pg_stat_replication on the primary or
> pg_stat_wal_receiver on the standby and save it in a custom table. The
> past information is not persistent as any replication-related data in
> catalogs is based on the shared memory state of the WAL senders and
> the WAL receiver, and those are wiped out at reconnection.
>

Thanks, that looks like what I want (or will be, once I get the other side
to upgrade to 9.6).

I think that pg_stat_wal_receiver should be crossreferenced in
https://www.postgresql.org/docs/9.6/static/hot-standby.html, near the same
place which it crossreferences table 9-79.  That would make it more
discoverable.

Cheers,

Jeff

[GENERAL] Monitoring of a hot standby with a largely idle master

2017-07-12 Thread Jeff Janes

I think that none of the recovery information functions (
https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE)
can distinguish a hot standby which is connected to an idle master, versus
one which is disconnected.  For example, because the master has crashed, or
someone has changed the firewall rules.

Is there a way to monitor from SQL the last time the standby was able to
contact the master and initiate streaming with it?  Other than trying to
write a function that parses it out of pg_log?

Cheers,

Jeff

[GENERAL] debugging SSL connection problems

2017-07-10 Thread Jeff Janes

Is there a way to get libpq to hand over the certificate it gets from the
server, so I can inspect it with other tools that give better diagnostic
messages?  I've tried to scrape it out of the output of "strace -s8192",
but since it is binary it is difficult to figure out where it begins and
ends within the larger server response method.

Thanks,

Jeff

Re: [GENERAL] Strange case of database bloat

2017-07-05 Thread Jeff Janes

On Tue, Jul 4, 2017 at 10:18 PM, Chris Travers 
wrote:

>
> Questions
> ===
> I assume that it is the fact that rows update frequently which is the
> problem here? But why doesn't Postgres re-use any of the empty disk pages?
>

Can you install the contrib extension pg_freespacemap and use "select *
from pg_freespace('table_name')" to see if PostgreSQL agrees that the space
is re-usable?

Cheers,

Jeff

Re: [GENERAL] Re: have trouble understanding xmin and xmax with update operations from two different sessions

2017-07-04 Thread Jeff Janes

On Mon, Jul 3, 2017 at 10:39 AM, rajan  wrote:

> Thanks, Jeff.
>
> Now I am going back to my old question.
>
> Even though *Session 2* fails to update with UPDATE 0 message, its txid is
> saved in xmax of updated(by *Session 1*) tuple.
>
> As it becomes an old txid, how come new txids are able to view it?
>

The database can see everything.  That is its job.  It constructs the
principles of ACID out of non-ACID components.  But once you use
pageinspect or select the system columns mxin and xmax, you start to peek
through that illusion.

Cheers,

Jeff

Re: [GENERAL] Re: have trouble understanding xmin and xmax with update operations from two different sessions

2017-07-03 Thread Jeff Janes

On Mon, Jul 3, 2017 at 3:02 AM, rajan  wrote:

> Thanks for the explanation.
>
> will I be able to view the information using this function,
> SELECT * FROM heap_page_items(get_raw_page('testnumbers', 0));
>
> Also, please let me know which column I should refer for viewing the
> pointer.
>


It is 't_ctid'

Cheers,

Jeff

Re: [GENERAL] Re: have trouble understanding xmin and xmax with update operations from two different sessions

2017-07-02 Thread Jeff Janes

On Sat, Jul 1, 2017 at 8:55 PM, rajan  wrote:

> Thanks, Jeff. That helps understanding it 50%.
>
> *Session 2* fails to UPDATE the record which is in *(0,2)* and this tuple
> is
> marked for deletion. It means that *(0,2) never exists* when Session 2 is
> trying to perform the update.
>

That it never exists is an appearance presented to the user.  The database
system works hard to maintain that illusion but the database system itself
sees through the illusion.  It blocks on (0,2) waiting for session 1 to
commit, and then once that happens session 2 goes and finds the new version
of that row ((0,4) in this case) and locks it.  If you use pageinspect, you
can see that (0,2) has left a pointer behind pointing to (0,4) to make it
easy to find.

Cheers,

Jeff

Re: [GENERAL] Re: have trouble understanding xmin and xmax with update operations from two different sessions

2017-07-01 Thread Jeff Janes

On Sat, Jul 1, 2017 at 6:32 PM, rajan  wrote:

> hello,
>
> thanks for replies, Adrian, Steven.
>
> >So calling it can advance the xid manually. Some testing here showed
> >that what xmin or xmax is created depends on when you call txid_current
> >in either the original session or the concurrent sessions.
>
> I understand this and I am executing my statements inside a Transaction
> block so the xid is not incremented when calling it.
>
> >Also worth noting that an UPDATE in Postgres is a DELETE/INSERT process.
> >The clue is the ctid value. In  Session 2 you are looking at the
> >original row(ctid=(0, 2) which has been marked as deleted(non-zero
> >xmax). In Session 3 you are looking at the new row(ctid(0, 4)).
>
> Yes. But why (ctid(0,4)) in *Session 3* carries the xmax of the txid 519115
> in which the update failed with *UPDATE 0* . This is where I can not
> understand,
> 1. Row (0,4) is updated with correct value and (0,3) is not visible in
> Session 2, which is good.
> 2. but in *Session 3* (0,4) also carries xmax which means what? Is it also
> marked for deletion? It can't be, right?
>

When session 2 encounters the locked row which meets the criterion for the
update, it has to wait for the locking transaction to finish.  At that
point it locks the row (by writing its transaction into the xmax, and
setting a flag not visible to you, unless you use pgeinspect) and then
re-evaluates if it still meets the criterion.  Since it doesn't meet the
criterion anymore, it doesn't finish updating the tuple.

Cheers,

Jeff

Re: [GENERAL] 9.6 parameters messing up my 9.2 pg_dump/pg_restore

2017-06-29 Thread Jeff Janes

On Thu, Jun 29, 2017 at 12:05 AM, Ken Tanzer  wrote:

> Thanks for the responses.  For me, using the 9.2 binary was the winner.
> Shoulda thought of that!
>
> On Wed, Jun 28, 2017 at 1:30 PM, Tom Lane  wrote:
>
>>
>> Generally speaking, it helps a lot if you don't insist on restoring the
>> output in a single transaction.  In this case, that would allow the
>> restore to ignore the new parameters and move on.
>>
>> regards, tom lane
>>
>
> Well sure, I can see it increases your chances of getting _something_
> restored.  But there's also a lot to be said for ensuring that _all_ your
> data restored, and did so correctly, no?
>

Record the errors, and look through them to decide if they are important or
not.

But better yet, use v9.2 of pg_dump to dump things out of a 9.2 server
which you want to load to another 9.2 server.  Don't be at the mercy of
your $PATH.

(Or even more better yet, upgrade the servers from 9.2 to 9.6, and then use
9.6's pg_dump)

Cheers,

Jeff

Re: [GENERAL] 10beta1 role

2017-06-22 Thread Jeff Janes

On Thu, Jun 22, 2017 at 1:39 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Thu, Jun 22, 2017 at 1:34 PM, Adrian Klaver <adrian.kla...@aklaver.com>
> wrote:
>
>> On 06/22/2017 01:29 PM, Jeff Janes wrote:
>>
>>> On Thu, Jun 22, 2017 at 1:22 PM, Adrian Klaver <
>>> adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>> wrote:
>>>
>>> On 06/22/2017 01:13 PM, Jeff Janes wrote:
>>>
>>>
>>>
>>> But he is seeing the message when he starts the database, not when he
>>> does initdb of it.
>>>
>>
>> Hmm, on my machine:
>>
>>
>
>
>> /usr/local/pgsql10/bin/pg_ctl -D pg100/ -l logfile start
>> waiting for server to start done
>> server started
>>
>
>
> But look inside "logfile".
>
>
>>
>> /usr/local/pgsql10/bin/psql -d postgres -U aklaver -p 5472
>> psql (10beta1)
>> Type "help" for help.
>>
>> postgres=# \du
>>
>
> Right, the database starts.  pg_ctl just can't prove it started, because
> it doesn't know how to connect to it.
>
>
>
Correction, it can prove it started, because when it connects as the wrong
user, it gets an error message which it would only get if the database is
up.  But in the process, it leaves a message behind in the server's log
file.

Cheers,

Jeff

Re: [GENERAL] 10beta1 role

2017-06-22 Thread Jeff Janes

On Thu, Jun 22, 2017 at 1:34 PM, Adrian Klaver <adrian.kla...@aklaver.com>
wrote:

> On 06/22/2017 01:29 PM, Jeff Janes wrote:
>
>> On Thu, Jun 22, 2017 at 1:22 PM, Adrian Klaver <adrian.kla...@aklaver.com
>> <mailto:adrian.kla...@aklaver.com>> wrote:
>>
>> On 06/22/2017 01:13 PM, Jeff Janes wrote:
>>
>>
>>
>> But he is seeing the message when he starts the database, not when he
>> does initdb of it.
>>
>
> Hmm, on my machine:
>
>


> /usr/local/pgsql10/bin/pg_ctl -D pg100/ -l logfile start
> waiting for server to start done
> server started
>


But look inside "logfile".


>
> /usr/local/pgsql10/bin/psql -d postgres -U aklaver -p 5472
> psql (10beta1)
> Type "help" for help.
>
> postgres=# \du
>

Right, the database starts.  pg_ctl just can't prove it started, because it
doesn't know how to connect to it.

Cheers,

Jeff

Re: [GENERAL] 10beta1 role

2017-06-22 Thread Jeff Janes

On Thu, Jun 22, 2017 at 1:22 PM, Adrian Klaver <adrian.kla...@aklaver.com>
wrote:

> On 06/22/2017 01:13 PM, Jeff Janes wrote:
>
>> On Thu, Jun 22, 2017 at 12:06 PM, Ray Stell <ste...@vt.edu > ste...@vt.edu>> wrote:
>>
>> I used "initdb -U" to specify an alternate superuser.  On startup it
>> throws these msgs:
>>
>> 2017-06-22 14:36:34 EDT,0,startup FATAL:  28000: role "postgresql"
>> does not exist
>>
>> 2017-06-22 14:36:34 EDT,0,startup LOCATION: InitializeSessionUserId,
>> miscinit.c:503
>>
>>
>> Earlier versions do the same thing if you start them with the wait option
>> (-w).
>>
>> The difference is that wait is now the default, and you use -W to turn it
>> off.
>>
>
> The would seem to work for the pg_ctl init[db] mode, however the OP is
> using the plain initdb where -W is:
>
> -W
> --pwprompt
>

But he is seeing the message when he starts the database, not when he does
initdb of it.

Cheers,

Jeff

Re: [GENERAL] 10beta1 role

2017-06-22 Thread Jeff Janes

On Thu, Jun 22, 2017 at 12:06 PM, Ray Stell  wrote:

> I used "initdb -U" to specify an alternate superuser.  On startup it
> throws these msgs:
>
> 2017-06-22 14:36:34 EDT,0,startup FATAL:  28000: role "postgresql" does
> not exist
>
> 2017-06-22 14:36:34 EDT,0,startup LOCATION: InitializeSessionUserId,
> miscinit.c:503

Earlier versions do the same thing if you start them with the wait option
(-w).

The difference is that wait is now the default, and you use -W to turn it
off.

With the wait option in use, when starting up the server pg_ctl keeps
trying to connect to the server so once it is running, it can report
success. But it doesn't know who to connect as, so it just uses the default.

Cheers,

Jeff

Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-19 Thread Jeff Janes

On Mon, Jun 19, 2017 at 10:33 AM, Dmitry O Litvintsev 
wrote:

> Hi
>
> Since I have posted this nothing really changed. I am starting to panic
> (mildly).
>
> The source (production) runs :
>
>   relname   |   mode   | granted |
> substr|
> query_start  |  age
> +--+
> -+--
> +---+
>  t_inodes_iio_idx   | RowExclusiveLock | t   |
> autovacuum: VACUUM ANALYZE public.t_inodes (to prevent wraparound)   |
> 2017-06-15 10:26:18.643209-05 | 4 days 01:58:56.697559
>

This is close to unreadable.  You can use use \x to get output from psql
which survives email more readably.

Your first report was 6 days ago.  Why is the job only 4 days old?  Are you
frequently restarting your production server, so that the vacuum job never
gets a chance to finish?  If so, that would explain your predicament.

And how big is this table, that it takes at least 4 days to VACUUM?

vacuum_cost_delay = 50ms
>

That is a lot.  The default value for this is 0.  The default value
for autovacuum_vacuum_cost_delay is 20, which is usually too high for giant
databases.

I think you are changing this in the wrong direction.  Rather than increase
vacuum_cost_delay, you need to decrease autovacuum_vacuum_cost_delay, so
that you won't keep having problems in the future.

On your test server, change vacuum_cost_delay to zero and then initiate a
manual vacuum of the table.  It will block on the autovacuum's lock, so
then kill the autovacuum (best to have the manual vacuum queued up first,
otherwise it will be race between when you start the manual vacuum, and
when the autovacuum automatically restarts, to see who gets the lock). See
how long it takes this unthrottled vacuum to run, and how much effect the
IO it causes has on the performance of other tasks.  If acceptable, repeat
this on production (although really, I don't that you have much of a choice
on whether the effect it is acceptable or not--it needs to be done.)

Cheers,

Jeff

Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Jeff Janes

On Mon, Jun 19, 2017 at 8:49 AM, Merlin Moncure <mmonc...@gmail.com> wrote:

> On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> > If you have a RAID, set it to the number of spindles in your RAID and
> forget
> > it. It is usually one of the less interesting knobs to play with.
> (Unless
> > your usage pattern of the database is unusual and exact fits the above
> > pattern.)
>
> Isn't that advice obsolete in a SSD world though?  I was able to show
> values up to 256 for a single device provided measurable gains for a
> single S3500.  It's true though that the class of queries that this
> would help is pretty narrow.

I don't think it is obsolete, you just have to be creative with how you
interpret 'spindle' :)

With a single laptop hard-drive, I could get improvements of about 2 fold
by setting it to very high numbers, like 50 or 80. By giving the hard drive
the option of dozens of different possible sectors to read next, it could
minimize head-seek.  But that is with just one query running at a time.
With multiple queries all running simultaneously all trying to take
advantage of this, performance gains quickly fell apart.  I would expect
the SSD situation to be similar to that, where the improvements are
measurable but also fragile, but I haven't tested it.

Cheers,

Jeff

Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Jeff Janes

On Sun, Jun 18, 2017 at 7:09 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Sun, Jun 18, 2017 at 6:02 PM, Patrick B 
> wrote:
>
>> Hi guys.
>>
>> I just wanna understand the effective_io_concurrency value better.
>>
>> My current Master database server has 16 vCPUS and I use
>> 
>>  effective_io_concurrency = 0.
>>
>
> It seems as though the number of virtual CPUs little to no bearing on
> whether, or to what value, you should set this parameter.  Obviously with
> only one CPU parallelism wouldn't be possible (I'm assuming a single query
> does not make multiple parallel requests for data)
>

Ah, but it does.  That is exactly what this parameter is for.

Unfortunately, it is only implemented in very narrow circumstances.  You
have to be doing bitmap index scans of many widely scattered rows to make
it useful.  I don't think that this is all that common of a situation.  The
problem is that at every point in the scan, it has to be possible to know
what data block it is going to want N iterations in the future, so you can
inform the kernel to pre-fetch it.  That is only easy to know for bitmap
scans.

If you have a RAID, set it to the number of spindles in your RAID and
forget it. It is usually one of the less interesting knobs to play with.
 (Unless your usage pattern of the database is unusual and exact fits the
above pattern.)

Cheers,

Jeff

Re: [GENERAL] workaround for column cross-correlation

2017-06-12 Thread Jeff Janes

On Mon, Jun 12, 2017 at 8:17 PM, Justin Pryzby  wrote:

> I know PG 10 will have support "CREATE STATISTICS.." for this..
>
> ..but I wondered if there's a recommended workaround in earlier versions ?
>

Not without seeing the query

>
> 2) memory explosion in hash join (due to poor estimate?) caused OOM.
>

As far as I know, the only way a hash join should do this is if the join
includes a huge number of rows with exactly the same 32 bit hash codes.
Otherwise, it should spill to disk without causing OOM.  Hash aggregates,
on the other hand, are a different matter.

 Cheers,

Jeff

Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-09 Thread Jeff Janes

On Fri, Jun 9, 2017 at 9:13 AM, Eric Lemoine 
wrote:

> On 06/08/2017 10:41 PM, Éric wrote:
> >
> >
> >
> >> Have you experimented with other queries that don't involve PostGIS?
> >> I'm wondering if your hook-installation code fails to work properly
> >> unless PostGIS was loaded first.  This would be easier to credit if
> >> there are hooks both extensions try to get into.
> >
> >
> > I think you're right on Tom. It looks like I cannot reproduce the issue
> if I start by calling a PostGIS function rather than a Pointcloud function.
> So it may well be a conflict between PostGIS and Pointcloud. Both use
> fn_extra, and that makes we wonder. This old thread [*] makes me wonder
> too! I still need to figure out the bug, but I can see some light now!
> thanks
> >
> > [*]  984D0F47C5FF4D0DB0D71A4F6EF670ED%40cleverelephant.ca#
> 984d0f47c5ff4d0db0d71a4f6ef67...@cleverelephant.ca>
>
>
> I now think that the performance bug is not related to the fn_extra
> thing. I had hope but not anymore :) I don't see where the Pointcloud
> and PostGIS extensions could conflict.
>

Can you run 'perf top' on the slow query?  That might pretty quickly tell
you which function is taking up your time.

Cheers,

Jeff

Re: [GENERAL] Why is posgres picking a suboptimal plan for this query?

2017-05-24 Thread Jeff Janes

On Wed, May 24, 2017 at 1:42 PM, Sam Saffron  wrote:

> I have this query that is not picking the right index unless I hard code
> dates:
>
>
> SELECT "topics".* FROM "topics"
> WHERE topics.last_unread_at >= '2017-05-11 20:56:24'
>
> "Index Scan using index_topics_on_last_unread_at on topics
> (cost=0.41..8.43 rows=1 width=725) (actual time=0.005..0.065 rows=5
> loops=1)"
> " Index Cond: (last_unread_at >= '2017-05-11 20:56:24'::timestamp
> without time zone)"
> "Planning time: 0.136 ms"
> "Execution time: 0.087 ms"
>

PostgreSQL knows that few entries come after 2017-05-11 (it thinks 1,
actually 5) and comes up with a plan which works well for that situation.

SELECT "topics".* FROM "topics"
> WHERE topics.last_unread_at >= (select first_topic_unread_at from
> user_stats us where us.user_id = 1)
>

> "Seq Scan on topics (cost=8.30..24030.12 rows=20555 width=725) (actual
> time=3.186..59.636 rows=5 loops=1)"
> " Filter: (last_unread_at >= $0)"
> " Rows Removed by Filter: 61660"
> " InitPlan 1 (returns $0)"
> " -> Index Scan using user_stats_pkey on user_stats us
> (cost=0.29..8.30 rows=1 width=8) (actual time=0.008..0.009 rows=1
> loops=1)"
> " Index Cond: (user_id = 1)"
> "Planning time: 0.147 ms"
> "Execution time: 59.671 ms"
>

At the time PostgreSQL plans this query, it doesn't know what the answer to
the subquery is going to be.  Not having the true answer at its fingertips,
it guesses that one third of the table is going to fall after the results
of that subquery.

Maybe it should first execute the subquery and then re-plan the rest of the
query based on the results.  But there is no provision for it to do that,
and no concrete plans (that I know of) to implement such a thing.

>
> The results here simply do not make sense to me, should I be piping
> dates in here to avoid this issue and running 2 queries instead of 1?
>

That is the most pragmatic approach.  It isn't very nice, but the
alternatives are worse.

Cheers,

Jeff

Re: [GENERAL] Encrypt with BouncyCastle and decrypt with pg_pub_decrypt

2017-05-19 Thread Jeff Janes

On Thu, May 18, 2017 at 4:57 PM, Kang, Kamal 
wrote:

> Hi all,
>
>
>
> I am trying to encrypt a string using Bouncy Castle PGP Java apis, Base64
> encode the encrypted string and then decrypt using pg_pub_decrypt but it is
> failing with error “Wrong Key”. Just wanted to know if this is doable or
> pg_pub_decrypt only works with encrypted strings from pg_pub_encrypt?
>

pg_pub_decrypt is compatible with gpg, so if Bouncy Castle is also
compatible with gpg I don't see why it wouldn't also work.  Without more
information, it is hard to provide more advice.  encrypt a dummy payload
with a dummy password and show us what you get and what you do with it.

Cheers,

Jeff

Re: [GENERAL] database is not accepting commands

2017-05-17 Thread Jeff Janes

On Tue, May 16, 2017 at 1:28 AM, reem  wrote:

> We have 1.5 TB database that's shown an error and block all commands.
> The error is :
> "ERROR:  database is not accepting commands to avoid wraparound data loss
> in
> database "dbname"
> HINT:  Stop the postmaster and use a standalone backend to vacuum that
> database.
> You might also need to commit or roll back old prepared transactions."
>
> I tried to do vacuum in the backend mode. Also I tried to set
> zero_damaged_pages = on then do the vacuum again but same error appeared.
> The error appeared after two hours of vacuuming where verbose shows passing
> tables.
>

I don't see any way that error message can be generated while in standalone
mode.

Are you sure you sure you don't have multiple instances running, and you
are mixing up the logs between them?

Cheers,

Jeff

Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-05 Thread Jeff Janes

On Wed, May 3, 2017 at 3:57 AM, Thomas Güttler <guettl...@thomas-guettler.de
> wrote:

> Am 02.05.2017 um 05:43 schrieb Jeff Janes:
>
>> On Sun, Apr 30, 2017 at 4:37 AM, Thomas Güttler <
>> guettl...@thomas-guettler.de <mailto:guettl...@thomas-guettler.de>>
>> wrote:
>>
>> Is is possible that PostgreSQL will replace these building blocks in
>> the future?
>>
>>  - redis (Caching)
>>
>>
>> PostgreSQL has its own caching.  It might not be quite as effective as
>> redis', but you can us it if you are willing to
>> take those trade offs.
>>
>
> What kind of caching does PG offer?
>

It has shared_buffers to cache the data it needs frequently (not query
results, but the data needed to produce the results), and also uses the
file systems cache.  This is what I am referring to.  I wouldn't recommend
using PostgreSQL simply as a cache for something else, if you don't want
any other features of the database.  But if you want to throw Redis up as a
layer of cache in front of PostgreSQL, maybe you should first see if that
RAM, and a bit of tuning, can be used to make PostgreSQL fast enough to not
need the Redis cache.

>
>>
>>
>>  - s3 (Blob storage)
>>
>>
>>
>
> No.  You can certainly use PostgreSQL to store blobs.  But then, you need
>> to store the PostgreSQL data **someplace**.
>> If you don't store it in S3, you have to store it somewhere else.
>>
>
> I don't understand what you mean here. AFAIK storing blobs in PG is not
> recommended since it is not very efficient.
>

If the metadata is stored in PG and the blobs themselves are stored
individually S3, you have a transaction atomicity problem. Solving that is
not likely to be very efficient, either.  You have to pick your poison.

Cheers,

Jeff

Re: [GENERAL] Can PG replace redis, amqp, s3 in the future?

2017-05-01 Thread Jeff Janes

On Sun, Apr 30, 2017 at 4:37 AM, Thomas Güttler <
guettl...@thomas-guettler.de> wrote:

> Is is possible that PostgreSQL will replace these building blocks in the
> future?
>
>  - redis (Caching)
>

PostgreSQL has its own caching.  It might not be quite as effective as
redis', but you can us it if you are willing to take those trade offs.

 - rabbitmq (amqp)
>

PostgreSQL has its own system for this, and other ones can be layered on
top of fully transactional tables.
Again, you can use one or the other, depending on your needs, if you are
willing to deal with the trade offs.

>  - s3 (Blob storage)
>

No.  You can certainly use PostgreSQL to store blobs.  But then, you need
to store the PostgreSQL data **someplace**.  If you don't store it in S3,
you have to store it somewhere else.

Cheers,

Jeff

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-19 Thread Jeff Janes

Please don't top-post, thanks.

On Tue, Apr 18, 2017 at 3:18 PM, Rj Ewing  wrote:

> On Tue, Apr 18, 2017 at 3:00 PM, Bruce Momjian  wrote:
>
>>
>> Full text search of JSON and JSONB data is coming in Postgres 10, which
>> is to to be released in September of this year:
>>
>> https://www.depesz.com/2017/04/04/waiting-for-postgresql-10-
>> full-text-search-support-for-json-and-jsonb/
>
>

A step in the right direction for me, however it doesn't appear to support
> per field full text searching.
> It is exciting though!
>

Your best bet might be to ignore the per-field searching in the initial
(indexed) pass of the query to get everything that has all the search
terms, regardless of which field they occur in.  And the re-check whether
each of the found values was found in the appropriate field in a later pass.

Something like

select * from sample where
 to_tsvector(json_thing->>:key1) @@ :value1
  and to_tsvector(json_thing->>:key2) @@ :value2
  and to_tsvector('english',json_thing) @@ (:value1 || :value2)

>From the initial email:

> An idea that has come up is to use a materialized view or secondary table
with triggers, where we would have 3 columns (id, key, value).

How would this be different from the "triple store" you are abandoning?

Cheers,

Jeff

Re: [GENERAL] Why so long?

2017-04-19 Thread Jeff Janes

On Wed, Apr 19, 2017 at 8:24 AM, Steve Clark 
wrote:

> Hello,
>
> I am confused. I have a table that has an incrementing primary key id.
>
> When I select max(id) from table is returns almost instantly but
> when I select min(id) from table it takes longer than I want to wait.
>
> Shouldn't postgresql be able to quickly find the minimum id value in the
> index?
>

Not if the low end of the index is stuffed full of obsolete entries, which
haven't been cleaned up because it is not being vacuumed often enough.

Do you have autovacuum on?  Have you manually vacuumed the table recently?

Cheers,

Jeff

Re: [GENERAL] streaming replication and archive_status

2017-04-18 Thread Jeff Janes

On Tue, Apr 18, 2017 at 5:20 AM, Luciano Mittmann 
wrote:

>
>
> Hi Jeff,
>
> **Does each file in pg_xlog/archive_status/ have a corresponding file one
> directory up?
>
> no corresponding file on pg_xlog directory. That is the question.. for
> some reason or some parameter that I do not know, the files are considered
> consumed but are not erased later.
>

I can see how a well-timed crash could leave behind a few .done files, but
not 75 thousand of them.

Are they still accumulating, or was it only an historical accumulation?

Also, is this on Windows?

Cheers,

Jeff

Re: [GENERAL] # of connections and architecture design

2017-04-18 Thread Jeff Janes

On Tue, Apr 18, 2017 at 2:42 AM, Moreno Andreo 
wrote:

> Hi all,
> As many of you has read last Friday (and many has tired to help, too,
> and I still thank you very much), I had a bad service outage.
> I was pointed to reduce number of maximum connections using a pooler, and
> that's what I'm building in test lab, but I'm wondering if there's
> something I can do with my overall architecture design.
> ATM we host one database per customer (about 400 now) and every customer
> has two points of access to data:
> - Directly to database, via rubyrep, to replicate the database he has in
> his own machine
> - Wia WCF self-hosted web services to read other customers data
> Every customer can access (and replicate) his database from a number of
> different positions (max 3).
> Customers are organized in groups (max 10 per group), and there is the
> chance that someone accesses someone else's data via WCF.
> For example, pick up a group of 5: everyone running rubyrep with only one
> position enabled, and getting data from others' database.
> If I'm not wrong, it's 5 connections (rubyrep) plus 5*4 (everyone
> connecting to everyone else's database) for WCF, so 25 connections
> Now imagine a group of 10
> Last friday I've been told that 350 connections is quite a big number and
> things can begin to slow down. Ok. When something slows down I'm used to
> search and find the bottleneck (CPU, RAM, IO, etc). If everything was
> running apparently fine (CPU < 10%, RAM used < 20%, I/O rarely over 20%),
> how can I say there's a bottleneck that's slowing down things? Am I missing
> something?
> Another thing is that on a customer server (with a similar, smaller
> architecture)  I _do_ have a connection leak problem that's under
> investigation, but when things begin to slow down I simply run a
> pg_terminate_backend on all connection with an age > 10 min and everything
> goes back to normal. On my server, last friday, it did not help, so I
> thought that was not the main problem.
> I've got no problems in splitting this architecture in how many servers I
> need, but I think I need some tips on how to design this, in order to avoid
> major issues in the near future (ask for details if needed).
>
> The current PostgreSQL 9.5.6 server is an 8 core VM with 52 GB RAM and
> Debian 8.
> WCF server is Windows 2012 R2 4-core, 16 GB RAM.
>
> While facing the issue none of them showed up any kind of overload and
> their logs were clean.
>
> I'm a bit scared it can happen again.
>

The logs being clean doesn't help much, if your log settings are set to be
too terse.

Is log_lock_waits on?  log_checkpoints?  track_io_timing (doesn't show up
in the logs, you have to query database views)?

Is log_min_duration_statement set to a reasonable value?
 log_autovacuum_min_duration?

Are you using pg_stat_statement (also doesn't show up in the logs, you have
to query it), and perhaps auto_explain?

Cheers,

Jeff

Re: [GENERAL] streaming replication and archive_status

2017-04-17 Thread Jeff Janes

2017-04-17 17:08 GMT-03:00 Jeff Janes <jeff.ja...@gmail.com>:

> On Mon, Apr 17, 2017 at 12:22 PM, Luciano Mittmann <mittm...@gmail.com>
>  wrote:
>
>> Hi All,
>>
>> anyone knows why there are so many files in the directory
>> pg_xlog/archive_status/ in replication server?
>>
>> # pg_xlog/archive_status/ | wc -l
>>
>> 75217
>>
>> Is possible to clean this .done files or just don't need to worry ?
>>
>> It's not occurs on primary or standalone servers, just on replication.
>>
>
> What version?  Are you logging checkpoints, and if so what do they say?
>
>
On Mon, Apr 17, 2017 at 1:24 PM, Luciano Mittmann <mittm...@gmail.com>
wrote:

> Hi Jeff,
>
> checkpoint message on standby node:
>
> [ 2017-04-17 17:21:56 BRT] @ LOG:  restartpoint complete: wrote 21475
> buffers (2.6%); 0 transaction log file(s) added, 0 removed, 0 recycled;
> write=149.816 s, sync=0.064 s, total=149.890 s; sync files=314,
> longest=0.002 s, average=0.000 s; distance=145449 kB, estimate=236346 kB
> [ 2017-04-17 17:21:56 BRT] @ LOG:  recovery restart point at 126/A7072A88
> [ 2017-04-17 17:21:56 BRT] @ DETAIL:  last completed transaction was at
> log time 2017-04-17 17:21:02.289164-03
>

I wish the checkpoint logging code would tell you how many transaction log
file it intentionally retained as well.  I've not seen it "accidentally"
retain files, but seeing the number logged would help simplify
troubleshooting.

Does each file in pg_xlog/archive_status/ have a corresponding file one
directory up?

Cheers,

Jeff

Re: [GENERAL] streaming replication and archive_status

2017-04-17 Thread Jeff Janes

On Mon, Apr 17, 2017 at 12:22 PM, Luciano Mittmann 
wrote:

> Hi All,
>
> anyone knows why there are so many files in the directory
> pg_xlog/archive_status/ in replication server?
>
> # pg_xlog/archive_status/ | wc -l
>
> 75217
>
> Is possible to clean this .done files or just don't need to worry ?
>
> It's not occurs on primary or standalone servers, just on replication.
>

What version?  Are you logging checkpoints, and if so what do they say?

Cheers,

Jeff

Re: [GENERAL] Service outage: each postgres process use the exact amount of the configured work_mem

2017-04-14 Thread Jeff Janes

On Fri, Apr 14, 2017 at 10:12 AM, Moreno Andreo 
wrote:

> Hi all,
> About 2 hours and half ago, suddenly (and on the late afternoon of the
> Easter Friday), customers reported failing connections to our server, or
> even very slow.
> After a bit of checking (that also involved server reboot) I noticed
> (using top) that every process regarding postgres is using exactly the
> amout I configured as work_mem (3 GB). And AFAIK it's not good.
>
> 30085 postgres 20 0 *3370048* 156656 153876 S 6.7 0.3 0:00.20 postgres
> 29833 postgres 20 0 *337* 65260 62416 S 1.7 0.1 0:00.17 postgres
> 29632 postgres 20 0 *3372468* 11712 6028 S 0.7 0.0 0:00.60 postgres
>

They are all sharing the same shared_buffers (not work_mem), and that
shared space is being listed for each one of them.

This is unlikely to be related to your current problems.  Nothing shown in
that output is alarming.

> What can be happened?
>

What error messages are the customers getting when they fail to connect?
What error messages are showing up in the server log file?  What do you see
in pg_stat_actvity?

Cheers,

Jeff

Re: [GENERAL] store key name pattern search

2017-04-04 Thread Jeff Janes

On Tue, Apr 4, 2017 at 8:41 AM, Armand Pirvu (home) 
wrote:

> Hi
>
> I have the following case
>
>
> select * from foo;
>   col1
> 
> -
>  "show_id"=>"1", "group_id"=>"32", "group_name"=>"slb",
> "group_add_by"=>"557651"
>  "show_id"=>"2", "group_id"=>"33", "group_name"=>"slc",
> "item_add_by"=>"557652"
> (2 rows)
>
> Is there anyway I can do a pattern search by hstore key name something like
>
> select * from foo where skeys(col1) like '%add_by%';
>
> I looked on the doc but did not see anything , or did I miss it ?
>

select * from foo where array_to_string(akeys(x),';') like '%add\_by%';

Note that I back-slashed the underscore, otherwise it acts as a wildcard
and may match more than you bargained for.

Cheers,

Jeff

[GENERAL] Trigger based logging alternative to table_log

2017-03-27 Thread Jeff Janes

I have some code which uses table_log (
http://pgfoundry.org/projects/tablelog/) to keep a log of changes to
selected tables.  I don't use the restore part, just the logging part.

It creates a new table for each table being logged, with several additional
columns, and adds triggers to insert rows in the new table for changes in
the original.

The problem is that table_log hasn't been maintained in nearly 10 years,
and pgfoundry itself seems to have one foot in the grave and one on a
banana peel.

There are several other systems out there which store the data in hstore or
json, which I would probably use if doing this from scratch.  But I'd
rather preserve the existing log tables than either throw away that data,
or port it over to a new format.

Is there any better-maintained code out there which would be compatible
with the existing schema used by table_log?

Cheers,

Jeff

Re: [GENERAL] Table not cleaning up drom dead tuples

2017-03-14 Thread Jeff Janes

On Tue, Mar 14, 2017 at 5:09 AM, Антон Тарабрин 
wrote:

> Good day. It seems that we have some strange case of VACUUM malfunction
> and table bloating.
>
> PostgreSQL 9.5.3
>

Are you using replication slots?

See this, fixed in 9.5.5:

commit de396a1cb34626619ddc6fb9dec6d12abee8b589
Author: Andres Freund 
Date:   Wed Aug 17 13:15:03 2016 -0700

Properly re-initialize replication slot shared memory upon creation.

Slot creation did not clear all fields upon creation. After start the
memory is zeroed, but when a physical replication slot was created in
the shared memory of a previously existing logical slot, catalog_xmin
would not be cleared. That in turn would prevent vacuum from doing its
duties.

Cheers,

Jeff

Re: [GENERAL] hight cpu %sy usage

2017-02-27 Thread Jeff Janes

On Mon, Feb 27, 2017 at 6:13 AM, dby...@163.com  wrote:

> hello everyone,
>
> i have PostgreSQL 9.5.3 server running on redhalt 6.6
> when i run one query  with pgbench the cpu is 80% and sy% is 60%
>
>
Why is this a problem?  If you run the query as fast as you can, all of the
time spent running the query has to go somewhere.  Is there something
inherently wrong with much of it going to sy rather than something else?

Can you show us the pgbench command you used?  -c, -j, etc.,


> 1.the query runing is 2.7ms
>
>
Where is that from?  It contradicts the information from the "explain
analyze".


>
> i drop the btree index  table_name_1_user_id_idx
> create index table_name_1_user_id_idx  on talbe_name_1 using
> hash(user_id);
> vacuum analyze table_name_1
>
> then the cpu is normal 
>
>

What happened to the TPS?  It is easy to shift load from sy to us if you do
it by making things slower overall by bloating the time spent in user
space.  I suspect that that is what happened.

Cheers,

Jeff

Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-23 Thread Jeff Janes

On Thu, Feb 23, 2017 at 2:42 AM, Bill Moran <wmo...@potentialtech.com>
wrote:

> On Wed, 22 Feb 2017 13:19:11 -0800
> Jeff Janes <jeff.ja...@gmail.com> wrote:
>
> > On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure <mmonc...@gmail.com>
> wrote:
> > >
> > > On Thursday, February 16, 2017, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > >
> > >> Tim Bellis <tim.bel...@metaswitch.com> writes:
> > >> > Even though this is a read only query, is it also expected to be
> > >> blocked behind the vacuum? Is there a way of getting indexes for a
> table
> > >> which won't be blocked behind a vacuum?
> > >>
> > >> It's not the vacuum that's blocking your read-only queries.  It's the
> > >> ALTER TABLE, which needs an exclusive lock in order to alter the
> table's
> > >> schema.  The ALTER is queued waiting for the vacuum to finish, and
> lesser
> > >> lock requests queue up behind it.  We could let the non-exclusive lock
> > >> requests go ahead of the ALTER, but that would create a severe risk
> of the
> > >> ALTER *never* getting to run.
> > >>
> > >> I'd kill the ALTER and figure on trying again after the vacuum is
> done.
> > >>
> > >>
> > > I've been drilled by this and similar lock stacking issues enough
> times to
> > > make me near 100% sure deferring the ALTER would be the better choice
> > >
> > >
> > This seems like a rather one-sided observation.  How could you know how
> > often the unimplemented behavior also would have "drilled" you, since it
> is
> > unimplemented?
> >
> > There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow
> other
> > requestors jump the queue if they were compatible with the held lock.  If
> > that is implemented, then you would just manually lock the table
> deferably
> > before invoking the ALTER TABLE command, if that is the behavior you
> wanted
> > (but it wouldn't work for things that can't be run in transactions)
>
> This seems redundant to me.
>
> We already have LOCK ... NOWAIT. It's fairly trivial to write an upgrade
> script that uses LOCK to explicitly lock tables that it's going to ALTER,
> then busy-waits if the lock is not immediately grantable.
>

As fairly trivial as it is, I bet I would mess it up a few times before I
got it right.  And then it would probably still be wrong in corner cases.
What if it fails not because the lock is unavailable, but for some obscure
error other than the ones anticipated or encountered during testing? And
busy-waiting is generally nasty and a waste of resources.


>
> The fact that so many ORMs and similar tools don't take advantage of that
> functionality is rather depressing.
>
> In my experience, I've also seen heavily loaded systems that this wouldn't
> work on, essentially because there is _always_ _some_ lock on every table.
> This is a case where experienced developers are required to take some
> extra time to coordinate their upgrades to work around the high load. But
> the proposed LOCK TABLE ... DEFERRABLE wouldn't help there either, because
> the ALTER would be deferred indefinitely.
>

True.  That is why it would be available only upon request, not the new
default.



> Personally, I feel like the existing behavior is preferrable. Software
> teams need to take the time to understand the locking implications of their
> actions or they'll have nothing but trouble anyway.
>
> As I've seen time and again: writing an application that handles low load
> and low concurrency is fairly trivial, but scaling that app up to high
> load and/or high concurrency generally sorts out the truely brilliant
> developers from the merely average.
>

So why not give the merely average some better tools?

Cheers,

Jeff

Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-22 Thread Jeff Janes

On Fri, Feb 17, 2017 at 10:36 AM, Tim Bellis <tim.bel...@metaswitch.com>
wrote:

>
>
>
>
> *From:* Jeff Janes [mailto:jeff.ja...@gmail.com]
> *Sent:* 17 February 2017 02:59
> *To:* Tim Bellis <tim.bel...@metaswitch.com>
> *Cc:* pgsql-general@postgresql.org
> *Subject:* Re: [GENERAL] Autovacuum stuck for hours, blocking queries
>
>
>
> On Wed, Feb 15, 2017 at 9:30 AM, Tim Bellis <tim.bel...@metaswitch.com>
> wrote:
>
> I have a postgres 9.3.4 database table which (intermittently but reliably)
> gets into a state where queries get blocked indefinitely (at least for many
> hours) behind an automatic vacuum. I was under the impression that vacuum
> should never take any blocking locks for any significant period of time,
> and so would like help resolving the issue.
>
> The process blocking the query is:
> postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum
> worker process   
> which is running the query
> autovacuum: VACUUM public.
>
>
>
> Are you sure it doesn't really say:
>
>
>
> autovacuum: VACUUM public. (to prevent wraparound)
>
> *[Tim Bellis] It doesn’t. I was using the query from *
> *https://wiki.postgresql.org/wiki/Lock_Monitoring*
> <https://wiki.postgresql.org/wiki/Lock_Monitoring>* and looking at the
> ‘current_statement_in_blocking_process’ column. Is there a different query
> I should be using?*
>

That query seems to be a bit mangled.  At one time, it only found row-level
locks.  Someone changed that, but didn't remove the comment "these only
find row-level locks, not object-level locks"

Also, the "WHERE NOT blocked_locks.GRANTED" should perhaps be:

WHERE NOT blocked_locks.GRANTED and blocking_locks.GRANTED;

As it is, every waiting query reports that it is waiting on all of its
fellow victims as well as the thing(s) actually blocking it. But my WHERE
clause is not really correct either, as it is possible that it is one
blocked thing is being blocked by a different blocked thing which is ahead
of it in the queue, when without that intervening blocked requestor it
could be immediately granted if its request mode is compatible with the
held mode(s).  I don't think there is a query that can reveal what is most
immediately blocking it.

But, I don't see how this explains what you see.  An autovacuum without
"(to prevent wraparound)" should not block anything for much more than a
second (unless you changed deadlock_timeout) and should not be blocked by
anything either as it just gives up on the operation if the lock is not
immediately available.

Cheers,

Jeff

Re: [GENERAL] bloat indexes - opinion

2017-02-22 Thread Jeff Janes

On Tue, Feb 21, 2017 at 1:44 PM, Patrick B  wrote:

> Hi guys,
>
> I've got a lot of bloat indexes on my 4TB database.
>
> Let's take this example:
>
> Table: seg
> Index: ix_filter_by_tree
> Times_used: 1018082183
> Table_size: 18 GB -- wrong. The table is mostly on pg_toast table. Its
> real size is 2TB
> Index_size: 17 GB
> Num_writes 16245023
> Index definition: CREATE INDEX ix_filter_by_tree ON seg USING btree
> (full_path varchar_pattern_ops) WHERE (full_path IS NOT NULL)
>
>
What is this from?  If you think the table size reported should include
toast, then change it to do that, or request the author of whatever-this-is
to make that change.

What indication is there that the index is bloated?  If the
meat-and-potatoes of a table is held in toast, then wouldn't you expect the
size of the table and the size of the index to be about the same?

Cheers,

Jeff

Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-22 Thread Jeff Janes

On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure  wrote:

>
>
> On Thursday, February 16, 2017, Tom Lane  wrote:
>
>> Tim Bellis  writes:
>> > Even though this is a read only query, is it also expected to be
>> blocked behind the vacuum? Is there a way of getting indexes for a table
>> which won't be blocked behind a vacuum?
>>
>> It's not the vacuum that's blocking your read-only queries.  It's the
>> ALTER TABLE, which needs an exclusive lock in order to alter the table's
>> schema.  The ALTER is queued waiting for the vacuum to finish, and lesser
>> lock requests queue up behind it.  We could let the non-exclusive lock
>> requests go ahead of the ALTER, but that would create a severe risk of the
>> ALTER *never* getting to run.
>>
>> I'd kill the ALTER and figure on trying again after the vacuum is done.
>>
>>
> I've been drilled by this and similar lock stacking issues enough times to
> make me near 100% sure deferring the ALTER would be the better choice
>
>
This seems like a rather one-sided observation.  How could you know how
often the unimplemented behavior also would have "drilled" you, since it is
unimplemented?

There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow other
requestors jump the queue if they were compatible with the held lock.  If
that is implemented, then you would just manually lock the table deferably
before invoking the ALTER TABLE command, if that is the behavior you wanted
(but it wouldn't work for things that can't be run in transactions)

Ideally each requestor would specify if they will hold the lock for a long
timer or a short time.  Them a short requestor which is blocked behind a
long requestor could let other compatible-with-held requests jump over it.
But once it was only blocked by short locks, it would reassert the normal
order, so it can't get permanently blocked by a constantly overlapping
stream of short locks.  But how would you get all lock requestors to
provide a reasonable estimate?

Cheers,

Jeff

Re: [GENERAL] Indexes and MVCC

2017-02-22 Thread Jeff Janes

On Sun, Feb 19, 2017 at 8:52 AM, Rakesh Kumar 
wrote:

>
> https://www.youtube.com/watch?v=8mKpfutwD0U=1741s
>
> Somewhere around 13th minute, Chris Tavers mentions this:
>
> 1 - In the queuing table, the application deletes lot of rows (typical for
> a queuing table).
> 2 - Query trying to find out rows to be picked by the queue, accesses them
> via indexes.
> 3 - Vacuum took lot of time to clean up dead rows.
> 4 - Indexes do not reach out to the latest visible tuple and has to
> traverse lot of dead
>  rows before hitting the row required by the sql. This is because of
> (3).
>
> My Question:
> Is the (4) true for all updates.  AFAIK, if an update occurs on even one
> index col, the
> index itself creates a new version in MVCC.
>

The index doesn't create a new version, but there is created a new pointer
in the index to the new version in the table. But it doesn't remove the
pointer to the old version at the time the new pointer is created.

> Only HOT updates will end up in situation
> described in (3).
>

No, quite the opposite.  Anyone can clean up a HOT update, once the
previous version is old enough that it can't be of interest to anyone
anymore. Only vacuum can completely remove the dead tuple from a non-HOT
update.

However, btree indexes have a feature where if they find a tuple which is
old enough that it can't be interesting to anyone, they remember that and
when they get back to the index they clean up the index entry, so the next
process doesn't need to follow it.  But the key here is that the tuple has
to be old enough that it is not interesting to *anyone*.  If there is a
long-lived transaction, even if it is not interested in this particular
table, it will inhibit this mechanism from working (as well as inhibiting
vacuum itself from cleaning them up)

Cheers,

Jeff

Re: [GENERAL] Bad planning data resulting in OOM killing of postgres

2017-02-21 Thread Jeff Janes

On Thu, Feb 16, 2017 at 3:27 PM, David Hinkle  wrote:

> I guess this doesn't work, latest test run crashed.  It still uses the
> bad plan for the hostid column even after n_distinct is updated.
>
> cipafilter=# select attname, n_distinct from pg_stats where tablename
> cipafilter-# = 'log_raw' and (attname = 'urlid' or attname = 'titleid'
> or attname =
> cipafilter(# 'hostid');
>  attname | n_distinct
> -+-
>  urlid   | 1.51625e+08
>  hostid  |  304710
>  titleid |  886499
> (3 rows)
>
> cipafilter=# explain DELETE FROM hosts WHERE NOT EXISTS ( SELECT
> log_raw.hostid FROM log_raw WHERE log_raw.hostid = hosts.hostid );
>QUERY PLAN
> 
> -
>  Delete on hosts  (cost=22249475.67..74844813.47 rows=1 width=12)
>->  Hash Anti Join  (cost=22249475.67..74844813.47 rows=1 width=12)
>  Hash Cond: (hosts.hostid = log_raw.hostid)
>  ->  Seq Scan on hosts  (cost=0.00..5017.10 rows=304710 width=10)
>  ->  Hash  (cost=12799395.52..12799395.52 rows=543645052 width=10)
>->  Seq Scan on log_raw  (cost=0.00..12799395.52
> rows=543645052 width=10)
> (6 rows)
>
> I guess I will also try throwing in 'set enable_hashjoin = false;' and
> see if that gets these purges to go.
>



Another option would be to force the de-dup to happen, with:

explain with t as (select distinct hostid from log_raw) delete from hosts
where not exists (select 1 from t where t.hostid=hosts.hostid)

That way you can use the hash join without running out of memory, in case
the hash join is actually faster than the merge join.  Also, it just seems
cleaner than fiddling with enable_* parameters and then having to remember
to reset them when done.

Cheers,

Jeff

Re: [GENERAL] disk writes within a transaction

2017-02-17 Thread Jeff Janes

On Thu, Feb 16, 2017 at 11:33 AM, 2xlp - ListSubscriptions <
postg...@2xlp.com> wrote:

> Can someone enlighten me to how postgres handles disk writing?  I've read
> some generic remarks about buffers, but that's about it.
>
> We have a chunk of code that calls Postgres in a less-than-optimal way
> within a transaction block.  I'm wondering where to prioritize fixing it,
> as the traffic on the wire isn't an issue.
>
> Basically the code looks like this:
>
> begin;
> update foo set foo.a='1' where foo.bar = 1;
> ...
> update foo set foo.b='2' where foo.bar = 1;
> ...
> update foo set foo.c='3' where foo.bar = 1;
> commit;
>
> If the updates are likely to be a memory based operation, consolidating
> them can wait.  If they are likely to hit the disk, I should schedule
> refactoring this code sooner than later.
>

You are going to generate more volume of WAL data, which has to reach disk
eventually.  Although it is likely they will all be consolidated into about
the same number of physical writes and syncs.

You are also likely to inhibit the Heap-only-tuple mechanism, because you
will end up with 4 copies of the row which all have to fit in the same
block.  If they don't, it has to migrate some of them to a different block
plus do index maintenance, so you will generate more dirty blocks that
way.  How many more depends on how many indexes you have, and whether the
columns being updated are themselves included in indexes.

There is also a CPU issue when the same tuple is updated repeatedly in a
single transaction.  Each update has to wade through all the previous row
versions, so it is an N^2 operation in the number of updates.

It will probably be easier to refactor the code than to quantify just how
much damage it does.

cheers,

Jeff

Re: [GENERAL] Bad planning data resulting in OOM killing of postgres

2017-02-13 Thread Jeff Janes

On Mon, Feb 13, 2017 at 12:43 PM, David Hinkle 
wrote:

> Thanks Jeff,
>
> No triggers or foreign key constrains:
>
> psql:postgres@cipafilter = \d+ titles
>  Table "public.titles"
>  Column  │   Type│Modifiers
>  │ Storage  │ Stats target │ Description
> ─┼───┼──
> ┼──┼──┼─
>  title   │ character varying │
>  │ extended │  │
>  titleid │ integer   │ not null default
> nextval('titles_titleid_seq'::regclass) │ plain│  │
> Indexes:
> "titles_pkey" PRIMARY KEY, btree (titleid)
> "titles_md5_title_idx" btree (md5(title::text))
>
> Do you see anything in there that would be problematic?
>


I'm out of ideas here.  What happens if you just select the rows, rather
than deleting them?  Does it have memory problems then?  If not, can you
post the explain (analyze, buffers) of doing that?

Cheers,

Jeff

Re: [GENERAL] Bad planning data resulting in OOM killing of postgres

2017-02-13 Thread Jeff Janes

On Mon, Feb 13, 2017 at 11:53 AM, David Hinkle 
wrote:

> Thanks guys, here's the information you requested:
>
> psql:postgres@cipafilter = show work_mem;
>  work_mem
> ──
>  10MB
> (1 row)
>

OK, new theory then.  Do you have triggers on or foreign key constraints to
the table you are deleting from?  It queues up each deleted row to go back
and fire the trigger or validate the constraint at the end of the
statement.  You might need to drop the constraint, or delete in smaller
batches by adding some kind of dummy condition to the WHERE clause which
you progressively move.

Or select the rows you want to keep into a new table, and then drop the old
one, rename the new one, and rebuild any constraints or indexes and other
dependencies.  This can be pretty annoying if there a lot of them.

Cheers,

Jeff

Re: [GENERAL] Bad planning data resulting in OOM killing of postgres

2017-02-13 Thread Jeff Janes

On Mon, Feb 13, 2017 at 9:40 AM, David Hinkle  wrote:

> I'm having trouble with purges related to a large table.   The delete
> query consumes ram until postgres crashes due to OOM.   I have a very
> large table called log_raw.  There are half a dozen related tables,
> such as 'urls' and 'titles'.   log_raw.urlid = urls.urlid and urls
> contains the text of the various urls, for example.
>
> Each time I try to purge these side tables the unit OOM's.
>
> psql:postgres@cipafilter = explain DELETE FROM titles WHERE NOT EXISTS
> ( SELECT 1 FROM log_raw WHERE log_raw.titleid = titles.titleid );
>QUERY PLAN
> 
> ─
>  Delete on titles  (cost=22166473.44..24850954.67 rows=870382 width=12)
>->  Hash Anti Join  (cost=22166473.44..24850954.67 rows=870382 width=12)
>  Hash Cond: (titles.titleid = log_raw.titleid)
>  ->  Seq Scan on titles  (cost=0.00..17871.64 rows=870664 width=10)
>  ->  Hash  (cost=12744792.64..12744792.64 rows=542011264 width=10)
>->  Seq Scan on log_raw  (cost=0.00..12744792.64
> rows=542011264 width=10)
> (6 rows)
>
> psql:postgres@cipafilter = select count(*) from (select titleid from
> log_raw group by titleid) as a;
>  count
> 
>  872210
> (1 row)
>
> cipafilter=# select n_distinct from pg_stats where tablename =
> 'log_raw' and attname = 'titleid';
>  n_distinct
> 
> 282
> (1 row)
>
> The planning data is wildly low for each of these fields, and I wonder
> if because of that error the planner thinks it can keep all these id's
> in ram while it works. Analyze doesn't fix it.   Increasing the
> statistics target improves the data in n_distinct but not
> considerably, as increasing it 3 or 4 fold leads to it still being
> wildly off.  ALTER TABLE set n_distinct doesn't seem to be used by the
> planner as it doesn't change any of the plans I've generated or seem
> to be taken into account in the row estimates. I'm out of ideas.
> Anybody have any ideas?
>


Your data on log_raw.titleid is probably clustered, so that any given page
of the table all has the same value for titleid. This really messes up the
sampling algorithm used by ANALYZE.  To overcome that, you would have to
increase the statistics target by 3 or 4 orders of magnitude, not a factor
of 3 or 4.

However, that doesn't seem to be the actual problem.  Surprisingly enough,
a hash anti-join doesn't automatically de-duplicate the hash table as it is
being built.  So n_distinct correctly does not have an influence on the
estimated RAM usage, because it doesn't influence the actual ram usage
either.

It sounds like your work_mem is set way too high.  What is it set to?  And
what version of PostgreSQL are you using?

Cheers,

Jeff

Re: [GENERAL] Locks Postgres

2017-02-10 Thread Jeff Janes

On Thu, Feb 9, 2017 at 9:00 PM, Patrick B  wrote:

> Hi guys
>
> I just wanna understand the locks in a DB server:
> [image: Imagem inline 1]
>
> Access share = Does that mean queries were waiting because an
> update/delete/insert was happening?
>

It would seem more plausible that your chart is showing the locks that are
*held*, not the locks that are *waiting to be granted*.  But without
knowing where the chart came from, we can't know for sure.

If those are locks being held, it just means your server was kind of busy
(which you already knew).  But we don't know how busy.  A single complex
query can easily hold several dozens locks.

Cheers,

Jeff

Re: [GENERAL] Transaction apply speed on the standby

2017-01-26 Thread Jeff Janes

On Thu, Jan 26, 2017 at 8:34 AM, Rakesh Kumar 
wrote:

> Ver 9.6.1
>
> In a streaming replication can it be assumed that if both primary and
> standby are of the same hardware, then the rate at which transactions are
> applied on the standby will be same as that on primary. Or standbys are
> always slower than primary in applying transactions because of the way
> replication works.
>

It could go either way.  The standby only has to apply the changes, not
compute them, so if the primary does something like:

UPDATE foobar set col1 = () where col2=?  ;

then the standby will replay it much faster than the primary needed to
execute it.

On the other hand, replay is done single-threaded.  If the primary has a
lot of active concurrent connections, replaying them serially could be much
slower than it took to produce them in the first place.  This might be true
of both CPU and of IO.  If your IO is a big RAID system, the primary could
keep multiple spindles active simultaneously by having multiple connections
waiting on different pieces of data independently, while replay will wait
on them serially.  There is currently not a prefetch mechanism for replay.

Cheers,

Jeff

Re: [GENERAL] Why autvacuum is not started?

2017-01-09 Thread Jeff Janes

On Mon, Jan 9, 2017 at 8:45 AM, Edmundo Robles  wrote:

> I have running Postgresql  9.4 and... if i have  a  table  with  following
>  configuration:
> autovacuum_vacuum_scale_factor=0.0,
> autovacuum_analyze_scale_factor=0.0,
> autovacuum_vacuum_threshold=1000,
> autovacuum_analyze_threshold=1000,
> autovacuum_enabled=true
>
> Why  autovacuum is not started if  the table has more than 1000 inserts???
>

Inserts do not generate obsolete tuples, and so are not counted against
the "vacuum threshold" as described here:
https://www.postgresql.org/docs/9.4/static/routine-vacuuming.html.

But inserts can change the data distributions, so do count against "analyze
threshold".

Due to index-only-scans and freeze maps, there are reasons to revisit this
topic, so that insert only tables do get vacuumed and not just analyzed.
But that re-think has yet to be finished, and certainly won't be
back-ported to 9.4.

Cheers,

Jeff

Re: [GENERAL] Plpgsql - Custom fields Postgres 9.5

2016-12-15 Thread Jeff Janes

On Wed, Dec 14, 2016 at 1:17 PM, Patrick B  wrote:

>
>
> 2. To call the function, I have to login to postgres and then run: select
> logextract(201612015, 201612015);
> How can I do it on cron? because the dates will be different every time.
>

PostgreSQL already knows what date today is.  Why does cron have to tell
it?  Just do 'select logextract()' and let Postgresql compute the dates for
itself.

Cheers,

Jeff

Re: [GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client

2016-12-06 Thread Jeff Janes

On Tue, Dec 6, 2016 at 2:44 AM, Tom DalPozzo  wrote:

> Hi,
> about SSD light:
>

> I guessed it was WAL -> actual db files data traffic. It explains why the
> light stops blinking after shutting down the server (I did it via kill
> command) .
>

Do you kill with -15 (the default) or -9?  And which process, the postgres
master itself or just some random child?

> But if so, I expected the light to restart blinking after restarting
> the server (in order to continue WAL->db activity).
>

The normal checkpoint is paced.  So trickling out data slowly will keep the
light on, but not actually stress the system.

When you shutdown the system, it does a fast checkpoint.  This gets the
data written out as quickly as possible (since you are shutting down, it
doesn't worry about interfering with performance for other users, as there
are none), so once it is done you don't see the light anymore.  If you do a
clean shutdown (kill -15 of the postgres master) this fast checkpoint
happens upon shutdown.  If you do an abort (kill -9) then the fast
checkpoint happens upon start-up, once recovery is finished but before the
database is opened of regular use.

 Cheers,

Jeff

Re: [GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client

2016-12-05 Thread Jeff Janes

On Fri, Dec 2, 2016 at 9:40 AM, Tom DalPozzo  wrote:

> Hi,
> I've two tables, t1 and t2, both with one bigint id indexed field and one
> 256 char data field; t1 has always got 1 row, while t2 is increasing as
> explained in the following.
>
> My pqlib client countinously updates  one row in t1 (every time targeting
> a different row) and inserts a new row in t2. All this in blocks of 1000
> update-insert per commit, in order to get better performance.
> Wal_method is fsync, fsync is on, attached my conf file.
> I've a 3.8ghz laptop with evo SSD.
>
> Performance is  measured every two executed blocks and related to these
> blocks.
>
> Over the first few minutes performance is around 10Krow/s then it slowly
> drops, over next few minutes to 4Krow/s, then it slowly returns high and so
> on, like a wave.
> I don't understand this behaviour. Is it normal? What does it depend on?
>

Yes, that is normal.  It is also very complicated.  It depends on pretty
much everything.  PostgreSQL, kernel, filesystem, IO controller, firmware,
hardware, other things going on on the computer simultaneously, etc.

>
> Also, when I stop the client I see the SSD light still heavily working.
>

This is normal.  It writes out critical data to a WAL log first, and then
leisurely writes out the changes to the actual data files later.  In the
case of a crash, the WAL will be used to replay the data file changes which
may or may not have made it to disk.

It would last quite a while unless I stop the postgresql server, in this
> case it suddenly stops.
>

Do you stop postgresql with fast or immediate shutdown?

> If I restart the server it remains off.
> I'm wondering if it's normal. I'd like to be sure that my data are safe
> once commited.
>

If your kernel/fs/SSD doesn't lie about syncing the data, then your data is
safe once committed. (It is possible there are bugs in PostgreSQL, of
course, but nothing you report indicates you have found one).

If you really want to be sure that the full stack, from PostgreSQL down to
the hardware on the SSD, is crash safe, the only real way is to do some
"pull the plug" tests.

Cheers,

Jeff

Re: [GENERAL] Moving pg_xlog

2016-12-02 Thread Jeff Janes

On Thu, Dec 1, 2016 at 6:17 PM, Michael Paquier 
wrote:

> On Thu, Dec 01, 2016 at 05:48:51PM +0200, Achilleas Mantzios wrote:
> >
> > Performance is the reason. You would benefit from moving pg_xlog to a
> > different controller with its own write cache or to a different SSD with
> a
> > write cache which is capacitor-backed. So in enterprise/server-class
> setups
> > the above would boost the performance. Using the same SSD with a
> different
> > partition won't give you much.
>
> For performance, on-disk write pattern of data in pg_xlog is sequential
> writes, while there will be likely random writes on the main data folder.
>

This is only the case if you have a write cache, or are doing bulk loads.
With small transactions and without a write cache, the need for constant
syncs totally destroys the benefits of sequential writes.

Cheers,

Jeff

Re: [GENERAL] Moving pg_xlog

2016-12-01 Thread Jeff Janes

On Thu, Dec 1, 2016 at 5:55 AM, Robert Inder 
wrote:

> I'm running  Postgres9.4 in master/hot-standby mode on a few pairs of
> servers.
>
> While recovering from A Bit Of Bother last week, I came across a
> posting saying that pg_xlog should be on a separate partition.
>
> I tried to find out more about this, by consulting the PostgresQL
> documentation (i.e.
> https://www.postgresql.org/docs/9.4/static/index.html )
> But all I could find was a mention that "It is advantageous if the log
> is located on a different disk from the main database files".
>
> The questions:
> 1. WHY is this good?  Is it (just) to stop pg_xlog filling the
> database disk/partition?

More like the reverse. Running the data partition out of space is bad.
Running the pg_xlog partition out of space is worse. Running both
partitions out of space at the same time is worse yet, which of course you
will do if they are the same partition and that one partition runs out of
space.

> Or are there performance implications?
> SPECIFICALLY: my database is currently in "/", which is on SSD.  Is it
> better to move pg_xlog to another partition on the same SSD?  Or to a
> physical disk or SAN?
>

If you have something with fast fsyncs (battery backed write cache, maybe
SSD), but that is not big enough to hold your entire database, then you
would want to put your pg_xlog on that, and the rest of the database on the
rest.  (if you are doing OLTP, anyway).

On some kernels and some file systems, having a constant stream of fsyncs
(from pg_xlog) interacts poorly with having ordinary non-immediately-synced
writes (from the regular data files) on the same partition.

> 2. What are the implications for doing a base backup?  I believe I
> read that putting pg_xlog on a different partition meant it would be
> omitted from a file-system bulk copy (e.g. rsync),

rsync has lots of options to control what happens with symbolic links and
mount points.  Or to exclude certain directories, symbolic links and mount
points not withstanding.

> and this was a GOOD
> thing, because the copy operation would be faster -- not copying
> pg_xlog would not prevent the standby server from starting, because
> the information it needed would be in the WAL files that would be
> shipped separately.  Have I got that right?
>
> Finally, the suggestion.
>
> I'd really like to read an explicit discussion of this in the official
> documentation, rather than just glean what I can from answers to
> questions.
>

The official documentation cannot have a dissertation on every combination
of hardware, OS, file-system type, version of that file-system, and your
usage pattern.  That is inherently the realm of the wiki or the blogs.

Cheers,

Jeff

Re: [GENERAL] Backup "Best Practices"

2016-11-29 Thread Jeff Janes

On Mon, Nov 28, 2016 at 11:20 PM, Thomas Kellerer 
wrote:

> Israel Brewster schrieb am 28.11.2016 um 23:50:
>
>>
>>> pg_archivecleanup -n /mnt/server/archiverdir
>>> 00010010.0020.backup
>>>
>>
>> Ok, but where does that "00010010.0020.backup"
>> come from? I mean, I can tell it's a WAL segment file name (plus a
>> backup label), but I don't have anything like that in my WAL
>> archives, even though I've run pg_basebackup a couple of times. Do I
>> have to call something to create that file? Some flag to
>> pg_basebackup? At the moment I am running pg_basebackup such that it
>> generates gziped tar files, if that makes a difference.
>>
>
> The .backup file will be inside the tar file if I'm not mistaken

I don't think it will be, but there will be a backup_label file in there,
which contains much of the same contents as the .backup file does.  But in
this case, the contents of the file are not important, only the name is.
 pg_archivecleanup doesn't attempt to open the file-name given as the
second argument, it just looks at the name itself.  So you could pull
backup_label out of the tar file, parse the contents and use them to
construct the command to give to pg_archivecleanup.

I think it would really be nice if pg_basebackup -D backup_dir -Ft would
create the backup_label file not only in the tarball, but also (as the
final step) create it as a loosie file in the backup_dir.

Cheers,

Jeff

Re: [GENERAL] Backup "Best Practices"

2016-11-28 Thread Jeff Janes

On Mon, Nov 28, 2016 at 2:50 PM, Israel Brewster 
wrote:

>
> - What is the "best" (or just a good) method of keeping the WAL archives
>> under control? Obviously when I do a new basebackup I can "cleanup" any old
>> files that said backup doesn't need,
>>
>
> You have said you might be interested in doing PITR. So you want to delay
> the cleanup so as to not compromise that ability.  You need to develop a
> policy on how far back you want to be able to do a PITR.
>
>
>
>> but how do I know what those are?
>>
>
> pg_archivecleanup -n /mnt/server/archiverdir 00010010.
> 0020.backup
>
>
> Ok, but where does that "00010010.0020.backup" come
> from? I mean, I can tell it's a WAL segment file name (plus a backup
> label), but I don't have anything like that in my WAL archives, even though
> I've run pg_basebackup a couple of times.
>

I get one file like that for every pg_basebackup I run.  Could your
archive_command be doing something to specifically short-circuit the
writing of those files?  Like testing the length of %p or %f?

> Do I have to call something to create that file? Some flag to
> pg_basebackup? At the moment I am running pg_basebackup such that it
> generates gziped tar files, if that makes a difference.
>

That is how I run it as well.  I don't think there is a flag to
pg_basebackup which even allows you to bypass the creation of those files.
You are looking in the WAL archive itself, correct?  Not somewhere in a
listing of the base.tar.gz file?

Cheers,

Jeff

Re: [GENERAL] Query regarding deadlock

2016-11-25 Thread Jeff Janes

On Thu, Nov 24, 2016 at 5:44 PM, Yogesh Sharma <
yogesh1.sha...@nectechnologies.in> wrote:

> Dear All,
>
> Thanks in advance.
> I found below deadlock in postgresql logs.
> I cannot change calling of REINDEX and insert query sequence because it is
> execute automatically through some cron script.
>
> ERROR:  deadlock detected
>  DETAIL:  Process 2234 waits for AccessShareLock on relation 16459 of
> database 16385; blocked by process 4111.
>  Process 4111 waits for ShareLock on relation 16502 of database 16385;
> blocked by process 2234.
>  Process 2234: INSERT INTO table1 ( id , unique_id )VALUES( '1', '4')
>  Process 4111: REINDEX TABLE table1
>
> Could you please provide any solution to resolve this deadlock.
>

What are tables 16459 and 16502?  Are they related to each other through
triggers or FK constraints?

Are you reindexing multiple tables in the same transaction?  If not, I
don't see why these should deadlock.  One should win, and the other should
block.

If you are reindexing multiple tables in the same transaction, why are you
doing that?  I can't think of a situation where you couldn't use separate
transactions per table.

Cheers,

Jeff

Re: [GENERAL] Backup "Best Practices"

2016-11-25 Thread Jeff Janes

On Wed, Nov 23, 2016 at 10:16 AM, Israel Brewster 
wrote:

> I was wondering if someone could inform me about, or point me to an online
> article about best practices for backing up a postgresql database cluster?
> At the moment, I have two servers running with streaming replication for
> failover purposes, and then I do nightly database dumps for recovery/backup
> purposes. However, I feel this approach is lacking in a couple areas I can
> think of:
>
> - If a total failure occurs towards the end of the day, we could
> potentially loose a whole days worth of data.
>

Why wouldn't the streaming replica salvage that?  Are they expected to fail
together?  Is the NFS share onto which you want to store your basebackup
and WAL also expected to fail together with them?

> Similar argument for user error - there is no way to "undo" a catastrophic
> user data error without going all the way back to the previous day
> - Less-than-ideal recovery under some scenarios. Since each database in
> the cluster is dumped individually, this is good should only *one* database
> need to be restored, but could get tedious should the entire cluster need
> to be restored.
>
> To mitigate these issues, I am thinking of supplementing the individual
> dumps with a full base backup and WAL archiving to a NFS share. This should
> enable (relatively) quick/easy recovery from backup, plus the ability to do
> PIT Recovery. I do have a few questions with this approach, however:
>
> - How do I figure out how often I should take a full base backup? I know
> this will depend on the amount of traffic my database is doing, and how
> long I am willing to wait for WAL files to be replayed - the more WAL files
> needing replayed, the longer recovery will take - but is there some rule of
> thumb that I can use to calculate how often I need a new base backup?
> Perhaps based on the number of WAL files?
>

You have to try it and see.  Different types of wal records will take
different amounts of time to re-play, so there is no rule of thumb. It
would depend on the type of traffic you have in your database.  And it
could be limited by a single CPU, or by IO.  If the restore_command needs
to restore the WAL from a remote server, it is very likely to be limited by
the latency of doing that.  In fact, this is often the bottleneck even if
it is restoring from the local server, at least if archival is often driven
by archive_timeout.

When I need to re-clone production to get a fresh server to use for dev or
testing, I do so using almost exactly the same method I would use for
restoring production from a disaster (restore from most recent basebackup,
then recovery from WAL archive).  So I know how long it takes for the
recovery to happen based on true experience, and I take a new basebackup
when that length of time starts to annoy me.

> - What is the "best" (or just a good) method of keeping the WAL archives
> under control? Obviously when I do a new basebackup I can "cleanup" any old
> files that said backup doesn't need,
>

You have said you might be interested in doing PITR. So you want to delay
the cleanup so as to not compromise that ability.  You need to develop a
policy on how far back you want to be able to do a PITR.

> but how do I know what those are?
>

pg_archivecleanup -n /mnt/server/archiverdir
00010010.0020.backup

> - Should I be looking at any other backup methods in addition to/instead
> of the basebackup/WAL archive scheme?
>

You may want to consider pg_receivexlog to maintain your WAL archive,
rather than archive_command.  That way you don't have to worry about the
trades off caused by setting archive_timeout.  But unless you use it with a
replication slot, it is not very safe as the pg_receivexlog could stop
working and your database would happy run along without protection.  Also,
it is hard to be sure you are reliably issuing an fsyncs over NFS, so with
archive_command over NFS there is always the risk your WAL data is not
actually reaching disk in a timely fashion.  So if you can run
pg_receivexlog running on the NFS-host machine pointed to the local
storage, not looping back over NFS, that is safer.

Cheers,

Jeff

Re: [GENERAL] Strict min and max aggregate functions

2016-11-20 Thread Jeff Janes

On Sun, Nov 20, 2016 at 2:45 AM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

>
>
> 2016-11-19 22:12 GMT+01:00 Jeff Janes <jeff.ja...@gmail.com>:
>
>> I need "strict" MIN and MAX aggregate functions, meaning they return NULL
>> upon any NULL input, and behave like the built-in aggregates if none of the
>> input values are NULL.
>>
>> This doesn't seem like an outlandish thing to want, and I'm surprised I
>> can't find other discussion of it.  Perhaps because none of the words here
>> are very effective as search terms as they are so individually common.
>>
>> I've hit upon a solution that works, but it is both ugly and slow (about
>> 50 fold slower than the built-ins; for my current purpose this is not a big
>> problem but I would love it to be faster if that could be done easily).
>>
>> So here is my approach.  Any suggestions to improve it?  Or are there
>> better canned solutions I've failed to find?
>>
>>
>> -- If no values have been delivered to the aggregate, the internal state
>> is the
>> -- NULL array.  If a null values has been delivered, the internal status
>> is an
>> -- array with one element, which is NULL.  Otherwise, it is an array with
>> one element,
>> -- the least/greatest seen so far.
>>
>> CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )
>> RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
>> SELECT CASE
>> WHEN $1 IS NULL THEN ARRAY[$2]
>> WHEN $1[1] IS NULL THEN $1
>> WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve
>> type
>> ELSE ARRAY[least($1[1],$2)] END ;
>> $$;
>>
>>
>> CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
>> RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
>> SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
>> $$;
>>
>> CREATE AGGREGATE strict_min (x anyelement) (
>> sfunc = strict_min_agg,
>> stype = anyarray,
>> finalfunc = strict_min_final
>> );
>>
>
> can you use plpgsql instead sql?
>

I can.  Would there be an advantage?

you can use composite type instead array too.
>

I tried a composite type of (flag int, value anyelement) but you can't use
anyelement in a composite type.  So the aggregate function couldn't be
polymorphic.  Or, that was my conclusion after making a few attempts. Maybe
I need to give on polymorphism if I want to get performance?

Cheers,

Jeff

[GENERAL] Strict min and max aggregate functions

2016-11-19 Thread Jeff Janes

I need "strict" MIN and MAX aggregate functions, meaning they return NULL
upon any NULL input, and behave like the built-in aggregates if none of the
input values are NULL.

This doesn't seem like an outlandish thing to want, and I'm surprised I
can't find other discussion of it.  Perhaps because none of the words here
are very effective as search terms as they are so individually common.

I've hit upon a solution that works, but it is both ugly and slow (about 50
fold slower than the built-ins; for my current purpose this is not a big
problem but I would love it to be faster if that could be done easily).

So here is my approach.  Any suggestions to improve it?  Or are there
better canned solutions I've failed to find?


-- If no values have been delivered to the aggregate, the internal state is
the
-- NULL array.  If a null values has been delivered, the internal status is
an
-- array with one element, which is NULL.  Otherwise, it is an array with
one element,
-- the least/greatest seen so far.

CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )
RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
SELECT CASE
WHEN $1 IS NULL THEN ARRAY[$2]
WHEN $1[1] IS NULL THEN $1
WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve
type
ELSE ARRAY[least($1[1],$2)] END ;
$$;


CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
sfunc = strict_min_agg,
stype = anyarray,
finalfunc = strict_min_final
);


Cheers,

Jeff

Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-13 Thread Jeff Janes

On Sat, Nov 12, 2016 at 5:33 PM, otar shavadze  wrote:

> Tried
>
>   OPERATOR(pg_catalog.@>)
>
>
>  as Tom mentioned, but still, don't get fast performance when value does
> not existed in any array.
>

Did you build the correct index?

>
> Also "played" with many   different ways, gin, gist indexes (gin with and
> without *gin__int_ops*)  but, always, there was some situation, where
> search in array was slow.
>

Yes.  There will always be some situation when the array search is slow.
Is that situation one that a specific person cares about?  Hard to tell,
since you have not given us any additional useful information.

I don't know exactly, may be I am wrong, but what I understood after
> several day "trying", is that, I never will use arrays, with tables more
> than 500 000-1000 000 rows, because then  searching in this array is
> somehow problematic.
>
> I rebuild my structure and added another table (instead of using array)
> and then used join's instead of searching in array.
>
> That's works perfectly,   joining works fast as hell, even for several
> millions rows in each table.
>

"Properly" normalizing your data is a wonderful thing, no doubt about it,
if you are prepared to deal with the consequences of doing so.  But not
everyone has that luxury.  Which is why there is more than one way of doing
things.

Cheers,

Jeff

Re: [GENERAL] Trigram is slow when 10m rows

2016-11-13 Thread Jeff Janes

On Sun, Nov 13, 2016 at 3:54 AM, Aaron Lewis 
wrote:

> I have a simple table with Trigram index,
>
> create table mytable(hash char(40), title text);
> create index title_trgm_idx on mytable using gin(title gin_trgm_ops);
>
> When I run a query with 10m rows, it uses the Trigram index, but takes
> 3s to execute, very slow.
> (I have 80m rows, but only inserted 10m for testing purpose)
>
> test=# select count(*) from mytable;
>   count
> --
>  13971887
> (1 row)
>
> test=# explain analyze select * from mytable where title ilike 'x264';
>   QUERY PLAN
> 
> --
>  Bitmap Heap Scan on mytable  (cost=462.69..5639.67 rows=1380
> width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
>Recheck Cond: (title ~~* 'x264'::text)
>Rows Removed by Index Recheck: 11402855
>Heap Blocks: exact=39557 lossy=158010
>->  Bitmap Index Scan on title_trgm_idx  (cost=0.00..462.35
> rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
>  Index Cond: (title ~~* 'x264'::text)
>  Planning time: 0.611 ms
>  Execution time: 2937.729 ms
> (8 rows)
>
> Any ideas to speed things up?
>


What version of postgresql and pg_trgm are you using?  It might work better
under 9.6/1.3

Since your query doesn't use wildcards, it is probably more well suited to
a regular btree index, perhaps with citext.

Cheers,

Jeff

Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-10 Thread Jeff Janes

On Thu, Nov 10, 2016 at 7:11 AM, Tom Lane  wrote:

> otar shavadze  writes:
> >> Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray
> >> operator not the core operator?  The intarray operator didn't get
> plugged
> >> into any real estimation logic until 9.6.
>
> > So, you mean that better would be go to version 9.6 ?
>
> If you are using that contrib module, and it's capturing this operator
> reference, that would probably explain the bad estimate.  You could
> drop the extension if you're not depending on its other features, or you
> could explicitly qualify the operator name ("operator(pg_catalog.@>)"),
> or you could upgrade to 9.6 (don't forget to do ALTER EXTENSION ... UPDATE
> afterwards).
>

Isn't the operator determined at index build time?  If he doesn't want to
update to 9.6, I think he would need to rebuild the index, removing
the "gin__int_ops" specification.

Cheers,

Jeff

Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-08 Thread Jeff Janes

On Tue, Nov 8, 2016 at 12:27 PM, otar shavadze  wrote:

>
> p.s. In "pg_stats" really many values (long lists in "most_common_vals",
> "most_common_freqs") and in another columns
> Which one columns should I show you? All?
>

most_common_elems.  Is it empty, or is it not empty?  If not empty, does it
contain the specific values you used in your queries?

Cheers,

Jeff

1 2 3 4 5 6 7 >

1 - 100 of 631 matches

Mail list logo