Re: [GENERAL] Question regarding logical replication

2017-10-27 Thread Francisco Olarte
On Fri, Oct 27, 2017 at 12:04 PM, Weiping Qu <q...@informatik.uni-kl.de> wrote:
> That's a good point and we haven't accounted for disk caching.
> Is there any way to confirm this fact in PostgreSQL?

I doubt, as it names indicates cache should be hidden from the db server.

You could monitor the machine with varying lags and see the disk-cache
hit ratio , or monitor the throughput loss, a disk-cache effect should
exhibit a constant part for little lags, where you mostly do cache
reads, then a rising part as you begin reading from disks stabilizing
asyntotically ( as most of the fraction of reads comes from disk, but
it could also exhibit a jump if you are unlucky and you evict pages
you'll need soon ), but it is not a simple thing to measure, specially
with a job mix and long delays.

The xlog can do strange things. IIRC it is normally write-only ( only
used on crash recovery, to archive (ship) it and for log based
replication slots ), but postgres recycles segments ( which can have
an impact on big memory machines ). I do not know to what extent a
modern OS can detect the access pattern and do things like evict the
log pages early after sync.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question regarding logical replication

2017-10-27 Thread Francisco Olarte
On Thu, Oct 26, 2017 at 10:20 PM, Weiping Qu <q...@informatik.uni-kl.de> wrote:

> However, the plots showed different trend (currently I don't have plots on
> my laptop) which shows that the more frequently are the CDC processes
> reading from logical slots, the less overhead is incurred over PostgreSQL,
> which leads to higher throughput.

Have you accounted for disk caching? Your CDC may be getting log from
the cache when going with little lag but being forced to read from
disk (make the server do it ) when it falls behind.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Restoring tables with circular references dumped to separate files

2017-10-22 Thread Francisco Olarte
On Sat, Oct 21, 2017 at 10:48 PM, doganmeh <meh...@edgle.com> wrote:
...
> On another note, I used to take full backups (entire database), however
> switched to table by table scheme in order to make it more VCS friendly.
> Namely, so I only check into github the dumps of the tables that are updated
> only.
> So, from that perspective, is there a dump-restore scenario that is widely
> used, but is also VCS friendly? To my knowledge, pg_restore does not restore
> backups that are in "plain text" format, and compressed formats such as
> "tar" would not be github friendly.

Not widely used, but you have the directory format ( disclaimer: have
not tested it for VCS friendliness ). It populates a directory similar
to what uncompressing a tar format would, but I do not know if it
renames the files from run to run, but should be easy to test.

Also note it is documented as compressed BY DEFAULT, but you can use
options to avoid compression, and it is the only one which supports
paralell dumps.

Also, custom and tar can be made uncompressed, but I do not think
that's a great idea.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table partition - parent table use

2017-09-15 Thread Francisco Olarte
Luiz:

1st thing, do not top-quote. It's hard to read and I, personally,
consider it insulting ( not the first time it's done, and for obvious
reasons ).

On Fri, Sep 15, 2017 at 4:24 PM, Luiz Hugo Ronqui <lron...@tce.sp.gov.br> wrote:
> Our usage allows us to insert all rows into the hot partition, since its a 
> rare event to receive data that otherwise would have to be redirected to a 
> "colder" partition.
> This way, its not a problem that the parent table would always be searched. 
> In fact it would guarantee that these bits, received "out of time", would get 
> accounted.

The problem of always being searched is not for recent rows, but for
historic. Imagine hot=2016-7, warm=2013-5 and cold=rest

If hot=parent and you make a query for 2014 data it's going to search
hot and warm, not just warm. If hot!=parent it is going to search
parent and warm ( and use a seq-scan in parent in the normal case, as
stats show it as empty , and it will be if things are going well ).

> The number of partitions, especially the "cold" ones, is not a hard limit... 
> we can expand it with time.

I know, my recomendation was to made them in such a way that once a
row lands in an historic partition it never moves if you use more than
one ( i.e., use things as cold-200x, cold-201x, not cold-prev-decade,
cold-two-decades-ago )

> The idea includes schemas and tablespaces, along with its management 
> benefits,  specifically for these partitioned data. One of our current 
> problems is exactly the time it takes for backup and restore operations. I 
> did not mentioned it before because of the size of the original message.

We normally do the schema trick, and as 90% of data is in historic
schema, we skip most of it.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table partition - parent table use

2017-09-15 Thread Francisco Olarte
Hi Luiz:

On Thu, Sep 14, 2017 at 11:06 PM, Luiz Hugo Ronqui
<lron...@tce.sp.gov.br> wrote:
...
> We have a database with data being inserted for almost 10 years and no
> policy defined to get rid of old records, even though we mostly use only the
> current and last year's data. Some etl processes run on older data from time
> to time.
> After this time, some tables have grown to a point where even their indexes
> are bigger than the server's available RAM. Because some queries were
> getting slower, despite optimizations, we started experimenting with table
> partitioning.
> The idea was creating 3 partitions for each table of interest: the "hot",
> the "warm" and the "cold". The first would have the last 2 years. The
> second, data from 3 to 5 years and the third, all the rest.

I would consider using more than one cold partition, and maybe moving
them AND warm to a different schema. Maybe 5 years in each, something
like cold-2000-2009, cold-2010-2019. You can update the constraints
adequately, but the thing is you periodically update your constraints
in the hot, warm and last cold, moving data among them appropiately,
then do a really good backup of warm and colds and you can forget
about them in daily backups, and also if you want to drop "stale" in
the future, or un-inherit them to speed up queries, it is easier to
do.

...
> Then one thing came to mind: Why not to use the "parent" table as the hot
> one, without doing any redirection at all? That way we could:
> 1)  Keep the referential integrity of the current model untouched;
> 2)  Dismiss the trigger redirection along with the hybernate issue;
> 3)  Have a much smaller dataset to use in most of our queries;
> 4)  Have all the historic data when needed

You can do it, but remember parent normally does not have constraints,
so it is always scanned ( fastly as it is known empty ). Also select
from only parent is useful to detect when you are missing partitions,
won't work in this case. But you can test it.

...
> I have run some basic tests and all seemed to work as expected, but since I
> couldn't find any use of  the parent table besides  being the head of the
> hierarchy, I am affraid of doing something that could stop  because it wasnt
> designed to work like that to begin with...

Seems fine to me. Never used that because y normally use special
insertion programs for my partitiones tables ( my usage allows thats
), so I insert directly in the appropiate partition always ( so I just
use inheritance, no triggers or rules ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL COPY Statement Error On Linux

2017-09-12 Thread Francisco Olarte
George:

On Tue, Sep 12, 2017 at 6:40 PM, George Neuner <gneun...@comcast.net> wrote:
> Francisco already pointed out that Linux doesn't understand the
> backslashes in the file path, however it should be noted that Windows
> *does* understand forward slashes and that [modulo disk names in
> Windows] you can use forward slash paths on both systems.

That's not strictly correct. Linus understand backslahes in paths
fine, they are just not a directory separator ( there are only two
reserved byte values in path names, IIRC, slash for dir sep and nul
for string end, C issues ). Windows, OTOH, inherits path separator
logic from MSDOS 2.0, and if it hasn't changed in the last fifteen
years treats any slash as a separator.

But the issue is that windos treats a \\netname\resource prefix as a
network request, and transform it internally, while linux does not. In
*ix you have to connect to
the machine and mount its resources first, similarly to what you do
with local disks. Normally //x is treated the same as /x, or as x:

folarte@n:~$ ls -ldi /tmp //tmp ///tmp
655361 drwxrwxrwt 14 root root 45056 Sep 12 19:17 /tmp
655361 drwxrwxrwt 14 root root 45056 Sep 12 19:17 //tmp
655361 drwxrwxrwt 14 root root 45056 Sep 12 19:17 ///tmp

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL COPY Statement Error On Linux

2017-09-12 Thread Francisco Olarte
On Tue, Sep 12, 2017 at 12:30 PM, Osahon Oduware <osahon@gmail.com> wrote:
> I am trying to utilize the "COPY" statement below to copy a .CSV file to a
> table in a PostgreSQL database.:
> COPY .() FROM
> '\\shared\network\path\to\csv\test.csv' DELIMITER ',' CSV HEADER;
>
> This works with a PostgreSQL database installed in a WINDOWS environment
> (Windows 7), but fails with the following error with a similar PostgreSQL
> database in a Linux environment (Centos 7):
> org.postgresql.util.PSQLException: ERROR: could not open file
> "\\shared\network\path\to\csv\test.csv" for reading: No such file or
> directory
>
> I have granted READ access to EVERYONE on the CSV folder on the network path
> as depicted in the attached image.
> Could someone point me to the reason for the failure in Linux?

You are invoking server side copy. This means the SERVER neads to be
able to access the file under the name you've given to it.

The network path you have given is valid on windows machines ( UNC
path? It's been a decade an a half since Iast used windows ), but not
on linux. Typically on linux you mount the shared folder /some/where
and type the path as /some/where/path/to/csv/test.csv.

You may be needing a CLIENT copy. I do not see which client program
you are using, it may be some fancy GUI stuff, in which case I cannot
help you. If you are using the standard "psql" tool you can just use
\copy. As explained in the docs this just does "copy from stdin" ( or
to stdout ) on the client side and redirects the file you give in the
command line ( or you can issue a [psql ... -c "copy ...from stdin"]
in a command line and feed the file via shell redirections, but, IIRC,
windows shells are terrible at quoting arguments and redirecting i/o,
so it may be better to avoid it).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] column names query

2017-09-07 Thread Francisco Olarte
Hi Hamman:

On Thu, Sep 7, 2017 at 3:17 PM,  <haman...@t-online.de> wrote:
> I would like to do something like
> \copy (select heading(select  my query here)) to /tmp/heading_testfile1

It's already been pointed out, you can do something using CSV copy
with headers ( Why headers are not allowed in other formats remains a
mistery to me), and LIMIT 0 ( I would try adding AND FALSE to the
where clause better, it may lead to faster response, although I doubt
it) .

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] column names query

2017-09-07 Thread Francisco Olarte
On Thu, Sep 7, 2017 at 9:18 AM,  <haman...@t-online.de> wrote:
> is there a simple way to retrieve column names from a query - basically the 
> way psql adds
> column headings when I do a select?

How do you do the query? I mean, JDBC, PERL? After all psql is just a
C program doing a query using libpq and can do it, we may provide some
useful info if you show yours first.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Numeric numbers

2017-09-02 Thread Francisco Olarte
Олег:

On Sat, Sep 2, 2017 at 7:04 PM, Олег Самойлов <ol...@mipt.ru> wrote:
...

>> Well, I just skipped over the rest of the code. I consider casting to
>> unespecified numeric widths a very bad habit and did not want to
>> encourage it.
> There is nothing in documentation that this casting is a very bad
> habit. And this is most interesting part, about infinite zeros. Looked
> like a bug.

It's not in the documentation, as it rarely deals with habits and
styles. And many people could consider it perfectly kosher. It's just
something  *I* consider a bad habit personally, feel free to use it as
much as you like, or to recomend it as good if you want. Normally I
wouldn't even mention it, as I did not in my first response, I just
did to explain why I ignored the tail.



Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Numeric numbers

2017-09-02 Thread Francisco Olarte
CCing the list ( hint: use reply all in your MUA, otherwhise people
will loose the thread, this message came only for me. If that was what
you wnated, please indicate so in future messages, as the custom in
this list is to reply to list + posters )

On Sat, Sep 2, 2017 at 6:21 PM, Олег Самойлов <ol...@mipt.ru> wrote:
> On Sat, 2017-09-02 at 17:54 +0200, Francisco Olarte wrote:
>> It's probably doing 1(integer) => double precioson => numeric(20) or
>> something similar if you do not specify.
>>
>> Francisco Olarte.
>
> Well, the question was not only about why there is only 20 "3" after
> point, I suspect this (may be it's not good enough documented, but
> reasonable), but also about why are there infinite amount of "0" after
> point if I subtract "3"s.

Well, I just skipped over the rest of the code. I consider casting to
unespecified numeric widths a very bad habit and did not want to
encourage it.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Numeric numbers

2017-09-02 Thread Francisco Olarte
Vincenzo:

On Sat, Sep 2, 2017 at 6:20 PM, Vincenzo Romano
<vincenzo.rom...@notorand.it> wrote:
> And I think Francisco is asking why only 20 digits.

No need to think, I wasn't asking anything.

I'm used to FLOAT ( I think actually DP )numbers being converted to
numeric(,20), and I normally never cast to unspecified numeric
precision ( except for toy one liners and the like ), and I was trying
to show the OP why he got 20.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Numeric numbers

2017-09-02 Thread Francisco Olarte
On Sat, Sep 2, 2017 at 4:16 PM, Олег Самойлов <ol...@mipt.ru> wrote:
> Hi all. I have silly question. Look at "numeric" type. According to
> docs it must be "up to 131072 digits before the decimal point; up to
> 16383 digits after the decimal point". Well, lets see.
>
> => select 1::numeric/3;
> ?column?
> 
>  0.

=> select 1::numeric(100,90)/3;
   ?column?
--
 
0.33
(1 row)

It's probably doing 1(integer) => double precioson => numeric(20) or
something similar if you do not specify.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] DROP [TEMP] TABLE syntax, as reason why not?

2017-08-24 Thread Francisco Olarte
On Thu, Aug 24, 2017 at 12:48 PM, Vincenzo Romano
<vincenzo.rom...@notorand.it> wrote:

> Isn't a CHOICE for better syntax enough?
> Aren't symmetry and consistency valuable arguments?
> Syntactic sugar is not evil on its own.
> It can help people writing code that can be better understood.

Valid arguments, but those extensions are NOT free to develop, test
and maintain. And every syntax extensions, specially one like this,
introduces the possibility of collisions with future standards ( de
facto or de iure, although Pg already deviates from ansi on the temp
stuff ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] DROP [TEMP] TABLE syntax, as reason why not?

2017-08-24 Thread Francisco Olarte
On Thu, Aug 24, 2017 at 11:46 AM, Vincenzo Romano
<vincenzo.rom...@notorand.it> wrote:
> Once you accept that Postgres is already extending the standard, I
> would focus on syntax consistency and symmetry as a yet-another-extra
> value from Postgres.
>
> Moreover, "DROP TEMP TABLE..." would make it clear and explicit that
> the table is temporary.
> And it would thus "protect the programmer from typos and errors"
> (intentional tongue-in-cheek).


Given drop temp table x is just syntactic sugar to drop table
PG_TEMP.x I think the (slight) increase on the bug-surface is enough
to avoid it, as the pg_temp. makes it equally clear and explicit you
are dropping a temporary table.

And if the programmer forgets the pg_temp. it can equally forget the TEMP.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Would you add a --dry-run to pg_restore?

2017-08-03 Thread Francisco Olarte
On Wed, Aug 2, 2017 at 7:10 PM, Edmundo Robles <edmu...@sw-argos.com> wrote:
>
> I  imagine   pg_restore can  execute  the instructions on dump but  don't  
> write on disk.   just like David said: "tell me what is going to happen but 
> don't actually do it"


IIRC pg_restore does not execute SQL fully. It just passes the
commands to the server when in text mode, like psql, and builds some
commands and send them to the server to execute when in custom/tar
mode. I doubt it has the ability to validate the contents of the dump.

>> Edmundo Robles <edmu...@sw-argos.com> writes:
>> > I mean,  to   verify the integrity of backup  i do:
>> > gunzip -c backup_yesterday.gz | pg_restore  -d my_database  && echo
>> > "backup_yesterday is OK"

I also think if he is piping the dump must be text mode, I seem to
recall custom format needs seekable files, but not too sure about tar,
it should not. In this case, as previously suggested, a simple gunzip
-t is enough to verify backup file integrity, but checking internal
correctness is very difficult ( as it may even depend on server
configuration, i.e., needing some predefined users / locales /
encodings ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Do not INSERT if UPDATE fails

2017-08-03 Thread Francisco Olarte
On Wed, Aug 2, 2017 at 6:23 PM, Scott Marlowe <scott.marl...@gmail.com> wrote:
> Does insert's "on conflict" clause not work for this usage?

Did you even bother to read the queries? He is using two different tables.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PSQL command line print speed

2017-05-18 Thread Francisco Olarte
Adrian:

On Wed, May 17, 2017 at 8:40 PM, Adrian Klaver
<adrian.kla...@aklaver.com> wrote:
> On 05/17/2017 09:46 AM, Adrian Myers wrote:
>> Ah I should have mentioned, the pager is off.
> Is that by choice and if so why?
>
> With the pager off you have to wait for the entire output to write to the
> screen. For anything but a small dataset that is going to take time.

Are you sure? IIRC hespecifically said the output started promptly but
was slow, i.e. speed, not latency problems as your are hinting.

And, IIRC again, psql uses libpq which always buffer the entire
response to the queries.

And nothing prohibits you from starting the output as soon as you can
calculate it even if no pager there ( note you have to make things
like calculate column widths, but this is done with and without pager
). In fact normally the pager will just introduce an small but
potentially noticeable delay.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PSQL command line print speed

2017-05-18 Thread Francisco Olarte
CCing the list.

Adrian:

On Wed, May 17, 2017 at 6:58 PM, Adrian Myers <hadrianmy...@gmail.com> wrote:
> That is a great question, and no, I'm not. This is on Windows and I see this
> with normal cmd.exe and ConEmu but I have been using ConEmu for some time
> and have not looked into settings there. Thanks for that suggestion.

I do not know what ConEmu is, and have being out of windows since w2k.
But I do remember that, although windows console windows where great (
when using their api ) their terminal  emulation capabilities where
not that good, and some times they did things like slow ( aka "smooth"
) scroll and similar things which greatly impaired their performance.

I would test that ConEmu with some simple program. If you have a
classic cat it would be nice ( not type, which is internal, or any
windows program which maybe skipping stdin and goin directly to the
console api , even a simple 'while((c=getc())!=EOF) putc(c)' should be
fast if the console/redirections is not playing tricks.

You can also test with a single table, single text field, put a
several pages text value there and select it. If it is slow, I would
bet for terminal emulator problems.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PSQL command line print speed

2017-05-17 Thread Francisco Olarte
Adrian:

On Wed, May 17, 2017 at 6:03 PM, Adrian Myers <hadrianmy...@gmail.com> wrote:
> Queries which perform in a few milliseconds through a driver (psycopg2 in
> this case) can take several seconds or minutes to complete in the console,
> seemingly just due to it printing lines at a very leisurely pace (in other
> words, this is not the delay caused by the query itself, often it starts
> printing almost instantly but spends a very long time just writing output).
> While overall application performance is unaffected, simple admin tasks and
> poking around the data take longer than I would like as a result. Is there
> any way to improve just the display/write performance in the console?

Are you sure the culprit is psql and not you terminal emulator ?


Francisco Olarte.-


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Top posting....

2017-05-12 Thread Francisco Olarte
George:

On Fri, May 12, 2017 at 2:23 AM, George Neuner <gneun...@comcast.net> wrote:

> I agree 100%.  But excessive brevity can make it so a reader can't
> follow the conversation.  Users of web forums often assume *you* can
> easily look back up the thread because *they* can.  In my experience,
> it isn't always easy to do.

Excessive = too much, normally implies bad things.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Top posting....

2017-05-11 Thread Francisco Olarte
Slightly unrelated...

On Wed, May 10, 2017 at 11:21 PM, Gavin Flower
<gavinflo...@archidevsys.co.nz> wrote:
> It is normal on this list not to top post, but rather to add comments at the
> end (so people can see the context) - though interspersed comments in the
> body of the text is okay when appropriate!

I'd rather say interspersed comments with the TRIMMED text body is the
appropiate thing to do. Bottom posting ( edited ) being a particular
case of that when only a single topic/question is being answered.

Full quoting ( I mean the people which even quotes others signatures )
is especially ugly, combined with top posting I feel it as insulting (
to me it feels as 'you do not deserve me taking time to edit a bit and
make things clear' ) ( but well, I started when all the university
multiplexed over a 9600bps link, so I may be a bit extreme on this )

Regards.
Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Python versus Other Languages using PostgreSQL

2017-05-09 Thread Francisco Olarte
On Tue, May 9, 2017 at 1:44 PM, vinny <vi...@xs4all.nl> wrote:
> In fact, I don't think many companies/developers even choose a language
> or database, but rather just use whatever they have experience in.

That is choosing. You choose them because you know them.


Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Python versus Other Languages using PostgreSQL

2017-05-09 Thread Francisco Olarte
Paul:

On Tue, May 9, 2017 at 2:45 AM, Paul Hughes <p...@vivation.com> wrote:
> My question still remains though - why is it that all the largest web 
> platforms that have used PostgreSQL *specifically* choose Python as their 
> back-end language?

Do you have any data supporting that? AFAIK people tend to choose the
language first, database second, not the other way round, and many
times the platform language is nailed, but the db can be changed.
Also, WHICH platforms are you referring to?

> Why are Postgres and Python so married, in the same way that Node.js is 
> largely married to MondogDB?

I do not think either of these is true.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Merging records in a table with 2-columns primary key

2017-04-02 Thread Francisco Olarte
Alexander:

On Sun, Apr 2, 2017 at 5:27 PM, Alexander Farber
<alexander.far...@gmail.com> wrote:
> 2) Is there a way to use an UPDATE reviews instead of the inefficient
> (because copying) INSERT ... SELECT ... ON CONFLICT DO NOTHING?

mmm, I've just sent a sugestion to use delete+reinsert and would like
to point that in pg update~=delete+insert. I use those because many
times they are more efficient ( simple conditions on delete, insert is
fast in postgres, and you can vacuum in the middle if a large portion
is going to get reinserted to reuse the space )

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Merging records in a table with 2-columns primary key

2017-04-02 Thread Francisco Olarte
Alexander:

On Sun, Apr 2, 2017 at 4:26 PM, Alexander Farber
<alexander.far...@gmail.com> wrote:
> The purpose of the function is to merge several user records to one (with
> the lowest uid).

It looks complicated ( more on this later )

> While merging the reviews records I delete all self-reviews and try to copy
> over as many remaining reviews as possible.
...
> test=> SELECT out_uid FROM merge_users(ARRAY[1,2,3,4]);
> ERROR:  new row for relation "reviews" violates check constraint
> "reviews_check"
> DETAIL:  Failing row contains (1, 1, User 4 says: 3 is ugly).

mmm, Maybe this is related to constrint immediateness, but anyway I
would suggest another approach for calculating the set of new reviews.

If you just delete every review for the set and capture the
not-self-referential:

with deleted as ( delete from reviews where uid in in_uids returning *)
, candidates as ( select * from deleted where author not in in_uids )

You can then generate a new review-set from it with some criteria:

, cleaned as (select author, min(review) as review from candidates group by 1)

And then insert them back

insert into reviews select $out_id, author, review from cleaned;

If I were more fluent with the window functions I would recommend
ordering the cleaned query by uid=$out_id DESC and getting the first
row with one of them ( the DESC order puts true first, IIRC, so it
favors keeping the original review for $out_id), or using string_agg
to try to keep all the texts ).

I've found that on many of this "merging" problems it's easier to
extract all, clean them, reinsert. Normally my data sets are big so I
just delete to a temporary ( not in sql way, just one which I create
and then drop ) table, clean on it and reimport them. It also makes
debugging the code much easier ( as the temp table can be cloned to
test easily ). For encapsulation "with" helps a lot, or, in a
function, you can use an real temporary table.

Francisco Olarte


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental / Level -1 backup in PG

2017-03-22 Thread Francisco Olarte
On Wed, Mar 22, 2017 at 9:40 AM, rakeshkumar464
<rakeshkumar...@outlook.com> wrote:
> basebackup + WAL archive lets you do just exactly this.
.
> Yes John I do know about using WAL archive.  IMO that will not be as fast as
> restoring using the incremental backup.

That's an opinion, have you tried measuring? Because normally I've found that

1.- Incremental backups are slow and impose a greater runtime penalty
on the system than log-change-archiving methods.

2.- Incremental restores are not that fast.

> Eg:
> It is common to take a full backup on weekends and incremental on
> weeknights.  If we have to restore
> upto Thu afternoon, which one do you think will be faster :-
>
> 1 -  Restore from basebackup.
> 2 -  Restore from wed night backup
> 3 - Apply WAL logs after wed night backup until the time we want to restore.

You are assuming your backup product does direct-diff to base. Those
are gonna be costly when friday arrives.

> vs
> 1 - Restore from basebackup
> 2 - Apply WAL logs from weekend until the time we want to restore.

> If first choice is lot faster in Oracle,DB2,

Is it really testable / a lot faster ? ( bear in mind if a product
just supports one strategy there is a huge interest in telling it is
the faster one )

> I have reasons to believe that
> the same should be true for PG also. But as someone explained, the PG
> technology can not support this.

I fear incremental backup capabilities will make postgres slower.

Anyway, with base backup + wal archive you always have the option of
making incremental. Just start a recovery on the backup each time you
receive a wal segment wal and you are done. In fact, you can treat a
replication slave as a very low lag backup.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY ... FROM stdin WITH FORMAT csv

2017-03-21 Thread Francisco Olarte
Alexander:

On Tue, Mar 21, 2017 at 6:31 PM, Alexander Farber
<alexander.far...@gmail.com> wrote:
> I keep rereading https://www.postgresql.org/docs/9.6/static/sql-copy.html
> but just can't figure the proper syntax to put some records into the table:

It's not that complex, let's see

> words=> COPY words_reviews (uid, author, nice, review, updated) FROM stdin
> WITH FORMAT 'csv';
> ERROR:  syntax error at or near "FORMAT"
> LINE 1: ...d, author, nice, review, updated) FROM stdin WITH FORMAT 'cs...
>  ^

Here you have an error ( more on this later ) so the next lines are
going to be interpreted as a new command

> words=> 1,2,1,'1 is nice by 2','2017-03-01',

Which you can clearly see because the prompt is => , initial, not ->,
continuation.

> words-> 1,3,1,'1 is nice by 3','2017-03-02',
> words-> 1,4,1,'1 is nice by 4','2017-03-03',
> words-> 2,1,1,'2 is nice by 1','2017-03-01',
> words-> 2,3,1,'2 is nice by 3','2017-03-02',

Also, you are putting an extra , at the end of the lines. This means
you have an empty string at the end, one extra fields. I do not
remember now if it hurts, but better omit it.

> words-> 2,4,0,'2 is not nice by 4','2017-03-03'
> words-> \.
> Invalid command \.. Try \? for help.

All the lines up to here are considered part of the previous sql (
remember => vs -> ) command. You are not in copy mode, so psql tries
to interpret '\.' as a meta command ( like \d ) but fails.

> words-> ;
> ERROR:  syntax error at or near "1"
> LINE 1: 1,2,1,'1 is nice by 2','2017-03-01',
> ^

And here you terminate the SQL command, so it fails ( note it referes
to the first error, the initial line with => ).

> I am not sure if FORMAT csv or FORMAT 'csv' should be used.

That is easy, try both. BUT! if you read the manual with care you will
notive it is "with ( options )", not "with options", so you are
missing parenthesis:

web_tarotpagos_staging=# create temporary table t(a varchar, b varchar);
CREATE TABLE

-- NO parens, no luck:
web_tarotpagos_staging=# copy t(a,b) from stdin with format csv;
ERROR:  syntax error at or near "format"
LINE 1: copy t(a,b) from stdin with format csv;
^
web_tarotpagos_staging=# copy t(a,b) from stdin with format 'csv';
ERROR:  syntax error at or near "format"
LINE 1: copy t(a,b) from stdin with format 'csv';


BUT, as soon as I put them:
   ^
web_tarotpagos_staging=# copy t(a,b) from stdin with (format csv);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> \.

Note the enter data approach. Also note copy is safe to try as you can
just abort it.

> And I have tried adding/removing commas at the end of lines too.
That is commented above.

> I have looked at pg_dump output, but it does not use csv.

pg_dump uses the default text format, a little more difficult but
vastly superior ( IMNSHO ) to CSV. It ( by default ) separates records
with newlines and fields with tab, and escapes newlines, tabs and
backslashes in data with backslash, so the transformation is
contextless, much easier than csv:

Copy out: Replace NULL with '\N', newline with '\n', tab with '\t',
backslash with '\\', join fields with tab, print with newline at end.

Copy In: Read till newline, split on tabs, replace '\n' with newline,
'\t' with tab, '\\' with backslash.

Much easier to get right than CSV ( how do you encode the C string ",;
\n\"\n\t;  \t\"\'" ? )

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres goes to auto recovery mode after system restart(check this draft)

2017-03-20 Thread Francisco Olarte
Manoj:

On Mon, Mar 20, 2017 at 10:55 AM, Manojkumar S
<manojkumar.krishnamur...@zohocorp.com> wrote:
>   I started Postgres from command line using pg_ctl.exe and restarted my
> windows machine. Whenever I start postgres again after machine restart,
> postgres goes to auto recovery mode. What is the reason for this behavior?

Unclean shutdown (of postgres) -> recovery on start.

> Is there a way to overcome this? . The same behavior is being reproduced
> every time with any version of postgres and even if I start postgres with
> postgres.exe.
> PS: This behavior does not occur if I start postgres as a windows service

Then start it as a service. My guess is when started as a service it
manages to get informed of (system) shutdowns and does a clean
(postgres) shutdown, but when started as a normal program windows just
kills it on shutdown ( this happens in other OS too depending on how
you manage it ). IIRC windows had infraestructure to do that with
services, but haven't used it since they launched XP so I'm really
rusty and outdated.

Francisco Olarte


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres backup solution

2017-03-15 Thread Francisco Olarte
Rich:

On Tue, Mar 14, 2017 at 11:42 PM, Rich Shepard <rshep...@appl-ecosys.com> wrote:
>   2.) I'm far from being a professional DBA but if I had to back up a 13T
> database what I'd do (since I use only linux) is run pg_dump with the
> archive (tar) format, then use dirvish to synchronize it with a remote copy.
> Dirvish <http://www.dirvish.org/> uses rsync and records only changes since
> the last run. I use it to back up my server/workstation daily. I've restored
> files accidently deleted with no problems using either cp or rsync.

Dirvish ( I use it ) uses rsync, and can send only changes from the
last run, but you'll better use directory format for such a huge db,
as tar is a single file backup and you will not be able to use the
hard links and other nice things rsync/dirvish can do to preserve
space ( this way you send diffs and link unchanging files, which, if
partitioning or some other tactic for unchanging tables is used, can
result in big space savings ).

If your db is small enough I would recommend the custom format, with
built in compression. I never use tar format, as I find dir or custom
are always better than it ( and normally everything you can do with
tar x and a tar backup is possible with pg_restore, and then more ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index using in jsonb query

2017-03-12 Thread Francisco Olarte
On Sun, Mar 12, 2017 at 9:50 AM, SuperCiccio <sc_030...@yahoo.it> wrote:
> But even if I create a specific index
> CREATE INDEX on datatable (((jsonfield#>>'{path1,path2}')::numeric));
> it isn't used in such a query; it is used in this query:
> select field1,field2 from datatable where
> (jsonfield#>>'{path1,path2}')::numeric = 1000;
> Definitely, I didn't find a way to optimize a query that checks a range in a
> json subfield.

Have you checked why it does not use the index? ( how many rows are
returned, stimates etc.. ). Maybe it is not using it because it is
faster ( than using it, index fetches are slower than sequential
fetches ( for the same number of rows ), so for some queries it is
better to not use the index  ) ( specially if you are using them for
small test tables, i.e., for a single page table nothing beats a
sequential scan ).


Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] CHECK for 2 FKs to be non equal

2017-03-11 Thread Francisco Olarte
Alexander:

On Sat, Mar 11, 2017 at 10:41 AM, Alexander Farber
<alexander.far...@gmail.com> wrote:
>  uid integer NOT NULL REFERENCES words_users(uid) CHECK (uid <>
> author) ON DELETE CASCADE,

Maybe a stupid question, but have you tried "refereces.. on delete .. check"?

I mean, the manual for create table says:

>>>

 column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]

...And a little down


where column_constraint is:

[ CONSTRAINT constraint_name ]
{ NOT NULL |
  NULL |
  CHECK ( expression ) [ NO INHERIT ] |
  DEFAULT default_expr |
  UNIQUE index_parameters |
  PRIMARY KEY index_parameters |
  REFERENCES reftable [ ( refcolumn ) ] [ MATCH FULL | MATCH PARTIAL |
MATCH SIMPLE ]
[ ON DELETE action ] [ ON UPDATE action ] }
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
<<<<

So ON DELETE is an optional part of a reference constraint, not a
constraint per se, and it is being parsed as "references..." ( correct
constraint) + "check..." (correct constraint) + "On delete.." (WTF is
this ), on delete after references should be parsed as a single big
constraint.

> What am I doing wrong please?

Not RTFM ? ( if I'm right, or not understanding it )

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] via psycopg2 or pg2pg? Move rows from one database to other

2017-02-27 Thread Francisco Olarte
Thomas:

On Mon, Feb 27, 2017 at 12:47 PM, Thomas Güttler
<guettl...@thomas-guettler.de> wrote:
> Thank you for explaining the steps of your algorithm.

My pleasure. But check it anyway, I may have forgotten something ( I
normally implement this things after writing a big flow diagram on a
piece of paper and checking it for a while, or something similar, I
find easier to spot the missing spots graphically )

> Just one question: How to do the actual transfer of data?

It does not matter too much, but ..

> I see two solutions:
> 1, Read the data into a script (via psycopg2 (we love python))
> and dump it into a second connection.
> 2, connect postgres to postgres and transfer the data without a database
> adapter like psycopg2.

For 2 you need and adapter, the foreign data wrapper, anyway. I
personally would go for 1, especially if you can collocate the program
near main db ( same machine or network, so you can have enough speed )
. Normally problems are much easier to diagnose this way, as you only
deal with psycopg2, not with psyco AND fdw, and you will need a
program to do the transfers anyway. Also, IIRC, you had a lot of
machines, so you will need a main program to do all the retrying and
accounting. And you can optimize some things ( like copying from
several satellites and then inserting them at once ).

YMMV anyway, just use whichever is easier for you, but avoid false lazyness ;-)

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Move rows from one database to other

2017-02-23 Thread Francisco Olarte
Thomas:

On Thu, Feb 23, 2017 at 4:25 PM, Thomas Güttler
<guettl...@thomas-guettler.de> wrote:
> This sound good. Is there a name for this trick, to find more details?

Not that I know of. It's really old, basic stuff, with many variations
possible. I've being doing variation of it since the half-inch tape
and punched cards times, and use it a lot for file processing ( as
renaming in Linux is atomic in many filesystems )

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Move rows from one database to other

2017-02-23 Thread Francisco Olarte
Thomas:

On Thu, Feb 23, 2017 at 4:26 PM, Thomas Güttler
<guettl...@thomas-guettler.de> wrote:
> Am 23.02.2017 um 13:44 schrieb Leknín Řepánek:
>> Maybe you can do something like
>> WITH cte AS (
>> DELETE FROM t1 WHERE cond
>> RETURNINIG *
>> )
>> INSERT into t2
>> SELECT * FROM cte;
>>
>> To move rows between tables with combination with fdw_postgres and
>> foreign table.

> ... this way you don't need the second table ... nice.

If you do this WITH a FDW be careful with the semantics, as I think
you have several commits.

If the local ( t1 ) is commited before the remote, you can lose rows.

If the remote ( t2 ) is commited before the local you'll have dupes,
so you need some way to purge them.

These things can be solved with the aid of transaction manager, and
prepared transactions, but I'm not sure of the status of it in your
versions, and those things are difficult.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Move rows from one database to other

2017-02-23 Thread Francisco Olarte
Thomas:

On Thu, Feb 23, 2017 at 4:16 PM, Thomas Güttler
<guettl...@thomas-guettler.de> wrote:
> Am 22.02.2017 um 16:00 schrieb Adrian Klaver:
>> only written on a successful transfer. To improve the chances of
>> successful transfer more smaller transfer batches
>> rather then larger transfers.

> I really need a solid solution.
> You said "... improve the chances of successful transfer ...". This makes me
> nervous.

I think what Adrian say is you improve the individual transfer time,
if it fails you retry.

> Delays are no problems, but data loss or duplication is.

Remember you can never guarantee 'exactly once' without very complex
solutions, I think you can do "at least once" or "at most once". That
means lose or duplicate.

That being said, IF you have some kind of global, unchanging ( at the
central site ) unique key, you could try the following, using a
holding table in each satellite and assuming you have 'on conflict do
nothing'.

1.- Move rows from main to holding table in the satelite, in a single
transaction. This is to let you work with an unmovable set ( as your
process is the only one touching the holding tables ). If there is
some data in holding it is no problem, they are from a previous
crashed transfer.

2.- Insert every thing from the holding table in main, using on
conflict do nothing.

3.- When everything is commited in main, truncate the satellite holding table.

If satellite crashes in 1 it will roll back, you have not touched main.

If you crash in 2 you will find 1 partially full in the next round,
and main will be rolled back ( it's important to not commit until
everything is done in 2, i.e., if you have read problems in the
satellite do no go to 3, just crash and rollback everything ). You can
either do a loop with the current set or append more data, your
choice, does not matter, as you have to reinsert. The on conflict do
nothing in 2 will take care of potential duplicates.

If you crash in 3 you will transfer the lot again, but the do-nothing
in 2 will eliminate it and 3 will eventually purge it.

You can optimize on that, but basically you just repeat this until
everything goes fine. I do these ( just with two DBs, not 100 ) and it
works.

It does a lot of duplicate work, but only on problems, it normally runs smooth.

If you do not have "on conflict do nothing" ( I do not remember the
versions ) you can use an extra step. Instead of inserting in main in
2 do 2.a - Copy holding to main ( truncating before hand if copy
present ) and 2.b insert news from the copy, either by using and
anti-join with main or by deleting ( in the same transaction ) the
dupes before inserting.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Move rows from one database to other

2017-02-23 Thread Francisco Olarte
Thomas:

On Wed, Feb 22, 2017 at 1:51 PM, Thomas Güttler
<guettl...@thomas-guettler.de> wrote:
> I want to **move** the data. The data should get deleted on the satellite
> after transfer.
> I don't know how to delete the data which was copied, since inserts can
> happen during the copy statement.

Depending on the structure / control / kind of ops you have on the
satellite you can try a two-table trick, similar to a common one used
to process files in spool directories.

1st, you move everything to a holding table transactionally in the
satellite ( insert into holding table delete from main table returning
* )

Then you have the holding table to transfer and clear, not touched by
anyone ( as transfer process is the only one touching it )

You can do some tricks. You could inherit holding from main if you
want to be able to see data while it is been transferred. Or play
renaming tricks. It all depends in the concrete app, but you can try
to fit the pattern in it, I've done it several times and its a useful
one.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Service configuration file and password security

2017-02-16 Thread Francisco Olarte
On Thu, Feb 16, 2017 at 2:57 PM, JP Jacoupy <jpjaco...@protonmail.com> wrote:
> Can the password be stored in an encrypted way inside a service
> configuration file?

There is not

Passwords are not stored in the ( sometimes shared, world readable )
service configuration file ( pg_sevice), but it the UNREADABLE FOR
OTHERS password file. They need to be stored in a way which libpq and
friends can use, so even if they were encrypted they could be
extracted easily byjust debugging and setting a breakpoint in it.

If you are looking at something like a password store with a master
password at least in unix a user account with a .pgpass file works as
it. You store every password in file readable by a user, .pgpass, and
you use that user login credentials to get access to it.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Documentation inconsistency (at least to me)

2017-02-14 Thread Francisco Olarte
Thomas:

On Mon, Feb 13, 2017 at 11:26 PM, Thomas Kellerer <spam_ea...@gmx.net> wrote:
> I wonder why regexp_split_to_array() is listed under "String functions and
> operators" [1] but string_to_array() is listed under "Array functions and
> operators" [2]
>
> I find that a bit inconsistent - I would expect to find both in the same
> chapter.

Seen the description of string_to_array I would assume it is there to
pair with array_to_string, which seems to its dual, as a way to
[un]marshal arrays with potential nulls. OTOH regexp_split is not
easily invertible and performs the type of operation you normally do
with more free-form text, user input and the like.

> I would suggest to put both into "String functions and operators" because
> after all string_to_array() does more or less the same as
> regexp_split_to_array() does. But at the end of the day I think it's just
> important that both are in the same chapter (unless I overlooked a huge
> difference between the two that warrants this distinction).

Given the above I would certainly expect array_to_string in array
functions. Then I would expect string_to_array next to it. And any
regexp related function clustered with its siblings, either in the
string page or ( in another manuals ) in its dedicated section.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom shuffle function stopped working in 9.6

2017-02-12 Thread Francisco Olarte
On Sat, Feb 11, 2017 at 5:37 PM, Alexander Farber
<alexander.far...@gmail.com> wrote:
...
> after switching to 9.6.2 from 9.5.3 the following custom function has
> stopped working:
> CREATE OR REPLACE FUNCTION words_shuffle(in_array varchar[])
> RETURNS varchar[] AS
> Any suggestions for a better shuffling function please?

I've seen several sugestions and hints, but seem no one sugested the
classical shuffling algorithm. Even when of the solutions seems to be
not guaranteed to stop.

An easy way to shuffle is swap every element with a random one from
its position to the start or end ( NOT a random one on the array, this
will give you N^N combinations on an N element array, which does not
evenly divide the N! permutations on an array ( see at end ) ) ( of
course even my version is not going to give you that given random() is
not perfect, but it will be a bit better ).

Not having access to a server I've just tried this on 9.3 on sqlfiddlle:

CREATE OR REPLACE FUNCTION words_shuffle(in_array varchar[])
RETURNS varchar[] AS
$$
declare
   a varchar[]:=in_array;
   n integer:=array_length(a,1);
   tmp varchar;
   r integer;
 begin
for i in reverse n..2 loop
  r := floor(random()*i) + 1;
  tmp=a[i]; a[i]=a[r]; a[r]=tmp;
end loop;
return a;
 end
$$
LANGUAGE plpgsql volatile

As you can see I do it from the end swapping it with elements from the
start ( this way I swap i in the range 1..i, instead of i, n wich is a
little harder to debug ). I stop at 2 because element 1 can only be
swapped with itself. I've marked it volatile as it returns different
things each time you call it. My tests show it working, but it may
have some problems with the type conversions, as I'm not used to do
this kind of code in plpgsql, but you can get the idea.

Francisco Olarte.

P.S.:
-- shufflings of three elements, with any or from its pos to the end:


Swapping with any element in the array
0,0,0: abc =>swap(0,0)=> abc =>swap(1,0)=> bac =>swap(2,0)=> cab
0,0,1: abc =>swap(0,0)=> abc =>swap(1,0)=> bac =>swap(2,1)=> bca
0,0,2: abc =>swap(0,0)=> abc =>swap(1,0)=> bac =>swap(2,2)=> bac
0,1,0: abc =>swap(0,0)=> abc =>swap(1,1)=> abc =>swap(2,0)=> cba
0,1,1: abc =>swap(0,0)=> abc =>swap(1,1)=> abc =>swap(2,1)=> acb
0,1,2: abc =>swap(0,0)=> abc =>swap(1,1)=> abc =>swap(2,2)=> abc
0,2,0: abc =>swap(0,0)=> abc =>swap(1,2)=> acb =>swap(2,0)=> bca
0,2,1: abc =>swap(0,0)=> abc =>swap(1,2)=> acb =>swap(2,1)=> abc
0,2,2: abc =>swap(0,0)=> abc =>swap(1,2)=> acb =>swap(2,2)=> acb
1,0,0: abc =>swap(0,1)=> bac =>swap(1,0)=> abc =>swap(2,0)=> cba
1,0,1: abc =>swap(0,1)=> bac =>swap(1,0)=> abc =>swap(2,1)=> acb
1,0,2: abc =>swap(0,1)=> bac =>swap(1,0)=> abc =>swap(2,2)=> abc
1,1,0: abc =>swap(0,1)=> bac =>swap(1,1)=> bac =>swap(2,0)=> cab
1,1,1: abc =>swap(0,1)=> bac =>swap(1,1)=> bac =>swap(2,1)=> bca
1,1,2: abc =>swap(0,1)=> bac =>swap(1,1)=> bac =>swap(2,2)=> bac
1,2,0: abc =>swap(0,1)=> bac =>swap(1,2)=> bca =>swap(2,0)=> acb
1,2,1: abc =>swap(0,1)=> bac =>swap(1,2)=> bca =>swap(2,1)=> bac
1,2,2: abc =>swap(0,1)=> bac =>swap(1,2)=> bca =>swap(2,2)=> bca
2,0,0: abc =>swap(0,2)=> cba =>swap(1,0)=> bca =>swap(2,0)=> acb
2,0,1: abc =>swap(0,2)=> cba =>swap(1,0)=> bca =>swap(2,1)=> bac
2,0,2: abc =>swap(0,2)=> cba =>swap(1,0)=> bca =>swap(2,2)=> bca
2,1,0: abc =>swap(0,2)=> cba =>swap(1,1)=> cba =>swap(2,0)=> abc
2,1,1: abc =>swap(0,2)=> cba =>swap(1,1)=> cba =>swap(2,1)=> cab
2,1,2: abc =>swap(0,2)=> cba =>swap(1,1)=> cba =>swap(2,2)=> cba
2,2,0: abc =>swap(0,2)=> cba =>swap(1,2)=> cab =>swap(2,0)=> bac
2,2,1: abc =>swap(0,2)=> cba =>swap(1,2)=> cab =>swap(2,1)=> cba
2,2,2: abc =>swap(0,2)=> cba =>swap(1,2)=> cab =>swap(2,2)=> cab
F(abc) = 4
F(acb) = 5
F(bac) = 5
F(bca) = 5
F(cab) = 4
F(cba) = 4
Swapping from its own position in the array to the end ( last can be
omitted, of course )
0,1,2: abc =>swap(0,0)=> abc =>swap(1,1)=> abc =>swap(2,2)=> abc
0,2,2: abc =>swap(0,0)=> abc =>swap(1,2)=> acb =>swap(2,2)=> acb
1,1,2: abc =>swap(0,1)=> bac =>swap(1,1)=> bac =>swap(2,2)=> bac
1,2,2: abc =>swap(0,1)=> bac =>swap(1,2)=> bca =>swap(2,2)=> bca
2,1,2: abc =>swap(0,2)=> cba =>swap(1,1)=> cba =>swap(2,2)=> cba
2,2,2: abc =>swap(0,2)=> cba =>swap(1,2)=> cab =>swap(2,2)=> cab
F(abc) = 1
F(acb) = 1
F(bac) = 1
F(bca) = 1
F(cab) = 1
F(cba) = 1


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Special index for "like"-based query

2016-12-30 Thread Francisco Olarte
Job:

On Fri, Dec 30, 2016 at 1:01 PM, Alban Hertroys <haram...@gmail.com> wrote:
>> On 30 Dec 2016, at 11:42, Job <j...@colliniconsulting.it> wrote:
...
>> The index applied on the timestamp field is a btree("timestamp")
...
>> select domain, sum(accessi) as c_count from TABLE where action='1' AND 
>> profile IN ('PROFILE_CODE') AND timestamp::date  BETWEEN '2016-12-27' AND 
>> '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY 
>> domain ORDER BY c_count DESC  LIMIT 101
...
>> Last question: the table is partitioned. I need to manually create index for 
>> every sub-tables or there is a way to create on every sub-tables once?

I think Alban missed this ( or I missed his response ). Yes, you need
to create the indexes for the partitions. When in doubt, create a
partition and \d+ it.

IIRC you can do 'create partition LIKE master INCLUDING indexes
INHERITS(master)', but you'll have to test. Anyway, this is normally
not too useful as the master table is normally indexless and kept
empty. I normally script the partition creation, and I woill recommend
doing that too.


Now onto the BETWEEN PROBLEM:

> It's usually more efficient to cast the constants you're comparing to, than 
> to cast a field value for each record in the set. The exception to that is 
> when you have an index on the casted field.
> In your case, since you're casting to date and time separately, and whole 
> days even, it's probably more efficient to combine that into:
> … AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND 
> '2016-12-30 23:59:59'::timestamp with time zone ...

Alban is correct here, in both counts.

- DO NOT CAST THE COLUMNS, cast the constants ( unless you are
building a specialized index, you can build an index in cast(timestamp
as date), and it would be useful if you did a lot of queries ONLY ON
DATES ).

- DO NOT USE  CLOSED INTERVALS for real number-like columns ( remember
I told you timestamps are a point in the time line, so real-like ).
The man problems strives from the fact tht you cannot cover the real
line with non-overlapping CLOSED intervals, BETWEEN uses closed
intervals and subtle problems permeate from this fact. Math is a harsh
mistress.

( Even when working with integer-like numbers half-open intervals are
normally the best way to go in the not so short term, but between
seems so nice and natural and reads so well that even I use it where I
should not )

> But even then, you're excluding items that fall in the second between the end 
> date and the next day. The new range types are useful there, for example:
>
> … AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange

And this is how it is solved with those new-fangled interval thingies
( I've been keying (ts>=xxx and ts <yyy), parens included, for so long
that I never remember those, but they are nearly the same. Probably
the optimizer splits it anyway.

> The above isn't entirely correct, as tsrange uses timestamp without time 
> zone, but you get the gist.

Or use >=, < those work.

> However, if those time ranges can have other values than '[00:00. 23:59]', 
> then you probably need 2 indexes on that timestamp column; one cast to date 
> and one to time. Otherwise, you end up creating timestamp range filters for 
> each day in the range in the query (which could still be the better approach).

Even if they have other values, single index on timestamp column is
the way to go if you only select single intervals. I mean, Xmas
morning ( data between 25 and 25 and time between 8:00 and 12:59 can
easiliy be selected by the interval [20161225T08,
20161225T13), but all the mornings in december can not ( although
a query with ts>='20160101' and ts <'20170101' and ts:time >='08:00'
and ts:time<'13:00' should work quite well, the first two condition
guide to an index scan and the rest is done with a filtering ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Special index for "like"-based query

2016-12-30 Thread Francisco Olarte
On Fri, Dec 30, 2016 at 11:00 AM, Job <j...@colliniconsulting.it> wrote:
> I tried to create a GIST/GIN index on a timestamp without time zone field
> but it does not work.
> Are there alternatives index types or timezone could speed query up?

Remember a timestamp is just a real number ( a point on the time line
) with some fancy formatting for I/O ( or you will suffer ). This
menas when you have a ts column and want to query for a date it is
usually better to do [ts>='2016-12-29' and ts<'2016-12-13'] than doing
[cast(ts as date) = '2016-12-29'] ( similar to how a real number is
better queried as [r>=1.0 and r<2.0] than [int(r)=1] ). Normally you
get good results with btree indexes.

And, basically, if you need help with some queries you could try
posting them whole, even redacted, along the table defs, this way
perople can see the problem and not invent one based on a partial
description. I do not see any thing in common between 'like based
query' and timestmap columns.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres pg_bulkload c filter function in c programming

2016-12-29 Thread Francisco Olarte
On Thu, Dec 29, 2016 at 8:41 PM, rajmhn <rajmhn@gmail.com> wrote:
> Thanks Francis.That seems to be a good solution.

Yep, but not for your problem as ...

>
> Thought to use pg_bulkload, a third party library instead of copy, where
> reject handling can be done in efficient way.

Mine was just an idea to do the part of the load you described
assuming pg_bulkload usage was optional. Not being it, it will not
work. MAYBE you can use the technique to preprocess the files for
pg_bulkload ( if possible this is nice, as the goood thing of
preprocessing them is you repeat until you get them right, no
DB-touchy ).

> Transformation(FILTER)
> functions can be implemented with any languages in pg_bulkload before it was
> loaded to table. SQL, C, PLs are ok, but you should write functions as fast
> as possible because they are called many times.



> In this case, function should be written in Perl and called inside the
> Postgressql function. Do you think that will work it out? But pg_bulkload is
> preferring C function over SQL function for performance.

I'm not familiar with pg_bulkload usage. I've read about it but all my
loading problemas have been solved better by using copy ( especially
factoring total time, I already know to use copy and a couple dozen
languages in which to write filters to preclean data for copy. In the
time I learn enough of pg_bulkload I can load filter and load a lot of
data ).

Regarding C vs perl, it seems pg_bulkload does server side processing.
In the server the funcion calling overhead is HUGE, specially when
transitioning between different languages. IMO the time spent doing
the data processing in perl would be 0 when compared with the time to
pass the data around to perl. C will be faster because the calling
barrier is smaller inside the server.

Just for data processing of things like you I've normally found
filters like the one I described can easily saturate an SSD array, and
the difference in time for processing is dwarfed by the difference in
time for developing the filter. In fact in any modern OS with write
through and readahead disk management the normal difference between
filtering in perl or C is perl may use 10% of 1 core, C 1%, perl
filter is developed in 15 minutes, C in an hour, and perl filter takes
some extra milliseconds to start. AND, if you are not familiar with
processing data in C you can easily code a slower solution than in
perl ( as perl was dessigned for this ).


> I will try this option as you suggested.

Just remember my option is not using pg_bulkload with perl stored
procedures. I cannot recommend anything if you use pg_bulkload.

I suggested using copy and perl to preclean the data. It just seemed
to me from the description of your problem you were using a too
complex tool. Now that you are introducing new terms, like reject
handling, I'll step out until I can make a sugestion ( don't bother to
define it for me, it seems a bulkload related term and I'm not able to
study that tool ).


FrancisCO Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres pg_bulkload c filter function in c programming

2016-12-29 Thread Francisco Olarte
Hi:

On Thu, Dec 29, 2016 at 3:01 PM, Jan de Visser <j...@de-visser.net> wrote:
> On Thursday, December 29, 2016 5:10:08 AM EST rajmhn wrote:
...
>> I'm new to C. Gone through this documentation. Not clear, how to start.
...
> It seems to me it would be much easier to load the data into a temporary
> table, and from there transform into the form you need it to be. If you're not
> experienced in C (and probably even if you are) you can import *a lot* of data
> in the time it would take you to develop that custom filter.
> Besides, your requirements don't appear to be that difficult.

for his type of requirements I would recommend going the perl ( any
similar language will do, but after using it for 25 years I find it is
the king for that ) + pipe route. I mean, write a script which spits
"copy xxx from stdin\n" plus the transformed rows, pipe it trough
psql.

Total time is difficult to beat, as debugging is very easy, write the
filtering function using <> - chomp - split - s/// - join - print and
debug it by feeding it some lines with head.

Untested code from which I remembter of the specs, could be something like:

print "Copy my_table from stdin;\n";
my $e = "The magic col-e default value";
while(<>) {
  chomp; # get rid of OPTIONAl line terminator
  # Get the columns and do the c-d column swap
  my ($a,$b,$d,$c)=split /\|/, $_;
  # Add dashed to the date in column b:
  $b =~ s/^(\d\d\d\d)(\d\d)(\d\d)$/$1-$2-$3/;
  # zap not numerics a to 0:
  ($a=~/^\d+$/) or ($a = 0);
  # And send it with the default separators ( scaping left as an
exercise to the reader, ask if needed, I've wrtten and tested it
several times ).
  print join("\t", $a,$b,$c,$d,$e),"\n";
}

Test it with "head the-datafile | perl the_script.pl |
my_favourite_pager" until correct, the beauty of this approache  is
you do not touch the db in debug, feed it to psql when done. In my
experience the perl script overhead is unnoticeable in any 2k+ machine
(and perl was specifically dessigned to be good at this kind of things
).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-28 Thread Francisco Olarte
On Wed, Dec 28, 2016 at 5:53 PM, Jan de Visser <j...@de-visser.net> wrote:
> .but the term "impedance mismatch"
> is at least 25 year old;

Much older, I was told it in class at least 32 years ago.

> as far as I know it was coined

_Borrowed_ from electrical engineering / communication techs.

It is used to highlight how signals 'bounce' at the points of a
transmision path where impedances do not match. It extrapolates the
fact that if you have a battery with an internal resistance R the way
to extract the maximum energy on a load is for it to match the
impedance, be R too. Higher load impedance and the fraction of energy
in the load goes up, the total down. Lower load impedance and the
fraction in the load goes down, the total up. In either case absolute
power in the load goes down. Match the impedance and the energy in the
load is the maximum ( and equal to the internal loss in the battery ).

The term has been used in radio texts since the dawn of ( radio ) times.

It's used a lot as a similar problem appears when mixing to different
technology, each time you cross the barrier you loose something, or
hit a problem.

> And despite the smart people in academia warning us about that mismatch in the
> early 90s, we bravely soldiered (I'm taking full blame myself here) on and
> 10-15 years later came up with abominations like Hibernate...
> History lesson over, carry on...

I think that goes together with "everyone can be a programmer" and
"every Java ( a language with several apparent concessions made to
people who did not even understand objects, like String.format, and
targeted to enable "everyone" to do OO ) coder can do databases".

Well, rant mode off. Today is "dia de los Inocentes", spanish version
of April Fools I was tempted to write something about different
impedances in the copper tracks used for DB data traffic when entering
the CPU silicon interconnects via golden cables.


Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Disabling inheritance with query.

2016-12-22 Thread Francisco Olarte
Edmundo:

On Wed, Dec 21, 2016 at 11:36 PM, Edmundo Robles <edmu...@sw-argos.com> wrote:
> i need  disable  inheritance  from many tables in a query like
> "delete from pg_inherits where inhparent=20473"  instead alter table ...
> but  is safe?   which is the risk for  database if  i  delete it?

Dangers of touching the catalog directly have already been pointed by
TL, along with the question of why isn't normal ALTER TABLE ok.

If it is because there are a lot of childs, I would like to point a
simple script ( if you are fluent in any scripting language, or even
in SQL ) can be used to automatically generate a bunch of alter table
commands. Even a simple text editor will do ( turn your query above
into something generating a bunch of table names, edit it ). Or just
try something like ( beware, untested )

with childs as (select relname from pg_class, pg_inherits where
pg_class.oid=inhrelid and inhparent='20473)
SELECT 'ALTER TABLE ' || relname || ' rest of alter table command;'
from childs ;

And feed the result back to the server using your favorite tool (
quoting maybe needed, schema names may be needed, YMMV ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_restore to a port where nobody is listening?

2016-12-21 Thread Francisco Olarte
On Wed, Dec 21, 2016 at 6:22 PM, Daniel Westermann
<daniel.westerm...@dbi-services.com> wrote:
> I have a PostgreSQL 9.5.4 and a PostgreSQL 9.6.1 instance installed on the
> same host.

.
> What do I miss? I can give any port to pg_restore and it just seems to be
> fine.

, are you by chance using debian/ubuntu/any derivative? Maybe
pg_restore is not directly executed. In Ubuntu I have this:

$ type -path pg_restore
/usr/bin/pg_restore
$ file /usr/bin/pg_restore
/usr/bin/pg_restore: symbolic link to ../share/postgresql-common/pg_wrapper
$ file /usr/share/postgresql-common/pg_wrapper
/usr/share/postgresql-common/pg_wrapper: Perl script text executable

And that pg_wrapper thingie has the habit of completely fscking my
connection options / service files ( even though I do not have a
server installed, I only install teh client programs to connect to the
remote servers ).

You could check with type/file wether you have something similar.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_dump and quoted identifiers

2016-12-15 Thread Francisco Olarte
On Thu, Dec 15, 2016 at 4:20 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
...
> Hmm.  It might shed some light if you put "echo" in front of that
> to see what gets printed:
>
> $ echo pg_dump -d postgres -t "\"Statuses\""
> pg_dump -d postgres -t "Statuses"

Also, when having strange issues, I've found the combo

echo  | od -tx1 -tc

very useful, this help rule out potential fancy quotes pointed previously


Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] huge table occupation after updates

2016-12-10 Thread Francisco Olarte
Tom:

On Sat, Dec 10, 2016 at 6:01 PM, Tom DalPozzo <t.dalpo...@gmail.com> wrote:
> As for crash proof, I meant that once my client app is told that her update
> request was committed, it mustn't get lost (hdd failure apart of course).
> And I can't wait to flush the cache before telling to the app :"committed".
> I can replicate also the cache on the standby PC of course.

You are making inconsistent requests. When the server tells your app
it's commited, it has flush the transaction log cache. If your
assertion about is real, you cannot wait for commit, so your
requirements are imposible to satisfy ( of course, you could run with
scissors, but that will loose data without hdd failure ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] huge table occupation after updates

2016-12-10 Thread Francisco Olarte
A couple of things first.

1.- This list encourages inline replying, editing the text, and frowns
upon top posting.

2.- Your HTML formatting with so a small size makes it harder for me (
and I can assume some others ) to properly read your messages.

If you want to discourage people replying to you, keep doing the two above.

On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpo...@gmail.com> wrote:
> you're right, VACUUM FULL  recovered the space, completely.

Well, it always does. ;-)

> So, at this point I'm worried about my needs.
> I cannot issue vacuum full as I read it locks the table.

Well, first hint of your needs. Bear in mind vacuum fulls can be very
fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
updates and then a vacuum full that will be very fast, time it ).

> In my DB, I (would) need to have a table with one bigint id field+ 10 bytea
> fields, 100 bytes long each (more or less, not fixed).
> 5/1 rows maximum, but let's say 5000.
> As traffic I can suppose 1 updates per row per day (spread over groups
> of hours; each update involving two of those fields, randomly.
> Also rows are chosen randomly (in my test I used a block of 2000 just to try
> one possibility).
> So, it's a total of 50 millions updates per day, hence (50millions * 100
> bytes *2 fields updated) 10Gbytes net per day.

Not at all. That's the volume of updated data, you must multiply by
the ROW size, not just the changed size, in your case 50M * 1100 ( to
have some wiggle room ), 55Gbytes.

But this is the UPPER BOUND you asked for. Not the real one.

> I'm afraid it's not possible, according to my results.

It certaninly is. You can set a very aggresive autovacuum schedule for
the table, or even better, you may vacuum AFTER each hourly update.
This will mark dead tuples for reuse. It will not be as fast, but it
can certainly be fast enough.

And, if you only update once an hour, you may try other tricks ( like
copy to a temp table, truncate the original and insert the temp table
in the original, although I fear this will lock the table too, but it
will be a very short time, your readers may well tolerate it. )

Yours seem a special app with special need, try a few, measure, it is
certainly possible.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] huge table occupation after updates

2016-12-10 Thread Francisco Olarte
Hi Tom

On Sat, Dec 10, 2016 at 1:15 PM, Tom DalPozzo <t.dalpo...@gmail.com> wrote:
...
> Reported table size is 1.5MB. OK.
That's 150 bytes per row, prety normal.
> Now, for 1000 times,  I update  2000 different rows each time, changing d0
> filed keeping the same length, and at the end of all,  I issued VACUUM.

And probably autovacuum or something similar kicked in meanwhile. 2M
updates is 200 updates per row, that's pretty heavy traffic, many
tables do not get that in their whole lifetime.

> Now table size is 29MB.
> Why so big? What is an upper bound to estimate a table occupation on disk?

Strictly, you could probably calculate an upper bound as row
size*number or insertions, given an update aproximates an insertion
plus deletion. Given the original insertion used 1.5Mb and you
repeated thar about 200 times, I would stimate 201*1.5=301.5Mb as an
upper bound, but I doubt that's of any use.

Those many updates probably left your table badly fragmented, with few
rows per page.  On a normal usage you do not need to worry, as
periodic vacuum would mark the space for reuse and the table will not
grow that big. But issuing an 1k updates on 20% of the table is hardly
normal usage, if you need this kind of usage maybe you should rethink
your strategies.

Vacuum full will probably pack the table and ''recover'' the space, it
should be fast with just 29Mb on disk. Not knowing your intended usage
nothing can be recommended, but I've had some usage patterns where a
heavy update plus vacuuum full was successfully used.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] btree gist indices, null and open-ended tsranges

2016-12-01 Thread Francisco Olarte
Hi Chris:

On Thu, Dec 1, 2016 at 12:56 PM, Chris Withers <ch...@simplistix.co.uk> wrote:
> So, first observation: if I make room nullable, the exclude constraint does
> not apply for rows that have a room of null. I guess that's to be expected,
> right?

I would expect it, given:

n=> select null=null, null<>null, not (null=null);
 ?column? | ?column? | ?column?
--+--+--
  |  |
(1 row)

Those are nulls, BTW:

n=> select (null=null) is null, (null<>null) is null, (not (null=null)) is null;
 ?column? | ?column? | ?column?
--+--+--
 t| t| t
(1 row)

I.e., the same happens with a nullable unique column, you can have one
of each not null values and as many nulls as you want.

SQL null is a strange beast.


Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overwrite pg_catalog?

2016-12-01 Thread Francisco Olarte
Juliano:

On Thu, Dec 1, 2016 at 12:16 PM, Juliano <jpli...@protonmail.com> wrote:
> I tried to restore pg_catalog to my new database, but the existing
> pg_catalog can't be overwritten or dropped, and postgres auto creates
> pg_catalog when I create a new DB.

This is because, in general, pg_catalog is maintained by DML
statements, restoring it is not going to do what you think.

I.e., if you create a table, a row goes into pg_class, but if you
somehow manage to insert into pg_class a table is not properly created
( more things need to be done ).

Copying pg_catalog from one db to other is like trying to copy the
root dir and FAT from one floppy ( how old fashioned ) to other, it
cannot be done with normal tools and probably won't do what you think
it does.


> So, there is a way to restore the pg_catalog to a new database?

Probably not, but this has the faint smell of http://xyproblem.info/ ,
what are you truing to achieve by doing that?

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: [GENERAL] FTS query, statistics and planner estimations…

2016-11-09 Thread Francisco Olarte
On Wed, Nov 9, 2016 at 11:19 AM, Pierre Ducroquet
<pierre.ducroq...@people-doc.com> wrote:
> Indeed the words in the query are correlated, but I do hope that the FTS
> indexing is able to cope with that.

If the query returns correct results in reasonable time it can. OTOH
the planner, and the statistics system, is another beast. Correlation
info in FTS is HUGE, and the planner is supposed to work with a
smallish summary of the index.

> Otherwise it makes it far less usable than
> what one would expect since real world queries will often contain sentences or
> related words.

Well, I concur it would be great to have it, but having written FTS
engines I suspect it would be difficult to have it AND maintain it. I
have built an FTS system, and I built an index as a compressed list of
(stemed-word, document, position), and then compressed it. The
information for word-word correlation would be huge, as its
cardinality could grow with n^2. Especially if you have to keep it in
an updatabale format. And it would not help you for the three, four,
etc.. cases. And even then, the optimizer may be spending a lot of
time reading and processing it, as it would not fit easily in the
cache.

> Also, PostgreSQL 9.6 introduced phrase search in FTS, and I
> don't see how that would work without a working multi-words query.

Queries work, is just they are not as fast as you want/expect them to
be. Phrase search is normally done by locating documents with all the
words and then filtering, just with the index if it includes word
position or by reading the docs. In general, in FTS, you need to use
selective terms for fast queries.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: [GENERAL] FTS query, statistics and planner estimations…

2016-11-09 Thread Francisco Olarte
Pierre:

On Wed, Nov 9, 2016 at 10:22 AM, Pierre Ducroquet
<pierre.ducroq...@people-doc.com> wrote:
> The query does a few joins «after» running a FTS query on a main table.
> The FTS query returns a few thousand rows, but the estimations are wrong,
> leading the optimizer to terrible plans compared to what should happen, and
> thus creates a far higher execution time.

> but the issue remain the same. The table contains about 295,000 documents, and

>  Request  | Estimated rows | Real rows
> --++---
> 'word1'   | 38050  | 37500
> 'word1 word2' | 4680   | 32000
> 'word1 word2 word3'   | 270| 12300
> 'word1 word2 word3 word4' | 10 | 9930
> 'word1 word2 word3 word4 word5'   | 1  | 9930
>
> You can see that with more words in query, the estimation falls far behind
> reality.

I'm not really familiar with FTS but, doing a few division of
estimations and rows it seems it estimates as uncorrelated words, and
you real rows clearly indicate some of them are clearly correlated (
like w1/w2 and w4/s5, and partially w3/w45 ) and very common.

> Is that a known limitation of the FTS indexing ? Am I missing something
> obvious, or a poor configuration ?

Someone more familiar with it needed for that, but what I've found
several times is FTS does not mix too well with relational queries at
the optimizer level ( as FTS terms can have very diverse degrees of
correlation, which is very difficult to store in the statistics a
relational optimizer normally uses ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Turning slave into a master - PostgreSQL 9.2

2016-11-01 Thread Francisco Olarte
Hi:

On Tue, Nov 1, 2016 at 3:01 AM, Patrick B <patrickbake...@gmail.com> wrote:
> If I change recovery.conf:
> recovery_target_time = '2016-10-30 02:24:40'
> I get error:
> FATAL:  requested recovery stop point is before consistent recovery point

It looks clear. When you take a base backup from a running system some
recovery is needed to make it consistent. With the target time you can
limit how much is done. But there is a minimum. Think of it, if you
stated '1970-01-01' it would be clearly imposible, your date is
bigger, but still imposible, try raising it a bit.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libpq heartbeat

2016-10-27 Thread Francisco Olarte
Merlin:

On Thu, Oct 27, 2016 at 7:29 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Thu, Oct 27, 2016 at 11:18 AM, Francisco Olarte
> <fola...@peoplecall.com> wrote:
>> It is, but handling them is not easy, and you have to deal with things
>> like DoS which are not trivial on the server ( as it is a heavy
>> service ). It can be done, and sometimes needs to be done, but is not
>> a thing to take over lightly.
>>> This could be over ssh tunnel for example.
>> In which case it is NOT exposed to the internet. What are you trying to say?
>
> what?   ssh can most certainly convey over the internet.   I said ssh
> *tunnel*; not ssh.   With tunneling the ssh endpoint is the client
> application.   When I built a libpq based intenet facing application
> we used a modified pgbouncer to whitelist the parameterized query
> strings and to force the auth.  We had zero issues.

I'm not a native English speaker, so I have some problem understanding
the finer details

I said libpq service/protocols are tricky to put on the internet.

You replied, among other things, it could be over an ssh tunnel ( I
use ssh tunnels continuously, to the point I routinely open/close them
on live connections via escape, so I know  they are ).

The I said in that case it is NOT exposed to the internet, trying to
mean libp/postgres is NOT exposed to the internet. They are exposed to
the SSH tunnel endpoint. Of course, payload still goes through the
interrnet, but they are not directly exposed.

As an example, leaving potential SSL and encryption usage in postgres
aside. If I expose postgres on the internet you can mount a DoS attack
against me, and postmaster will have to defend against it. IIRC
postmaster does not fork until client is authenticated, but even then
I suspect its accepting code is not dessigned with the same care to
deal with hostile connection attempts as the one in ssh, or even on a
web server. So you may be able to DoS me with much less resources than
you would need to DoS the ssh server.

And, also, if you are exposing the postmaster directly I think client
and server certificates are a must, aside from passwords. But I doubt
someone who does not know about the keepalive stuff like the OP uses
more than server certs and passwords, although I hope I'm proven wrong
by him. In general I use the tunneling approach for postgres, as I my
security guys continuously monitor and patches the ssh servers. But,
if he were dealing with that kind of stuff and told me I would have
pointed him in the direction of ssh[d]_config to enable ssh keepalives
for his firewall problems and not said a word about the non-existent
perils of exposing the postmaster.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libpq heartbeat

2016-10-27 Thread Francisco Olarte
Tom:

On Thu, Oct 27, 2016 at 6:32 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Francisco Olarte <fola...@peoplecall.com> writes:
>> Isn't this a server setting, and so going to affect every connection,
> Yes,

Ok, just checking.

> but there are equivalent libpq parameters for firing heartbeat
> pings from the client side.  Those are per-connection.
> https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS

That's what I told the OP ( with nearly the same link ).

Thanks.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libpq heartbeat

2016-10-27 Thread Francisco Olarte
Merlin:

On Thu, Oct 27, 2016 at 6:10 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Thu, Oct 27, 2016 at 10:01 AM, Francisco Olarte
> <fola...@peoplecall.com> wrote:
>> And I'd like to point libpq sessions does not sound to be the best
>> kind of traffic across a firewall, not a good service / protocol to
>> expose.

> meh -- it's perfectly fine to expose postgres to the internet as long
> as you've handled the security concerns.

It is, but handling them is not easy, and you have to deal with things
like DoS which are not trivial on the server ( as it is a heavy
service ). It can be done, and sometimes needs to be done, but is not
a thing to take over lightly.

> This could be over ssh tunnel for example.

In which case it is NOT exposed to the internet. What are you trying to say?

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libpq heartbeat

2016-10-27 Thread Francisco Olarte
On Thu, Oct 27, 2016 at 5:01 PM, hubert depesz lubaczewski
<dep...@depesz.com> wrote:
> On Thu, Oct 27, 2016 at 04:43:55PM +0200, Marcin Giedz wrote:
...
>> I'm wondering if there is any natural implementation of heartbeats in
>> libpq library?
...

> check tcp_keepalives_* in postgresql.conf

Isn't this a server setting, and so going to affect every connection,
being it from the (affected) libpq connections or from other sources (
like jdbc, although he may want keepalives for those too )?

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libpq heartbeat

2016-10-27 Thread Francisco Olarte
Hi Marcin:

On Thu, Oct 27, 2016 at 4:43 PM, Marcin Giedz <marcin.gi...@arise.pl> wrote:
> I'm wondering if there is any natural implementation of heartbeats in libpq
> library? We've been facing specially in firewall env occasionally session
> drops between client and server. Extending session timeout directly on
> firewalls is not possible (company rules). Perhaps there is such mechanism
> "somewhere" to avoid reimplementation ?

RTFM? Now, seriously. Have you read
https://www.postgresql.org/docs/9.6/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS,
specifically search for keepalive. Keepalives generate traffic which
normally keeps overzealous firewalls happy, I have used it before
successfully.

And I'd like to point libpq sessions does not sound to be the best
kind of traffic across a firewall, not a good service / protocol to
expose.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] streaming replication and WAL

2016-10-25 Thread Francisco Olarte
I may be confused but...

On Tue, Oct 25, 2016 at 5:08 PM, t.dalpo...@gmail.com
<t.dalpo...@gmail.com> wrote:
> These servers are configured for Sync streaming replication .
> Let's suppose that the standby stays down for a long time, then it restarts,

Doesn't sync replication plus standby down mean primary will stop
accepting work?

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strange? BETWEEN behaviour.

2016-10-20 Thread Francisco Olarte
Bjørn:

On Thu, Oct 20, 2016 at 4:13 PM, Bjørn T Johansen <b...@havleik.no> wrote:

> Yes, the field name is actually dato but changed it to the English variant..

I routinely have databases with english column names for nearly
everything except 'fecha', spanish for date, and 'tipo', spanish for
type which sometimes collides, and things like 'tabla' and 'columna' (
those are easy to translate ) to avoid this kind of problems.

Anyway, when having problems try to just cut & paste the code as it
failes, because something the subtle detail is precisely what is 'lost
in translation'. Unless you abuse things like ø or ö or things like
these people do not normally have problem running them ( in spanish we
just have to avoid tildes in vowels and ñ and are fine ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strange? BETWEEN behaviour.

2016-10-20 Thread Francisco Olarte
On Thu, Oct 20, 2016 at 1:51 PM, Bjørn T Johansen <b...@havleik.no> wrote:
> I have the following SQL:
> SELECT * from table WHERE date BETWEEN to_timestamp('20.10.2016 
> 00:00:00','DD.MM. HH24:MI:SS') AND to_timestamp('20.10.2016 
> 23:59:59','DD.MM.
> HH24:MI:SS')
> date is of type timestamp.
> I was expecting to get all the records that had datepart = 20.10.2016 but I 
> am not getting that..
> What am I missing?

As it has been pointed, show your data, show your expected but missing values.

Besides, some general comments.

Is this a real query? Because date is a reserved word ( and gives
problems in many places )... a quick test shows it works in this
context, but using identifiers as column names has bitten me before.

Whenever you are trying to get intervals on a dataype which models a
real number ( like timestamp, which is like a point in the line of
time ) is better to always use half-open intervals ( because they can
cover the line, unless closed and open ones ). ( It's not the same for
dates, which model a day, an integer, countable number ).

This means, instead of your query prefer to use:

SELECT * from table
WHERE date >= to_timestamp('20.10.2016 00:00:00','DD.MM. HH24:MI:SS')
  AND date <  to_timestamp('21.10.2016 00:00:00','DD.MM. HH24:MI:SS')

This even let's you write the query for a single day in a very clean way:

SELECT * from table
WHERE date >= '2010-10-20'::date
  AND date <  '2010-10-20'::date + '1 day'::interval

I have to made a lot of queries for ts ( really tstz ) ranges @work
and this helps a lot.

Second advise, test your queries piecewise. If you test your constants:

n=> select to_timestamp('20.10.2016 00:00:00','DD.MM.
HH24:MI:SS'),to_timestamp('20.10.2016 23:59:59','DD.MM.
HH24:MI:SS');
  to_timestamp  |  to_timestamp
+
 2016-10-20 00:00:00+02 | 2016-10-20 23:59:59+02
(1 row)

You'll see you are building timestamp WITH time zone, not plain
timestamps. I think this is not going to have influence in your
queries, but better convert explicitly ( as it can bite you in some
ocasions ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strange? BETWEEN behaviour.

2016-10-20 Thread Francisco Olarte
Bottom quoting makes it difficult to reply properly, so reformated a bit:

On Thu, Oct 20, 2016 at 2:02 PM, William Ivanski
<william.ivan...@gmail.com> wrote:
> Em qui, 20 de out de 2016 às 09:52, Bjørn T Johansen <b...@havleik.no>
> escreveu:
>> date is of type timestamp.
> select * from table where date = '2016/10/20'::date

I think is the other way round ( date::date = '2016/10/20' ).

To me it seems yours will do:
date = '2016/10/20'::date::timestamp ( = 2016/10/20 00:00:00 )
( widening conversion )

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tablesample performance

2016-10-18 Thread Francisco Olarte
Andy:

On Tue, Oct 18, 2016 at 7:17 PM, Andy Colson <a...@squeakycode.net> wrote:
> Ah, yes, you're right, there is a bit of a difference there.
>
> Speed wise:
> 1) select one from ones order by random() limit 1;
>> about 360ms
> 2) select one from ones tablesample bernoulli(1) limit 1 ;
>> about 4ms
> 3) select one from ones tablesample bernoulli(1) order by random() limit 1;
>> about 80ms

Expected. It would be nice if you had provided some tbale structure / size data.
>
> Using the third option in batch, I'm getting about 15 transactions a second.
>
> Oddly:
> select one from ones tablesample bernoulli(0.25) order by random()
> takes almost 80ms also.

mmm, it depends a lot on you total rows and average rows per

> bernoulli(0.25) returns 3k rows
> bernoulli(1) returns 14k rows

This hints at 1M4 rows (14k / 1%). If your rows are small and you have
more than 400 rows per page I would expect that, as .25% sample would
hit every page.

Tome hinted you at an extension. Also, if you are in a function (
which can loop ) you can do a little trick, instead of bernouilli(1)
use bernouilli (N/table_size). This way you will select very few rows
and speed up the last phase. Anyway, I fear bernouilly must read all
the table too, to be able to discard randomly, so you may not win
nothing ( I would compare the query time against a simple 'count(one)
query', to have a benchmark of how much time the server expends
reading the table. I would bet for 'about 80 ms'.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tablesample performance

2016-10-18 Thread Francisco Olarte
On Tue, Oct 18, 2016 at 5:06 PM, Andy Colson <a...@squeakycode.net> wrote:
> I wanted to report an awesome performance boost using tablesample.
> In my stored function I was getting a random row using:
> select one into x from ones order by random() limit 1;
> When the table was smaller it worked fine, but the performance has slowly
> gotten worse.  This morning I was getting around 8 transactions a second.

Which is not a surprise, as it has to at least read all the rows and
generate a random() for each one and keep track of the minimum.

> I just replaced it with:
> select one into x from ones tablesample bernoulli(1) limit 1;

This should be faster, but to me it seems it does a different thing.
This seems to select each row of the table with probability 1% and
return the first selected, i.e., something similar to

select one into x from ones where random()>0.01 limit 1.

Which has the ( diminishing with table size ) risk of selecting zero
rows and is going to select one of the first 100 or so rows with high
probability, unless I'm missing something.

I say this because docs state ir returns a 'randomly chosen', sample,
not a 'randomly ORDERED' one, and the straightforward implementation
of sampling returns rows in the primitive scan order. I supose it
could be easily tested by selecting bernouilli(100), but have not
server access now to verify it.

With a big table it seems:

select one into x from ones where random()>0.01 order by random() limit 1
or
select one into x from ones tablesample bernoulli(1) order by random() limit 1;

Is more similar to what you originally did ( and the run time should
possibly be something in between ).


I would recomend you to execute the function and verify it does what
you want ( as you say it's fast, I would try selecting a several
thousands and eyeballing the result, if it does what I fear the
grouping should be obvious ).

Maybe you do not mind it, in which case it's ok, but a one minute run
should let you know wahat you are exactly doing.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] My DB has has 5TB, many operations are very slow (on Google Cloud Compute)

2016-10-12 Thread Francisco Olarte
Melvin:

On Tue, Oct 11, 2016 at 9:50 PM, Melvin Davidson <melvin6...@gmail.com> wrote:
> >Requiring and exclusive table lock does not imply slownes. Just try
> >'lock table x in exclusive mode' on an idle system. Pretty fast.
> Sure on an idle system, you will get a table lock right away, but OP's 
> statements imply a large busy system.
May be OP, but not PP ( previous poster ).

Had you not merged two replies in one and pruned the context too much,
you could have read yourself ( not too sure, maybe I have my local
copies of mail borked and it was other person ) saying, just before
this:

> FYI, moving between tablespaces requires an exclusive table lock, so it's 
> naturally going to be slow.

English is not my mother tongue, but this seems to imply slowness
being blamed on the table lock, maybe someone more knowledgeable in
the finer details of english language can explain it for to me if it
is not the case.


> And if there are transactions occurring against that table, there is no 
> telling how long it will take.
> Since we do not have enough specific info, I stand by my statement.

I would not expect less. I do not remember where the OP stated a busy
system, but anyway the lock is going to execute fast and but with a
long delay, and counting the time form the issuing of the command to
the time of end is a perfectly reasonable way to do it.

Anyway, ok, exclusive locks cause the slownes.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] My DB has has 5TB, many operations are very slow (on Google Cloud Compute)

2016-10-11 Thread Francisco Olarte
Rakesh:

On Tue, Oct 11, 2016 at 9:00 PM, Rakesh Kumar
<rakeshkumar...@outlook.com> wrote:
>>Cores do not help, postgres is single-threaded. RAM MAY help, but I
> I hope this is no longer true from 9.6 for those queries where PG can use 
> parallelism.

It does, AFAIK, but for queries, not AFAIK for this kind of data
moving ops ( and I doubt it will, as presently you can easily saturate
the channels with a single core for that kind of simple ops, and
normally if you want to optimize this kind of op is better to target
concurrency ( table can be used while moving ) than pure speed .

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] My DB has has 5TB, many operations are very slow (on Google Cloud Compute)

2016-10-11 Thread Francisco Olarte
Melvin:

On Tue, Oct 11, 2016 at 8:33 PM, Melvin Davidson <melvin6...@gmail.com> wrote:

> FYI, moving between tablespaces requires an exclusive table lock, so it's 
> naturally going to be slow.

Requiring and exclusive table lock does not imply slownes. Just try
'lock table x in exclusive mode' on an idle system. Pretty fast.

The lock may lengthen the operation due to potential delays.

And moving between tablespaces on a big db may well be a fast ( speed
) but long ( time ) process, lots of things have to be done, but they
can be done rather fast. But even if light is fast, it still needs 550
years to reach Antares. You can have a short slow operation ( like
needing a second to get an indexed tuple ) or a fast long one ( like
needing an hour to move a petabyte of data ).

To judge if its going slow we will need a reference, like, how much
time does it take to copy and sync a big uncached file between the
affected volumes. If move does say, 1.5 times slower I wouldn't say it
is that slow ( given copy is optimized for this kind of transfers and
a database not so much ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] My DB has has 5TB, many operations are very slow (on Google Cloud Compute)

2016-10-11 Thread Francisco Olarte
On Thu, Oct 6, 2016 at 12:36 AM, David A <da...@scalaacademy.com> wrote:
> My DB has has 5TB, it's on Google Cloud Compute, mostly on SSDs, part on
> Standard Persistent Disks.
> Querying is ok, but deleting, moving between tablespaces, dropping tables,
> etc, etc is hugely slow
> (note: I do have a number of indexes)

Are you sure it's slow and not just lengthy? Hostings tend to have and
publish limits on their bandwidths and similar things, and may be you
are just saturating your capacity. If you can at least compare the
time to move a tablespace against the time of copying a similarly
sized file between the same disks someone may be able to say
something.

> Instance RAM: 60GB
> Instance CPU: 16Cores

Cores do not help, postgres is single-threaded. RAM MAY help, but I
suspect your operations are IO bound. Of course, with the sparseness
of the details, one can not say too much.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transactional-DDL DROP/CREATE TABLE

2016-10-06 Thread Francisco Olarte
Geoff:

On Thu, Oct 6, 2016 at 5:43 PM, Geoff Winkless <pgsqlad...@geoff.dj> wrote:
> Nope. Serializable ignores the DROP, and then freezes on CREATE (and
> then fails when the first transaction COMMITs).

Yep, I tested it too.

> Which is also broken,
> because the transaction should fail if (at COMMIT time) the table
> cannot be CREATEd, but that's no reason to not create a table within a
> temporary namespace and perform actions against that table until the
> COMMIT, at which point the table can either be validated systemwide or
> the transaction rolled back.

Well, that maybe a nice new addition to the standard, but I doubt it
would fly. If you want that kind of behaviour you should implement
them app-side, they are not that difficult.


>
>> And drop table if exsits means if it exists when the
>> server executes your command, not on the future ( the server cannot
>> know if it will exist then, your own transaction may recreate it or
>> not. Maybe you know your command sequence is not going to depend on
>> intermediate results, the server does not ).
> Then that effectively makes the IF EXISTS useless, because it might in
> fact exist by the time the transaction is committed.

Lots of people find it useful as it is. Is just that normally people
do not try to interleave conditional drop plus create on interleaved
transactions without an upper level retrying loop and expect it to
magically work as they think it should.

And, following that train of thought unconditional drop is useless,
because by commit time table may not exists, and select is useless,
because at commit time rows may not exist or have other values.


> The point of a DROP ... IF EXISTS should surely be that after the
> command, that table should no longer exist, either because it didn't
> exist or because it has been dropped (indeed, the result of "DROP...IF
> EXISTS" is "DROP TABLE").

That exactly what is does ( unless your transaction aborts in the command ).

> The idea that this can't be done at
> commit-time because people might use the NOTICE response as some sort
> of branch is (IMO) logically bankrupt: you can quite happily test for
> existence without requiring any sort of atomic DROP, if that's your
> intention.

You are contradicting yourself. First you say after the command it
must not exist. Then you say to do it at commit time. If it is done at
commit time you cannot guarantee it does not exist after the command.
And using the branch for testing is not logically bankrupt, atomic
operations with responses are there for a second, this is why
processors have 'test and set' and 'compare-exchange' and similar.
This one is similar to a test and set, you set existence to false and
test whether it existed before. I can easily test and then set, but is
not the same as TAS. And the notice is not the reason it is not done
at commit time, the reason is the one you said, action must be taken
when you issue the command, not a magic convenient time in the future.


Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transactional-DDL DROP/CREATE TABLE

2016-10-06 Thread Francisco Olarte
Adrian:

On Thu, Oct 6, 2016 at 4:31 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote:
> This is how I can trigger the ERROR:

This is how you can trigger the ISSUE, IMO it is a correct behaviour.

Anyway, your example lacks some important details:
1.- SHOW your isolation level.
2.- SHOW your command interleaving.

Here is an example session where IT WORKS like you pretend, and the
table exists before starting showing those details:

\set PROMPT1 'session1-%`date +%H:%M:%S` [%x]'
session1-17:27:26 []start transaction isolation level serializable;
START TRANSACTION
session1-17:27:35 [*]drop table if exists ddl_test;
DROP TABLE
session1-17:27:44 [*]create table ddl_test(id int);
CREATE TABLE
session1-17:28:03 [*]commit;
COMMIT
session1-17:28:10
-
\set PROMPT1 'session2-%`date +%H:%M:%S` [%x]'
session2-17:27:29 []start transaction isolation level serializable;
START TRANSACTION
session2-17:27:39 [*]drop table if exists ddl_test;
GAP
DROP TABLE
session2-17:28:10 [*]create table ddl_test(id int);
CREATE TABLE
session2-17:28:23 [*]commit;
COMMIT
session2-17:28:28

in the ***GAP*** mark session 2 was blocked, and it unblocked when I
issued commit in session 1. ( note the timestamps of command end are
the ones starting the next line, and except the one I marked they were
nearly instant )

Note how you can follow the command interleaving and the isolation level.

OTOH, as you point, not having the table shows the issue again:

session1-17:33:56 []start transaction isolation level serializable;
START TRANSACTION
session1-17:33:59 [*]drop table if exists ddl_test;
NOTICE:  table "ddl_test" does not exist, skipping
DROP TABLE
session1-17:34:08 [*]create table ddl_test(id int);
CREATE TABLE
session1-17:34:19 [*]commit;
COMMIT

session2-17:28:28 []start transaction isolation level serializable;
START TRANSACTION
session2-17:34:04 [*]drop table if exists ddl_test;
NOTICE:  table "ddl_test" does not exist, skipping
DROP TABLE
session2-17:34:13 [*]create table ddl_test(id int);
ERROR:  duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(ddl_test, 2200) already exists.
session2-17:34:30 [!]

This time session 2 stopped at the create table and direcly aborted
when session1 commited. Correct, IMO, although antiestetic behaviour.
I think it is due to drop being a no-op if table did not exist, as
commands are not postponed ( it must show you the notice or not before
completing ), so you are just issuing to create commands for the same
table.


Your serial postponed execution is a nice desire, but I doubt it is necessary .

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transactional-DDL DROP/CREATE TABLE

2016-10-06 Thread Francisco Olarte
Hi Geoff:

On Thu, Oct 6, 2016 at 1:58 PM, Geoff Winkless <pgsqlad...@geoff.dj> wrote:
> But surely Transactional DDL implies that (it should appear that) nothing
> happens until transaction-commit. That means "drop table if exists" should
> drop the table if it exists at commit time, not drop the table if it didn't
> exist when the code was first run.

I'm not sure even transactional DML works that way. Bear in mind you
are supposed to be informed of the result of your commands. I mean,
you issue drop if exists and then you are allowed to issue a different
command depending on the result of the drop, i.e., you may be willing
to recreate the table if it existed and not create it if not, so the
drop must record your intentions and lock the table definition, like a
delete does with data rows.

> If the other transaction hasn't committed, then it should either fail with
> rollback when committed (because it tried to create a table that exists at
> commit time) or drop the new table (because it also has a drop clause).

It depends on the implementation. IIRC with serializable isolation
level you are guaranteeed a final result coherent with some serial
order of execution of all the completed transactions, but even there
you are not guaranteed it will find an order of execution for all of
them, some may be aborted. Trying to do what you pretend will result
in an extremely deadlock-prone system.

And anyway, what isolation level are you working on? Because it seems
you are using a weaker one than serializable, as I think serializable
should give you more or less what you are expecting ( not on commit
time, but second drop could probably get stuck until first transaction
commits ).

And surely Transactional D*L does not imply what you state in all
isolation levels. And drop table if exsits means if it exists when the
server executes your command, not on the future ( the server cannot
know if it will exist then, your own transaction may recreate it or
not. Maybe you know your command sequence is not going to depend on
intermediate results, the server does not ).


Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transactional-DDL DROP/CREATE TABLE

2016-10-06 Thread Francisco Olarte
On Thu, Oct 6, 2016 at 11:21 AM, Geoff Winkless <pgsqlad...@geoff.dj> wrote:
> DROP TABLE IF EXISTS mytable; CREATE TABLE mytable 
>
> Occasionally this produces
>
> ERROR: duplicate key value violates unique constraint
> "pg_type_typname_nsp_index" DETAIL: Key (typname,
> typnamespace)=(mytable, 2200) already exists.
>
> I can get away from this by using CREATE TABLE IF NOT EXISTS in the
> same code, but there's the potential that the wrong data will end up
> in the table if that happens, and it also seems a little odd.
>
> Would you not expect this transaction to be atomic? ie at commit time,
> the transaction should drop any table with the same name that has been
> created by another transaction.

It seems to be atomic, either it drop/creates or does nothing. What
you want is a beyond atomicity. What does the other transaction do?
What if the other transaction hasn't commited? or it has created the
table anew ( no drop, the table wasn't there ). What are the isolation
levels involved?

If all the transactions operating in the table are doing just what you
show an nothing more, and they are all serializable, I MAY expect
that, but wouldn't put much money in it.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] executing os commands from a function

2016-09-30 Thread Francisco Olarte
Armand

On Thu, Sep 29, 2016 at 11:41 PM, Armand Pirvu (home)
<armand.pi...@gmail.com> wrote:
> I know this may sound like heresy since it involves executing an OS command 
> from a function , but here goes
> After an insert in a table, I want to touch a file

> I used plsh extension but I had to use two functions and a trigger, see code 
> below

> It works but can I be simpler ? Any other alternatives ? In Ingres for 
> example I can use dbevent and an esqlc app which listens

If you are superuser ( which I supose you must be to execute plsh )
you could try to put code like this in a plpgsql, or may be even sql,
security definer ( or plain if not needed ) function:

( slighly sanitized )
In the psql prompt of a client machine:

n=# copy (select 1 as c where false) to '/tmp/ptxtst';
COPY 0

In the server machine:

postgres@server ~ $ stat /tmp/ptxtst
  File: '/tmp/ptxtst'
  Size: 0 Blocks: 0  IO Block: 4096   regular empty file
Device: 802h/2050dInode: 4721101 Links: 1
Access: (0644/-rw-r--r--)  Uid: (   70/postgres)   Gid: (   70/postgres)
Access: 2016-09-30 17:31:21.024617892 +0200
Modify: 2016-09-30 17:31:21.024617892 +0200
Change: 2016-09-30 17:31:21.024617892 +0200
 Birth: -

Further details left for the reader.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Monitor pg_xlog size via SQL with postgres 9.4

2016-09-22 Thread Francisco Olarte
Hello:

On Thu, Sep 22, 2016 at 2:23 PM, Sylvain Marechal
<marechal.sylva...@gmail.com> wrote:
> is there a way to monitor the size of the pg_xlog directory in SQL? The goal
> is to monitor the pg_xlog file without ressorting to a 'du' like solution
> that needs a direct access to the machine.

Well AFAIK SQL doesn't even have the concept of a directory (
functions callable from SQL provide it, but SQL is for databases &
AMOF you can have a perfectly valid SQL db without disks ), so I
assume you want some way to get at the machine disk usage without
opening a shell on it, from a db client. You could try one of the
functions in

https://www.postgresql.org/docs/9.5/static/functions-admin.html#FUNCTIONS-ADMIN-GENFILE

and many of the untrusted programming languages for postgres functions
( plperl, plpython, etc ) has methods of calling stat in the server.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] large number dead tup - Postgres 9.5

2016-09-12 Thread Francisco Olarte
Hi:

On Mon, Sep 12, 2016 at 1:17 AM, Patrick B <patrickbake...@gmail.com> wrote:
>> schemaname relname   n_live_tup n_dead_tup
>> -- - -- --
>> public parts 191623953  182477402
...
> Because of that the table is very slow...
> When I do a select on that table it doesn't use an index, for example:
> \d parts;
>> "index_parts_id" btree (company_id)
>> "index_parts_id_and_country" btree (company_id, country)
> explain select * from parts WHERE company_id = 12;
>> Seq Scan on parts  (cost=0.00..6685241.40 rows=190478997 width=223)
>>   Filter: (company_id = 12)

You've already been directed to check table is really getting vacuumed
/ analyzed, but I'd like to point that if the count estimates are
nearly correct that plan is good ( it's estimating getting more than
99% of the table, a seq scan tends to beat index scan easily when
selecting that big part of the table, even accounting for dead tuples
it's more about 50% of the table, and a seq scan is much faster PER
TUPLE then an index scan ( and and index scan would likely touch every
data page for that big fraction, so reading all of them sequentially
and oing a quick filter is easier )).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUIDs & Clustered Indexes

2016-08-30 Thread Francisco Olarte
Luke:

On Tue, Aug 30, 2016 at 3:59 PM, Luke Gordon <gord...@gmail.com> wrote:
> I'm trying to decide on which UUID generator to use for my Postgres
> database, and I've narrowed it down to gen_random & uuid_generate_v1mc.
>
> There's a fascinating article that discusses performance implications
> between gen_random_uuid & uuid_generate_v1mc:
> https://www.starkandwayne.com/blog/uuid-primary-keys-in-postgresql/
> TL;DR, the article suggests:
> "Random produces very fragmented inserts that destroy tables. Use
> uuid_generate_v1mc() [instead].."

He probably means destroy index, not tables, tables are not index
ordered in postgres ( like some kind of clustered tables in other
products )

>
> However, according to a message on this mailing list, Postgres doesn't have
> clustered indexes:
> "But Postgres doesn't _have_ clustered indexes, so that article doesn't
> apply at all. The other authors appear to have missed this important point."
> https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu


No, he hasn't. If you are generating them to use as a kind of primary
key, like a serial, having a timestamp based means they are generated
in ascending order, which means you append on the table ( at the end )
and on the index, so they end up better filled and perform better.
Specially in the index, ordered insertions tend to be well optimized,
being a common case, and perform quite well. It's not going to be as
advantegous as on a clustered table system, but it will help ( but not
that much ).



> But, doing a quick check, it appears Postgres does indeed have a mechanism
> for a clustered index:
> https://www.postgresql.org/docs/9.5/static/sql-cluster.html
> So, does anyone know which is best? Or are the performance differences so
> minute they'd only matter in extremely unique circumstances?

Cluster just resorts the table and rebuild the index, it's already
being told in other messages. It's equivalent to create temp table tt
as select * from the_table, truncate th_table, insert into the_table
select * from tt order by index_expression, drop table tt. It is nice
to do it for tables that are normally ordered but somehow lost it.
Like having a log table with an indexed field for insertion timestamp
and updating it a lot, or purging many old records. As you normally
would typically query it with a range condition on the timestamp, a
cluster will help.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_hba.conf : bad entry for ADDRESS

2016-08-25 Thread Francisco Olarte
Hi Arnaud:

On Thu, Aug 25, 2016 at 4:35 PM, arnaud gaboury
<arnaud.gabo...@gmail.com> wrote:
>> Are this all the contents of you pg_hba.conf? Note order matters, all
>> non comment ( or at least the host ones ) need to be checked .
> Here is the whole content:
>  79 local   thetradinghall  mailman peer
> map=mailmap
>  80 local   all postgrestrust
>  84 host mattermost mmuser  127.0.0.1/24md5

.. This looks good once you've added the netmask, ehich slipped to me.

>> Also, did you signal the postmaster to reread after adding the line?
> What do you mean?

When you change the file you need to signal the postgres main process
( postmaster ) to reread it by sending it a HUP signal, or using
pg_ctl reload ( your OS/distro may have other methods ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_hba.conf : bad entry for ADDRESS

2016-08-25 Thread Francisco Olarte
On Thu, Aug 25, 2016 at 4:28 PM, arnaud gaboury
<arnaud.gabo...@gmail.com> wrote:
> On Thu, Aug 25, 2016 at 4:26 PM, Ilya Kazakevich
> <ilya.kazakev...@jetbrains.com> wrote:
>>>I entered this line in pg_hab.conf:
>> Are you sure your file name is correct and it is really used by postgres?
> I think so as another service (Postfix) is running and working.

It has nothing to do with it, except if postfix is using postgres.

> How can I verify ?

If you used hab, it is wrong, if you used hba, consult the docs for
your version & os and check.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_hba.conf : bad entry for ADDRESS

2016-08-25 Thread Francisco Olarte
Hi Arnaud:
On Thu, Aug 25, 2016 at 4:18 PM, arnaud gaboury
<arnaud.gabo...@gmail.com> wrote:
> There is a public IP with a domain name (http works OK).
Nice to know, but does not matter if all you use is 127.0.0.1


> I entered this line in pg_hab.conf:

Have you checked the filename? you are saying HAB, but it is HBA (
Host Based Auth ) . May be a typo, but better safe then sorry. And
have you checked it is stored in the right place?


> --
>  host mattermost mmuser  127.0.0.1   md5
> 

Are this all the contents of you pg_hba.conf? Note order matters, all
non comment ( or at least the host ones ) need to be checked .


Also, did you signal the postmaster to reread after adding the line?

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ON CONFLICT does not support deferrable unique constraints

2016-08-25 Thread Francisco Olarte
On Wed, Aug 24, 2016 at 9:22 PM, Andreas Joseph Krogh
<andr...@visena.com> wrote:
> As a developer I want it to "just work", if there's an error of any kind then 
> abort the transaction, just as it was non-deferrable.

Everybody wants everything to "just work", for their own ( in a lot of
cases unspecified even to the themselves ) definition of JW.
Developers normally want to be told what "just work" means when
developing something.

That been said, I'm not sure making it ( deferred constraint act like
immediate ones during upserts ) work is even a good idea. If it can be
conditionally enabled with a simple set and implemented in very few (
< 20 ) lines of code, ok for me , otherwise I would prefer the reduced
bug surface.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ON CONFLICT does not support deferrable unique constraints

2016-08-24 Thread Francisco Olarte
On Wed, Aug 24, 2016 at 6:26 PM, Scott Marlowe <scott.marl...@gmail.com> wrote:
> On Wed, Aug 24, 2016 at 6:05 AM, Andreas Joseph Krogh
>> Are there any plans to lift this restriction?
> I'm trying to figure out a method for making this work in my head.
> These two things seem kind of opposed to each other.

He probably wants to just treat is as non-deferrable just during the
upsert. I do not know if he has thought this opens a can of worms (
like, the constraint may be already broken due to precious DML ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sequential vs. random values - number of pages in B-tree

2016-08-23 Thread Francisco Olarte
Hi Rob:

On Tue, Aug 23, 2016 at 4:52 PM, Rob Sargent <robjsarg...@gmail.com> wrote:
> By 'this' I was referring to the optimizations mentioned, and am wondering
> if this holds true under user load.

For that you'll have to refer to the source, or ask someone more
versed in pg source arcanes.

> Much magic can happen in a custom data
> load, but do these optimization apply to an application loading single (or
> perhaps several) records per transaction.  Does one, in that scenario, not
> suffer any consequence for continuously loading one side of the tree (the
> rightmost node?).

Not that much magic is neccesary. The time I did it I just needed to
detect on every insertion whether I was at the rightmost position (
made easier because I had minimum/maximum keys cached in the tree
object header ), and have a special routine for inserting a new last
node ( put in last page, whose pointer I had, grabbing a new one of
needed, whose pointer will be appended at the tail of the parent,
etc.., it was just a pruned down version of the general insert
routine, but made insertions run easily 20 times faster by avoiding
nearly every check knowing I was on the right edge ). I do not know if
pg inserts several items at a time in bulk loading, but I doubt it.
Normally every btree indexing library has some optimization for this
cases, as they are common, just like every real sort routine has some
optimization for presorted input.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sequential vs. random values - number of pages in B-tree

2016-08-23 Thread Francisco Olarte
On Tue, Aug 23, 2016 at 4:28 PM, Rob Sargent <robjsarg...@gmail.com> wrote:
> On 08/23/2016 07:44 AM, Francisco Olarte wrote:
>> On Tue, Aug 23, 2016 at 2:26 PM, pinker <pin...@onet.eu> wrote:
>>> I am just surprised by the order of magnitude in the difference though. 2
>>> and 27 minutes that's the huge difference... I did another, simplified
>>> test,
>>> to make sure there is no duplicates and the only difference between both
>>> sets is the order:
>>
>> ...
>>>
>>> INSERT INTO t_sequential SELECT * FROM source_sequential;
>>> 102258,949 ms
>>> INSERT INTO t_random SELECT * FROM source_random;
>>> 1657575,699 ms
>>
>> If I read correctly, you are getting 100s/10Mkeys=10us/key in
>> sequential, and 165 in random.
>>
>> I'm not surprissed at all. I've got greater differences on a memory
>> tree, sorted insertion can be easily optimized to be very fast. AS an
>> example, sequential insertion can easily avoid moving data while
>> filling the pages and, with a little care, it can also avoid some of
>> them when splitting. I'm not current with the current postgres
>> details, but it does not surprise me they have big optimizations for
>> this, especially when index ordered insertion is quite common in
>> things like bulk loads or timestamped log lines.

> And if each insert is in a separate transaction, does this still hold true?

What are you referring to by 'this'? ( BTW, bear in mind one
transaction needs at least a disk flush, and, if done via network, at
least one RTT, so I doubt you can achieve 10us/transaction unless you
have very special conditions ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] question on error during COPY FROM

2016-08-23 Thread Francisco Olarte
On Tue, Aug 23, 2016 at 4:06 PM, Rakesh Kumar
<rakeshkumar46...@gmail.com> wrote:
> Is it true that one datafile in PG can only belong to one object (table/index)

If this is a question, yes, AFAIK ( in fact they are split in 1G
chunks to prevent problems with quirky filesystems ). Search for "Each
table and index is stored in a separate file" in
https://www.postgresql.org/docs/9.5/static/storage-file-layout.html


A full periodic read, even if superficial, of the postgres manual, is
quite beneficial. I feel poetic today ;-> . Seriously, is quite well
written and full of interesting information, even if skipping large
chunks knowing where the info is can sava you a lot of work and mails.
AAMOF, it's one of the main reasons I've been using postgres all this
years.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] question on error during COPY FROM

2016-08-23 Thread Francisco Olarte
On Tue, Aug 23, 2016 at 2:32 PM, Ilya Kazakevich
<ilya.kazakev...@jetbrains.com> wrote:
>>does that mean that I should always execute a VACUUM to recover the
>>wasted space when an error is triggered or will the auto-vacuum mechanism
>>do the job by itself ?
> If you have autovacuum enabled it will clean up tablespace. However, space 
> will not be returned to filesystem but will be reused by database.
> You may run VACUUM FULL manually to return it to filesystem.

A normal vacuum may also return some space, specially after a big bulk
load, see second paragraph of 23.1.2 the URL you posted:
> https://www.postgresql.org/docs/9.1/static/routine-vacuuming.html

Where it says "However, it will not return the space to the operating
system, except in the special case where one or more pages at the end
of a table become entirely free and an exclusive table lock can be
easily obtained.". A big aborted bulk load may just fit the case, as
it may put a lot of tuples at new pages at the end and be executed in
a low-load period where the lock is easier to acquire.


Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sequential vs. random values - number of pages in B-tree

2016-08-23 Thread Francisco Olarte
Hi pinker:

On Tue, Aug 23, 2016 at 2:26 PM, pinker <pin...@onet.eu> wrote:
> I am just surprised by the order of magnitude in the difference though. 2
> and 27 minutes that's the huge difference... I did another, simplified test,
> to make sure there is no duplicates and the only difference between both
> sets is the order:
...
> INSERT INTO t_sequential SELECT * FROM source_sequential;
> 102258,949 ms
> INSERT INTO t_random SELECT * FROM source_random;
> 1657575,699 ms

If I read correctly, you are getting 100s/10Mkeys=10us/key in
sequential, and 165 in random.

I'm not surprissed at all. I've got greater differences on a memory
tree, sorted insertion can be easily optimized to be very fast. AS an
example, sequential insertion can easily avoid moving data while
filling the pages and, with a little care, it can also avoid some of
them when splitting. I'm not current with the current postgres
details, but it does not surprise me they have big optimizations for
this, especially when index ordered insertion is quite common in
things like bulk loads or timestamped log lines.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit Heap Fetches / Rows Removed by Filter in Index Scans

2016-08-20 Thread Francisco Olarte
Victor:

On Fri, Aug 19, 2016 at 8:01 PM, Victor Blomqvist <v...@viblo.se> wrote:

> Thanks! A sub select seems to do it.

I suspected it would be needed, that's why I sugested it.

I prefer to write the queries as CTEs because they are much easier to
read, but IIRC they are some kind of 'optimization fences'. Some times
I end up using psql macros to write queries in CTE-like order in my
scripts and compose them into sub selects when I have this problems.

> Checking these two queries I can see that the first one visits the
> max 50 rows its allowed to and returns 5 rows, while the second one
> finish off after 13 rows fetched and returns the full 10 rows.

Good. The only problem is you are not guaranteed a result, like in the
contrived example I gave, but if it is what you want this is a way to
go.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit Heap Fetches / Rows Removed by Filter in Index Scans

2016-08-19 Thread Francisco Olarte
Hi Victor:


On Fri, Aug 19, 2016 at 7:02 PM, Victor Blomqvist <v...@viblo.se> wrote:
> What I want to avoid is my query visiting the whole 1m rows to get a result,
> because in my real table that can take 100sec. At the same time I want the
> queries that only need to visit 1k rows finish quickly, and the queries that
> visit 100k rows at least get some result back.

You are going to have problems with that. If you just want to limit it
to max 100k rows, max 10 results my solution works, probably better as
nested selects than CTEs, but someone more knowledgeable in the
optimizer will need to say something ( or tests will be needed ). This
is because "the queries that visit 100k rows at least get some result
back." may be false, you may need to visit the whole 1M to get the
first result if you are unlucky. Just set ap=999 where id=1M and ask
for ap>=999 and you've got that degenerate case, which can only be
saved if you have an index on ap ( even with statistics, you would
need a full table scan to find it ).

If you are positive some results are in the first 100k rows, then my
method works fine, how fast will need to be tested with the real data.
You can even try using *10, *100, *1k of the real limit until you have
enough results if you want to time-limit your queries.


Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sequential vs. random values - number of pages in B-tree

2016-08-19 Thread Francisco Olarte
On Fri, Aug 19, 2016 at 3:20 PM, Daniel Verite <dan...@manitou-mail.org> wrote:
> There's a simple technique that works on top of a Feistel network,
> called the cycle-walking cipher. Described for instance at:
> http://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf
> I'm using the opportunity to add a wiki page:
> https://wiki.postgresql.org/wiki/Pseudo_encrypt_constrained_to_an_arbitrary_range
> with sample plgsql code for the [0..10,000,000] range that might be useful
> for other cases.

Nice reference, nice WikiPage, nice job. Bookmarking every thing.

> But for the btree fragmentation and final size issue, TBH I don't expect
> that constraining the values within that smaller range will make any
> difference
> in the tests, because it's the dispersion that matters, not the values
> themselves.

Neither do I, that is why I stated probabley good enough for tests,  but

> I mean that, whether the values are well dispersed in the [0..1e7] range or
> equally well dispersed in the [0..2**32] range, the probability of a newly
> inserted value to compare greater or lower to any previous values of the list
> should be the same, so shouldn't the page splits be the same, statistically
> speaking?

I know some btrees do prefix-coding of the keys, and some do it even
with long integers, and some others do delta-coding for integer keys.
I seem to recall postgres does not, that's whay I do not think it will
make a difference.

But anyway, to compare two things like that, as the original poster
was doing, I normally prefer to test just one thing at a time, that's
why I would normally try to do it by writing a sorted file, shuffling
it with sort -R, and copying it, server side if posible, to eliminate
so both

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Francisco Olarte
On Fri, Aug 19, 2016 at 12:44 PM, Andreas Kretschmer
<akretsch...@spamfence.net> wrote:
> for append-only tables like this consider 9.5 and BRIN-Indexes for
> timestamp-searches. But if you deletes after N weeks BRIN shouldn't work
> properly because of vacuum and re-use of space within the table.
> Do you know BRIN?
>
> So, in your case, consider partitioning, maybe per month. So you can
> also avoid mess with table and index bloat.

If done properly he can use both. For 6 weeks I would use seven
partition, current+6 previous, drop old partition weekly, so
effectively they become append only and he can use BRIN too.

Even better, if he normally inserts in batches ( it happens to me with
some log-like data, I rotate the file and insert all rotated data
periodically ) he can use a staging table ( 1 master, and inheriting
from it seven constrained week partition plus one unconstrained
staging partition). Insertions go into staging and are moved with a
small delay to the corresponding partition, using and ordered select
so they go in perfect order into their final resting place and it can
be vacuumed just after that ( if they are log lines and the maximum
delay is X you just move every row older than that from staging to the
partition with whatever period is best). Staging partition is normally
small and cached and can be processed quite fast ( with 200k/day an
hourly movement will leave staging with less than about 10k rows if
distribution is somehow uniform ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit Heap Fetches / Rows Removed by Filter in Index Scans

2016-08-19 Thread Francisco Olarte
Hi Victor:

On Fri, Aug 19, 2016 at 7:06 AM, Victor Blomqvist <v...@viblo.se> wrote:
> Is it possible to break/limit a query so that it returns whatever results
> found after having checked X amount of rows in a index scan?
>
> For example:
> create table a(id int primary key);
> insert into a select * from generate_series(1,10);
>
> select * from a
> where id%2 = 0
> order by id limit 10
>
> In this case the query will "visit" 20 rows and filter out 10 of them. We
> can see that in the query plan:
> "Rows Removed by Filter: 10"
> "Heap Fetches: 20"
>
> Is it somehow possible to limit this query so that it only fetches X amount,
> in my example if we limited it to 10 Heap Fetches the query could return the
> first 5 rows?
>
>
> My use case is I have a table with 35 million rows with a geo index, and I
> want to do a KNN search but also limit the query on some other parameters.
> In some cases the other parameters restrict the query so much that Heap
> Fetches becomes several 100k or more, and in those cases I would like to
> have a limit to my query.

Well, if you accept more abstract limits (i.e. you do not depend on
heap fetches, you just want up to 5 even IDs from the first 10 IDs )
you could try:

with base as (select * from a order by id limit 10)
select * from base where id %2 = 0 order by id limit 5;

( Or do it with a subquery instead of a CTE ).

In general

select * from table where common_condition and filter_condition order
by xx limit N

becomes

with base as (select * from table where common_condition order by xx
limit base_fecthes)
select * from base where filter_condition order by XX limit N;

In the example common_condition is non existent, put it as true,
optimize after transforming.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sequential vs. random values - number of pages in B-tree

2016-08-18 Thread Francisco Olarte
Daniel:
On Thu, Aug 18, 2016 at 5:24 PM, Daniel Verite <dan...@manitou-mail.org> wrote:
>> unless you know of an easy way to generate a random permutation on the
>> fly without using a lot of memory, I do not.
> It could be done by encrypting the stream.
> For 32 bits integers:
> https://wiki.postgresql.org/wiki/Skip32
> For 64 bits integers:
> https://wiki.postgresql.org/wiki/XTEA

Nearly, probably good enough for tests, but only generates a
pseudorandom permutation if you encrypt 2**32/64 values, not with the
1..1E7 range, it will map them into 1E7 different numbers in the range
2**32/64. I think there are some pseudo-random number generators which
can be made to work with any range, but do not recall which ones right
now.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sequential vs. random values - number of pages in B-tree

2016-08-18 Thread Francisco Olarte
Hi:

On Thu, Aug 18, 2016 at 1:32 PM, pinker <pin...@onet.eu> wrote:
...
> create table t01 (id bigint);
> create index i01 on t01(id);
> insert into t01 SELECT s from generate_series(1,1000) as s;
>
> and random values:
> create table t02 (id bigint);
> create index i02 on t02(id);
> insert into t02 SELECT random()*100 from generate_series(1,1000) as s;

It's already been told that btrees work that way, if you find it
strange read a bit about them, this is completely normal, but ...

... what I come to point is your test is severely flawed. It probably
does not matter in this case, but you are inserting 10M DIFFERENT
VALUES in the first case and only 100 in the second one, which an
average of 100K DUPLICATES of each. This affects btrees too. You could
try using random*1G, or at least 100M, for a better test ( which may
have even worse behaviour, ideally I would just write 10M integers to
a disk file, then shuffle it and compare COPY FROM times from both ) (
unless you know of an easy way to generate a random permutation on the
fly without using a lot of memory, I do not ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SQL help - multiple aggregates

2016-08-18 Thread Francisco Olarte
CCing to the list ( if you are new to this list, messages come from
the sender address, you have to use "reply all" ( at least in my MUA,
web gmail ) to make your replies appear in the list ).

On Thu, Aug 18, 2016 at 3:03 PM,  <haman...@t-online.de> wrote:
> Hi Francisco,
> thanks a lot. I will give it a try later

Do it, and do not forget to try the straightforward solution ( sume of
cases ) given by Ladislav Lenart above.I normally prefer to do this
kind of things the way I pointed you because the queries are simpler
and normally only the first one takes time, and using count tends to
be the faster way to extract the relevant data ( the rest of my query,
after the first with, is just moving data around for pretty-printing (
or pretty-selecting ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SQL help - multiple aggregates

2016-08-18 Thread Francisco Olarte
On Thu, Aug 18, 2016 at 10:56 AM,  <haman...@t-online.de> wrote:
> I have a table cv with custid and vendid columns. Every entry represents the 
> purchase of a product
> available from a specific vendor.
> Now, for a set of "interesting" vendors, I would like to select a new table
> custid, c415, c983, c1256
> based upon part queries
> select custid, count(vendid) as c415 from cv where vendid = 415 group by 
> custid


Divide and conquer, first you get the raw data ( so you have what you
need as 'vertical' tagged columns ): ( beware, untested )...

with raw_data as (
select
 custid, vendid, count(*) as c
from cv
where vendid in (415,983,1256)
group by 1,2;
)

Then put it in three columns ( transforming it into diagonal matrix ):

, column_data as (
select
 custid,
 case when vendid=415 then c else 0 end as c415,
 case when vendid=983 then c else 0 end as c983,
 case when vendid=1256 then c else 0 end as c1256
from raw_data
)

and then group then ( putting them into horizontal rows ):

select
 custid,
 max(c415) as c415,
 max(c983) as c983,
 max(c1256) as c1256
from column_data group by 1;

Note:
 I used 0 in else to get correct counts for the case where not al
vendids are present. If you prefer null you can use it, IIRC max
ignores them.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Any reasons for 'DO' statement not returning result?

2016-08-13 Thread Francisco Olarte
On Fri, Aug 12, 2016 at 11:34 PM, Xtra Coder <xtraco...@gmail.com> wrote:
...
> In my particular case I'm more interested in an easy way to create complex
> SELECTs that require usage of variables in the one-time through-away scripts
> (some-time during experiments for implementation of functions, to see
> immediate results of the intermediate code). The easiest way would be
> MsSQL-like when declaring a variable outside of SP actually makes it visible
> globally in current session. In such case I do not need 'DO' at all and this
> is simple. Probably PostgreSQL has another way to make that thing simple.

If you are just interested in avoiding some mistakes, and/or
parametrizing some queries from the command line, psql ( the CLI
program ) has macro expansion with some sql quoting capabilities, see
https://www.postgresql.org/docs/9.5/static/app-psql.html#APP-PSQL-VARIABLES
and be sure to scroll down to "SQL Interpolation" after the built in
variables list and read that. I've used it several times, just
remember it's a macro processor and it's done by psql, not by the
server.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Corrupted Data ?

2016-08-13 Thread Francisco Olarte
Hi Adrian:

On Fri, Aug 12, 2016 at 9:01 PM, Adrian Klaver
<adrian.kla...@aklaver.com> wrote:
> "Specially if this happens, you may have some slightly bad disks/ram/
> leading to this kind of problems."
>
> Trying to reconcile that with all the physical machine 1 VMs sharing the
> same RAM and physical disk, but the error only occurring on db3.
>
> Is the VM hosting db3 setup different from the VMs 1 & 2?

They probably share the disk, bus, ram and disk controllers, but they
surely do not share the disk SECTORS. "Weak" ( nearly failing ) tracks
can give this kind of problems ( although the disk CRC should catch
all odd number of bit errors , but with VMs in the mix who knows where
the messages could end up ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


  1   2   3   >