Re: [GENERAL] Build in function to verify email addresses

2017-11-16 Thread Andreas Joseph Krogh
På torsdag 16. november 2017 kl. 09:05:00, skrev Pavel Stehule <
pavel.steh...@gmail.com <mailto:pavel.steh...@gmail.com>>:
Hi   2017-11-16 8:56 GMT+01:00 Nick Dro <postgre...@walla.co.il 
<mailto:postgre...@walla.co.il>>: I beleieve that every information system has 
the needs to send emails.
Currently PostgreSQL doesn't have a function which gets TEXT and return true 
if it's valid email address (x...@yyy.com / .co.ZZ)
Do you believe such function should exist in PostgreSQL or it's best to let 
every user to implement his own function?

 
 I don't think so this functionality should be in upstream - but it is good 
use for some extension and placing it in PGXN or PostgreSQL community 
repository.
 
Postgres has good regexp support and this case can be implemented by one 
regexp.
 
you can use PLPerlu and use some modules from CPAN
 
http://search.cpan.org/~rjbs/Email-Valid-1.202/lib/Email/Valid.pm 
<http://search.cpan.org/~rjbs/Email-Valid-1.202/lib/Email/Valid.pm>
 
Regards
 
Pavel



 
In general, I see no reason for a modern RDBMS not to provide an 
email-datatype. IMV that's no different from other types which also could have 
been plain-text but are convenient to have datatypes for.
Being an open-source project I guess one must show initiative and start a 
discussion on -hackers to see what interesst there's in having one in core. I 
for one hope there will be.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] PG-10 + ICU and abbreviated keys

2017-11-13 Thread Andreas Joseph Krogh
På tirsdag 14. november 2017 kl. 00:44:11, skrev Peter Geoghegan <p...@bowt.ie 
<mailto:p...@bowt.ie>>:
On Mon, Nov 13, 2017 at 2:05 PM, Andreas Joseph Krogh
 <andr...@visena.com> wrote:
 > When sorting on text, we're usually doing so using an multi-column index, 
like for instance "CREATE INDEX xxx ON my_table (lower(name) ASC, created 
ASC)". Will abbreviated keys help here?

 Yes, they'll help with that, even though the leading column might be
 low cardinality.
 
Nice to know, thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] PG-10 + ICU and abbreviated keys

2017-11-13 Thread Andreas Joseph Krogh
På mandag 13. november 2017 kl. 22:28:40, skrev Peter Geoghegan <p...@bowt.ie 
<mailto:p...@bowt.ie>>:
On Mon, Nov 13, 2017 at 12:48 PM, Andreas Joseph Krogh
 <andr...@visena.com> wrote:
 > Thanks.

 As the person that worked on abbreviated keys, I'd like to hear about
 how you get with this. How much faster is it for you?

 I don't usually get to hear about this, because most users don't
 notice that anything in particular gets faster, because there are many
 performance enhancements added to a release.
 
We haven't migrated any of our databases to v10 yet so I really can't tell. 
I'm evaluating ICU-usage as the last step before we decide moving to v10. Being 
a per-column setting that means a pg_dump/reload won't cut it (AFAIU), so I'm 
not sure we'll take that route as it involves much manual tweaking which we're 
really not interessted in spending time on.
 
When sorting on text, we're usually doing so using an multi-column index, like 
for instance "CREATE INDEX xxx ON my_table (lower(name) ASC, created ASC)". 
Will abbreviated keys help here?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] PG-10 + ICU and abbreviated keys

2017-11-13 Thread Andreas Joseph Krogh
På mandag 13. november 2017 kl. 21:46:08, skrev Peter Geoghegan <p...@bowt.ie 
<mailto:p...@bowt.ie>>:
On Mon, Nov 13, 2017 at 12:40 PM, Peter Geoghegan <p...@bowt.ie> wrote:
 >> Do I have to explicitly specify collation when using ORDER by on that 
column for index and abbreviated keys to be used?
 >
 > Only if you didn't define the column with a per-column collation initially.

 BTW, if you specifically want to quickly verify whether or not
 abbreviated keys were used, you can do that by setting "trace_sort =
 on", and possibly setting "client_min_messages = LOG", too.

 There should be quite a bit of debug output from that that
 specifically mentions abbreviated keys.
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] PG-10 + ICU and abbreviated keys

2017-11-13 Thread Andreas Joseph Krogh
På mandag 13. november 2017 kl. 21:40:47, skrev Peter Geoghegan <p...@bowt.ie 
<mailto:p...@bowt.ie>>:
On Mon, Nov 13, 2017 at 11:48 AM, Andreas Joseph Krogh
 <andr...@visena.com> wrote:
 > Ok, so I have to explicitly specify like this:
 >
 > create table test(id serial primary key, name varchar collate "nb_NO" not 
null);

 That doesn't look like an ICU locale. You may mean "nb-NO-x-icu". But
 otherwise, yes.
 
 
Ok, is there a way I can get a list of ICU-collations?
 
 
> Will ICU be used here as long as PG is compiled with ICU-suppoert, as the 
debian-packages are, or do I have to specify collation-provider?

 If you did initdb with a version with ICU support, the ICU collations
 should be there.

 > Do I have to explicitly specify collation when using ORDER by on that 
column for index and abbreviated keys to be used?

 Only if you didn't define the column with a per-column collation initially.
 
Ok, thanks.
 
Looking forward to this being a per-database setting so it's (hopefully) more 
transparent.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] PG-10 + ICU and abbreviated keys

2017-11-13 Thread Andreas Joseph Krogh
På mandag 13. november 2017 kl. 19:07:53, skrev Peter Geoghegan <p...@bowt.ie 
<mailto:p...@bowt.ie>>:
On Mon, Nov 13, 2017 at 12:27 AM, Andreas Joseph Krogh
 <andr...@visena.com> wrote:
 > In PG-10, with ICU enabled, is abbreviated keys now enabled?

 Yes. ICU will use abbreviated keys on every platform, including Windows.

 > If so, using locale=nb_NO.UTF-8, do I have to use a ICU-specific locale to 
take advantage of abbreviated keys?

 You need to use an ICU collation. It must be a per-column collation,
 as you cannot currently use ICU for an entire database. (This
 limitation should be removed in the next release or two.)
 
Ok, so I have to explicitly specify like this:
 
create table test(id serial primary key, name varchar collate "nb_NO" not 
null);
  
Will ICU be used here as long as PG is compiled with ICU-suppoert, as the 
debian-packages are, or do I have to specify collation-provider?
 
Do I have to explicitly specify collation when using ORDER by on that column 
for index and abbreviated keys to be used?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] PG-10 + ICU and abbreviated keys

2017-11-13 Thread Andreas Joseph Krogh
Hi.
 
In PG-10, with ICU enabled, is abbreviated keys now enabled?
 
If so, using locale=nb_NO.UTF-8, do I have to use a ICU-specific locale to 
take advantage of abbreviated keys?
 
Thanks.

 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] database size changed after restoring using pg_restore

2017-11-08 Thread Andreas Joseph Krogh
På onsdag 08. november 2017 kl. 23:59:40, skrev Dylan Luong <
dylan.lu...@unisa.edu.au <mailto:dylan.lu...@unisa.edu.au>>:
Hi 

I am upgrading some databases from a PostgreSQL 9.2 server to a new 
PostgreSQL 9.6 server.

 

I used pg_dump and pg_restore for the upgrade.

 

This is the command I used to dump the database. I ran the command from the 
9.6 server.

pg_dump -h 9.2server -j 16 --format=directory -f 
/pg_backup/backup/dump/mydb-20171108.dmp -U postgres -W mydb

 

This is the command I used to restore the database on the 9.6server.

pg_restore -j 16 -C -d postgres /pg_backup/backup/dump/mydb-20171108.dmp

 

Everything appears ok, the dump and restore completed without and errors.

 

But when I listed the size of the database (postgres=# \l+) between the 9.2 
and the upgraded 9.6, they were different.

 

on 9.2 it was 3776 MB

on 9.6 it was 1570 MB

 

I also did a few more databases using the same steps and they all appeared to 
be smaller. Is that normal?


Yep.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] OR-clause support for indexes

2017-10-08 Thread Andreas Joseph Krogh
Hi.
 
There was a while ago a proposed patch for adding 
$subject; https://commitfest.postgresql.org/8/454/
 
Is this being worked on? Any progress in btree-support?

 --
 Andreas Joseph Krogh


[GENERAL] Storing LOs outside the database and having a proper cleanup-mechanism to prevent dangling files

2017-07-17 Thread Andreas Joseph Krogh
Hi.
 
After struggeling with storing LOs in the database and making efficient 
backups/restore-routines I've decided to move LOs out of the DB and use a 
"filename"-column containing an UUID referencing the file one the filesystem. 
Having this schema I need a robust book-keeping mechanism so I know when I can 
delete files on the filesystem when no longer referenced from the database.
 
In an email-program I store the whole email-body as a binary in a file and 
store all header-info in an email_message table (for this example's sake).
 
I have a book-keeping table :
 
create table origo_email_message_file_operation( filename VARCHAR not null, 
operationVARCHAR not NULL, PRIMARY KEY (filename, operation) ); 
And a email_message table holding the messages, with a "filename"-column 
referencing the BLOB.
 
For DELETE I use this routine:
 
deleteData(fileName, messageId) {
startTX()
deleteEmail() // Deletes the entry in email_message table
INSERT INTO origo_email_message_file_operation(filename, operation) VALUES(
'3b1d18ae-7b54-c055-1016-d928daec7294','DELETE'); deleteEmail(messageId)

commitTX()
}
 
So for DELETE-operations the file on disk isn't deleted by the main-program, 
but by a cleanup-job which runs as a cron-job and the 
inspects origo_email_message_file_operation for DELETE-entries and then delete 
the referenced files, then removes the DELETE-entries from 
origo_email_message_file_operation.
 
 
 
INSERT is like this:
 
insertData(fileName) {
 
startTX()
// First, register the INSERT in case it fails 
INSERT INTO origo_email_message_file_operation(filename, operation) 
VALUES('3b1d18ae-7b54-c055-1016-d928daec7294','INSERT');
 
commitTX()
 
startTX()
 
 
insertEmail() // Inserts the entry in email_message table
DELETE FROM origo_email_message_file_operation WHERE filename = 
'3b1d18ae-7b54-c055-1016-d928daec7294'AND operation = 'INSERT'; 
// If this commits there is no entry left 
in origo_email_message_file_operation and we're all good
commitTX()
 


 
}
 
UPDATE is implemented as INSERT + DELETE.
 
 
The challenge with this is what to do if INSERT rolls back. If INSERT rolls 
back then we end up with an INSERT-entry in origo_email_message_file_operation 
with no corresponding "filename"-entry in email_message. But I fail to se how a 
cleanup job can know the difference between such an INSERT-entry 
in origo_email_message_file_operation caused by ROLLBACK and and INSERT-entry 
caused by an in-progress insertEmail() operation.
 
Does anyone have a robust mechanism for cleaning up files in such scenarios? 
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Configure Qt Creator to work with PostgreSQL to extensions development

2017-06-25 Thread Andreas Joseph Krogh
På søndag 25. juni 2017 kl. 23:02:32, skrev Adrian Klaver <
adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>:
On 06/25/2017 01:34 PM, Andreas Joseph Krogh wrote:
 > På søndag 25. juni 2017 kl. 17:04:02, skrev Adrian Klaver
 > <adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>:
 >
 >     On 06/25/2017 07:34 AM, Fabiana Zioti wrote:
 >      > I'm creating an extension to PostgreSQL, with user-defined types and
 >      > user-defined functions.
 >      >
 >      > Extensions can be written in C as well as C ++, correct?
 >
 >     Given that you are asking about Qt Creator do you mean written in C++ as
 >     well as C?
 >
 >     An extension is just packaging of objects:
 >
 >     https://www.postgresql.org/docs/9.6/static/extend-extensions.html
 >
 >     I am not sure C/C++ really is a factor.
 >     Maybe if you offer more detail on what you are trying to achieve.
 >
 >      > I am currently using ATOM to develop in Ubuntu. But I would like
 >     to work
 >      > with Qt.
 >
 >     Not seeing how Qt will work in a Postgres extension?
 >
 > I think what he's asking is how to use Qt Creator as a development
 > environment for developing PG-extentions, not linking Qt in from an
 > extention.

 Even if that is the case, Qt Creator is optimized for building GUI's
 based on Qt, I suspect the OP will be fighting it to build Postgres
 extensions. Seems to me sticking with Atom would be the wiser choice. At
 any rate the above awaits more information for confirmation.
 
I'd give CLion a try: https://www.jetbrains.com/clion/
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Configure Qt Creator to work with PostgreSQL to extensions development

2017-06-25 Thread Andreas Joseph Krogh
På søndag 25. juni 2017 kl. 17:04:02, skrev Adrian Klaver <
adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>:
On 06/25/2017 07:34 AM, Fabiana Zioti wrote:
 > I'm creating an extension to PostgreSQL, with user-defined types and
 > user-defined functions.
 >
 > Extensions can be written in C as well as C ++, correct?

 Given that you are asking about Qt Creator do you mean written in C++ as
 well as C?

 An extension is just packaging of objects:

 https://www.postgresql.org/docs/9.6/static/extend-extensions.html

 I am not sure C/C++ really is a factor.
 Maybe if you offer more detail on what you are trying to achieve.

 > I am currently using ATOM to develop in Ubuntu. But I would like to work
 > with Qt.

 Not seeing how Qt will work in a Postgres extension?
 
I think what he's asking is how to use Qt Creator as a development environment 
for developing PG-extentions, not linking Qt in from an extention.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

2017-06-22 Thread Andreas Joseph Krogh
På torsdag 22. juni 2017 kl. 17:10:40, skrev Achilleas Mantzios <
ach...@matrix.gatewaynet.com <mailto:ach...@matrix.gatewaynet.com>>:
[snip]
What's so better in LO's VS bytea? You do a lot updates on the binary data, 
changing only parts of it?
 
BYTEA really sucks when dealing with large objects and streaming to clients 
(JDBC). The only solution (I've found) which doesn't explode in memory-usage is 
using LOs (OIDs). Note that we're dealing with multi-gigabytes objects, which 
need to be "transactional safe" (hence stored in the DB).
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

2017-06-22 Thread Andreas Joseph Krogh
På torsdag 22. juni 2017 kl. 19:30:49, skrev Andres Freund <and...@anarazel.de 
<mailto:and...@anarazel.de>>:
On 2017-06-22 18:10:40 +0300, Achilleas Mantzios wrote:
 > > Once again having pg_largeobject as a system-catalog prevents LOs
 > > from working smoothly. Neither replication nor having LOs on a
 > > different tablespace (by moving pg_largeobject) works.

 > I think logical decoding was designed for supporting DML SQL commands
 > (i.e. a finite set of commands) and not specific functions (lo_*)
 > which by nature can be arbitrary, infinite and version specific.

 That's not really the reason. The first reason its currently unsupported
 is that LOs are stored in a system catalog, and currently all system
 catalogs are excluded from the change stream.  The second problem is how
 exactly to represent the changes - we can't represent it as the whole LO
 being changed, as that'd increase the volume of WAL and replicated
 writes dramatically.  Thus we need to invent an API that can represent
 creation, deletion, and writes to arbitrary offsets, for output plugins.
 
pg_largeobject being a system catalog is the cause of much pain as I've found 
out. It also prevents moving it to a separate tablespace and maintaining 
pg_upgrade compatibility.
 
The first initiative would (possibly) be coming up with a new model for 
storing LOs, not involving system-catalogs. But, pg_largeobject doesn't seem 
all that magic and appears to contain "logical-decoding compatible" 
column-types:
 
\d pg_largeobject 
  Table "pg_catalog.pg_largeobject"
 ┌┬─┬───┬──┬─┐
 │ Column │  Type   │ Collation │ Nullable │ Default │
 ├┼─┼───┼──┼─┤
 │ loid   │ oid │   │ not null │ │
 │ pageno │ integer │   │ not null │ │
 │ data   │ bytea   │   │ not null │ │
 └┴─┴───┴──┴─┘

 If this was a regular table there would be nothing preventing it from being 
replicated successfully using logical decoding, correct?
 
 
> > I wish PG in some future version will address these quirks so one can 
operate on LOs more smoothly.

 You're welcome to help...
 
Every time issues arise regarding LOs there seems to be little interest to 
improve matters, and if it doesn't itch
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

2017-06-22 Thread Andreas Joseph Krogh
På torsdag 22. juni 2017 kl. 15:25:20, skrev Achilleas Mantzios <
ach...@matrix.gatewaynet.com <mailto:ach...@matrix.gatewaynet.com>>:
On 22/06/2017 13:38, Andreas Joseph Krogh wrote:
På torsdag 22. juni 2017 kl. 11:43:02, skrev Achilleas Mantzios <
ach...@matrix.gatewaynet.com <mailto:ach...@matrix.gatewaynet.com>>:
On 22/06/2017 11:21, Andreas Joseph Krogh wrote:
Hi.
 
1. Why should one prefer built-in logical replication in pg-10 to pglogical, 
does it do anything pglogical doesn't?
It seems pglogical is more feature-rich...
2. As I understand built-in logical replication in pg-10 doesn't support 
large-objects, which we use a lot. Does pglogical replicate large objects? I 
cannot find any notes about large-objects under "Limitations and Restrictions": 
https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/ 
<https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/>
 You may do a simple test, create a table with a largeobject and try to read 
the logical stream, if it cannot represent the lo_import, lo_open, lowrite, 
lo_close (and I 'd bet they can't be encoded) then neither pglogical (being 
based on the same logical decoding technology) will support them
    
The point of email-lists like this is that one may share knowledge so one 
doesn't have to test everything one self, and can build on knowledge from 
others. I'm looking for an answer from someone who's not betting, but knows.
 I gave you enough knowledge already. Here's some more :
 - go and install 10
 - create a table containing one col with type oid (large object) and one bytea
 - follow the simple setup here : 
https://www.postgresql.org/docs/10/static/logicaldecoding-example.html 
<https://www.postgresql.org/docs/10/static/logicaldecoding-example.html>
 - insert a row
 - Do again : SELECT * FROM pg_logical_slot_get_changes('regression_slot', 
NULL, NULL);

 Do you see any of your oid image data in the output? Do you see any of the 
bytea ? (the answer here in 9.5 is  "no"/"yes").
 If in 10.0 is still the case, then you should think about moving to bytea.  
Hm, it turns out it's not quite that simple...
 
Test-case:
 
create table drus(id bigint primary key, lo oid, data bytea);
SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding');
INSERT INTO drus (id, lo, data) values(1, 
lo_import('/tmp/faktura_27.pdf'), decode('AAAEEE', 'hex'));

select * from drus; 
┌┬─┬──┐ 
 │ id │   lo    │   data   │
 ├┼─┼──┤
 │  1 │ 2873269 │ \xaaaeee │
 └┴─┴──┘


SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); 
 
┌┬──┬┐
 │    lsn │ xid  │  data 
 │
 
├┼──┼┤
 │ B/E585B858 │ 9391 │ BEGIN 9391 
│
 │ B/E586BE78 │ 9391 │ table public.drus: INSERT: id[bigint]:1 lo[oid]:2873269 
data[bytea]:'\xaaaeee' │
 │ B/E586BF80 │ 9391 │ COMMIT 9391 
   │
 
└┴──┴┘
 (3 rows)

  
So far so good, the oid-value (2873269) is apparently in the change-set, but...
 
Set up publication:
CREATE PUBLICATION bolle FOR ALL TABLES; 
 CREATE PUBLICATION

 === ON REPLICA ===
 
# create table on replica:
create table drus(id bigint primary key, lo oid, data bytea);
 
# create subscription:
CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5433 user=andreak 
dbname=fisk' PUBLICATION bolle;
NOTICE:  created replication slot "mysub" on publisher 
 CREATE SUBSCRIPTION
2017-06-22 16:38:34.740 CEST [18718] LOG:  logical replication apply worker 
for subscription "mysub" has started
 2017-06-22 16:38:34.747 CEST [18720] LOG:  logical replication table 
synchronization worker for subscription "mysub", table "drus" has started
 2017-06-22 16:38:35.746 CEST [18720] LOG:  logical replication table 
synchronization worker for subscription "mysub", table "drus" has finished

 Looks good:
select * from drus; 
 ┌┬─┬──┐
 │ id │   lo    │   data   │
 ├┼─┼──┤
 │  1 │ 2873269 │ \xaaaeee │
 └┴─┴──┘
 (1 row)

  
...until :
 
SELECT lo_export(drus.lo, '/tmp/faktura.pdf') from drus where id = 1; 
 2017-06-22 16:40:04.967 CEST [18657] ERROR:  large object 2873269 does not 
exist
 2017-06-22 16:40:04.967 CEST [18657] STATEMENT:  SELECT lo_export(drus.lo, 
'/tmp/faktura.pdf') from drus where id = 1;
 ERROR:  large object 2873269 does not exist

 So, the OID-value is replicated but pg_largeobject is empty:
 
select * fro

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

2017-06-22 Thread Andreas Joseph Krogh
På torsdag 22. juni 2017 kl. 11:43:02, skrev Achilleas Mantzios <
ach...@matrix.gatewaynet.com <mailto:ach...@matrix.gatewaynet.com>>:
On 22/06/2017 11:21, Andreas Joseph Krogh wrote:
Hi.
 
1. Why should one prefer built-in logical replication in pg-10 to pglogical, 
does it do anything pglogical doesn't?
It seems pglogical is more feature-rich...
2. As I understand built-in logical replication in pg-10 doesn't support 
large-objects, which we use a lot. Does pglogical replicate large objects? I 
cannot find any notes about large-objects under "Limitations and Restrictions": 
https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/ 
<https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/>
 You may do a simple test, create a table with a largeobject and try to read 
the logical stream, if it cannot represent the lo_import, lo_open, lowrite, 
lo_close (and I 'd bet they can't be encoded) then neither pglogical (being 
based on the same logical decoding technology) will support them
    
The point of email-lists like this is that one may share knowledge so one 
doesn't have to test everything one self, and can build on knowledge from 
others. I'm looking for an answer from someone who's not betting, but knows.
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] pglogical vs. built-in logical replication in pg-10

2017-06-22 Thread Andreas Joseph Krogh
Hi.
 
1. Why should one prefer built-in logical replication in pg-10 to pglogical, 
does it do anything pglogical doesn't?
It seems pglogical is more feature-rich...
2. As I understand built-in logical replication in pg-10 doesn't support 
large-objects, which we use a lot. Does pglogical replicate large objects? I 
cannot find any notes about large-objects under "Limitations and 
Restrictions": 
https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




[GENERAL] OR-clause support for indexes

2017-06-17 Thread Andreas Joseph Krogh
Hi.
 
Back in 9.6 dev-cycle Teodor (from PostgresPro) posted a patch providing 
indexed OR-clauses: https://commitfest.postgresql.org/8/454/
Sadly it didn't make it to 9.6, and wasn't re-submitted for the 10 dev-cycle.
 
@Teodor; Do you have plans to submit a patch for PG-11?
And, have you made any progress making it work for btree?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




[GENERAL] How to include BIGINT-column in RUM-index sorted by timestamp

2017-05-15 Thread Andreas Joseph Krogh
Hi PostgresPro (RUM-creators).
 
With the latest version of RUM: https://github.com/postgrespro/rum
, one is able to store BIGINT as part of the index, it seems. I'm however not 
able to make a query use the index without the Filter-step.
 
This table is roughly like this:
CREATE TABLE email_delivery( received_timestamp timestamp not null, folder_id 
bigint NOT NULL referencesorigo_email_folder(entity_id), fts_all tsvector ); 
 
I have created the index like this:
 
CREATE index rum_idx ON origo_email_delivery using rum (fts_all 
rum_tsvector_addon_ops, folder_id, received_timestamp)with (attach = 
'received_timestamp', to = 'fts_all'); 


The query is:
 
EXPLAIN (COSTS OFF) SELECT del.entity_id, del.received_timestamp, 
del.received_timestamp <=>'3000-01-01' :: TIMESTAMP, del.folder_id FROM 
email_delivery delWHERE del.fts_all @@ to_tsquery('simple', 'andre:*:*') 
ANDdel.folder_id IN (44965, 2470520) ORDER BY del.received_timestamp <=> 
'3000-01-01':: TIMESTAMP LIMIT 10; 
 
which gives the following plan:
 

┌───┐
 │  QUERY PLAN 
  │
 
├───┤
 │ Limit 

│
 │   ->  Index Scan using rum_idx on email_delivery del                       
                   │
 │ Index Cond: (fts_all @@ '''andre'':* & ''jose'':*'::tsquery) 
 │
 │ Order By: (received_timestamp <=> '3000-01-01 00:00:00'::timestamp 
without time zone) │
 │ Filter: (folder_id = ANY ('{44965,2470520}'::bigint[])) 
  │
 
└───┘

  
This indicates that the folder_id column, which is part of the index, isn't 
actually "part of the index", meaning it's not used in the index for some 
reason.
 
Can you tell me how to craft an index so that folder_id = ANY 
('{44965,2470520}'::bigint[]) is part of the Index Cond?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Upgrading postgresql minor version

2017-05-10 Thread Andreas Joseph Krogh
På onsdag 10. mai 2017 kl. 17:13:50, skrev Ron Ben <ronb...@walla.co.il 
<mailto:ronb...@walla.co.il>>:
Not possible 
https://www.postgresql.org/download/linux/debian/
 
To upgrade I do: apt-get install postgresql-9.3
There is no way to "roll back" from here.
I can not choose which version to install, it install the latest version 
packed for debian. Currently its 9.3.16
 
The docuntation says that upgrading minor version is just "replacing the 
executables", so there has to be a way to save them on the side for roll back.
If not, the documntation is insuffecent. We all know that sometimes even the 
smallest change can cause troubles.


 
Well - if you choose apt as you package-manager then this is really not an 
issue for PostgreSQL. What you're really asking is how to make apt install (or 
downgrade) a specific minor-version even if an updated version exists in the 
package-repo. Maybe you'll have more luck asking that other places than a 
PG-mailing-list.

 --
 Andreas Joseph Krogh
 




Re: [GENERAL] Upgrading postgresql minor version

2017-05-10 Thread Andreas Joseph Krogh
På onsdag 10. mai 2017 kl. 16:55:50, skrev Ron Ben <ronb...@walla.co.il 
<mailto:ronb...@walla.co.il>>:
  I think you miss understood me.
pg_dump dumps the data. the tables, functions and the data saved in them.
I have daily backups for this so i'm not worried.
 
What i'm woried about are the "executables files". These files are what is 
actualy being updated when you update the version.
I want to be able to roll them back in case of upgrade failure without rolling 
back the data.
 
 
Basicly what I want is to save on the side all the files that will be change 
by the update and in case of failure just to replace them back.
Is it doable? How?



 
What about just re-installing the previous version the same way you installed 
the "patch-release"-version?

 --
 Andreas Joseph Krogh
 




Re: [GENERAL] RUM-index and support for storing BIGINT as part of a tsvector+timestamp

2017-02-03 Thread Andreas Joseph Krogh
Ping...
 
Any feedback is welcome, thanks.
 
På torsdag 12. januar 2017 kl. 13:13:34, skrev Andreas Joseph Krogh <
andr...@visena.com <mailto:andr...@visena.com>>:
Hi PostgresPro-guys.
 
I've asked this before but didn't get any response, so I'll try again. I know 
you have a lot on you plate...
 
On RUM's TODO-list is this:
 

 * Allow multiple additional information (lexemes positions + timestamp). 
 * Add support for arrays. Will any of these items support storing BIGINT as 
part of a tsvector+timestamp in the same index?
so one an have an index like this:
 
create index rum_idx on message using rum (fts_all some_opclass, folder_id, 
received_timestamp) WITH (attach=received_timestamp, "to"=fts_all, 
order_by_attach=t);
 
Making it possible for a query like this to use one index effectively:

WHERE fts_all @@ to_tsquery('simple', 'andre:*:*') AND folder_id = ANY(
ARRAY[2,3]::BIGINT[]) ORDER BY received_timestamp DESC LIMIT 10;
 
 
The reason I ask is that storing only tsvector+timestamp often is't enough 
because often there is some other dimention which is *very* relevant to limit 
the results (like what folder, or account, to search for email-messages in, 
instead of searching for all in the system).
 
Thanks.

 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>

 
 
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Searching array for multiple items

2017-01-25 Thread Andreas Joseph Krogh
På onsdag 25. januar 2017 kl. 09:47:56, skrev Thomas Kellerer <
spam_ea...@gmx.net <mailto:spam_ea...@gmx.net>>:
Alex Magnum schrieb am 25.01.2017 um 09:29:
 > I can search an array with 1 = ANY('{1,3,4,7}'::int[])
 >
 > I need to check for one or multiple items in the array.
 >
 > e.g.'1,7,3'  = ANY('{1,3,4,7}'::int[]
 >
 > I do need to check if
 > a) all items exist in the array

 You can use the contains (or is contained) operator for that:

    array[1,7,3] <@ array[1,3,4,7] is true

    array[1,7,10] <@ array[1,3,4,7] is false

 > b) at least one item exists in the array

 You can use the "overlaps" operator:

    array[1,7,3] && array[1,3,4,7] returns true

    array[10,11] && array[1,3,4,7] returns false

 > Does the order of left and right side matter?

 For the contains or (is contained) operator the order matters, for the 
overlaps operator it does not.

 For more details see 
https://www.postgresql.org/docs/current/static/functions-array.html

 Thomas
 
Can you elaborate on index-usage? Ie. will the suggested queries above utilize 
idexes (gist?). If so, can you give an example with definition of index 
and explain-plan?
 
It would be interesting to see how this performs vs. contrib/intarray.
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] RUM-index and support for storing BIGINT as part of a tsvector+timestamp

2017-01-12 Thread Andreas Joseph Krogh
Hi PostgresPro-guys.
 
I've asked this before but didn't get any response, so I'll try again. I know 
you have a lot on you plate...
 
On RUM's TODO-list is this:
 

 * Allow multiple additional information (lexemes positions + timestamp). 
 * Add support for arrays. Will any of these items support storing BIGINT as 
part of a tsvector+timestamp in the same index?
so one an have an index like this:
 
create index rum_idx on message using rum (fts_all some_opclass, folder_id, 
received_timestamp) WITH (attach=received_timestamp, "to"=fts_all, 
order_by_attach=t);
 
Making it possible for a query like this to use one index effectively:

WHERE fts_all @@ to_tsquery('simple', 'andre:*:*') AND folder_id = ANY(
ARRAY[2,3]::BIGINT[]) ORDER BY received_timestamp DESC LIMIT 10;
 
 
The reason I ask is that storing only tsvector+timestamp often is't enough 
because often there is some other dimention which is *very* relevant to limit 
the results (like what folder, or account, to search for email-messages in, 
instead of searching for all in the system).
 
Thanks.

 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>



Re: [GENERAL] Question slow query

2017-01-11 Thread Andreas Joseph Krogh
På torsdag 12. januar 2017 kl. 03:15:59, skrev Patrick B <
patrickbake...@gmail.com <mailto:patrickbake...@gmail.com>>:
Hi guys,  
I've got a slow query, running at 25 seconds.
 
 -> Bitmap Heap Scan on ja_notes r_1103088 (cost=234300.55..1254978.62 
rows=553275 width=101) (actual time=1423.411..10572.549 rows=475646 loops=1) 
Recheck Cond: (n_type = ANY ('{note,note_1,note_2}'::n_type[])) Filter: 
(timezone('Etc/UTC'::text, d_modified) >= '2016-08-07 23:12:34'::timestamp 
without time zone) Rows Removed by Filter: 13725231 
 
 
As you can see, 13.725.231 rows were removed by Filter..
 
Should I create an index for this column?
d_modified              | timestamp with time zone | default 
statement_timestamp() I tried but it didn't help... the query doesn't even hit 
the index.

 

CREATE INDEX q ON test USING BTREE (d_modified);

 

Am I missing something? 



 
Yes, you've not shown us:
1. The query
2. The schema
3. Complete EXPLAIN-output
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] join_collapse_limit = 14

2017-01-07 Thread Andreas Joseph Krogh
På lørdag 07. januar 2017 kl. 18:25:42, skrev Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>>:
Andreas Joseph Krogh <andr...@visena.com> writes:
 > På lørdag 07. januar 2017 kl. 17:48:49, skrev Tom Lane <t...@sss.pgh.pa.us
 > <mailto:t...@sss.pgh.pa.us>>:
 >  If you've got just one problem query, it might be worth your time to take
 >  note of the optimal join order (as seen in EXPLAIN when a good plan is
 >  chosen) and rearrange the query so that the syntactic join order matches
 >  that.  Then you should get the same plan even when join_collapse_limit is
 >  small.

 > Will geqo kick if if join_collapse_limit =4 and there are many more joins or
 > will the planner just stop trying to rearrange JOINs after this limit?

 geqo_threshold applies to the size of the join subproblems that exist
 after flattening is done, or not done, according to join_collapse_limit
 and from_collapse_limit: if a subproblem is >= geqo_threshold relations
 then it's planned via geqo, else the normal search.  So if geqo_threshold
 is more than join_collapse_limit then it's impossible for a JOIN nest to
 result in use of the geqo code.

 > In any case; I assume you're suggesting doing something like "SET LOCAL
 > join_collapse_limit=" in the same transaction so this doesn't 
have
 > to be a system-wide setting?

 Right, there's no reason not to set it locally in your session for
 experimental purposes.  I'd just be nervous about turning it up globally
 for production without having checked all your queries ...

 regards, tom lane
 
OK, thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] join_collapse_limit = 14

2017-01-07 Thread Andreas Joseph Krogh
På lørdag 07. januar 2017 kl. 17:48:49, skrev Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>>:
Andreas Joseph Krogh <andr...@visena.com> writes:
 > I wonder; In general, is there any downside of having join_collapse_limit = 
14
 > on modern hardware (32 cores, 64GB RAM), and geqo_threshold=16 ?
 > I'm aware of it increasing planning-time, but is this really an issue in
 > practice?

 It can be.  The number of possible join orders is exponential in the join
 size (at least 2^n, maybe worse, don't recall at the moment).  There are
 heuristics in the planner that usually let it avoid investigating large
 parts of that space; but the only thing that will certainly limit the
 problem is preventing large join subproblems from being considered in the
 first place --- which is what join_collapse_limit/from_collapse_limit are
 for.

 Depending on what your queries are like, you might not ever hit any of the
 bad cases, so it might be fine.  But I'd be nervous about throwing a wide
 mix of complex queries at a planner set with high thresholds.

 If you've got just one problem query, it might be worth your time to take
 note of the optimal join order (as seen in EXPLAIN when a good plan is
 chosen) and rearrange the query so that the syntactic join order matches
 that.  Then you should get the same plan even when join_collapse_limit is
 small.
 
Thanks.
Will geqo kick if if join_collapse_limit =4 and there are many more joins or 
will the planner just stop trying to rearrange JOINs after this limit?
In other words; Is join_collapse_limit=1 the only way or might it work for 
other "small" values as well?
 
In any case; I assume you're suggesting doing something like "SET LOCAL 
join_collapse_limit=" in the same transaction so this doesn't have 
to be a system-wide setting?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] join_collapse_limit = 14

2017-01-07 Thread Andreas Joseph Krogh
Hi all.
 
I have a query which takes forever and the only way I've been able to make it 
perform reasonably well is increasing join_collapse_limit to 14 (12 still 
produced lots of nest-loops).
This way lots of nest-loops (which I think caused the slowness) was made into 
hash-joins and performance was acceptable again.
 
I wonder; In general, is there any downside of having join_collapse_limit = 14 
on modern hardware (32 cores, 64GB RAM), and geqo_threshold=16 ?
I'm aware of it increasing planning-time, but is this really an issue in 
practice?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Er Data Modeller for PostgreSQL

2016-12-23 Thread Andreas Joseph Krogh
På fredag 23. desember 2016 kl. 06:50:54, skrev Günce Kaya <
guncekay...@gmail.com <mailto:guncekay...@gmail.com>>:
Hi All,  
I'm looking for an ER Data Modeller tool for postgresql. I use Navicat Premium 
for postgresql and the tool has a modeller but I would like to display a 
database modeller that belonging to a tables of an schema under a database.
 
If I use Navicat for modeller, I have to drag and drop whole tables which I 
want to add to data modeller. So It's pretty manual process. 
 
Is there any way to display an er modeller that show only chosen schema under 
the database? 
 
Any advice would be appreciated. 
 
Regards,
 
-- 
Gunce Kaya



 
We use IDEA from IntelliJ: https://www.jetbrains.com/idea/
It features a nice DB-tool integrated (the commercial version only I think)
 

Database tools

Take advantage of intelligent coding assistance when editing SQL; connect to 
live databases; run queries; browse and expert data; and even manage your 
schemes in a visual interface–right from the IDE.

learn more <https://www.jetbrains.com/datagrip/features/>
 
 
...which I think is an embeded version of their DataGrip 
application: https://www.jetbrains.com/datagrip/
 

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Replacing an existing unique constraint (not UNIQUE INDEX) with primary key

2016-12-20 Thread Andreas Joseph Krogh
På tirsdag 20. desember 2016 kl. 11:42:56, skrev Achilleas Mantzios <
ach...@matrix.gatewaynet.com <mailto:ach...@matrix.gatewaynet.com>>:
On 20/12/2016 12:27, Andreas Joseph Krogh wrote:
På tirsdag 20. desember 2016 kl. 11:02:27, skrev Achilleas Mantzios <
ach...@matrix.gatewaynet.com <mailto:ach...@matrix.gatewaynet.com>>:
On 20/12/2016 11:43, Andreas Joseph Krogh wrote:
[snip]
 BEGIN;
 ALTER TABLE person ADD CONSTRAINT person_pk PRIMARY KEY (entity_id);
 alter table person drop constraint person_entity_id_key CASCADE;
 alter table phone add CONSTRAINT phone_fk FOREIGN KEY (person_entity_id) 
REFERENCES person(entity_id);
 alter table address add CONSTRAINT address_fk FOREIGN KEY (person_id) 
REFERENCES person(entity_id);
 COMMIT;  
Yea, I was hoping to avoid having to manually add the FK's to the referencing 
tables (34).
Is there really no way to accomplish this without DROP CONSTRAINT ... 
 CASCADE, hacking the system-catalogs or something?
 
 You may write a script to output those 34 FK constraints. Definitely safer 
than hacking pg_constraint.conindid   
Yes.
 
I'd still argue that what I'm trying to do should "just work" as PG treats 
UNIQUE CONSTRAINT and UNIQUE INDEX the same wrt. the planner and FK-enforcement.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Replacing an existing unique constraint (not UNIQUE INDEX) with primary key

2016-12-20 Thread Andreas Joseph Krogh
På tirsdag 20. desember 2016 kl. 11:02:27, skrev Achilleas Mantzios <
ach...@matrix.gatewaynet.com <mailto:ach...@matrix.gatewaynet.com>>:
On 20/12/2016 11:43, Andreas Joseph Krogh wrote:
[snip]
 BEGIN;
 ALTER TABLE person ADD CONSTRAINT person_pk PRIMARY KEY (entity_id);
 alter table person drop constraint person_entity_id_key CASCADE;
 alter table phone add CONSTRAINT phone_fk FOREIGN KEY (person_entity_id) 
REFERENCES person(entity_id);
 alter table address add CONSTRAINT address_fk FOREIGN KEY (person_id) 
REFERENCES person(entity_id);
 COMMIT;  
Yea, I was hoping to avoid having to manually add the FK's to the referencing 
tables (34).
Is there really no way to accomplish this without DROP CONSTRAINT ... 
 CASCADE, hacking the system-catalogs or something?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] Replacing an existing unique constraint (not UNIQUE INDEX) with primary key

2016-12-20 Thread Andreas Joseph Krogh
Hi all.
 
For historical reasons I have a table which at first had an "id"-column (the 
PK) and later got an "entity_id"-column (which is a UNIQUE CONSTRAINT).
 
I'm now trying to get rid of the "id"-column and make the "entity_id"-column 
the new PK. The tricky part is that both of these columns are referenced as 
FK's from many tables, so disabling/removing FKs is not so easy. I'm facing a 
problem when issuing:
ALTER TABLE person ADD PRIMARY KEY USING INDEX person_entity_id_key; 
ERROR:  index "person_entity_id_key" is already associated with a constraint

 
A full example of what I'm trying to do (replacing the PK of the 
"person"-table) is here:

DROP TABLE IF EXISTS phone; DROP TABLE IF EXISTS address; DROP TABLE IF EXISTS 
person;CREATE TABLE person ( id BIGINT PRIMARY KEY, entity_id BIGINT NOT NULL 
UNIQUE, name VARCHAR NOT NULL ); CREATE TABLE address ( id BIGINT PRIMARY KEY, 
person_idBIGINT NOT NULL REFERENCES person (id) ); CREATE TABLE phone ( id 
BIGINT PRIMARY KEY, person_entity_id BIGINT NOT NULL REFERENCES person 
(entity_id), numberVARCHAR NOT NULL ); INSERT INTO person (id, entity_id, name) 
VALUES(1, 101, 'Andreas'), (2, 102, 'Santa'); INSERT INTO address (id, 
person_id)VALUES (1, 1), (2, 2); INSERT INTO phone (id, person_entity_id, 
number)VALUES (1, 101, '1800555123'), (2, 102, '1800555456'); -- Drop the 
deprecated foreign key on addressALTER TABLE address DROP CONSTRAINT 
address_person_id_fkey;-- Update address and make person_id point to 
person.entity_id instead of person.idUPDATE address a SET person_id = 
p.entity_idFROM person p WHERE p.id = a.person_id; ALTER TABLE address ADD 
FOREIGN KEY(person_id) REFERENCES person (entity_id); -- Drop the deprecated 
id-columnALTER TABLE person DROP COLUMN id; -- Try to make new PK using the 
UNIQUE CONSTRAINT person_entity_id_keyALTER TABLE person ADD PRIMARY KEY USING 
INDEXperson_entity_id_key;  ERROR:  index "person_entity_id_key" is already 
associated with a constraint 
  
  
I see that if I had declared person.entity_id without the UNIQUE-keyword and 
instead created a UNIQUE INDEX:
create UNIQUE INDEX person_entity_id_key on person(entity_id);  
Then the ADD PRIMARY KEY USING INDEX command would have succeeded.
 
I have lots of queries which have GROUP BY person.id which now should use 
GROUP BY person.entity_id, and not having to also list all other columns 
selected from the person-table.
 
How do I proceed with this?

 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Index size

2016-12-04 Thread Andreas Joseph Krogh
På søndag 04. desember 2016 kl. 16:51:54, skrev Melvin Davidson <
melvin6...@gmail.com <mailto:melvin6...@gmail.com>>:

 On Sun, Dec 4, 2016 at 4:43 AM, Samuel Williams <
space.ship.travel...@gmail.com <mailto:space.ship.travel...@gmail.com>> wrote: 
Melvin, of course there are differences. However, I suspect there are at least 
SOME tangible differences which can be identified.  

  On 4 December 2016 at 15:53, Melvin Davidson <melvin6...@gmail.com 
<mailto:melvin6...@gmail.com>> wrote:     On Sat, Dec 3, 2016 at 9:32 PM, Steve 
Atkins<st...@blighty.com <mailto:st...@blighty.com>> wrote: 
> On Dec 3, 2016, at 3:57 PM, Samuel Williams <space.ship.travel...@gmail.com 
<mailto:space.ship.travel...@gmail.com>> wrote:
 >
 > Thanks everyone for your feedback so far. I've done a bit more digging:
 >
 > MySQL in MBytes (about 350 million rows):
 >
 > index_user_event_on_what_category_id_created_at_latlng | 22806.00
 > index_user_event_for_reporting | 18211.00
 > index_user_event_on_created_at | 9519.00
 > index_user_event_on_user_id | 6884.00
 > index_user_event_on_poi_id | 4891.00
 > index_user_event_on_deal_id | 3979.00
 >
 > Postgres (about 250 million rows):
 >
 > index_user_event_on_what_category_id_created_at_latlng | 25 GB
 > index_user_event_for_reporting | 19 GB
 > index_user_event_on_created_at | 7445 MB
 > index_user_event_on_user_id | 7274 MB
 > index_user_event_on_deal_id | 7132 MB
 > index_user_event_on_poi_id | 7099 MB
 >
 > So, the index is a bit bigger, plus there is also the PKEY index which
 > increases disk usage by another whole index. Keep in mind in the
 > above, MySQL has about 40% more data.
 >
 > With some indexes, it looks like MySQL might not be adding all data to
 > the index (e.g. ignoring NULL values). Does MySQL ignore null values
 > in an index? Can we get the same behaviour in Postgres to minimise
 > usage? What would be the recommendation here?

 It's unlikely anyone will be able to usefully answer the questions you
 should be asking without seeing the schema and index definitions,
 and maybe some clues about how you're querying the data.

 Cheers,
   Steve 
--
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org 
<mailto:pgsql-general@postgresql.org>)
 To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general 
<http://www.postgresql.org/mailpref/pgsql-general>




You are comparing apples to oranges. MySQL and PostgreSQL engines are 
different by design, so likewise
 the size of the indexes will be different.
 You may as well ask why a 2015 Porsche 911 Carrera 6 cyl, 3.4 L, Auto(AM-S7) 
gets 22 MPG,
 but the 2016 Jaguar XF MPG gets 24 mpg.

-- Melvin Davidson
I reserve the right to fantasize.  Whether or not you 
wish to share my fantasy is entirely up to you. 






   Samuel, 
Please note that the accepted convention in this forum is to bottom post. 
Please do not top post.



 
The accepted convention in this forum is to not have purple signature:-)
 
-- Andreas Joseph Krogh


 


Re: [GENERAL] Full text search randomly not working for short prefixes?

2016-12-02 Thread Andreas Joseph Krogh
På fredag 02. desember 2016 kl. 16:33:12, skrev Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>>:
cen <imba...@gmail.com> writes:
 > Something funny going on with my full text search.. and I have no idea what.

 The way to debug this sort of thing is generally to look at what tsquery
 you're actually getting.  I get

 regression=# select to_tsquery(unaccent('a:*'));
 NOTICE:  text-search query contains only stop words or doesn't contain 
lexemes, ignored
  to_tsquery
 
  
 (1 row)

 regression=# select to_tsquery(unaccent('an:*'));
 NOTICE:  text-search query contains only stop words or doesn't contain 
lexemes, ignored
  to_tsquery
 
  
 (1 row)

 regression=# select to_tsquery(unaccent('ana:*'));
  to_tsquery
 
  'ana':*
 (1 row)

 Of course, only the last is going to match 'ana'.

 So you need to use a text search configuration in which a/an are
 not stop words.  Or possibly you could cast the unaccent result
 directly to tsquery rather than passing it through to_tsquery(),
 though likely that would just have a different set of failure modes
 with queries where you do wish stemming would occur.

 The problem with "no" seems to be the same.
 
One can always specify 'simple' as the config, eliminating any "stop-wprd 
smartness":
 
andreak=> select to_tsquery('simple', 'a:*');  
 to_tsquery  
 ----
 'a':*
 (1 row)

  
-- Andreas Joseph Krogh




Re: [GENERAL] Incorrect information about GIN-index in RUM's docs

2016-11-15 Thread Andreas Joseph Krogh
På tirsdag 15. november 2016 kl. 08:48:37, skrev Oleg Bartunov <
obartu...@gmail.com <mailto:obartu...@gmail.com>>:
    On Tue, Nov 15, 2016 at 10:37 AM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: This item isn't valid, is it?
 

 * There isn't phrase search with GIN index. This problem relates with 
previous problem. It is need position information to perform phrase search. 
RUM being an extention, and having index-access-methods as extentions is new 
in 9.6, where phrase-search was introduced, this isn't really true, or am I 
missing something?
 
 
RUM is very good for phrase search, since it has access to coordinates in 
index, check slides from PGconf.eu (
http://www.sai.msu.su/~megera/postgres/talks/pgconfeu-fts-2016.pdf 
<http://www.sai.msu.su/~megera/postgres/talks/pgconfeu-fts-2016.pdf>).  Where 
did you find the citation ?



 
https://github.com/postgrespro/rum#user-content-introduction
 
I assume I read the item wrong tho. As it's phrased, it (for me at least but 
I'm Norwegian) seems like phrase-search is not possible with GIN, which it 
definitely is. So I assume the real issue, that this item is trying to point 
out, is that phrase-search isn't reallyoptimized with GIN, just syntactic 
sugar. And with RUM it is optimized, not just syntactic sugar. Am I correct 
assuming this?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] Incorrect information about GIN-index in RUM's docs

2016-11-14 Thread Andreas Joseph Krogh
This item isn't valid, is it?
 

 * There isn't phrase search with GIN index. This problem relates with 
previous problem. It is need position information to perform phrase search. 
RUM being an extention, and having index-access-methods as extentions is new 
in 9.6, where phrase-search was introduced, this isn't really true, or am I 
missing something?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




[GENERAL] Question about RUM's TODO-list

2016-11-14 Thread Andreas Joseph Krogh
Hi.
 
The following TODO-items are on RUM's TODO-list:
 

 * Allow multiple additional information (lexemes positions + timestamp). 
 * Add support for arrays. 
 
Will any of these items support storing BIGINT as part of a tsvector+timestamp?
so one an have an index like this:
create index rum_idx on message using rum (fts_all some_opclass, folder_id, 
received_timestamp)WITH (attach=received_timestamp, "to"=fts_all, 
order_by_attach=t);
 
Making it possible for a query like this to use one index:
WHERE fts_all @@ to_tsquery('simple', 'andre:*:*') AND folder_id = ANY(
ARRAY[2,3]::BIGINT[]) ORDER BY received_timestamp DESC LIMIT 10;
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] which work memory parameter is used for what?

2016-11-09 Thread Andreas Joseph Krogh
På onsdag 09. november 2016 kl. 15:54:13, skrev Adrian Klaver <
adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>:
On 11/08/2016 06:45 PM, Andreas Joseph Krogh wrote:
 > På onsdag 09. november 2016 kl. 03:02:54, skrev John R Pierce
 > <pie...@hogranch.com <mailto:pie...@hogranch.com>>:
 >
 >     On 11/8/2016 2:34 PM, Hector Yuen wrote:
 >     > I am confused on which are the parameters for different queries. I am
 >     > trying to run VACUUM on a big table, and it is easier for me to set
 >     > the work memory for the specific session instead of tuning it in
 >     > postgresql.conf.
 >     >
 >     > I noticed that if I do:
 >     >
 >     > set work_mem='1GB';
 >     >
 >     > it doesn't help VACUUM, I have to do:
 >     >
 >     > set maintenance_work_mem='1GB';
 >     >
 >     > to accelerate the operation. I could notice that by running VACUUM
 >     > VERBOSE and see that the table was scanned less times an the operation
 >     > finished a lot faster.
 >     >
 >     > My question is, for which operations does work_mem matter and for
 >     > which ones does maintenance_work_mem do? I am specially interested in
 >     > operations like ANALYZE and VACUUM, I believe ANALYZE depends on
 >     > work_mem and VACUUM on maintenance_work_mem.
 >     >
 >     > Can you confirm my understanding?
 >     >
 >
 >    
 https://www.postgresql.org/docs/current/static/runtime-config-resource.html
 >
 >
 >     maintenance_work_mem is used by vacuum and create index operations
 >     (including implicit index creation such as add foreign key).
 >
 > 
 > There is no such thing in PG.

 Can you be more specific as;

 
https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM
 
"implicit index creation such as add foreign key"
No implicit index will be created.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] which work memory parameter is used for what?

2016-11-08 Thread Andreas Joseph Krogh
På onsdag 09. november 2016 kl. 03:02:54, skrev John R Pierce <
pie...@hogranch.com <mailto:pie...@hogranch.com>>:
On 11/8/2016 2:34 PM, Hector Yuen wrote:
 > I am confused on which are the parameters for different queries. I am
 > trying to run VACUUM on a big table, and it is easier for me to set
 > the work memory for the specific session instead of tuning it in
 > postgresql.conf.
 >
 > I noticed that if I do:
 >
 > set work_mem='1GB';
 >
 > it doesn't help VACUUM, I have to do:
 >
 > set maintenance_work_mem='1GB';
 >
 > to accelerate the operation. I could notice that by running VACUUM
 > VERBOSE and see that the table was scanned less times an the operation
 > finished a lot faster.
 >
 > My question is, for which operations does work_mem matter and for
 > which ones does maintenance_work_mem do? I am specially interested in
 > operations like ANALYZE and VACUUM, I believe ANALYZE depends on
 > work_mem and VACUUM on maintenance_work_mem.
 >
 > Can you confirm my understanding?
 >

 https://www.postgresql.org/docs/current/static/runtime-config-resource.html


 maintenance_work_mem is used by vacuum and create index operations
 (including implicit index creation such as add foreign key).
 
There is no such thing in PG.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-11-07 Thread Andreas Joseph Krogh
På mandag 07. november 2016 kl. 22:01:41, skrev Guillaume Lelarge <
guilla...@lelarge.info <mailto:guilla...@lelarge.info>>:
2016-11-07 7:06 GMT+01:00 amul sul <sula...@gmail.com 
<mailto:sula...@gmail.com>>: On Mon, Nov 7, 2016 at 2:03 AM, Guillaume Lelarge
 <guilla...@lelarge.info <mailto:guilla...@lelarge.info>> wrote:
 >>
 >> Agreed. I was afraid of that, but for some reason, didn't find that. I'll
 >> fix this.
 >
 >
 > Fixed in v4.
 >

 This fix is broken.

  70  -   if (dopt.include_everything && !dopt.schemaOnly)
  71 +   if (dopt.include_everything && !dopt.schemaOnly && !dopt.outputBlobs)
  72         dopt.outputBlobs = true;

 dopt.outputBlobs set to FALSE when option -B is specified and this IF
 condition reverts to TRUE which force to dump blobs.
    
I don't see what you mean. It forces dump of Blobs if we didn't use -B and if 
we include everything in the dump, which seems good to me. What did you try 
that didn't work as expected?



 
I guess what he means is that if -B is given, the following code 
sets dopt.outputBlobs = false
 
+            case 'B':            /* Don't dump blobs */
 +                dopt.outputBlobs = false;
 +                break;
  
 
Then this IF sets it back to TRUE:
 
+    if (dopt.include_everything && !dopt.schemaOnly && !dopt.outputBlobs)
          dopt.outputBlobs = true;
 
 
...making it impossible to turn off dumping of blobs.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>




[GENERAL] Parallel pg_dump/restore and LOs

2016-11-06 Thread Andreas Joseph Krogh
Hi all.
 
When dumping large-objects it doesn't appear that it's done in parallel, 
meaning only one thread for dumping all LOs. Is this the case?
When pg_dump comes to dumping large-objects it outputs (when -v is specified):
pg_dump: saving large objects
 Then it seems only 1 CPU is working.
 
Are there some plans to make dumping large tables (typically LOs) benefit from 
parallelism?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-10-23 Thread Andreas Joseph Krogh
På søndag 23. oktober 2016 kl. 20:44:34, skrev Guillaume Lelarge <
guilla...@lelarge.info <mailto:guilla...@lelarge.info>>:
[snip]
Remember that, if it gets commited, it'll be for next release (aka 10), and 
not 9.6 and earlier.



 
The patch working for 9.6 is fine for me, but getting this in master is of 
course the best.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-10-23 Thread Andreas Joseph Krogh
På søndag 23. oktober 2016 kl. 19:15:17, skrev Andreas Joseph Krogh <
andr...@visena.com <mailto:andr...@visena.com>>:
På søndag 23. oktober 2016 kl. 17:06:57, skrev Guillaume Lelarge <
guilla...@lelarge.info <mailto:guilla...@lelarge.info>>:
2016-03-08 21:06 GMT+01:00 Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>>: På tirsdag 08. mars 2016 kl. 21:03:01, skrev 
David G. Johnston <david.g.johns...@gmail.com 
<mailto:david.g.johns...@gmail.com>>:
On Tue, Mar 8, 2016 at 9:45 AM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote:
På tirsdag 08. mars 2016 kl. 17:38:04, skrev Joshua D. Drake <
j...@commandprompt.com <mailto:j...@commandprompt.com>>:
On 03/08/2016 08:02 AM, Andreas Joseph Krogh wrote:
 > På tirsdag 08. mars 2016 kl. 16:57:01, skrev Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>
 > <mailto:t...@sss.pgh.pa.us <mailto:t...@sss.pgh.pa.us>>>:
 >
 >     Andreas Joseph Krogh <andr...@visena.com <mailto:andr...@visena.com>> 
writes:
 >      > What I'm looking for is "inverse -b" in an otherwise complete
 >     dump. Any plans
 >      > to add that?
 >
 >     [ shrug... ]  Nobody ever asked for it before.
 >
 >     regards, tom lane
 >
 > It surely helps testing production-datasets which contain lots of BLOBs
 > where one wants to dump the production-data into a test-env. We have
 >  >1TB databases containing > 95% blobs so it would help us tremendously
 > to have this option.

 I have quite a few customers that would benefit from the ability to not
 have blobs present in dumps.
 
Great! So how do we proceed to get "--no-blobs" added to pg_dump?
Maybe CommandPrompt and Visena should co-fund development of such an addition, 
if it's accepted by -hackers?
We'd be willing to pay for such an addition for the 9.5 branch, as a patch.
 
​Unfortunately this doesn't qualify as a bug fix - it is a new feature and 
thus is ineligible for inclusion in official 9.5

 

David J.




 


Of course. That's why I mentioned that, if possible, an unofficial patch to 
9.5 could be developed, funded partly by Visena (my company). Given that 
someone is willing to do this of course.
 


 
That probably should look like the patch attached. It applies cleanly on HEAD, 
and works AFAICT. If this patch seems interesting enough, I'll add it to the 
next commit fest (note for myself, update the ref/pg_dump.sgml documentation 
file).

 For Andreas' information, it also applies on 9.5, though I didn't check if it 
worked afterwards.



 
+1 for adding it to the commitfest.
 
It's almost scary how simple this patch is and noone ever got around to 
implement it.
 
Thanks, I'll test it on 9.5 soon.
 
It's totally OK for me to use 9.6 (now that it's released) to dump 9.5 DBs, so 
I'm all good with this patch, thanks!
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-10-23 Thread Andreas Joseph Krogh
På søndag 23. oktober 2016 kl. 17:06:57, skrev Guillaume Lelarge <
guilla...@lelarge.info <mailto:guilla...@lelarge.info>>:
2016-03-08 21:06 GMT+01:00 Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>>: På tirsdag 08. mars 2016 kl. 21:03:01, skrev 
David G. Johnston <david.g.johns...@gmail.com 
<mailto:david.g.johns...@gmail.com>>:
On Tue, Mar 8, 2016 at 9:45 AM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote:
På tirsdag 08. mars 2016 kl. 17:38:04, skrev Joshua D. Drake <
j...@commandprompt.com <mailto:j...@commandprompt.com>>:
On 03/08/2016 08:02 AM, Andreas Joseph Krogh wrote:
 > På tirsdag 08. mars 2016 kl. 16:57:01, skrev Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>
 > <mailto:t...@sss.pgh.pa.us <mailto:t...@sss.pgh.pa.us>>>:
 >
 >     Andreas Joseph Krogh <andr...@visena.com <mailto:andr...@visena.com>> 
writes:
 >      > What I'm looking for is "inverse -b" in an otherwise complete
 >     dump. Any plans
 >      > to add that?
 >
 >     [ shrug... ]  Nobody ever asked for it before.
 >
 >     regards, tom lane
 >
 > It surely helps testing production-datasets which contain lots of BLOBs
 > where one wants to dump the production-data into a test-env. We have
 >  >1TB databases containing > 95% blobs so it would help us tremendously
 > to have this option.

 I have quite a few customers that would benefit from the ability to not
 have blobs present in dumps.
 
Great! So how do we proceed to get "--no-blobs" added to pg_dump?
Maybe CommandPrompt and Visena should co-fund development of such an addition, 
if it's accepted by -hackers?
We'd be willing to pay for such an addition for the 9.5 branch, as a patch.
 
​Unfortunately this doesn't qualify as a bug fix - it is a new feature and 
thus is ineligible for inclusion in official 9.5

 

David J.




 


Of course. That's why I mentioned that, if possible, an unofficial patch to 
9.5 could be developed, funded partly by Visena (my company). Given that 
someone is willing to do this of course.
 


 
That probably should look like the patch attached. It applies cleanly on HEAD, 
and works AFAICT. If this patch seems interesting enough, I'll add it to the 
next commit fest (note for myself, update the ref/pg_dump.sgml documentation 
file).

 For Andreas' information, it also applies on 9.5, though I didn't check if it 
worked afterwards.



 
+1 for adding it to the commitfest.
 
It's almost scary how simple this patch is and noone ever got around to 
implement it.
 
Thanks, I'll test it on 9.5 soon.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] pg_upgrade not able to cope with pg_largeobject being in a different tablespace

2016-10-18 Thread Andreas Joseph Krogh
På tirsdag 18. oktober 2016 kl. 12:39:03, skrev Magnus Hagander <
mag...@hagander.net <mailto:mag...@hagander.net>>:
    On Thu, Oct 13, 2016 at 7:35 AM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: På torsdag 13. oktober 2016 kl. 16:09:34, 
skrev Bruce Momjian <br...@momjian.us <mailto:br...@momjian.us>>:
On Thu, Oct 13, 2016 at 10:14:08AM +0200, Andreas Joseph Krogh wrote:
 > I would assume that having pg_largeobject in a separate tablespace is more 
and
 > more common these days, having real-cheap SAN vs. fast-SSD for normal 
tables/
 > indexes/wal.

 So common that no one has ever asked for this feature before?
 
 
Sometimes one gets the feeling that one is the only one in the universe doing 
something one considers "quite common":-)
 
> So - I'm wondering if we can fund development of pg_upgrade to cope with this
 > configuration or somehow motivate to getting this issue fixed?
 >  
 > Would any of the PG-companies (2ndQ, EDB, PgPro) take a stab at this?
 >  
 > Any feedback welcome, thanks.

 You would need to get buy-in that that community wants the relocation of
 pg_largeobject to be supported via an SQL command, and at that point
 pg_upgrade would be modified to support that.  It is unlikely pg_upgrade
 is going to be modified to support something that isn't supported at the
 SQL level.  Of course, you can create a custom version of pg_upgrade to
 do that.
 
Doesn't "ALTER TABLE pg_largeobject SET TABLESPACE myspace_lo" count as being 
"at the SQL-level"?
 
Well, it requires that you set allow_system_table_mods on, which is documented 
as a developer option. It's documented with things like "The following 
parameters are intended for work on the PostgreSQL source code, and in some 
cases to assist with recovery of severely damaged databases. There should be no 
reason to use them on a production database.".

 Perhaps we should add another disclaimer there saying that if you make 
changes in this mode, tools like pg_upgrade (or for that matter, pg_dump or 
pretty much anything at all) may not work anymore?
 
 
The whole problem seems to come from the fact that BLOBs are stored in 
pg_largeobject which for some reason is implemented as a system-catalogue in 
PG, which imposes all kinds of weird problems, from a DBA-perspective.
 
Yes, there are several issues related to how lo style large objects work. I've 
often gone to similar implementations but in userspace on top of custom tables 
to work around those.
 
 
Can we pay you at EDB for making such a custom version of pg_upgrade for 9.6?
 
 
You're assuming pg_upgrade is the only potential problem. If you are willing 
to spend towards it, it would probably be better to spend towards the "upper 
layer" problem which would be to make it possible to move pg_largeobject to a 
different tablespace *without* turning on system_table_mods.
 
That said, I cannot comment to the complexity of either doing that *or* doing 
a custom pg_upgrade that would support it. But solving a long-term problem 
seems better than solving a one-off one.



 
I totally agree that investing in a long-term solution is the best. However, I 
need (would like very much) to upgrade a 9.5 cluster to 9.6 and would rather 
not wait for a solution to land in 10.x.
 
IIRC there was a discussion on -hackers not too long ago about pg_largeobject 
and releasing it from being a "system catalogue", but i think it stranded and 
got nowhere.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>

 


Re: [GENERAL] pg_upgrade not able to cope with pg_largeobject being in a different tablespace

2016-10-13 Thread Andreas Joseph Krogh
På torsdag 13. oktober 2016 kl. 16:09:34, skrev Bruce Momjian <br...@momjian.us 
<mailto:br...@momjian.us>>:
On Thu, Oct 13, 2016 at 10:14:08AM +0200, Andreas Joseph Krogh wrote:
 > I would assume that having pg_largeobject in a separate tablespace is more 
and
 > more common these days, having real-cheap SAN vs. fast-SSD for normal 
tables/
 > indexes/wal.

 So common that no one has ever asked for this feature before?
 
 
Sometimes one gets the feeling that one is the only one in the universe doing 
something one considers "quite common":-)
 
> So - I'm wondering if we can fund development of pg_upgrade to cope with this
 > configuration or somehow motivate to getting this issue fixed?
 >  
 > Would any of the PG-companies (2ndQ, EDB, PgPro) take a stab at this?
 >  
 > Any feedback welcome, thanks.

 You would need to get buy-in that that community wants the relocation of
 pg_largeobject to be supported via an SQL command, and at that point
 pg_upgrade would be modified to support that.  It is unlikely pg_upgrade
 is going to be modified to support something that isn't supported at the
 SQL level.  Of course, you can create a custom version of pg_upgrade to
 do that.
 
Doesn't "ALTER TABLE pg_largeobject SET TABLESPACE myspace_lo" count as being 
"at the SQL-level"?
 
The whole problem seems to come from the fact that BLOBs are stored in 
pg_largeobject which for some reason is implemented as a system-catalogue in 
PG, which imposes all kinds of weird problems, from a DBA-perspective.
 
Can we pay you at EDB for making such a custom version of pg_upgrade for 9.6?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] pg_upgrade not able to cope with pg_largeobject being in a different tablespace

2016-10-13 Thread Andreas Joseph Krogh
Hi all.
 
(I tried raising this on -hackers but got "this is not supported"-answer, 
which is quite dissatisfactory..)
 
I have an issue with pg_upgrade upgrading 9.5 to 9.6. (my system is 
Ubuntu-16.04 and packages from http://apt.postgresql.org/) 
<http://apt.postgresql.org/)>
 
In short pg_upgrade fails with:
 
Linking user relation files
 No match found in new cluster for old relation with OID 16388 in database 
"andreak": "pg_toast.pg_toast_2613" which is the TOAST table for 
"pg_catalog.pg_largeobject"
 No match found in new cluster for old relation with OID 16390 in database 
"andreak": "pg_toast.pg_toast_2613_index" which is an index on 
"pg_toast.pg_toast_2613" which is the TOAST table for 
"pg_catalog.pg_largeobject"
Failed to match up old and new tables in database "andreak"
 Failure, exiting
 
I issued the following command:
PG_NEW_VERSION=9.6
 PG_OLD_VERSION=9.5
 /usr/lib/postgresql/$PG_NEW_VERSION/bin/pg_upgrade \
           --old-bindir=/usr/lib/postgresql/$PG_OLD_VERSION/bin/ \
           --new-bindir=/usr/lib/postgresql/$PG_NEW_VERSION/bin/ \
           --old-datadir=/var/lib/postgresql/$PG_OLD_VERSION/main \
           --new-datadir=/var/lib/postgresql/$PG_NEW_VERSION/main \
           -o " -c 
config_file=/etc/postgresql/$PG_OLD_VERSION/main/postgresql.conf" \
           -O " -c 
config_file=/etc/postgresql/$PG_NEW_VERSION/main/postgresql.conf" \
           --link

 
 
My database and tablespaces are created like this:
 I have tablespaces configured outside PGDATA:
mkdir /var/lib/postgresql/9.5/tablespaces/andreak
 mkdir /var/lib/postgresql/9.5/tablespaces_lo/andreak
psql -c "create tablespace andreak OWNER andreak location 
'/var/lib/postgresql/9.5/tablespaces/andreak'" postgres;
 psql -c "create tablespace andreak_lo OWNER andreak location 
'/var/lib/postgresql/9.5/tablespaces_lo/andreak'" postgres;
 createdb --tablespace=andreak -O andreak andreak


psql -U postgres -c "alter table pg_largeobject set tablespace andreak_lo" -d 
andreak


(I've set allow_system_table_mods=on in postgresql.conf)
 
These are symlinked:
ln -s /storage/wal/9.5/pg_xlog /var/lib/postgresql/9.5/main/pg_xlog
 ln -s /storage/fast_ssd/9.5/tablespaces /var/lib/postgresql/9.5/tablespaces
 ln -s /storage/archive_disk/9.5/tablespaces_lo 
/var/lib/postgresql/9.5/tablespaces_lo

 
I would assume that having pg_largeobject in a separate tablespace is more and 
more common these days, having real-cheap SAN vs. fast-SSD for normal 
tables/indexes/wal.
 
So - I'm wondering if we can fund development of pg_upgrade to cope with this 
configuration or somehow motivate to getting this issue fixed?
 
Would any of the PG-companies (2ndQ, EDB, PgPro) take a stab at this?
 
Any feedback welcome, thanks.
 

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-25 Thread Andreas Joseph Krogh
På torsdag 25. august 2016 kl. 18:12:34, skrev Oleg Bartunov <
obartu...@gmail.com <mailto:obartu...@gmail.com>>:
Andreas,
  
sorry for delay,
it looks like a bug to me, could you please, share your dataset with me, so I 
could reproduce the behaviour.

 
 
I'll send you a Google Drive link on your email
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] ON CONFLICT does not support deferrable unique constraints

2016-08-24 Thread Andreas Joseph Krogh
På onsdag 24. august 2016 kl. 18:36:16, skrev Francisco Olarte <
fola...@peoplecall.com <mailto:fola...@peoplecall.com>>:
On Wed, Aug 24, 2016 at 6:26 PM, Scott Marlowe <scott.marl...@gmail.com> wrote:
 > On Wed, Aug 24, 2016 at 6:05 AM, Andreas Joseph Krogh
 >> Are there any plans to lift this restriction?
 > I'm trying to figure out a method for making this work in my head.
 > These two things seem kind of opposed to each other.

 He probably wants to just treat is as non-deferrable just during the
 upsert. I do not know if he has thought this opens a can of worms (
 like, the constraint may be already broken due to precious DML ).

 Francisco Olarte.
 
As a developer I want it to "just work", if there's an error of any kind then 
abort the transaction, just as it was non-deferrable.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] ON CONFLICT does not support deferrable unique constraints

2016-08-24 Thread Andreas Joseph Krogh
Hope some -hackers read this...
 
Are there any plans to lift this restriction?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Postgres Pain Points 2 ruby / node language drivers

2016-08-12 Thread Andreas Joseph Krogh
På fredag 12. august 2016 kl. 11:24:16, skrev Daevor The Devoted <
doll...@gmail.com <mailto:doll...@gmail.com>>:
  [snip] If you don't like your domain-model to be very close to your physical 
DB-model, there's nothing preventing you from having a persistence-model, using 
the ORM, and map that to/from your domain-model. However, I don't see any of 
these challenges getting easier by throwing the ORM out and having the 
developers handling everything themselves using SQL directly.
 
 



Could you elaborate on this method? Or direct me to docs describing it in more 
detail? I'm quite interested in this approach, but would like to see what 
amount of overhead (development and maintenance) this might result in. 



 
I'm not in favor of this method myself as, IMO, it adds an unnecessary 
mapping-layer which just slows down development. But it might make 
sense depending on how much control you have on your DB, domain-model, code, 
developers etc.
 
I don't have any pointers describing this separation tho, try asking Mr. 
Google, that's what I'd do.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Postgres Pain Points 2 ruby / node language drivers

2016-08-12 Thread Andreas Joseph Krogh
På fredag 12. august 2016 kl. 11:07:08, skrev Chris Travers <
chris.trav...@gmail.com <mailto:chris.trav...@gmail.com>>:
[snip]
My preference is stored procedures plus service locators, to be honest.  It 
enables a degree of loose coupling and even dynamic discovery that ORMs are 
generally not well suited to.
 
But I think the difference may be bigger.  ORMs make sense when you want a 
database for your application.  They break down badly when you want an 
application for your database.  Usually I tend to want the latter.
 



 
I'm in the first group, and I think many are when evaluating ORMs or having an 
opinion about them. Being in the latter, I more understand you concerns.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Postgres Pain Points 2 ruby / node language drivers

2016-08-12 Thread Andreas Joseph Krogh
På fredag 12. august 2016 kl. 10:33:19, skrev Chris Travers <
chris.trav...@gmail.com <mailto:chris.trav...@gmail.com>>:
    On Fri, Aug 12, 2016 at 9:59 AM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: På fredag 12. august 2016 kl. 05:27:42, 
skrev Chris Travers <chris.trav...@gmail.com <mailto:chris.trav...@gmail.com>>:
    On Thu, Aug 11, 2016 at 10:20 PM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: På torsdag 11. august 2016 kl. 19:13:08, 
skrev support-tiger <supp...@tigernassau.com <mailto:supp...@tigernassau.com>>:
 


I cannot not comment on this. Saying that ORM seems dumb, and working with PG 
using ORM does not fly, is a very good recipe for not being taken seriously.
 
And yet everyone I have talked to understands that ORMs are pretty problematic 
as a concept.  They make some things easy but they have some pretty massive 
downsides.  ORMs, it is true, do solve some problems, but they usually create 
many more in the process.  The reason is that as much as relations look like 
collections of objects, they are best organized along very different 
principles.  While we break down our tables based on functional dependencies 
between data values, we break down our object models based on how we can 
encapsulate state changes behind consistent interfaces.  The latter is 
dependent on use, while the former far less so.
  
Of course you *can* use them well.  I remember talking about this with one 
author or a major ORM and he said that on thing he often does is create views 
with triggers and then use the ORM against those.  This solves the problem 
above very well.  But it still leaves the fact that the database and the 
application have to share an implicit understanding of an object model and 
keeping that in sync as the project grows can be troublesome.




 
I don't understand why people bashing ORMs seem to think that once you have an 
ORM in you project you have to use it for everything. Of course, the ORM 
doesn't free you from using SQL directly where appropriate. IMO ORMs are best 
using for CRUD, but not for reporting or batch-processing. In a large project 
you have both, so combining is, IMO, the best.
 
The problems I mention above occur when doing CRUD via an ORM (at least all 
ORMs I have worked with).  The fundamental problem is that ORMs make for bad 
data management in most cases because *either* you structure your database 
table structures around your object model of your application *or* you add 
complexity of a logical level structured for your application.
  
But either way, the database has to have intimate familiarity with the 
internals of the applications using it.  And there isn't an easy way around 
this.



 
If you don't like your domain-model to be very close to your physical 
DB-model, there's nothing preventing you from having a persistence-model, using 
the ORM, and map that to/from your domain-model. However, I don't see any of 
these challenges getting easier by throwing the ORM out and having the 
developers handling everything themselves using SQL directly.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Postgres Pain Points 2 ruby / node language drivers

2016-08-12 Thread Andreas Joseph Krogh
På fredag 12. august 2016 kl. 05:27:42, skrev Chris Travers <
chris.trav...@gmail.com <mailto:chris.trav...@gmail.com>>:
    On Thu, Aug 11, 2016 at 10:20 PM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: På torsdag 11. august 2016 kl. 19:13:08, 
skrev support-tiger <supp...@tigernassau.com <mailto:supp...@tigernassau.com>>:
 


I cannot not comment on this. Saying that ORM seems dumb, and working with PG 
using ORM does not fly, is a very good recipe for not being taken seriously.
 
And yet everyone I have talked to understands that ORMs are pretty problematic 
as a concept.  They make some things easy but they have some pretty massive 
downsides.  ORMs, it is true, do solve some problems, but they usually create 
many more in the process.  The reason is that as much as relations look like 
collections of objects, they are best organized along very different 
principles.  While we break down our tables based on functional dependencies 
between data values, we break down our object models based on how we can 
encapsulate state changes behind consistent interfaces.  The latter is 
dependent on use, while the former far less so.
  
Of course you *can* use them well.  I remember talking about this with one 
author or a major ORM and he said that on thing he often does is create views 
with triggers and then use the ORM against those.  This solves the problem 
above very well.  But it still leaves the fact that the database and the 
application have to share an implicit understanding of an object model and 
keeping that in sync as the project grows can be troublesome.




 
I don't understand why people bashing ORMs seem to think that once you have an 
ORM in you project you have to use it for everything. Of course, the ORM 
doesn't free you from using SQL directly where appropriate. IMO ORMs are best 
using for CRUD, but not for reporting or batch-processing. In a large project 
you have both, so combining is, IMO, the best.
 
But once you have a non-trivial project, the promise that ORMs are often sold 
on ('you don't have to know SQL') evaporates and you find that you have to know 
SQL and the ORM well to get half-way decent performance.



 
I don't believe for a second that having an ORM in a project relieves the 
developers from knowing SQL.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-11 Thread Andreas Joseph Krogh
På torsdag 11. august 2016 kl. 19:13:10, skrev Artur Zakirov <
a.zaki...@postgrespro.ru <mailto:a.zaki...@postgrespro.ru>>:
On 07.08.2016 11:05, Andreas Joseph Krogh wrote:
 > På søndag 07. august 2016 kl. 08:27:06, skrev Oleg Bartunov
 > <obartu...@gmail.com <mailto:obartu...@gmail.com>>:
 >
 >     [snip]
 >     have you considered <=| and |=> operators ? <=> in ORDER BY works
 >     like KNN.
 >
 >
 > I don't get how these operators should work. Neither give me the
 > expected results.
 >
 > Using <=>
 >
 >
 > SELECT del.entity_id, del.folder_id, del.received_timestamp FROM
 > origo_email_delivery del WHERE del.fts_all @@ to_tsquery('simple',
 > 'andreas:*:*') ORDER BY '2000-01-01' :: TIMESTAMP <=>
 > del.received_timestamp LIMIT 10;
 >
 >  entity_id | folder_id |   received_timestamp
 > ---+---+-
 >   1224278 |   1068087 | 2015-08-17 23:53:26
 >   1224382 |   1068087 | 2015-08-18 03:07:55
 >   1224404 |   1068087 | 2015-08-18 03:49:02
 >   1505713 |     48496 | 2015-10-27 14:51:45
 >    142132 |     66658 | 2012-12-03 14:14:05.488
 >    122565 |     90115 | 2012-11-20 15:41:04.936
 >    200744 |     66655 | 2013-01-28 21:47:44.561
 >   1445927 |    888665 | 2015-09-29 00:26:56
 >    123671 |     83509 | 2012-11-21 14:16:26.448
 >   1129928 |     66658 | 2015-05-09 08:39:14.128
 > (10 rows)
 >
 >
 > Using <=|
 >
 > SELECT del.entity_id, del.folder_id, del.received_timestamp FROM
 > origo_email_delivery del WHERE del.fts_all @@ to_tsquery('simple',
 > 'andreas:*:*') ORDER BY '2000-01-01' :: TIMESTAMP <=|
 > del.received_timestamp LIMIT 10;
 >
 >
 >  entity_id | folder_id |   received_timestamp
 > ---+---+-
 >   1224278 |   1068087 | 2015-08-17 23:53:26
 >   1224382 |   1068087 | 2015-08-18 03:07:55
 >   1224404 |   1068087 | 2015-08-18 03:49:02
 >   1505713 |     48496 | 2015-10-27 14:51:45
 >    142132 |     66658 | 2012-12-03 14:14:05.488
 >    122565 |     90115 | 2012-11-20 15:41:04.936
 >    200744 |     66655 | 2013-01-28 21:47:44.561
 >   1445927 |    888665 | 2015-09-29 00:26:56
 >    123671 |     83509 | 2012-11-21 14:16:26.448
 >   1129928 |     66658 | 2015-05-09 08:39:14.128
 > (10 rows)
 >
 >
 > Neither are ordered by received_timestamp
 >
 > Can you explain how to get ORDER BY received_timestamp DESC?
 >
 > Thanks.
 >
 > --
 > *Andreas Joseph Krogh*
 > CTO / Partner - Visena AS
 > Mobile: +47 909 56 963
 > andr...@visena.com <mailto:andr...@visena.com>
 > www.visena.com <https://www.visena.com>
 > <https://www.visena.com>
 >

 Do you need simple ordering by received_timestamp column? Not ordering
 by distance between received_timestamp and some date?

 Then you can use simple "ORDER BY received_timestamp". For example, we
 have data:

 =# SELECT * FROM test;
   id |     fts     |        received
 +-+-
    1 | 'andreas':1 | 2015-08-17 23:53:26
    2 | 'andreas':1 | 2015-08-18 03:07:55
    3 | 'andreas':1 | 2015-08-18 03:49:02
    4 | 'andreas':1 | 2012-12-03 14:14:05.488
    5 | 'andreas':1 | 2012-11-20 15:41:04.936
    6 | 'andreas':1 | 2013-01-28 21:47:44.561
    6 | 'andreas':1 | 2015-09-29 00:26:56
    7 | 'andreas':1 | 2012-11-21 14:16:26.448
    8 | 'andreas':1 | 2015-05-09 08:39:14.128
 (9 rows)

 I created index:

 CREATE INDEX rumidx ON test USING rum (fts rum_tsvector_timestamp_ops,
 received) WITH (attach = 'received', to = 'fts');

 Then we can execute queries:

 =# SELECT id, received FROM test WHERE fts @@ to_tsquery('simple',
 'andreas') ORDER BY received LIMIT 8;
   id |        received
 +-
    5 | 2012-11-20 15:41:04.936
    7 | 2012-11-21 14:16:26.448
    4 | 2012-12-03 14:14:05.488
    6 | 2013-01-28 21:47:44.561
    8 | 2015-05-09 08:39:14.128
    1 | 2015-08-17 23:53:26
    2 | 2015-08-18 03:07:55
    3 | 2015-08-18 03:49:02
 (8 rows)

 =# SELECT id, received FROM test WHERE fts @@ to_tsquery('simple',
 'andreas') ORDER BY received DESC LIMIT 8;
   id |        received
 +-
    6 | 2015-09-29 00:26:56
    3 | 2015-08-18 03:49:02
    2 | 2015-08-18 03:07:55
    1 | 2015-08-17 23:53:26
    8 | 2015-05-09 08:39:14.128
    6 | 2013-01-28 21:47:44.561
    4 | 2012-12-03 14:14:05.488
    7 | 2012-11-21 14:16:26.448
 (8 rows)
 
Yes, this gives the correct result, but the whole motivation for using 
RUM-index is for the query to use the same index for ORDER BY, as it seems to 
do using the <=> operator.
 
The query you gave above does not the index for sorting AFAIU.
 
Operators <=>, |=>, <=| you can use to order by nearest date to specific
 date:
 [snip]
 
I hope this is what you want.
 
I still d

Re: [GENERAL] Postgres Pain Points 2 ruby / node language drivers

2016-08-11 Thread Andreas Joseph Krogh
På torsdag 11. august 2016 kl. 19:13:08, skrev support-tiger <
supp...@tigernassau.com <mailto:supp...@tigernassau.com>>:
A database is only as good as the ability to use it.  Using ORM's with
 Postgres seems dumb, like having a Ferrari but only allowed to drive 25
 kmh. Really, for getting data in and out with Postgres, SQL is pretty
 simple. The ORM abstractions take as much effort as simply writing
 direct SQL statements and convenience functions with straight sql.
 Ruby:  The "pg" driver seems to be tweaked to work okay with active
 record, and maybe Sequel, but we are having problems with with straight
 ruby/sinatra.  (see comments above re ORM's).   Since we changed to
 Fedora and Ruby 2.3, we are now having breaking connections and JSONB
 query problems.  This could be from lack of clear docs and examples - we
 cant' find any good examples or tutorials with JSONB. It seems with this
 driver it is Rails or the highway.

 Nodejs:  The docs and tutorials for "node-postgres" are a mess.  The
 github readme, examples, and wikis are all different.  Then there is
 "native" and "pooling" and prepared statement alternatives. Callback
 hell is bad enough, then to have lots of different approaches. 
 Async/Await should help this issue (whenever it arrives in Node) but
 meanwhile, it's a painful struggle.  Whatever examples exist are for
 such trivial use, they fall short of helping with real-life examples,
 like sequential results.

 Python:  The Postgres team supported driver psycopg2 seems to work great
 (but we really don't want to rewrite lots of code into python

 It would be great if the Ruby and Node drivers can be brought under the
 Postgres team umbrella and make them as reliable and clearly documented
 as the Python or jdbc drivers.
 
I cannot not comment on this. Saying that ORM seems dumb, and working with PG 
using ORM does not fly, is a very good recipe for not being taken seriously. Of 
course you can make an ORM hose your DB, but any developer could just as easily 
write SQL which does the same. Also, remember that PG is based on volunteer 
effort and if drivers for some languages are more mature and better supported 
than others, that's because core-pg-developers or someone in "the inner 
circle", or someone else, has use for them. Just because language X or 
framework Y popus up doesn't mean the PG-project should maintain drivers for 
them or support them.
 
So - yes, it would be great if all the drivers for all languages would be made 
as reliable and clearly documented as the most supported ones.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-07 Thread Andreas Joseph Krogh
På søndag 07. august 2016 kl. 08:27:06, skrev Oleg Bartunov <obartu...@gmail.com
 <mailto:obartu...@gmail.com>>:
[snip]
have you considered <=| and |=> operators ? <=> in ORDER BY works like KNN.



 
I don't get how these operators should work. Neither give me the expected 
results.
 
Using <=>
 
SELECT del.entity_id, del.folder_id, del.received_timestamp FROM 
origo_email_delivery delWHERE del.fts_all @@ to_tsquery('simple', 
'andreas:*:*') ORDER BY '2000-01-01' :: TIMESTAMP <=> 
del.received_timestampLIMIT 10; 
 entity_id | folder_id |   received_timestamp 
 ---+---+-
   1224278 |   1068087 | 2015-08-17 23:53:26
   1224382 |   1068087 | 2015-08-18 03:07:55
   1224404 |   1068087 | 2015-08-18 03:49:02
   1505713 | 48496 | 2015-10-27 14:51:45
    142132 | 66658 | 2012-12-03 14:14:05.488
    122565 | 90115 | 2012-11-20 15:41:04.936
    200744 | 66655 | 2013-01-28 21:47:44.561
   1445927 |    888665 | 2015-09-29 00:26:56
    123671 | 83509 | 2012-11-21 14:16:26.448
   1129928 | 66658 | 2015-05-09 08:39:14.128
 (10 rows)

  
Using <=|
SELECT del.entity_id, del.folder_id, del.received_timestamp FROM 
origo_email_delivery delWHERE del.fts_all @@ to_tsquery('simple', 
'andreas:*:*') ORDER BY '2000-01-01' :: TIMESTAMP <=| 
del.received_timestampLIMIT 10; 
 
 entity_id | folder_id |   received_timestamp 
 ---+---+-
   1224278 |   1068087 | 2015-08-17 23:53:26
   1224382 |   1068087 | 2015-08-18 03:07:55
   1224404 |   1068087 | 2015-08-18 03:49:02
   1505713 | 48496 | 2015-10-27 14:51:45
    142132 | 66658 | 2012-12-03 14:14:05.488
    122565 | 90115 | 2012-11-20 15:41:04.936
    200744 | 66655 | 2013-01-28 21:47:44.561
   1445927 |    888665 | 2015-09-29 00:26:56
    123671 | 83509 | 2012-11-21 14:16:26.448
   1129928 | 66658 | 2015-05-09 08:39:14.128
 (10 rows)

  
Neither are ordered by received_timestamp
 
Can you explain how to get ORDER BY received_timestamp DESC?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-07 Thread Andreas Joseph Krogh
På lørdag 06. august 2016 kl. 20:54:32, skrev Artur Zakirov <
a.zaki...@postgrespro.ru <mailto:a.zaki...@postgrespro.ru>>:
Hello,   2016-08-02 21:08 GMT+03:00 Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>>:  The ORDER BY part seems strange; It seems one 
has to find a value "lower than any other value" to use as a kind of base, why 
is this necessary? It also seems that in order to be able to sort DESC one has 
to provide a timestamp value "higher than any other value", is this correct? 
 
It would be great if the docs explained this.
 
We will write more detailed documentation for RUM.



 
Great!
 
 
I really miss the opportunity to include a BIGINT as part of the index, so 
that the WHERE-clause could be like this:
 
WHERE del.fts_all @@ to_tsquery('simple', 'andreas') AND del.folder_id IN (1
,2,3)
 
Having this would be perfect for my use-case searching in email in folders, 
sorted by received_date, and having it use ONE index.
 
Will this be supported?


 We have a plan to use generic types to able to include bigint, timestamp and 
other types as part of index.

 
Does this eliminate the need for a btree_rum equivalent of btree_gin, being 
that the RUM-index will handle all "btree-able" datatypes?
 
 
But I cant tell date of it.


 
I understand.
Do you think it will be done by the time 9.6 is released?
 
Thanks.
 

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-02 Thread Andreas Joseph Krogh
Hi.
 
I see the RUM-index is updated, which is great!
 
I wonder, to be able to sort by timestamp one has to create the index like 
this:
 
CREATE INDEX rumidx ON origo_email_delivery USING rum (fts_all 
rum_tsvector_timestamp_ops, received_timestamp)WITH (attach = 
'received_timestamp', TO = 'fts_all', order_by_attach = TRUE ); 
Then, to be able to use the index for sorting by the 
"received_timestamp"-column one has to issue a query like this:
EXPLAIN ANALYZE SELECT del.entity_id, del.subject, del.received_timestamp, 
fts_all <=>to_tsquery('simple', 'andreas') AS rank FROM origo_email_delivery 
delWHERE del.fts_all @@ to_tsquery('simple', 'andreas') ORDER BY '2000-01-01'
::TIMESTAMP <=> del.received_timestamp LIMIT 10; 
QUERY PLAN 
--
Limit(cost=14.40..26.47 rows=10 width=89) (actual time=10.908..10.952 rows=10 
loops=1) -> Index Scan using rumidx on origo_email_delivery del (cost
=14.40..3221.22rows=2657 width=89) (actual time=10.906..10.947 rows=10 loops=1) 
IndexCond: (fts_all @@ '''andreas'' & ''kr'''::tsquery) Order By: 
(received_timestamp <=>'2000-01-01 00:00:00'::timestamp without time zone) 
Planningtime: 0.491 ms Execution time: 11.010 ms (6 rows) 
 
The ORDER BY part seems strange; It seems one has to find a value "lower than 
any other value" to use as a kind of base, why is this necessary? It also seems 
that in order to be able to sort DESC one has to provide a timestamp value 
"higher than any other value", is this correct?
 
It would be great if the docs explained this.
 
I really miss the opportunity to include a BIGINT as part of the index, so 
that the WHERE-clause could be like this:
 
WHERE del.fts_all @@ to_tsquery('simple', 'andreas') AND del.folder_id IN (1
,2,3)
 
Having this would be perfect for my use-case searching in email in folders, 
sorted by received_date, and having it use ONE index.
 
Will this be supported?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Enquiry

2016-06-29 Thread Andreas Joseph Krogh
På onsdag 29. juni 2016 kl. 09:36:12, skrev mohd abdul azeem <
mohdabdulaz...@hotmail.com <mailto:mohdabdulaz...@hotmail.com>>:
Hello  
This is Mohammed. I am a database developer using postgres as the backend 
database. i am working on the postgres version 9.3.5 for development purpose.
 
I have just signed up with the postgres community. i hope this is correct 
place for getting my queries answered.  
 
I am planning to learn the scalability feature in postgres. found that there 
is no inbuilt support in postgres. 
 
it would be great if you could point me in the right direction.
 
Thanks
Mohammed Abdul Azeem

 
It would make it easier for us to help you if you asked a specific question.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Updated RUM?

2016-06-27 Thread Andreas Joseph Krogh
På mandag 27. juni 2016 kl. 13:20:03, skrev Oleg Bartunov <obartu...@gmail.com 
<mailto:obartu...@gmail.com>>:
    On Fri, Jun 24, 2016 at 2:36 PM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: Hi.
 
@PostgresPro; Will there be any updates to the rum git-repo any time soon?
 
Sure.
  
 
I understand you're working hard on this, care to share some vision/roadmap? 
I'm specifically interested in if RUM will support the same datatypes as GIN 
supports (arrays, JSON etc.), and what opclasses you plan to implement (for 
equalty and sorting). Will there be some kind of btree_rum like there is for 
btree_gin, to add RUM-support for the types there exists Btree-support for?
 
We are testing fts+timestamp combination right now.  The challenge is to have 
posting list/tree sorted by addinfo (timestamp) not by item pointer. It works 
already, but we are not quite satisfied with generated WAL traffic and are 
researching some options in improving posting tree building and generic wal.
  
We have in mind some sort of generalization like anyarray, but don't have any 
real ideas.



 
Cool!
 
Will RUM support having ts_vector and bigint in the same index, like GIN does 
(utilizine btree_gin), then sorting by some 3rd timestamp-column?
 
This will enable queries like:
 
create table delivery( id BIGSERIAL primary key, fts_all TSVECTOR not null, 
folder_idBIGINT NOT NULL, sent TIMESTAMP not null, message varchar not null ); 
 
SELECT del.id <http://del.id/> , del.sent FROM delivery del WHERE del.fts_all 
@@to_tsquery('simple', 'hi:*') AND del.folder_id = ANY(ARRAY[2,3]::BIGINT[]) 
ORDER BYdel.sent DESC LIMIT 101 OFFSET 0; Having this query use one RUM-index 
with optimized sorting by timestamp would be extremely useful.

 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] Updated RUM?

2016-06-24 Thread Andreas Joseph Krogh
Hi.
 
@PostgresPro; Will there be any updates to the rum git-repo any time soon?
 
I understand you're working hard on this, care to share some vision/roadmap? 
I'm specifically interested in if RUM will support the same datatypes as GIN 
supports (arrays, JSON etc.), and what opclasses you plan to implement (for 
equalty and sorting). Will there be some kind of btree_rum like there is for 
btree_gin, to add RUM-support for the types there exists Btree-support for?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Question about RUM-index

2016-06-17 Thread Andreas Joseph Krogh
På fredag 17. juni 2016 kl. 15:47:08, skrev Oleg Bartunov <obartu...@gmail.com 
<mailto:obartu...@gmail.com>>:
    On Fri, Jun 17, 2016 at 3:52 PM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: På fredag 17. juni 2016 kl. 13:53:34, skrev 
Oleg Bartunov <obartu...@gmail.com <mailto:obartu...@gmail.com>>:
    On Fri, Jun 17, 2016 at 2:10 PM, Oleg Bartunov <obartu...@gmail.com 
<mailto:obartu...@gmail.com>> wrote:     On Fri, Jun 17, 2016 at 9:32 AM, 
Andreas Joseph Krogh<andr...@visena.com <mailto:andr...@visena.com>> wrote: På 
torsdag 16. juni 2016 kl. 00:50:45, skrev Jeff Janes <jeff.ja...@gmail.com 
<mailto:jeff.ja...@gmail.com>>:
On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: Hi.
 
First; Is this the correct forum to ask questions about the Postgres Pro's new 
RUM-index?
 
If not, please point me to the right forum.
 
I think that https://github.com/postgrespro/rum/issues 
<https://github.com/postgrespro/rum/issues> might be the best forum.



 
Oleg and friends; Should we use GitHub-issues as forum (one issue per 
question/thread?), pgsql-general or something else?
 
Andreas,
  
we are hardly working on our internal version of rum and will open it after 
resolving some issues. I think the best place to discuss it is -hackers.



 
Ah, as someone corrected me, we are working hard !



 
He he, I figured that was what you meant:-)
 
Ok, so basically - will RUM-index support the same indexing-properties as GIN 
(being able to index tsvector, BIGINT-arrays, JSONB etc.) and be able to use 
index for sorting on ie. timestamp, tsrank or some BIGINT-column?
 
Like my example, will it be possible to issue a query like this:
 
SELECT del.id <http://del.id/> , del.sent FROM delivery del WHERE 1 = 1 AND 
del.fts_all @@to_tsquery('simple', 'hi:*') AND del.folder_id = ANY(ARRAY[2,3]::
BIGINT[]) ORDER BY del.sent DESC LIMIT 101 OFFSET 0;  


and have it use one RUM-index so the whole operation, including sorting, is as 
efficient as possible?
 
 
we have implementation for timestamp. One need to write opclass to deal with 
arrays, it shouldn't be difficult.



 
Will the opclass dealing with bigint-arrays also handle the single-element 
case, that is only one bigint?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Question about RUM-index

2016-06-17 Thread Andreas Joseph Krogh
På fredag 17. juni 2016 kl. 13:53:34, skrev Oleg Bartunov <obartu...@gmail.com 
<mailto:obartu...@gmail.com>>:
    On Fri, Jun 17, 2016 at 2:10 PM, Oleg Bartunov <obartu...@gmail.com 
<mailto:obartu...@gmail.com>> wrote:     On Fri, Jun 17, 2016 at 9:32 AM, 
Andreas Joseph Krogh<andr...@visena.com <mailto:andr...@visena.com>> wrote: På 
torsdag 16. juni 2016 kl. 00:50:45, skrev Jeff Janes <jeff.ja...@gmail.com 
<mailto:jeff.ja...@gmail.com>>:
On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: Hi.
 
First; Is this the correct forum to ask questions about the Postgres Pro's new 
RUM-index?
 
If not, please point me to the right forum.
 
I think that https://github.com/postgrespro/rum/issues 
<https://github.com/postgrespro/rum/issues> might be the best forum.



 
Oleg and friends; Should we use GitHub-issues as forum (one issue per 
question/thread?), pgsql-general or something else?
 
Andreas,
  
we are hardly working on our internal version of rum and will open it after 
resolving some issues. I think the best place to discuss it is -hackers.



 
Ah, as someone corrected me, we are working hard !



 
He he, I figured that was what you meant:-)
 
Ok, so basically - will RUM-index support the same indexing-properties as GIN 
(being able to index tsvector, BIGINT-arrays, JSONB etc.) and be able to use 
index for sorting on ie. timestamp, tsrank or some BIGINT-column?
 
Like my example, will it be possible to issue a query like this:
 
SELECT del.id <http://del.id/> , del.sent FROM delivery del WHERE 1 = 1 AND 
del.fts_all @@to_tsquery('simple', 'hi:*') AND del.folder_id = ANY(ARRAY[2,3]::
BIGINT[]) ORDER BY del.sent DESC LIMIT 101 OFFSET 0;  


and have it use one RUM-index so the whole operation, including sorting, is as 
efficient as possible?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Question about RUM-index

2016-06-17 Thread Andreas Joseph Krogh
På torsdag 16. juni 2016 kl. 00:50:45, skrev Jeff Janes <jeff.ja...@gmail.com 
<mailto:jeff.ja...@gmail.com>>:
On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: Hi.
 
First; Is this the correct forum to ask questions about the Postgres Pro's new 
RUM-index?
 
If not, please point me to the right forum.
 
I think that https://github.com/postgrespro/rum/issues 
<https://github.com/postgrespro/rum/issues> might be the best forum.



 
Oleg and friends; Should we use GitHub-issues as forum (one issue per 
question/thread?), pgsql-general or something else?
 
Note that GIN does almost what I want, except use the index when sorting by 
"sent"-timestamp.
 
So I wonder if RUM can do any better?
What I don't understand is how to have "folder_id" as part of the RUM-index so 
that I can search in an array of folders using the index, AND have the whole 
result sorted by "sent"-timestamp also using the RUM-index.
 
I think you would have to implement an operator for integers for RUM much like 
btree_gin does for GIN.  Sorry don't know how to do that, except to say look in 
the RUM code to see how it does it for time-stamps.
 
 
In the (limited) documentation sorting using timestamp is done like this:
 
ORDER BY sent <-> '2000-01-01'::TIMESTAMP
 
which I don't understand; Why must one specify a value here, and how does that 
value affect the result?
 
 
This is essentially identical to ORDER BY ABS(sent - '2000-01-01'::TIMESTAMP); 
 except it can use the index.
 
So maybe pick a constant outside the range of possible values, and use that as 
one argument to <->.



 
This should be unnecessary and hidden from the user. Maybe some "ORDER BY 
rum_timestamp(sent)" or something could abstract away stuff to make it much 
clearer to the user?

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Question about RUM-index

2016-06-15 Thread Andreas Joseph Krogh
På onsdag 15. juni 2016 kl. 15:27:32, skrev David G. Johnston <
david.g.johns...@gmail.com <mailto:david.g.johns...@gmail.com>>:
On Wed, Jun 15, 2016 at 6:56 AM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote:
Hi.
 
First; Is this the correct forum to ask questions about the Postgres Pro's new 
RUM-index?
 
​-general is fairly broad in scope and the authors do monitor here as far as I 
am aware.  I'm not sure this is strictly related to that extension though, 
anyway.​

 
 
If not, please point me to the right forum.
 
I'm trying to do this (done with GIN):
 
create extension if not exists btree_gin; drop table if EXISTS delivery; 
create tabledelivery( id BIGSERIAL primary key, fts_all TSVECTOR not null, 
folder_idBIGINT NOT NULL, sent TIMESTAMP not null, message varchar not null ); 
create index gin_idx on delivery using GIN(fts_all, folder_id); 
​In general I'm not sure what you are so focused on multi-column indexes 
through this email.​
 [snip] 
 
​What index?  You don't have "sent" in the one CREATE INDEX statement you 
specified above.​

 
 
So I wonder if RUM can do any better?
What I don't understand is how to have "folder_id" as part of the RUM-index so 
that I can search inan array of folders using the index, AND have the whole 
result sorted by "sent"-timestamp also using the RUM-index.
 
 
​A bit out of my experience here...but see comment on multi-column indexes 
above.​

 
In the (limited) documentation sorting using timestamp is done like this:
 
 
 
 
ORDER BY sent <-> '2000-01-01'::TIMESTAMP
 
which I don't understand; Why must one specify a value here, and how does that 
value affect the result?
 
 
​You don't (I think...);  is perfectly valid, it will use the 
"<=, and >=" operators in the btree operator family to perform the 
ordering...see comment regarding multi-column indexes.  Even if you stick sent 
into such an index it would not (I guess it could...but your where clause 
doesn't filter on it so doing so seem counter-productive) be a leading column 
so it is unlikely that the index would be helpful in supplying tuples to the 
sort node in order - the sort would still have work to perform.

 

Maybe its simple ignorance but AFAIK the sorting node never consults an index 
to perform its work.  Where indexes come into to play is the node under the 
sort is able to supply its tuples in sorted order then the sort node will have 
nothing to do.

 

https://www.postgresql.org/docs/9.6/static/indexes-ordering.html 
<https://www.postgresql.org/docs/9.6/static/indexes-ordering.html>

 

"""

In addition to simply finding the rows to be returned by a query, an index may 
be able to deliver them in a specific sorted order. This allows a query's ORDER 
BY specification to be honored without a separate sorting step. Of the index 
types currently supported by PostgreSQL, only B-tree can produce sorted output 
— the other index types return matching rows in an unspecified, 
implementation-dependent order.

​"""​

​HTH
 
David J.​




 
I'm focused on using *one* index for this query for this to be as efficient as 
possible. I'm searching thru millions of records and the user expects 
"millisecond-response", we're building an web-based email-system and people 
expect search to be fast, as GMail and other services are. No index-type can do 
this yet but RUM is on its way and seems promising.
 
The reason I don't have "sent" in the (GIN) index is that I'm only using it 
for sorting and GIN cannot be used for sorting, which is why I'm looking into 
using the RUM-index; To hopefully get the best of GIN (able to index 
BIGINT-arrays, JSONB and tsvector) and be able to sort on timestamp (which RUM 
is able to do).
 
Based on your answer it doesn't seem you are very familiar with the 
RUM-extention, so much of your reply doesn't really seem relevant.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] Question about RUM-index

2016-06-15 Thread Andreas Joseph Krogh
Hi.
 
First; Is this the correct forum to ask questions about the Postgres Pro's new 
RUM-index?
 
If not, please point me to the right forum.
 
I'm trying to do this (done with GIN):
 
create extension if not exists btree_gin; drop table if EXISTS delivery; 
create tabledelivery( id BIGSERIAL primary key, fts_all TSVECTOR not null, 
folder_idBIGINT NOT NULL, sent TIMESTAMP not null, message varchar not null ); 
create index gin_idx on delivery using GIN(fts_all, folder_id); CREATE OR 
REPLACE FUNCTIONupdate_delivery_tsvector_tf() RETURNS TRIGGER AS $$ BEGIN  
NEW.fts_all =to_tsvector('simple', NEW.message);  return NEW; END; $$ LANGUAGE 
PLPGSQL;CREATE TRIGGER update_delivery_tsvector_t BEFORE INSERT OR UPDATE ON 
deliveryFOR EACH ROW EXECUTE PROCEDURE update_delivery_tsvector_tf(); insert 
intodelivery(folder_id, sent, message) values (1, '2015-01-01', 'Yes hit four') 
, (1, '2014-01-01', 'Hi man') , (2, '2013-01-01', 'Hi man') , (2, '2013-01-01', 
'fish') ; analyze delivery; set ENABLE_SEQSCAN to off; explain analyze SELECT 
del.id , del.sentFROM delivery del WHERE 1 = 1 AND del.fts_all @@ to_tsquery(
'simple', 'hi:*') AND del.folder_id = ANY(ARRAY[2,3]::BIGINT[]) ORDER BY 
del.sentDESC LIMIT 101 OFFSET 0; 
 
                                                         QUERY PLAN 
 
 

 Limit  (cost=3.63..3.63 rows=1 width=16) (actual time=0.025..0.025 rows=1 
loops=1)
   ->  Sort  (cost=3.63..3.63 rows=1 width=16) (actual time=0.024..0.024 
rows=1 loops=1)
 Sort Key: sent DESC
 Sort Method: quicksort  Memory: 25kB
 ->  Bitmap Heap Scan on delivery del  (cost=2.40..3.62 rows=1 
width=16) (actual time=0.019..0.019 rows=1 loops=1)
   Recheck Cond: ((fts_all @@ '''hi'':*'::tsquery) AND (folder_id 
= ANY ('{2,3}'::bigint[])))
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on gin_idx  (cost=0.00..2.40 rows=1 
width=0) (actual time=0.015..0.015 rows=1 loops=1)
 Index Cond: ((fts_all @@ '''hi'':*'::tsquery) AND 
(folder_id = ANY ('{2,3}'::bigint[])))
 Planning time: 0.153 ms
 Execution time: 0.047 ms
 (11 rows)

  
Note that GIN does almost what I want, except use the index when sorting by 
"sent"-timestamp.
 
So I wonder if RUM can do any better?
What I don't understand is how to have "folder_id" as part of the RUM-index so 
that I can search inan array of folders using the index, AND have the whole 
result sorted by "sent"-timestamp also using the RUM-index.
 
In the (limited) documentation sorting using timestamp is done like this:
 
ORDER BY sent <-> '2000-01-01'::TIMESTAMP
 
which I don't understand; Why must one specify a value here, and how does that 
value affect the result?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Slides for PGCon2016; "FTS is dead ? Long live FTS !"

2016-05-31 Thread Andreas Joseph Krogh
På tirsdag 31. mai 2016 kl. 16:12:52, skrev Oleg Bartunov <obartu...@gmail.com 
<mailto:obartu...@gmail.com>>:
[snip] He he, I reported 1st issue: https://github.com/postgrespro/rum/issues/1
 <https://github.com/postgrespro/rum/issues/1>
 
Would be cool to see this fixed so I actually could have a sip of the rum:-)


  
It's not easy to fix this. We don't want rum depends on  btree_gin, so 
probably the easiest way is to have separate operator <=> in rum.



 
+1 for separate operator!
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Slides for PGCon2016; "FTS is dead ? Long live FTS !"

2016-05-30 Thread Andreas Joseph Krogh
På mandag 30. mai 2016 kl. 22:27:11, skrev Oleg Bartunov <obartu...@gmail.com 
<mailto:obartu...@gmail.com>>:
    On Sun, May 29, 2016 at 12:59 AM, Oleg Bartunov <obartu...@gmail.com 
<mailto:obartu...@gmail.com>> wrote:     On Thu, May 26, 2016 at 11:26 PM, 
Andreas Joseph Krogh<andr...@visena.com <mailto:andr...@visena.com>> wrote: Hi.
 
Any news about when slides for $subject will be available?
 
I submitted slides to pgcon site, but it usually takes awhile, so you can 
download our presentation directly
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2016-fts.pdf 
<http://www.sai.msu.su/~megera/postgres/talks/pgcon-2016-fts.pdf>
  



 
Please, download new version of slides. I added CREATE INDEX commands in 
examples.



 
Great!
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Slides for PGCon2016; "FTS is dead ? Long live FTS !"

2016-05-29 Thread Andreas Joseph Krogh
På søndag 29. mai 2016 kl. 19:49:06, skrev Oleg Bartunov <obartu...@gmail.com 
<mailto:obartu...@gmail.com>>:
[snip]  
I want to run 9.6 beta in production right now because of this:-)
 
wait-wait :)  We'd be happy to have feedback from production, of course, but 
please, wait a bit. We are adding support of sorting posting list/tree not by 
item pointer as in gin, but make use of additional information, for example, 
timestamp, which will provide additional speedup to the existing one.



 
Awesome!
 
 
Also, we are sure there are some bugs :)



 
He he, I reported 1st issue: https://github.com/postgrespro/rum/issues/1
 
Would be cool to see this fixed so I actually could have a sip of the rum:-)

 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Slides for PGCon2016; "FTS is dead ? Long live FTS !"

2016-05-29 Thread Andreas Joseph Krogh
På lørdag 28. mai 2016 kl. 23:59:55, skrev Oleg Bartunov <obartu...@gmail.com 
<mailto:obartu...@gmail.com>>:
    On Thu, May 26, 2016 at 11:26 PM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: Hi.
 
Any news about when slides for $subject will be available?
 
I submitted slides to pgcon site, but it usually takes awhile, so you can 
download our presentation directly
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2016-fts.pdf 
<http://www.sai.msu.su/~megera/postgres/talks/pgcon-2016-fts.pdf>
  
There are some missing features in rum index, but I hope we'll update github 
repository really soon.



 
This is simply amazing!
 
I want to run 9.6 beta in production right now because of this:-)
 
Hats off guys, congrats to PostgresPro, and huge thanks!!
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] Slides for PGCon2016; "FTS is dead ? Long live FTS !"

2016-05-26 Thread Andreas Joseph Krogh
Hi.
 
Any news about when slides for $subject will be available?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] index on ILIKE/LIKE - PostgreSQL 9.2

2016-05-12 Thread Andreas Joseph Krogh
På torsdag 12. mai 2016 kl. 10:05:01, skrev Andreas Joseph Krogh <
andr...@visena.com <mailto:andr...@visena.com>>:
[snp] I created this test:
 
create table ja_jobs(id bigserial primary key, title varchar not null, 
clientid bigint not null, time_job bigint not null);
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title 
gin_trgm_ops, clientid);

 --- insert some test-data
 
As you see, this uses the index (when casting clientid to bigint):


 
andreak=# explain analyze SELECT DISTINCT title  
 FROM ja_jobs WHERE title ILIKE '%ras du%'
 and clientid = 12::bigint AND time_job > 257826264
 order BY title
 limit 10;
   QUERY 
PLAN   
 
---
  Limit  (cost=8.43..8.44 rows=1 width=32) (actual time=0.033..0.034 rows=1 
loops=1)
    ->  Unique  (cost=8.43..8.44 rows=1 width=32) (actual time=0.032..0.032 
rows=1 loops=1)
  ->  Sort  (cost=8.43..8.43 rows=1 width=32) (actual 
time=0.032..0.032 rows=1 loops=1)
    Sort Key: title
    Sort Method: quicksort  Memory: 25kB
    ->  Bitmap Heap Scan on ja_jobs  (cost=7.20..8.42 rows=1 
width=32) (actual time=0.025..0.025 rows=1 loops=1)
  Recheck Cond: (((title)::text ~~* '%ras du%'::text) AND 
(clientid = '12'::bigint))
  Filter: (time_job > 257826264)
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on ix_ja_jobs_trgm_clientid_gin  
(cost=0.00..7.20 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1)
    Index Cond: (((title)::text ~~* '%ras du%'::text) 
AND (clientid = '12'::bigint))
  Planning time: 0.169 ms
  Execution time: 0.061 ms
 (13 rows)

 
Forgot to say, this is in PG-9.6 (master), but should work on previous 
versions.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] index on ILIKE/LIKE - PostgreSQL 9.2

2016-05-12 Thread Andreas Joseph Krogh
På torsdag 12. mai 2016 kl. 09:57:58, skrev Andreas Joseph Krogh <
andr...@visena.com <mailto:andr...@visena.com>>:
På torsdag 12. mai 2016 kl. 02:30:33, skrev Lucas Possamai <
drum.lu...@gmail.com <mailto:drum.lu...@gmail.com>>:
Hi there!  
I've got a simple but slow query:
 
 SELECT DISTINCT title  
 FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
 and clientid = 31239  AND time_job > 1457826264
 order BY title
 limit 10 
 
Explain analyze: 
 
Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 
rows=1 loops=1)
   ->  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual 
time=2746.753..2746.763 rows=1 loops=1)
         ->  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual 
time=2746.750..2746.754 rows=4 loops=1)
               Sort Key: "title"
               Sort Method: quicksort  Memory: 25kB
               ->  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 
width=19) (actual time=576.275..2746.609 rows=4 loops=1)
                     Recheck Cond: (("clientid" = 31239) AND ("time_job" > 
1457826264))
                     Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
                     Rows Removed by Filter: 791
                     ->  Bitmap Index Scan on "ix_jobs_client_times" 
 (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 
loops=1)
                           Index Cond: (("clientid" = 31239) AND ("time_job" > 
1457826264))
 Total runtime: 2746.879 ms  
Then, I created a trgm index:
 
CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title 
gin_trgm_ops);
 
Explain analyze after the index: (Yes, I ran the analyze)
 
Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 
rows=0 loops=1)
   ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual 
time=3720.507..3720.507 rows=0 loops=1)
         ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual 
time=3720.505..3720.505 rows=0 loops=1)
               Sort Key: "title"
               Sort Method: quicksort  Memory: 25kB
               ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 
width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
                     Recheck Cond: (("clientid" = 31239) AND ("time_job" > 
1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
                     Rows Removed by Index Recheck: 4
                     ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) 
(actual time=3720.469..3720.469 rows=0 loops=1)
                           ->  Bitmap Index Scan on "ix_jobs_client_times" 
 (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 
loops=1)
                                 Index Cond: (("clientid" = 31239) AND 
("time_job" > 1457826264))
                           ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin" 
 (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 
loops=1)
                                 Index Cond: (("title")::"text" ~~ '%RYAN WER
%'::"text")
 Total runtime: 3720.653 ms   
 
so.. the query is still slow.. 
Do you guys  know what can be done ? related to the ILIKE?

 
cheers
Lucas

 
It uses available indexes, but that isn't good enough.
 
Try including clientid in the index, using the btree_gin extension:
 
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title 
gin_trgm_ops, clientid);
 
Note that if clientid is a bigint you have to cast the value to bigint for 
btree_gin to use it (note that this isn't necessary if you use a prepared 
statement):
 
SELECT DISTINCT title  
         FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
         and clientid = 31239::bigint  AND time_job > 1457826264
         order BY title
         limit 10

 
Also note that the index cannot ant won't be used for sorting. A bitmap-AND is 
also inevitable because GIN-indexes cannot be used for the '>' operator, so PG 
uses the ix_jobs_client_times btree-index and bigmap-ANDs the result.
 
Can you post you complete schema?
 
I created this test:
 
create table ja_jobs(id bigserial primary key, title varchar not null, 
clientid bigint not null, time_job bigint not null);
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title 
gin_trgm_ops, clientid);

 --- insert some test-data
 
As you see, this uses the index (when casting clientid to bigint):


 
andreak=# explain analyze SELECT DISTINCT title  
 FROM ja_jobs WHERE title ILIKE '%ras du%'
 and clientid = 12::bigint AND time_job > 257826264
 order BY title
 limit 10;
   QUERY 
PLAN   
 
---

Re: [GENERAL] index on ILIKE/LIKE - PostgreSQL 9.2

2016-05-12 Thread Andreas Joseph Krogh
På torsdag 12. mai 2016 kl. 02:30:33, skrev Lucas Possamai <drum.lu...@gmail.com
 <mailto:drum.lu...@gmail.com>>:
Hi there!  
I've got a simple but slow query:
 
 SELECT DISTINCT title  
 FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
 and clientid = 31239  AND time_job > 1457826264
 order BY title
 limit 10 
 
Explain analyze: 
 
Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 
rows=1 loops=1)
   ->  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual 
time=2746.753..2746.763 rows=1 loops=1)
         ->  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual 
time=2746.750..2746.754 rows=4 loops=1)
               Sort Key: "title"
               Sort Method: quicksort  Memory: 25kB
               ->  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 
width=19) (actual time=576.275..2746.609 rows=4 loops=1)
                     Recheck Cond: (("clientid" = 31239) AND ("time_job" > 
1457826264))
                     Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
                     Rows Removed by Filter: 791
                     ->  Bitmap Index Scan on "ix_jobs_client_times" 
 (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 
loops=1)
                           Index Cond: (("clientid" = 31239) AND ("time_job" > 
1457826264))
 Total runtime: 2746.879 ms  
Then, I created a trgm index:
 
CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title 
gin_trgm_ops);
 
Explain analyze after the index: (Yes, I ran the analyze)
 
Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 
rows=0 loops=1)
   ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual 
time=3720.507..3720.507 rows=0 loops=1)
         ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual 
time=3720.505..3720.505 rows=0 loops=1)
               Sort Key: "title"
               Sort Method: quicksort  Memory: 25kB
               ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 
width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
                     Recheck Cond: (("clientid" = 31239) AND ("time_job" > 
1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
                     Rows Removed by Index Recheck: 4
                     ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) 
(actual time=3720.469..3720.469 rows=0 loops=1)
                           ->  Bitmap Index Scan on "ix_jobs_client_times" 
 (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 
loops=1)
                                 Index Cond: (("clientid" = 31239) AND 
("time_job" > 1457826264))
                           ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin" 
 (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 
loops=1)
                                 Index Cond: (("title")::"text" ~~ '%RYAN WER
%'::"text")
 Total runtime: 3720.653 ms   
 
so.. the query is still slow.. 
Do you guys  know what can be done ? related to the ILIKE?

 
cheers
Lucas

 
It uses available indexes, but that isn't good enough.
 
Try including clientid in the index, using the btree_gin extension:
 
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title 
gin_trgm_ops, clientid);
 
Note that if clientid is a bigint you have to cast the value to bigint for 
btree_gin to use it (note that this isn't necessary if you use a prepared 
statement):
 
SELECT DISTINCT title  
         FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
         and clientid = 31239::bigint  AND time_job > 1457826264
         order BY title
         limit 10

 
Also note that the index cannot ant won't be used for sorting. A bitmap-AND is 
also inevitable because GIN-indexes cannot be used for the '>' operator, so PG 
uses the ix_jobs_client_times btree-index and bigmap-ANDs the result.
 
Can you post you complete schema?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] "Keyed" sequence?

2016-04-28 Thread Andreas Joseph Krogh
På torsdag 28. april 2016 kl. 20:30:20, skrev Israel Brewster <
isr...@ravnalaska.net <mailto:isr...@ravnalaska.net>>:
This is probably crazy talk, but in Postgresql is there any way to have a 
"keyed" sequence? That is, I have a table with a list of departments. While 
*relatively* static, technically speaking this is a dynamic list - someone 
certainly could decide to add a department, or combine two departments into one 
new one, whatever. In any case, I would ideally like to have a sequence per 
department, but since the list is dynamic it's not as simple as creating 33 
independent sequences and then having a lookup table somewhere, although I 
guess I could implement something of the sort with triggers. 
Is there any elegant way of accomplishing this? Something like SELECT 
nextval('department_seq['ops'])?
 
Maybe this will help you: 
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
 
Then you can do:
 "id" bigint NOT NULL DEFAULT insta5.next_id(), 
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Initdb --data-checksums by default

2016-04-20 Thread Andreas Joseph Krogh
På onsdag 20. april 2016 kl. 11:22:33, skrev Alex Ignatov <
a.igna...@postgrespro.ru <mailto:a.igna...@postgrespro.ru>>:
[snip] Why do you think that common pg-users doesn't care about their data?  
Did I say that?
  Also why do we have wal_level=minimal fsync=on and other stuff?  
To make certain garantees that data is by default durable.
 

What I'm saying is that everything is a compromise, cost/benefit. The universe 
might explode tomorrow but the chances are slim, so no use preparing for it.
Those caring enough probably use checksums, battery-packed RAID etc.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Initdb --data-checksums by default

2016-04-20 Thread Andreas Joseph Krogh
På onsdag 20. april 2016 kl. 11:02:31, skrev Alex Ignatov <
a.igna...@postgrespro.ru <mailto:a.igna...@postgrespro.ru>>:
  On 20.04.2016 11:40, Andreas Joseph Krogh wrote:
På onsdag 20. april 2016 kl. 10:33:14, skrev Alex Ignatov <
a.igna...@postgrespro.ru <mailto:a.igna...@postgrespro.ru>>:


 On 20.04.2016 11:29, Devrim Gündüz wrote:
 > Hi,
 >
 > On Wed, 2016-04-20 at 10:43 +0300, Alex Ignatov wrote:
 >> Today in Big Data epoch silent data corruption becoming more and more
 >> issue to afraid of. With uncorrectable read error rate ~ 10^-15  on
 >> multiterabyte disk bit rot is the real issue.
 >> I think that today checksumming data  must be mandatory  set by default.
 >> Only if someone doesn't care about his data he can manually turn this
 >> option off.
 >>
 >> What do you think about defaulting --data-checksums in initdb?
 > I think this should be discussed in -hackers, right?
 >
 > Regards,
 May be you right but i want to know what people think about it before
 i'll write to hackers.
 
-1 on changing the default.
 
10^15 ~= 1000 TB, which isn't very common yet. Those having it probably are 
aware of the risk and have enabled checksums already.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 
 It is per bit not bytes. So it is ~100 TB. We working with some enterprise 
who have WALs creation rate ~ 4GB per min - so it is only max 100 days before 
you get bit rotted and have probability to get silent data corruption.
 Also don't forget that it is theoretical limit and Google tells us that HDD 
and SSD is not as reliable as manufactures tell. So this 10^-15 can easily be 
much higher.  
Ok, but still - the case you're describing isn't the common-case for PG-users. 
Enterprises like that certainly chould use --data-checksums, I'm not arguing 
against that, just that it shouldn't be the default-setting.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Initdb --data-checksums by default

2016-04-20 Thread Andreas Joseph Krogh
På onsdag 20. april 2016 kl. 10:33:14, skrev Alex Ignatov <
a.igna...@postgrespro.ru <mailto:a.igna...@postgrespro.ru>>:


 On 20.04.2016 11:29, Devrim Gündüz wrote:
 > Hi,
 >
 > On Wed, 2016-04-20 at 10:43 +0300, Alex Ignatov wrote:
 >> Today in Big Data epoch silent data corruption becoming more and more
 >> issue to afraid of. With uncorrectable read error rate ~ 10^-15  on
 >> multiterabyte disk bit rot is the real issue.
 >> I think that today checksumming data  must be mandatory  set by default.
 >> Only if someone doesn't care about his data he can manually turn this
 >> option off.
 >>
 >> What do you think about defaulting --data-checksums in initdb?
 > I think this should be discussed in -hackers, right?
 >
 > Regards,
 May be you right but i want to know what people think about it before
 i'll write to hackers.
 
-1 on changing the default.
 
10^15 ~= 1000 TB, which isn't very common yet. Those having it probably are 
aware of the risk and have enabled checksums already.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] How to build docs from master?

2016-04-11 Thread Andreas Joseph Krogh
På mandag 11. april 2016 kl. 15:05:06, skrev Artur Zakirov <
a.zaki...@postgrespro.ru <mailto:a.zaki...@postgrespro.ru>>:
On 11.04.2016 15:27, Andreas Joseph Krogh wrote:
 > Hi all.
 > I have installed PG (master) from git, but it seems the docs are not
 > built issuing "make install".

 Yes. You need to run "make world" to build everything or "make docs" to
 build docs. And run "make install-world" to install everything or "make
 install-docs" to install docs.

 > I've tried "make docs" but it fails with:
 >        interface Makefile:127: recipe for target 'HTML.index' failed
 > make[3]: *** [HTML.index] Error 1
 > make[3]: Leaving directory '/home/andreak/dev/postgresql/doc/src/sgml'
 > Makefile:8: recipe for target 'all' failed
 > make[2]: *** [all] Error 2
 > make[2]: Leaving directory '/home/andreak/dev/postgresql/doc/src'
 > Makefile:16: recipe for target 'all' failed
 > make[1]: *** [all] Error 2
 > make[1]: Leaving directory '/home/andreak/dev/postgresql/doc'
 > GNUmakefile:17: recipe for target 'docs' failed
 > make: *** [docs] Error 2
 > Any hints?

 Do you have also the following error?

 ***
 ERROR: `jade' is missing on your system.
 ***

 Did you install necessary tools? You can see it from
 http://www.postgresql.org/docs/devel/static/docguide-toolsets.html
 
Ah, issuing:
 apt-get install docbook docbook-dsssl docbook-xsl libxml2-utils openjade1.3 
opensp xsltproc did the trick, thanks!

 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] How to build docs from master?

2016-04-11 Thread Andreas Joseph Krogh
Hi all.
 
I have installed PG (master) from git, but it seems the docs are not built 
issuing "make install".
I've tried "make docs" but it fails with:
 
  interface Makefile:127: recipe for target 'HTML.index' failed
 make[3]: *** [HTML.index] Error 1
 make[3]: Leaving directory '/home/andreak/dev/postgresql/doc/src/sgml'
 Makefile:8: recipe for target 'all' failed
 make[2]: *** [all] Error 2
 make[2]: Leaving directory '/home/andreak/dev/postgresql/doc/src'
 Makefile:16: recipe for target 'all' failed
 make[1]: *** [all] Error 2
 make[1]: Leaving directory '/home/andreak/dev/postgresql/doc'
 GNUmakefile:17: recipe for target 'docs' failed
 make: *** [docs] Error 2
 
 
Any hints?
 
Thanks.

 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Multi-row constraints, how to avoid unnecessary trigger execution?

2016-04-06 Thread Andreas Joseph Krogh
På onsdag 06. april 2016 kl. 19:00:05, skrev Tobia Conforto <
tobia.confo...@gruppo4.eu <mailto:tobia.confo...@gruppo4.eu>>:
Sándor,
 I'd rather have the application developers use regular DML, which could 
become quite complex, and just perform my check on the database side, at 
transaction commit time.

 Andreas,
 thanks, but I need to avoid duplicate executions on different rows too.

 I just came up with this "hack" which seems to be working:

 create or replace function my_trigger() returns trigger as $$
 begin
 create temporary table my_trigger() on commit drop;

 -- perform expensive test here and raise error if it fails
 if ... then
 raise ...;
 end if;

 return null;
 exception when duplicate_table then
 -- already ran in the current transaction, skip test
 return null;
 end;
 $$ language 'plpgsql';

 create constraint trigger my_trigger after insert or update or delete on 
my_table
 initially deferred for each row execute procedure my_trigger();

 Any improvement is welcome.
 
You are aware that your suggestion also will fire for each row?
 
Your function will also suffer from executing multiple times for the same row 
if it's updated multiple times within the same transaction, which my suggestion 
specifically avoids.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Multi-row constraints, how to avoid unnecessary trigger execution?

2016-04-06 Thread Andreas Joseph Krogh
På onsdag 06. april 2016 kl. 18:32:50, skrev Tobia Conforto <
tobia.confo...@gruppo4.eu <mailto:tobia.confo...@gruppo4.eu>>:
I have a complex data validation requirement that spans many rows and possibly 
more than one table.

 The application must be able to perform several data manipulation statements 
that could invalidate the requirement between one another, and only have the 
database check this requirement at transaction commit time.

 Ideally I would have some sort of after trigger that is deferred to commit 
time, have it search for invalid or missing records and raise errors if any is 
found.

 If I'm reading the manual correctly, the only kind of trigger that can be 
deferred to commit time is a constraint trigger. The problem is that this 
trigger must be declared for each row, but I need to only perform the 
validation once per transaction, because it executes complex queries.

 What is the best way to address this issue?

 Is there any other way, other than a constraint trigger, to defer execution 
of a piece of code at transaction commit time?

 Otherwise, can I check for repeated invocations of my trigger function in the 
same transaction and return early on the second and following ones? I could do 
that by creating a temporary table on commit drop, but it seems overkill. Is 
there a lighter solution?
 
Hi.
 
(note that my answer here only prevents executing the trigger-logic more than 
oncefor each row, so it will fire for each row affected at commit, just not 
multiple times for the same row it it's updated several times in the same 
transaction)
 
The trick is to use constraint-triggers, and to have a condition (column) to 
test for so that it does the actual work only once.
 
Triggers in PG is fired in alphabetical order so a good naming-scheme for such 
triggers is _1, _2, _3
 
It's the first trigger which does the actual work (in this case 
index_email_1_tf()).
 
I use a special column, t_updated, for checking. This column has no other 
purpose than to help the triggers. Note that trigger _2 is NOT DEFERRED, this 
is important.
 
Here is what I use:
 
-- Trigger function to index email CREATE OR REPLACE FUNCTION index_email_1_tf
()returns TRIGGER AS $$ declare  v_email message; BEGIN  SELECT * FROM message 
WHERE entity_id = NEW.entity_id INTO v_email; perform index_email(v_email);  
RETURN NEW;END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION index_email_2_tf
()returns TRIGGER AS $$ BEGIN  update message set t_updated = TRUE WHERE 
entity_id = NEW.entity_id; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE OR 
REPLACE FUNCTIONindex_email_3_tf() returns TRIGGER AS $$ BEGIN  update message 
set t_updated = NULL WHERE entity_id = NEW.entity_id; RETURN NULL; END; $$ 
LANGUAGEplpgsql; CREATE CONSTRAINT TRIGGER index_email_1_t AFTER INSERT OR 
UPDATE OFre_index ON message DEFERRABLE INITIALLY DEFERRED FOR EACH ROW WHEN 
(NEW.t_updatedIS NULL) EXECUTE PROCEDURE index_email_1_tf(); CREATE CONSTRAINT 
TRIGGERindex_email_2_t AFTER INSERT OR UPDATE OF re_index ON message -- NOT 
DEFERREDFOR EACH ROW WHEN (NEW.t_updated IS NULL) EXECUTE PROCEDURE 
index_email_2_tf(); CREATE CONSTRAINT TRIGGER index_email_3_t AFTER INSERT OR 
UPDATE OFt_updated ON message DEFERRABLE INITIALLY DEFERRED FOR EACH ROW WHEN 
(NEW.t_updated)EXECUTE PROCEDURE index_email_3_tf(); 
 
Hope this helps
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-03-08 Thread Andreas Joseph Krogh
På tirsdag 08. mars 2016 kl. 21:03:01, skrev David G. Johnston <
david.g.johns...@gmail.com <mailto:david.g.johns...@gmail.com>>:
On Tue, Mar 8, 2016 at 9:45 AM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote:
På tirsdag 08. mars 2016 kl. 17:38:04, skrev Joshua D. Drake <
j...@commandprompt.com <mailto:j...@commandprompt.com>>:
On 03/08/2016 08:02 AM, Andreas Joseph Krogh wrote:
 > På tirsdag 08. mars 2016 kl. 16:57:01, skrev Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>
 > <mailto:t...@sss.pgh.pa.us <mailto:t...@sss.pgh.pa.us>>>:
 >
 >     Andreas Joseph Krogh <andr...@visena.com <mailto:andr...@visena.com>> 
writes:
 >      > What I'm looking for is "inverse -b" in an otherwise complete
 >     dump. Any plans
 >      > to add that?
 >
 >     [ shrug... ]  Nobody ever asked for it before.
 >
 >     regards, tom lane
 >
 > It surely helps testing production-datasets which contain lots of BLOBs
 > where one wants to dump the production-data into a test-env. We have
 >  >1TB databases containing > 95% blobs so it would help us tremendously
 > to have this option.

 I have quite a few customers that would benefit from the ability to not
 have blobs present in dumps.
 
Great! So how do we proceed to get "--no-blobs" added to pg_dump?
Maybe CommandPrompt and Visena should co-fund development of such an addition, 
if it's accepted by -hackers?
We'd be willing to pay for such an addition for the 9.5 branch, as a patch.
 
​Unfortunately this doesn't qualify as a bug fix - it is a new feature and 
thus is ineligible for inclusion in official 9.5

 

David J.




 
Of course. That's why I mentioned that, if possible, an unofficial patch to 
9.5 could be developed, funded partly by Visena (my company). Given that 
someone is willing to do this of course.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-03-08 Thread Andreas Joseph Krogh
På tirsdag 08. mars 2016 kl. 17:38:04, skrev Joshua D. Drake <
j...@commandprompt.com <mailto:j...@commandprompt.com>>:
On 03/08/2016 08:02 AM, Andreas Joseph Krogh wrote:
 > På tirsdag 08. mars 2016 kl. 16:57:01, skrev Tom Lane <t...@sss.pgh.pa.us
 > <mailto:t...@sss.pgh.pa.us>>:
 >
 >     Andreas Joseph Krogh <andr...@visena.com> writes:
 >      > What I'm looking for is "inverse -b" in an otherwise complete
 >     dump. Any plans
 >      > to add that?
 >
 >     [ shrug... ]  Nobody ever asked for it before.
 >
 >     regards, tom lane
 >
 > It surely helps testing production-datasets which contain lots of BLOBs
 > where one wants to dump the production-data into a test-env. We have
 >  >1TB databases containing > 95% blobs so it would help us tremendously
 > to have this option.

 I have quite a few customers that would benefit from the ability to not
 have blobs present in dumps.
 
Great! So how do we proceed to get "--no-blobs" added to pg_dump?
Maybe CommandPrompt and Visena should co-fund development of such an addition, 
if it's accepted by -hackers?
We'd be willing to pay for such an addition for the 9.5 branch, as a patch.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-03-08 Thread Andreas Joseph Krogh
På tirsdag 08. mars 2016 kl. 17:10:43, skrev Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>>:
Andreas Joseph Krogh <andr...@visena.com> writes:
 > P�� tirsdag 08. mars 2016 kl. 16:54:19, skrev Adrian Klaver <
 > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>:
 >  Off hand I would say you are running pg_dump as a user that is not a
 >  superuser:

 > Yes, since when should I not be able to dump a DB (owned by a non-superuser)
 > as that user?

 The problem is that -t '*' is being interpreted as matching system
 catalogs.  You might be able to get somewhere with

 pg_dump -t '*' -N pg_catalog ...

 Probably we should fix pg_dump so it doesn't try to dump system catalogs
 as tables, even if the switches seem to ask it to.

 regards, tom lane
 
That didn't work either:
pg_dump -t '*' -N pg_catalog
 pg_dump: [archiver (db)] query failed: ERROR:  permission denied for relation 
pg_authid
 pg_dump: [archiver (db)] query was: LOCK TABLE pg_catalog.pg_authid IN ACCESS 
SHARE MODE

 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-03-08 Thread Andreas Joseph Krogh
På tirsdag 08. mars 2016 kl. 16:57:01, skrev Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>>:
Andreas Joseph Krogh <andr...@visena.com> writes:
 > What I'm looking for is "inverse -b" in an otherwise complete dump. Any 
plans
 > to add that?

 [ shrug... ]  Nobody ever asked for it before.

 regards, tom lane
 
It surely helps testing production-datasets which contain lots of BLOBs where 
one wants to dump the production-data into a test-env. We have >1TB databases 
containing > 95% blobs so it would help us tremendously to have this option.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-03-08 Thread Andreas Joseph Krogh
På tirsdag 08. mars 2016 kl. 16:54:19, skrev Adrian Klaver <
adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>:
On 03/08/2016 07:46 AM, Andreas Joseph Krogh wrote:
 > På tirsdag 08. mars 2016 kl. 16:30:36, skrev Tom Lane <t...@sss.pgh.pa.us
 > <mailto:t...@sss.pgh.pa.us>>:
 >
 >     Andreas Joseph Krogh <andr...@visena.com> writes:
 >      > P�� tirsdag 08. mars 2016 kl. 15:43:37, skrev Adrian Klaver <
 >      > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>:
 >      >  Do you care about not dumping the pg_largeobject table or not
 >     dumping
 >      >  the data it contains?
 >      >
 >      > I have several tables with OID-columns and I'd like to dump my DB
 >     without any
 >      > data in pg_largeobject (> 95% of the space is occupied by data in
 >      > pg_largeobject).
 >      > I've tried to exclude (using -T) the tables containing
 >     OID-columns but
 >      > pg_largeobject is still dumped containing the data it seems.
 >
 >     A look at the pg_dump source code says that it skips blobs if any of
 >     -s, -n, -t are used.  There's a -b switch to undo that and include
 >     them anyway, but no "inverse -b" to skip them in an otherwise-complete
 >     dump.
 >
 >     So you could do something along the lines of pg_dump -t '*' ...
 >     although this will result in *all* non-schema-named objects being
 >     excluded, I believe, which might be a problem.
 >
 >     regards, tom lane
 >
 > Hm:
 > pg_dump -v -t '*' > andreak-noblob.dmp
 > pg_dump: reading extensions
 > pg_dump: identifying extension members
 > pg_dump: reading schemas
 > pg_dump: reading user-defined tables
 > pg_dump: [archiver (db)] query failed: ERROR:  permission denied for
 > relation pg_authid


 Off hand I would say you are running pg_dump as a user that is not a
 superuser:
 [snip]
 
Yes, since when should I not be able to dump a DB (owned by a non-superuser) 
as that user?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-03-08 Thread Andreas Joseph Krogh
På tirsdag 08. mars 2016 kl. 16:30:36, skrev Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>>:
Andreas Joseph Krogh <andr...@visena.com> writes:
 > P�� tirsdag 08. mars 2016 kl. 15:43:37, skrev Adrian Klaver <
 > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>:
 >  Do you care about not dumping the pg_largeobject table or not dumping
 >  the data it contains?
 >
 > I have several tables with OID-columns and I'd like to dump my DB without 
any
 > data in pg_largeobject (> 95% of the space is occupied by data in
 > pg_largeobject).
 > I've tried to exclude (using -T) the tables containing OID-columns but
 > pg_largeobject is still dumped containing the data it seems.

 A look at the pg_dump source code says that it skips blobs if any of
 -s, -n, -t are used.  There's a -b switch to undo that and include
 them anyway, but no "inverse -b" to skip them in an otherwise-complete
 dump.

 So you could do something along the lines of pg_dump -t '*' ...
 although this will result in *all* non-schema-named objects being
 excluded, I believe, which might be a problem.

 regards, tom lane
 
Hm:
 
pg_dump -v -t '*' > andreak-noblob.dmp
 pg_dump: reading extensions
 pg_dump: identifying extension members
 pg_dump: reading schemas
 pg_dump: reading user-defined tables
 pg_dump: [archiver (db)] query failed: ERROR:  permission denied for relation 
pg_authid
 pg_dump: [archiver (db)] query was: LOCK TABLE pg_catalog.pg_authid IN ACCESS 
SHARE MODE
 
 
What I'm looking for is "inverse -b" in an otherwise complete dump. Any plans 
to add that?
 

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Exclude pg_largeobject form pg_dump

2016-03-08 Thread Andreas Joseph Krogh
På tirsdag 08. mars 2016 kl. 15:43:37, skrev Adrian Klaver <
adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>:
On 03/08/2016 01:53 AM, Andreas Joseph Krogh wrote:
 > Hi all.
 > Is there a way to exclude pg_largeobject from pg_dump? -T pg_largeobject
 > doesn't work. I've tried to exclude tables using OID-datatype also but
 > that didn't work either.

 Well pg_largeobject is a system catalog so pretty sure it cannot be
 excluded.

 What tables are you trying to exclude with OID?

 Do you care about not dumping the pg_largeobject table or not dumping
 the data it contains?
 
I have several tables with OID-columns and I'd like to dump my DB without any 
data in pg_largeobject (> 95% of the space is occupied by data in 
pg_largeobject).
I've tried to exclude (using -T) the tables containing OID-columns but 
pg_largeobject is still dumped containing the data it seems.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] Exclude pg_largeobject form pg_dump

2016-03-08 Thread Andreas Joseph Krogh
Hi all.
 
Is there a way to exclude pg_largeobject from pg_dump? -T pg_largeobject 
doesn't work. I've tried to exclude tables using OID-datatype also but that 
didn't work either.
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




Re: [GENERAL] Replacement for Oracle Text

2016-02-19 Thread Andreas Joseph Krogh
På fredag 19. februar 2016 kl. 12:41:49, skrev Daniel Westermann <
daniel.westerm...@dbi-services.com <mailto:daniel.westerm...@dbi-services.com>>:
>>Daniel Westermann schrieb am 19.02.2016 um 11:53:
>>> if I'd need to implement/replace Oracle Text 
(ww.oracle.com/technetwork/testcontent/index-098492.html).
 >>>> What choices do I have in PostgreSQL (9.5+) ?

 >Postgres also has a full text search (which I find much easier to use than 
Oracle's):
 >
 >http://www.postgresql.org/docs/current/static/textsearch.html

 Yes, i have seen this. Can this be used to index and search binary documents, 
e.g. pdf ?

 
What we do is extract plain-text from PFD/Word etc. clientside in the 
application, and then index that in the database.
Works very well.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Fatal error when not numeric value - PostgreSQL 9.2

2016-02-03 Thread Andreas Joseph Krogh
44
  3915135 │ \x1A │ \x1A  │ 
fe88ff8f-6b4d-4e3d-8020-3475a101d25e │ \x1A   │ \x1A  │ \x1A    │ 
fe88ff8f-6b4d-4e3d-8020-3475a101d25e
 (2 rows)
 

 
 
Note that you have a comma after the last paren:
 
 'fe88ff8f-6b4d-4e3d-8020-3475a101d25e'), 
 
I had to remove that to not get this error-message:
 
[42601] ERROR: syntax error at or near ")"
   Position: 428
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Moving a large DB (> 500GB) to another DB with different locale

2016-01-15 Thread Andreas Joseph Krogh
På torsdag 14. januar 2016 kl. 00:34:51, skrev Jim Nasby <
jim.na...@bluetreble.com <mailto:jim.na...@bluetreble.com>>:
On 1/13/16 2:39 PM, Andreas Joseph Krogh wrote:
 > Where can I find more info about how to use and configure pg_logical to
 > replicate a 9.4 DB to 9.5?

 http://2ndquadrant.com/en/resources/pglogical/
 
Thanks, I found detailed instructions in 
/usr/share/doc/postgresql-9.5-pglogical/README.md.gz
Any chance of putting in online?
 
I see that wal_level = 'logical', and that is a problem for us as we already 
use wal_level = 'hot_standby' on this installation as it replicates to another 
server.
 
Is it possible to use pglogical together with hot_standby 
streaming-replication?
 
Thank.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Moving a large DB (> 500GB) to another DB with different locale

2016-01-15 Thread Andreas Joseph Krogh
På fredag 15. januar 2016 kl. 14:33:24, skrev Shulgin, Oleksandr <
oleksandr.shul...@zalando.de <mailto:oleksandr.shul...@zalando.de>>:
On Fri, Jan 15, 2016 at 1:02 PM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: På torsdag 14. januar 2016 kl. 00:34:51, 
skrev Jim Nasby <jim.na...@bluetreble.com <mailto:jim.na...@bluetreble.com>>:
On 1/13/16 2:39 PM, Andreas Joseph Krogh wrote:
 > Where can I find more info about how to use and configure pg_logical to
 > replicate a 9.4 DB to 9.5?

http://2ndquadrant.com/en/resources/pglogical/ 
<http://2ndquadrant.com/en/resources/pglogical/>
 
Thanks, I found detailed instructions in 
/usr/share/doc/postgresql-9.5-pglogical/README.md.gz
Any chance of putting in online?
 
I see that wal_level = 'logical', and that is a problem for us as we already 
use wal_level = 'hot_standby' on this installation as it replicates to another 
server.
 
Is it possible to use pglogical together with hot_standby 
streaming-replication?
 
Well, the wal_level change is just a matter of database restart: you got to do 
that once in a while anyway, e.g. for minor version updates.  I would expect 
you only need this wal_level on the walsender side, thus for pglogical_output, 
the logical decoding plugin.



 
My point is that we cannot not have streaming-replication, so we need to keep 
wal_level = 'hot_standby' AFAIU. Is there a way to do both streaming-replication
and pglogical for just replicating one of may databases in the same cluster?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Moving a large DB (> 500GB) to another DB with different locale

2016-01-15 Thread Andreas Joseph Krogh
På fredag 15. januar 2016 kl. 16:04:00, skrev Shulgin, Oleksandr <
oleksandr.shul...@zalando.de <mailto:oleksandr.shul...@zalando.de>>:
On Fri, Jan 15, 2016 at 3:41 PM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: På fredag 15. januar 2016 kl. 14:33:24, 
skrev Shulgin, Oleksandr <oleksandr.shul...@zalando.de 
<mailto:oleksandr.shul...@zalando.de>>:
On Fri, Jan 15, 2016 at 1:02 PM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote:  
I see that wal_level = 'logical', and that is a problem for us as we already 
use wal_level = 'hot_standby' on this installation as it replicates to another 
server.
 
Is it possible to use pglogical together with hot_standby 
streaming-replication?
 
Well, the wal_level change is just a matter of database restart: you got to do 
that once in a while anyway, e.g. for minor version updates.  I would expect 
you only need this wal_level on the walsender side, thus for pglogical_output, 
the logical decoding plugin.



 
My point is that we cannot not have streaming-replication, so we need to keep 
wal_level = 'hot_standby' AFAIU. Is there a way to do both streaming-replication
and pglogical for just replicating one of may databases in the same cluster?
 
But logical is "greater than" hot_standby, so you can still have streaming 
replication with wal_level = logical.



 
This answers my initial question, thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Moving a large DB (> 500GB) to another DB with different locale

2016-01-13 Thread Andreas Joseph Krogh
På onsdag 13. januar 2016 kl. 03:58:27, skrev John McKown <
john.archie.mck...@gmail.com <mailto:john.archie.mck...@gmail.com>>:
On Tue, Jan 12, 2016 at 4:09 PM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote:
På tirsdag 12. januar 2016 kl. 22:56:56, skrev John R Pierce <
pie...@hogranch.com <mailto:pie...@hogranch.com>>:
On 1/12/2016 1:20 PM, Andreas Joseph Krogh wrote:
 > I'm moving a > 500GB DB to another server which is initdb'ed with a
 > different locale (nb_NO.UTF-8 to get correct collation when ORDER BY).
 > Is there another option than pg_dump/restore for doing this?

 nope, because the text data has to be converted to the new encoding,
 indexes rebuilt since the collation order is different, etc.
 
I was pretty sure there wasn't, but had to ask before imposing downtime on our 
customers.
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 

 
​I install, maintain, and use PostgreSQL on Fedora Linux. But only as a type 
of "power" user. I'm not a DBA​. Nor do I have good knowledge of PostgreSQL 
internals. What I gather you are considering is something like:
 
server1: pg_dump one or more data bases to a file on server1 (or on an NFS / 
CIFS shared NAS box of some sort)
server1: transfer that file to server2 if necessary (i.e. if the file dumped 
on server1 is not readable directly on server2.
server2: pg_restore the data from the file.
 
What I am wondering is this: Is there some way to make the PostgreSQL instance 
on server2 be accessable, say via FDW, to PostgreSQL on server1? Or may vice 
versa. 
ref: http://www.postgresql.org/docs/current/static/postgres-fdw.html 
<http://www.postgresql.org/docs/current/static/postgres-fdw.html>
if so, would it be possible to do something like:
 
on server1:
 
CREATE EXTENSION posgres_fdw;
CREATE SERVER server2
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS(host 'a.b.c.d', port '5432', db_name 'bubba')
;
CREATE USER MAPPING FOR 
    SERVER server2
    OPTIONS(user 'remote_user', password 'password')
;
CREATE FOREIGN TABLE server2_sometable (
   -- duplicate definition of "sometable" on server2
) SERVER server2 OPTIONS(schema_name 'PUBLIC', table_name 'sometable')
;
INSERT INTO server2_sometable (COL1, COL2, COL3, ...) SELECT COL1, COL2, COL3, 
... FROM sometable;
 
This is most likely a stupid thought. And I don't know if it would address the 
locale and collation issue or not. If nothing else, someone will explain 
(hopefully kindly) why this is a bad idea. Such as performance or some such 
thing.
 
An auxiliary thought, if the communications speed between server1 & server2 is 
"poor", would be to have two copies of PostgreSQL running on server1. The 
second Postgres would write to a filesystem on an SSD connected to Server1, 
either via SATA, eSATA, or maybe USB. I would hope that this would be faster 
than using a 1Gig (or ever 10 Gig) IP connection. Once the copy is complete, 
stop the second Postgres instance, unmount the filesystem, move the SSD to 
"server2" hardware platform.
 
Just some wacky thoughts. They are right at home in my head.


 
Note that I'm searching for a solution for moving a single (out of many) 
database in a cluster, not just some tables.
This database contains about 400 tables, triggers, constraints (also 
exclusion-constraints) PL/pgSQL functions etc.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Moving a large DB (> 500GB) to another DB with different locale

2016-01-13 Thread Andreas Joseph Krogh
På onsdag 13. januar 2016 kl. 04:17:03, skrev Vick Khera <vi...@khera.org 
<mailto:vi...@khera.org>>:
  On Tue, Jan 12, 2016 at 4:20 PM, Andreas Joseph Krogh <andr...@visena.com 
<mailto:andr...@visena.com>> wrote: I'm moving a > 500GB DB to another server 
which is initdb'ed with a different locale (nb_NO.UTF-8 to get correct 
collation when ORDER BY).
Is there another option than pg_dump/restore for doing this?
 


 I recently used slony to move data stored in an SQL_ASCII db to one that was 
strict UTF-8. I wrote up a program that would fix any invalid byte sequences in 
the original db first. It went very well, and we had very minimal downtime.
 
It seems to me you could use slony to accomplish your goal as well of just 
copying the data into a new DB with the different collation setting. Once done, 
just stop your application for a minute or two to switch the DB it uses, and 
drop the slony replication. This process will work if you can assure that the 
dump/restore would have worked too. That is, there are no invalid data.

 
Seems like pglogical migth be better? 
http://2ndquadrant.com/en/resources/pglogical/
It's available for 9.4 also.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: [GENERAL] Moving a large DB (> 500GB) to another DB with different locale

2016-01-13 Thread Andreas Joseph Krogh
På onsdag 13. januar 2016 kl. 19:18:20, skrev Jim Nasby <
jim.na...@bluetreble.com <mailto:jim.na...@bluetreble.com>>:
On 1/13/16 2:03 AM, Andreas Joseph Krogh wrote:
 > Seems like pglogical migth be better?
 > http://2ndquadrant.com/en/resources/pglogical/
 > It's available for 9.4 also.

 It would certainly be faster. It's also less tested than Slony is
 though, as it's fairly new. If it was me, I'd use pg_logical.
 --
 Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
 Experts in Analytics, Data Architecture and PostgreSQL
 Data in Trouble? Get it in Treble! http://BlueTreble.com
 
Where can I find more info about how to use and configure pg_logical to 
replicate a 9.4 DB to 9.5?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


[GENERAL] Moving a large DB (> 500GB) to another DB with different locale

2016-01-12 Thread Andreas Joseph Krogh
Hi all.
 
I'm moving a > 500GB DB to another server which is initdb'ed with a different 
locale (nb_NO.UTF-8 to get correct collation when ORDER BY).
Is there another option than pg_dump/restore for doing this?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>




  1   2   >