Re: [GENERAL] idle in transaction, why

2017-11-07 Thread Rob Sargent

> On Nov 7, 2017, at 12:16 AM, Thomas Kellerer <spam_ea...@gmx.net> wrote:
> 
> Rob Sargent schrieb am 06.11.2017 um 23:09:
>> Gosh I wish I could learn to proof-read my posts.
>> My support crew graciously set
>> 
>> idle_transaction_timeout = 1
>> 
>> Now to ponder if I need zero or some large number.
> 
> The unit of that setting is milliseconds (if no unit is specified). 
> zero disables that feature.
> 
> One millisecond seems like an awfully short period to allow a transaction to 
> be idle. 
> 
> I would figure values in "minutes" to be more realistic depending on the 
> workload and characteristics of the application. 
> 
> A transaction that has several seconds of "think time" between individual 
> statements doesn't seem that unrealistic. 
> 
> 
> Thomas
> 
I see I didn’t clarify that the timeout was set in the pgbouncer configuration. 
(I had shown upstream that it was NOT set for the  postgres server.) In 
pgbouncer, the unit is seconds.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] idle in transaction, why

2017-11-07 Thread Rob Sargent



On 11/07/2017 09:09 AM, Scott Marlowe wrote:

On Tue, Nov 7, 2017 at 7:44 AM, Rob Sargent <robjsarg...@gmail.com> wrote:



On Nov 7, 2017, at 12:16 AM, Thomas Kellerer <spam_ea...@gmx.net> wrote:

I would figure values in "minutes" to be more realistic depending on the 
workload and characteristics of the application.

A transaction that has several seconds of "think time" between individual 
statements doesn't seem that unrealistic.


I started with the default zero and the save went through perfectly. It takes 
ten minutes so I’ll have a concurrency issue I imagine.

10 minutes is long-ish but if it's not run all the time etc it may be
what you're stuck with. Idle in transaction doesn't necessarily mean
concurrency issues, as long as you're not operating on a whole table
other people need to update as well. I guess now's a good time to
profile what your code is doing, what's take the longest, and see if
maybe you can split that big transaction up into bite sized pieces.
Short story: The save operation in question is insert only. Done in a 
single transaction (currently).


The last run generated 1.7M new 'segments', each of those grouped into 
one of 131K segmentsets (so another 1.7M members), those 131K sets 
recorded as 'outputs'. In production we'll start 22 jobs, one per 
chromosome (no XY) for each pedigree.  We spread those across machines, 
starting roughly at the same time.  Analysis time is measured in days 
and is pretty much linear with chromosome size (and inversely to 
processor count) but pretty sure at some point two of those send their 
payload within minutes of each other.  You know they will.


Do two sets of writes to same table interfere with each other horribly?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] idle in transaction, why

2017-11-07 Thread Rob Sargent


> On Nov 7, 2017, at 12:16 AM, Thomas Kellerer <spam_ea...@gmx.net> wrote:
> 
> Rob Sargent schrieb am 06.11.2017 um 23:09:
>> Gosh I wish I could learn to proof-read my posts.
>> My support crew graciously set
>> 
>> idle_transaction_timeout = 1
>> 
>> Now to ponder if I need zero or some large number.
> 
> The unit of that setting is milliseconds (if no unit is specified). 
> zero disables that feature.
> 
> One millisecond seems like an awfully short period to allow a transaction to 
> be idle. 
> 
> I would figure values in "minutes" to be more realistic depending on the 
> workload and characteristics of the application. 
> 
> A transaction that has several seconds of "think time" between individual 
> statements doesn't seem that unrealistic. 
> 
> 
> Thomas
> 
> 
> 
I started with the default zero and the save went through perfectly. It takes 
ten minutes so I’ll have a concurrency issue I imagine. 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] idle in transaction, why

2017-11-06 Thread Rob Sargent



On 11/06/2017 02:38 PM, Merlin Moncure wrote:

On Mon, Nov 6, 2017 at 2:50 PM, Rob Sargent <robjsarg...@gmail.com> wrote:


On 11/06/2017 01:41 PM, Tom Lane wrote:

Rob Sargent <robjsarg...@gmail.com> writes:

idle_in_transaction_session_timeout | 0   | default |
|| A value of 0 turns off the timeout. | user

Meh.  I think we're barking up the wrong tree anyway: so far as I can
find, there is no error message reading 'idle transaction timeout'
in the existing PG sources (and I sure hope no committer would have
thought that such an ambiguous message text was satisfactory).
So I think your error is coming from client-side or third-party code.
What other moving parts have you got in there?

 regards, tom lane

The most likely culprit is JOOQ, which I chose as a learning experience
(normally I use ORM tools).  But that said, I just ran the same data into my
test env, (postgres 10.0 (real) on centos 6.9, ubuntu client) and all went
swimmingly.  It's a sizable payload (several batches of over 100K items,
deserialized from json) and takes 5 minutes to save.

I was hoping to blame the virt or the beta.  Not a good time to start doubt
JOOQ

I can't speak to JOOQ (who are fantastic postgres supporters BTW), but
in the java world this typically comes from one of two things:

1) you have long running in-transaction process that has very heavy
computation between sql statements.  this is a rare case
--or--
2) you are connecting pooling and the app sent a connection back into
the pool without having a transaction committed.

"2" is a common and dangerous bug.  It can happen due to bug in
application code (most likely), the jdbc wrapping library code (less
likely), or the connection pooler itself if you're using one.  A
typical cause of application side problems is manual transaction
management and some uncaught exception paths where errors (say, a
duplicate key error).  So investigate causes like that first
(database errors in the database log might be a helpful clue) and go
from there.   If the problem is within JOOQ, you ought to take it up
with them, which I encourage you to do, since I consider JOOQ to be a
wonderful treatment of SQL integration from the java perspective.

merlin

Gosh I wish I could learn to proof-read my posts.
My support crew graciously set

   idle_transaction_timeout = 1

Now to ponder if I need zero or some large number.

Thanks again


Re: [GENERAL] idle in transaction, why

2017-11-06 Thread Rob Sargent



On 11/06/2017 02:38 PM, Merlin Moncure wrote:

On Mon, Nov 6, 2017 at 2:50 PM, Rob Sargent <robjsarg...@gmail.com> wrote:


On 11/06/2017 01:41 PM, Tom Lane wrote:

Rob Sargent <robjsarg...@gmail.com> writes:

idle_in_transaction_session_timeout | 0   | default |
|| A value of 0 turns off the timeout. | user

Meh.  I think we're barking up the wrong tree anyway: so far as I can
find, there is no error message reading 'idle transaction timeout'
in the existing PG sources (and I sure hope no committer would have
thought that such an ambiguous message text was satisfactory).
So I think your error is coming from client-side or third-party code.
What other moving parts have you got in there?

 regards, tom lane

The most likely culprit is JOOQ, which I chose as a learning experience
(normally I use ORM tools).  But that said, I just ran the same data into my
test env, (postgres 10.0 (real) on centos 6.9, ubuntu client) and all went
swimmingly.  It's a sizable payload (several batches of over 100K items,
deserialized from json) and takes 5 minutes to save.

I was hoping to blame the virt or the beta.  Not a good time to start doubt
JOOQ

I can't speak to JOOQ (who are fantastic postgres supporters BTW), but
in the java world this typically comes from one of two things:

1) you have long running in-transaction process that has very heavy
computation between sql statements.  this is a rare case
--or--
2) you are connecting pooling and the app sent a connection back into
the pool without having a transaction committed.

"2" is a common and dangerous bug.  It can happen due to bug in
application code (most likely), the jdbc wrapping library code (less
likely), or the connection pooler itself if you're using one.  A
typical cause of application side problems is manual transaction
management and some uncaught exception paths where errors (say, a
duplicate key error).  So investigate causes like that first
(database errors in the database log might be a helpful clue) and go
from there.   If the problem is within JOOQ, you ought to take it up
with them, which I encourage you to do, since I consider JOOQ to be a
wonderful treatment of SQL integration from the java perspective.

merlin


"2" definitely fits this bill.  The difference between test and prod is 
pgboucer which I've forgotten to mention at all in this thread.  I do 
start a tx in my code a la:


public void writedb(DSLContext ctx) {
logger.error("{}: start transaction at {}", getRunTag(),
   System.currentTimeMillis());
ctx.transaction(ltx -> {
startProcess(ctx);
writeSegments(ctx);
finishProcess(ctx);
});
logger.error("{}: end transaction at {}", getRunTag(),
   System.currentTimeMillis());
}

But I don't think this is out of the ordinary. However writing lists 
with up to 1,175,151 records might not be (2 this size, to at 131K). 
I'll take this up with JOOQ and pgbouncer.  (fasterxml is having trouble 
with this size too.  Not sure how I can break this up if need be done, 
they're all generate from the same analysis run.)


Thanks to all.






Re: [GENERAL] idle in transaction, why

2017-11-06 Thread Rob Sargent



On 11/06/2017 01:50 PM, Rob Sargent wrote:



On 11/06/2017 01:41 PM, Tom Lane wrote:

Rob Sargent <robjsarg...@gmail.com> writes:

   idle_in_transaction_session_timeout | 0   | default |
|| A value of 0 turns off the timeout. | user

Meh.  I think we're barking up the wrong tree anyway: so far as I can
find, there is no error message reading 'idle transaction timeout'
in the existing PG sources (and I sure hope no committer would have
thought that such an ambiguous message text was satisfactory).
So I think your error is coming from client-side or third-party code.
What other moving parts have you got in there?

regards, tom lane
The most likely culprit is JOOQ, which I chose as a learning 
experience (normally I use ORM tools).  But that said, I just ran the 
same data into my test env, (postgres 10.0 (real) on centos 6.9, 
ubuntu client) and all went swimmingly.  It's a sizable payload 
(several batches of over 100K items, deserialized from json) and takes 
5 minutes to save.


I was hoping to blame the virt or the beta.  Not a good time to start 
doubt JOOQ

My bet is that those 'org.postgres' messages came from the jdbc driver.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] idle in transaction, why

2017-11-06 Thread Rob Sargent



On 11/06/2017 01:41 PM, Tom Lane wrote:

Rob Sargent <robjsarg...@gmail.com> writes:

   idle_in_transaction_session_timeout | 0   | default |
|| A value of 0 turns off the timeout. | user

Meh.  I think we're barking up the wrong tree anyway: so far as I can
find, there is no error message reading 'idle transaction timeout'
in the existing PG sources (and I sure hope no committer would have
thought that such an ambiguous message text was satisfactory).
So I think your error is coming from client-side or third-party code.
What other moving parts have you got in there?

regards, tom lane
The most likely culprit is JOOQ, which I chose as a learning experience 
(normally I use ORM tools).  But that said, I just ran the same data 
into my test env, (postgres 10.0 (real) on centos 6.9, ubuntu client) 
and all went swimmingly.  It's a sizable payload (several batches of 
over 100K items, deserialized from json) and takes 5 minutes to save.


I was hoping to blame the virt or the beta.  Not a good time to start 
doubt JOOQ



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] idle in transaction, why

2017-11-06 Thread Rob Sargent


On 11/06/2017 01:17 PM, Tom Lane wrote:

"David G. Johnston"  writes:

You should probably login as your application user and do "show
idle_in_transaction_session_timeout" to see what a clean session has for a
value and then figure out from there where that value is coming from.

You don't have to guess about the latter: the pg_settings view will tell
you exactly where the active value came from.  See the source, sourcefile,
sourceline columns.

regards, tom lane
select name, setting, source, sourcefile, sourceline, extra_desc, 
context from pg_settings where name ~ 'idle';
name | setting | source  | sourcefile | 
sourceline |  extra_desc   | context

-+-+-+++---+-
 idle_in_transaction_session_timeout | 0   | default |
|| A value of 0 turns off the timeout. | user
 tcp_keepalives_idle | 7200| default |
|| A value of 0 uses the system default. | user

(2 rows)



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] idle in transaction, why

2017-11-06 Thread Rob Sargent


On 11/06/2017 01:09 PM, David G. Johnston wrote:
On Mon, Nov 6, 2017 at 12:32 PM, Rob Sargent <robjsarg...@gmail.com 
<mailto:robjsarg...@gmail.com>>wrote:


Using postgres 10-beta3 (hopefully 10.0 this week) on virtual
CentOS7 and this JDBC driver postgresql:42.1.4


The postgresql.conf file has

#idle_in_transaction_session_timeout = 0# in
milliseconds, 0 is disabled


​There are numerous places where default settings can be configured.

https://www.postgresql.org/docs/10/static/config-setting.html

You should probably login as your application user and do "show 
idle_in_transaction_session_timeout" to see what a clean session has 
for a value and then figure out from there where that value is coming 
from.


David J.


From logging in with the application role I get

   coon=> show idle_in_transaction_session_timeout;
 idle_in_transaction_session_timeout
   -
 0
   (1 row)



[GENERAL] idle in transaction, why

2017-11-06 Thread Rob Sargent
Using postgres 10-beta3 (hopefully 10.0 this week) on virtual CentOS7 
and this JDBC driver postgresql:42.1.4



The postgresql.conf file has

   #idle_in_transaction_session_timeout = 0# in milliseconds, 0
   is disabled

my db url has "?prepareThreshold=0" since I bump into "already defined" 
statements otherwise


but I'm getting

   org.jooq.exception.DataAccessException: SQL [select
   "projectfile"."id", "projectfile"."name", "projectfile"."filetype",
   "projectfile"."uri", "projectfile"."people_id" from "projectfile"
   where "projectfile"."uri" = ?]; ERROR: idle transaction timeout

   

   2017-11-06T11:32:20-07:00  -  -  - Caused by:
   org.postgresql.util.PSQLException: ERROR: idle transaction timeout
   2017-11-06T11:32:20-07:00  -  -  - at
   
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2477)
   2017-11-06T11:32:20-07:00  -  -  - at
   
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2190)
   2017-11-06T12:12:43-07:00  -  -  - at
   org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)
   2017-11-06T12:12:43-07:00  -  -  - at
   org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
   2017-11-06T12:12:43-07:00  -  -  - at
   org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
   2017-11-06T12:12:43-07:00  -  -  - at
   
org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169)
   2017-11-06T12:12:43-07:00  -  -  - at
   org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:158)

at various calls, sometimes the first one, sometime several calls later 
(some select, some insert), all in a transaction initiated in my code. 
From what I can piece together from the bind values all the queries are 
sound - at least the selects expected to find things work as expected.



Where should I be looking?




Re: [GENERAL] gin index trouble

2017-11-03 Thread Rob Sargent



On 10/30/2017 10:56 AM, Peter Geoghegan wrote:

On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsarg...@gmail.com> wrote:

Peter, you beat me to the punch.  I was just about to say "Having read the
referenced message I thought I would add that we never delete from this
table."  In this particular case it was written to record by record, in a
previous execution and at the time of the error it was only being read.  (In
case you've been following, the failed execution would have added ~1M
"segments", each which references an entry in the gin'd table "probandsets"
- but like a rookie I'm looking up each probandset(2^16) individually.
Re-working that NOW.)

It's not surprising that only a SELECT statement could see this
problem. I guess that it's possible that only page deletions used for
the pending list are involved here.

I'm not sure how reliably you can recreate the problem, but if it
doesn't take too long then it would be worth seeing what effect
turning off the FASTUPDATE storage parameter for the GIN index has.
That could prevent the problem from recurring, and would support my
theory about what's up here. (It wouldn't fix the corruption, though.)

Of course, what I'd much prefer is a self-contained test case. But if
you can't manage that, or if reproducing the issue takes hours, then
this simpler experiment might be worthwhile.


My test database machine is:

   Not virtual
   Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz (quad core, hyperthreaded)
   MemTotal:   16272548 kB
   default postgres.conf from yum install postgresql-10*

I've loaded thrice the number of records (190K) into the problem table, 
but no sign yet of the problem. But unlike the production 
lookup-notfind-insert (anti)pattern, these were all loaded in a single 
transaction. I think the following query has to read the gin'd column of 
every record:


   select array_length(probands,1) as heads,
   count(*) as occurs
   from tld.probandset
   where probands @>
   '{65fe3b60-1c86-4b14-a85d-21abdf68f9e2,f0963403-3f3c-426d-a828-b5bfff914bb4}'
   group by array_length(probands,1)
   order by array_length(probands,1);

 heads | occurs
   ---+
 2 |  1
 3 | 14
 4 | 91
 5 |364
 6 |   1001
 7 |   2002
 8 |   3003
 9 |   3432
10 |   3003
11 |   2002
12 |   1001
13 |364
14 | 91
15 | 14
16 |  1
   (15 rows)

   Time: 17.125 ms

Happy as a clam.

I'll try a run of the antipattern.  I have NOT diddled FASTUPDATE at all.





Re: [GENERAL] query not scaling

2017-10-31 Thread Rob Sargent

On 10/31/2017 03:12 AM, Laurenz Albe wrote:

Rob Sargent wrote:

I think your biggest problem is the join condition
 on m.basepos between s.startbase and s.endbase

That forces a nested loop join, which cannot be performed efficiently.

Agree! 800,000 * 4,000 = 3,200,000,000.  It's just that I thought I had
corralled that problem which indexing but apparently not.  I was hoping
some kind soul might point out a missing index or similar correction. I
have completely reworked the process, but not sure yet if it's correct.
(The slow answer is correct, once it comes in.)

You can create indexes that are useful for this query:

ON sui.segment(chrom, markerset_id)
ON sui.probandset(people_id)

But that probably won't make a big difference, because the sequential
scans take only a small fraction of your query time.

A little less than half of the query time is spent in the nested loop
join, and a little more than half of the time is spent doing the
GROUP BY.

Perhaps the biggest improvement you can easily make would be to
get rid of "numeric" for the computation.  I suspect that this is
where a lot of time is spent, since the hash aggregate is over
less than 15000 rows.

Unless you really need the precision of "numeric", try

CREATE OR REPLACE FUNCTION pv(l bigint, e bigint, g bigint, o int)
RETURNS double precision LANGUAGE sql AS
$$SELECT (g+e+o)::double precision / (l+e+g+o)::double precision$$;

Yours,
Laurenz Albe
In practice markersets are always aligned with one chromosome so I would 
not expect this to have an effect.  There's no constraint on this 
however, and there can be more than one markerset per chromosome.  I 
have played with indexing on segment.markerset_id.


In all the data sets used in the examples (runtimes, explains etc) there 
has been a in single people_id across the existing segment data.  Down 
the road this of course will not be the case and I can see the value of 
an index on probandset.people_id eventually. I can certainly add it now 
for a test.  I'm currently writing a probandset loader hoping to get a 
test case for the problem with gin indexing mentioned up-thread.


I think I'm most surprise at the notion that the arithmetic is the 
problem and will happily test your suggestion to force floating point 
values.  The value can get small (10^-12 on a good day!) but we don't 
need many digits of precision.


Thanks



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_audit to mask literal sql

2017-10-30 Thread Rob Sargent



On 10/30/2017 03:35 PM, John R Pierce wrote:

On 10/30/2017 10:55 AM, rakeshkumar464 wrote:

Is there a way in pgaudit to mask literal sqls like the below:

insert into table (col1,col2) values(1,2)
select * from table where col1 = 1

These sqls are typed by our QA folks using pgadmin. pgaudit records this
verbatim which runs afoul of our HIPAA requirement.  Prepared 
statements are

not an issue since pgaudit provides a way to suppress values.


if you have a HIPAA requirement that says 'dont run manual sql 
statements', then, well, DONT.


why are QA folks making changes on production databases, anyways?   
thats not within their domain.   QA should be working on development 
or staging databases.




I suspect the QA types are testing against production and using/seeing 
real names, etc with queries which create /transitory/ tables.  I wonder 
if the QA folks have been HIPAA certified?  Probable better to get them 
redacted data for testing.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] gin index trouble

2017-10-30 Thread Rob Sargent



On 10/30/2017 10:56 AM, Peter Geoghegan wrote:

On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsarg...@gmail.com> wrote:

Peter, you beat me to the punch.  I was just about to say "Having read the
referenced message I thought I would add that we never delete from this
table."  In this particular case it was written to record by record, in a
previous execution and at the time of the error it was only being read.  (In
case you've been following, the failed execution would have added ~1M
"segments", each which references an entry in the gin'd table "probandsets"
- but like a rookie I'm looking up each probandset(2^16) individually.
Re-working that NOW.)

It's not surprising that only a SELECT statement could see this
problem. I guess that it's possible that only page deletions used for
the pending list are involved here.

I'm not sure how reliably you can recreate the problem, but if it
doesn't take too long then it would be worth seeing what effect
turning off the FASTUPDATE storage parameter for the GIN index has.
That could prevent the problem from recurring, and would support my
theory about what's up here. (It wouldn't fix the corruption, though.)

Of course, what I'd much prefer is a self-contained test case. But if
you can't manage that, or if reproducing the issue takes hours, then
this simpler experiment might be worthwhile.

I can reload the gin'd table repeatedly in a dev environment. Does 
select * from  order by <gin'd column> expose the corruption or 
does the load itself necessarily fail at the moment of corruption?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] gin index trouble

2017-10-30 Thread Rob Sargent



On 10/30/2017 10:32 AM, Peter Geoghegan wrote:

On Fri, Oct 27, 2017 at 3:15 PM, Rob Sargent <rsarg...@xmission.com> wrote:

I’ve hit this same message

Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN
page is of different type

in a couple of contexts and I’m starting to get worried.

I’ve rebuilt the index, but will that help?
Is there a way to see what the ‘different type’ is?
Is it caught/clean-up by vacuum analyse or some such?

Is there a lot of churn on this table? Do you either heavily update or
heavily delete rows in the table? Does vacuum tend to run on the table
rather frequently?

Peter, you beat me to the punch.  I was just about to say "Having read 
the referenced message I thought I would add that we never delete from 
this table."  In this particular case it was written to record by 
record, in a previous execution and at the time of the error it was only 
being read.  (In case you've been following, the failed execution would 
have added ~1M "segments", each which references an entry in the gin'd 
table "probandsets" - but like a rookie I'm looking up each 
probandset(2^16) individually. Re-working that NOW.)



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] gin index trouble

2017-10-30 Thread Rob Sargent



If you can make a test case that (eventually) hits that, we'd be
interested to see it ...



Any hint(s) on what might trigger this sort of thing?  I could duplicate 
the upload, but I doubt you want the 800K records, 200M input file even 
if it did regenerate the problem.


Would select * from  order by  show the message?

and I um, er, enabled gin on uuid by copying from a thread in this list, as 
follows:
create operator class _uuid_ops
default for type _uuid
using gin as
operator 1 &&(anyarray, anyarray)
,operator 2 @>(anyarray, anyarray)
,operator 3 <@(anyarray, anyarray)
,operator 4 =(anyarray, anyarray)
,function 1 uuid_cmp(uuid, uuid)
,function 2 ginarrayextract(anyarray, internal, internal)
,function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, 
internal, internal, internal)
,function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, 
internal, internal, internal)
,storage uuid;

You should not have needed to do that, I think, as the standard
anyarray GIN opclass should've handled it.  Having said that,
I don't immediately see anything broken about this definition,
so it seems like it should've worked.


Good to hear.

Thanks.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] gin index trouble

2017-10-30 Thread Rob Sargent
I’ve hit this same message 
Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN page 
is of different type
in a couple of contexts and I’m starting to get worried. 
I’ve rebuilt the index, but will that help? 
Is there a way to see what the ‘different type’ is? 
Is it caught/clean-up by vacuum analyse or some such?

I’ve had good results using “<@" and “@>” and believe I've defended the use of 
an array, but I can’t loose three days worth of simulations to this dang wrong 
sibling.

select version(); — will use production release of 10 next week.
  version   


 PostgreSQL 10beta3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
20150623 (Red Hat 4.8.5-11), 64-bit


The only gin index I have is in this table definition:
\d sui.probandset
   Table "sui.probandset"
   Column|   Type   | Modifiers 
-+--+---
 id  | uuid | not null
 name| text | 
 probands| uuid[]   | not null
 meioses | integer  | 
 min_kincoef | double precision | 
 max_kincoef | double precision | 
 people_id   | uuid | not null
Indexes:
"probandset_pkey" PRIMARY KEY, btree (id)
"probandsetunique" gin (probands)
Check constraints:
"sortedset" CHECK (issorteduuids(probands))
Foreign-key constraints:
"probandset_people_id_fkey" FOREIGN KEY (people_id) REFERENCES 
base.people(id)
Referenced by:
TABLE "sui.probandset_group_member" CONSTRAINT 
"probandset_group_member_member_id_fkey" FOREIGN KEY (member_id) REFERENCES 
sui.probandset(id)
TABLE "sui.segment" CONSTRAINT "segment_probandset_id_fkey" FOREIGN KEY 
(probandset_id) REFERENCES sui.probandset(id)

and I um, er, enabled gin on uuid by copying from a thread in this list, as 
follows:
create operator class _uuid_ops
default for type _uuid
using gin as
operator 1 &&(anyarray, anyarray)
,operator 2 @>(anyarray, anyarray)
,operator 3 <@(anyarray, anyarray)
,operator 4 =(anyarray, anyarray)
,function 1 uuid_cmp(uuid, uuid)
,function 2 ginarrayextract(anyarray, internal, internal)
,function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, 
internal, internal, internal)
,function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, 
internal, internal, internal)
,storage uuid;



Re: [GENERAL] query not scaling

2017-10-26 Thread Rob Sargent


On 10/26/2017 09:01 AM, Tom Lane wrote:

Laurenz Albe  writes:

Also, to have PostgreSQL inline the function, which would be good
for performance, it should be declared IMMUTABLE.

Actually, if you hope to have a SQL function be inlined, it's better
not to decorate it at all --- not with IMMUTABLE, and not with STRICT
either.  Both of those restrict the parser's ability to inline unless
it can prove the contained expression is equally immutable/strict.
With the default attributes of volatile/not strict, there's nothing
to prove.

(In any case, it's usually easy enough to tell from EXPLAIN output
whether inlining has happened.)

regards, tom lane


In another instance of the same schema, in same database as original
slow execution I've loaded 823591 segments (though in this case all
of them are on one chromosome, one markerset)**and 65K proband sets
using same marker table as the slow(est) query.  In the fastest run,
there are only 46K segments for the given markerset.



QUERY PLAN
--
 HashAggregate  (cost=82122076.59..8215.35 rows=14876 width=48) 
(actual time=208203.091..208210.348 rows=14645 loops=1)
   Output: m.id, min(((1.0 * (((s.events_greater + s.events_equal) + 
0))::numeric) / s.events_less + s.events_equal) + s.events_greater) 
+ 0))::numeric))

   Group Key: m.id
   Buffers: shared hit=43209090
   ->  Nested Loop  (cost=3799.40..44686205.23 rows=1361304413 
width=40) (actual time=55.443..89684.451 rows=75577302 loops=1)

 Output: m.id, s.events_greater, s.events_equal, s.events_less
 Buffers: shared hit=43209090
 ->  Hash Join  (cost=3798.98..43611.56 rows=823591 width=32) 
(actual time=55.393..1397.509 rows=823591 loops=1)
   Output: s.events_greater, s.events_equal, s.events_less, 
s.startbase, s.endbase

   Inner Unique: true
   Hash Cond: (s.probandset_id = p.id)
   Buffers: shared hit=19222
   ->  Seq Scan on sui.segment s (cost=0.00..29414.86 
rows=823591 width=48) (actual time=0.017..669.915 rows=823591 loops=1)
 Output: s.id, s.chrom, s.markerset_id, 
s.probandset_id, s.startbase, s.endbase, s.firstmarker, s.lastmarker, 
s.events_less, s.events_equal, s.events_greater
 Filter: ((s.chrom = 22) AND (s.markerset_id = 
'edf95066-24d2-4ca1-bad6-aa850cc82fef'::uuid))

 Buffers: shared hit=17061
   ->  Hash  (cost=2979.99..2979.99 rows=65519 width=16) 
(actual time=55.272..55.272 rows=65519 loops=1)

 Output: p.id
 Buckets: 65536  Batches: 1  Memory Usage: 3584kB
 Buffers: shared hit=2161
 ->  Seq Scan on sui.probandset p 
(cost=0.00..2979.99 rows=65519 width=16) (actual time=0.007..33.582 
rows=65519 loops=1)

   Output: p.id
   Filter: (p.people_id = 
'9b2308b1-9659-4a2c-91ae-8f95cd0a90b3'::uuid)

   Buffers: shared hit=2161
 ->  Index Scan using marker_chrom_basepos_idx on base.marker 
m  (cost=0.42..37.67 rows=1653 width=20) (actual time=0.010..0.075 
rows=92 loops=823591)

   Output: m.id, m.name, m.chrom, m.basepos, m.alleles
   Index Cond: ((m.chrom = 22) AND (m.basepos >= 
s.startbase) AND (m.basepos <= s.endbase))

   Buffers: shared hit=43189868
 Planning time: 0.764 ms
 Execution time: 208214.816 ms
(30 rows)



Re: [GENERAL] query not scaling

2017-10-26 Thread Rob Sargent



On 10/26/2017 09:01 AM, Tom Lane wrote:

Laurenz Albe  writes:

Also, to have PostgreSQL inline the function, which would be good
for performance, it should be declared IMMUTABLE.

Actually, if you hope to have a SQL function be inlined, it's better
not to decorate it at all --- not with IMMUTABLE, and not with STRICT
either.  Both of those restrict the parser's ability to inline unless
it can prove the contained expression is equally immutable/strict.
With the default attributes of volatile/not strict, there's nothing
to prove.

(In any case, it's usually easy enough to tell from EXPLAIN output
whether inlining has happened.)

regards, tom lane

As to the explain analyze,

   could not receive data from server: Connection timed out
   Time: 7877340.565 ms

for the second time.  I had presumed at first that this had occurred 
during a network burp.  I'll try running it directly on the pg host.




Re: [GENERAL] query not scaling

2017-10-26 Thread Rob Sargent


> On Oct 26, 2017, at 1:02 AM, Laurenz Albe <laurenz.a...@cybertec.at> wrote:
> 
> Rob Sargent wrote:
>> I have a query I cannot tame and I'm wondering if there's an alternative
>> to  the "between"  clause I'm  using.  Perhaps  a custom  type could  do
>> better?  I've  tried the "<@" orperator and that changes the  query plan
>> significantly but the execution cost/time is not improved.
>> 
>> Any suggestion or pointers much appreciated.
> 
> You really need EXPLAIN (ANALYZE, BUFFERS) output to be able to
> undersrand what is going on.
> 
> A couple of simple things to check:
> 
> - Have all tables been ANALYZED beforehand?
> - Are all optimizer database parameters identical?
> 
> Also, to have PostgreSQL inline the function, which would be good
> for performance, it should be declared IMMUTABLE.
> 
> Yours,
> Laurenz Albe
The explain analyze was (maybe is)still running but without buffers. 
Thought of the immutable bit. Will be doing that test. 
All tables vacuumed and analyzed with each structural change. 
Thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] query not scaling

2017-10-25 Thread Rob Sargent

I have a query I cannot tame and I'm wondering if there's an alternative
to  the "between"  clause I'm  using.  Perhaps  a custom  type could  do
better?  I've  tried the "<@" orperator and that changes the  query plan
significantly but the execution cost/time is not improved.

Any suggestion or pointers much appreciated.

Environment: Using  a virtual  CentOS Linux  release 7.4.1708  (Core), 4
cores (2.3GHz),  8G RAM and  postgres 10.0(beta3) shared_buffers  = 1GB,
work_mem = 2GB

Domain: (TL/DR)  A "segment" is  defined by  a particular set  of people
(probandset.id)  plus a  subset of  markers (markerset.id,  startmarker,
endmarker).   I need  the minimum  p-value for  each marker  in the  set
across all segments matching the set and a specific set of poeple.  So a
given segment says  "I cover all the markers from  startbase to endbase"
and each marker has a specific base position (relative to a chromosome).
I'm  after the  smallest  p-value  for each  marker  across  the set  of
segments which include that marker (from the 'between' clause).

Context:  I have  the query  in a  function so  the ids  of the  all the
players are available to the following sql:

select m.id as mkrid
   , min(pv(s.events_less, s.events_equal, s.events_greater, 0)) as 
optval
from marker m join segment s on m.basepos between s.startbase and s.endbase
 and m.chrom = 1
 and s.chrom = 1
 and s.markerset_id = suppliedMarkersetId
join probandset r on s.probandset_id = r.id
 and r.people_id =  suppliedPeopleId
group by m.id

where the pv function is

create or replace function pv(l bigint, e bigint, g bigint, o int)
returns numeric
as
$$
select 1.0*(g+e+o)/(l+e+g+o);
$$
language sql
;


I have the identical schema in  two databases (same pg instance) and the
tables definitions  involved are  below.  In one  schema there  are 1.7M
records  in segment  and in  the other  there is  40M rows.   The marker
tables are much more similar with 600K and 900K respectively.  The third
table, probandset, has 60 and 600 respectively. On average 0.8M and 1.8M
segments per markerset_id.

The explains: (fast  (12sec), then slow(hours)).  The  part which sticks
out  to  me  is  where  the "between"  gets  used.   (I'm  betting  that
probandset is  too small to  matter.)  The  slower explain plan  is very
similar to  what I  saw originally in  the now "fast"  data set  and the
current indexing stategy comes largely from that performance work.

It looks like I'm getting a Cartesian between the number of markers in a
set and the number of segments found: ten zeros at least.
  QUERY PLAN
--
 HashAggregate  (cost=291472.27..292040.58 rows=56831 width=48)
   Group Key: m.id
   ->  Nested Loop  (cost=3752.33..167295.52 rows=4515518 width=40)
 ->  Nested Loop  (cost=3751.90..17906.25 rows=715 width=32)
   ->  Seq Scan on probandset r  (cost=0.00..2.77 rows=4 width=16)
 Filter: (people_id = 
'4e3b9829-43a8-4f84-9df6-f120dc5b1a7e'::uuid)
   ->  Bitmap Heap Scan on segment s  (cost=3751.90..4473.96 
rows=191 width=48)
 Recheck Cond: ((probandset_id = r.id) AND (chrom = 1) AND 
(markerset_id = '61a7e5cb-b81d-42e4-9e07-6bd9c2fbe6d1'::uuid))
 ->  BitmapAnd  (cost=3751.90..3751.90 rows=191 width=0)
   ->  Bitmap Index Scan on useg  (cost=0.00..72.61 
rows=2418 width=0)
 Index Cond: ((probandset_id = r.id) AND (chrom 
= 1))
   ->  Bitmap Index Scan on segment_markerset_id_idx  
(cost=0.00..3676.23 rows=140240 width=0)
 Index Cond: (markerset_id = 
'61a7e5cb-b81d-42e4-9e07-6bd9c2fbe6d1'::uuid)
 ->  Index Scan using marker_chrom_basepos_idx on marker m  
(cost=0.42..145.79 rows=6315 width=20)
   Index Cond: ((chrom = 1) AND (basepos >= s.startbase) AND (basepos 
<= s.endbase))
(15 rows)


 QUERY PLAN
-
 HashAggregate  (cost=83131331.81..83132151.44 rows=81963 width=48)
   Group Key: m.id
   ->  Nested Loop  (cost=1907.38..70802659.35 rows=448315362 width=40)
 Join Filter: ((m.basepos >= s.startbase) AND (m.basepos <= s.endbase))
 ->  Bitmap Heap Scan on marker m  (cost=1883.64..11009.18 rows=81963 
width=20)
   Recheck Cond: (chrom = 1)
   ->  Bitmap Index Scan on marker_chrom_basepos_idx  
(cost=0.00..1863.15 rows=81963 width=0)
 Index Cond: (chrom = 1)
 ->  Materialize  (cost=23.74..181468.38 rows=49228 width=32)

Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Rob Sargent



On 10/13/2017 09:49 AM, Seamus Abshere wrote:

hey,

In the spreadsheet world, there is this concept of "shared strings," a
simple way of compressing spreadsheets when the data is duplicated in
many cells.

In my database, I have a table with >200 million rows and >300 columns
(all the households in the United States). For clarity of development
and debugging, I have not made any effort to normalize its contents, so
millions of rows have, for example, "SINGLE FAMILY RESIDENCE /
TOWNHOUSE" (yes, that whole string!) instead of some code representing
it.

Theoretically / blue sky, could there be a table or column type that
transparently handles "shared strings" like this, reducing size on disk
at the cost of lookup overhead for all queries?

(I guess maybe it's like TOAST, but content-hashed and de-duped and not
only for large objects?)

Thanks,
Seamus

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


What data type are these columns now?  I would be tempted to map the 
full strings to an abbreviation just so I didn't have to alter all the 
columns to an "id";  Optional to place any RI on the columns to the 
abbreviation dictionary table.  Just use the translation as a last step 
in user facing reports.  If you can map/abbreviate to 4 characters, 
you've approximated the disk size of an integer.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Inserting millions of record in a partitioned Table

2017-09-20 Thread Rob Sargent


On 09/20/2017 02:46 PM, Vick Khera wrote:
On Wed, Sep 20, 2017 at 10:10 AM, Job > wrote:


We noticed that if we import directly into the global table it is
really, really slow.
Importing directly in the single partition is faster.


Do you have a rule or trigger on the main table to redirect to the 
partitions? You should expect that to take some extra time *per row*. 
Your best bet is to just import into the proper partition and make 
sure your application produces batch files that align with your 
partitions.


Either that or write a program that reads the data, determines the 
partition, and then inserts directly to it. It might be faster.


I wonder if this is a case of hurry up and wait.  A script which could 
load say 10 records, and assuming that takes much less than one second, 
run once per second (waiting 1000 - runtime ms) would by now have done 
about a million records since the question was asked.


Re: [GENERAL] looking for a globally unique row ID

2017-09-15 Thread Rob Sargent



On 09/15/2017 12:45 PM, Adam Brusselback wrote:


I cannot image a single postgres index covering more than one
physical table. Are you really asking for that?


While not available yet, that is a feature that has had discussion 
before.  Global indexes are what i've seen it called in those 
discussions.  One of the main use cases is to provide uniqueness 
across multiple tables, which would also allow things like foreign 
keys on partitioned tables.
I had a sneaking suspicion that partitioning would be the use-case, but 
clearly there's at least the 'notion' of a single entity




Re: [GENERAL] looking for a globally unique row ID

2017-09-15 Thread Rob Sargent



Isn't this typically handled with an inheritance (parent-children)
setup.  MasterDocument has id, subtype and any common columns (create
date etc) then dependents use the same id from master to complete the
data for a given type.  This is really common in ORM tools.  Not clear
from the description if the operations could be similarly handled
(operation id, operation type as master of 17 dependent
operationSpecifics; there is also the "Activity Model")

I do that, but may be I do that badly.

Currently I do have 6 levels of inheritance which partition my
document-class space. But I cannot see any way to have a unique index
(unique constraint) to cover all those partitions at once.

This is actually the core of my question: How to make one?


I cannot image a single postgres index covering more than one physical 
table. Are you really asking for that? Remember each dependent record 
has an entry in the master so the master guarantees a unique set of keys 
across all the dependents.  Now if you have enough documents you may get 
into partitioning but that's a separate issue.
How you model the work done on (or state transition of) those documents 
is a yet another design, but at least the work flow model can safely, 
consistently refer to the master table.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a globally unique row ID

2017-09-14 Thread Rob Sargent



On 09/14/2017 02:39 PM, Rafal Pietrak wrote:


W dniu 14.09.2017 o 19:30, Rob Sargent pisze:


On 09/14/2017 11:11 AM, Rafal Pietrak wrote:

[--]

So I'm stuck with seriously incomplete solution.

that's why I have an impression, that I'm going into entirely wrong
direction here.



So you care if the same id is used in separate, unrelated tables? What's
your fear here?  And I completely get the confusion generated be the

You can call it fear, but I see it as an attempt for "schema/design
resilience".

Like with "unique constraint". You lay out the schema, and whatever bug
get planted into an application, the database does not let you put
inconsistent data there.

And answering your question, my problem is actually trivial. I need that
to systematically cover document identification within archive. There
are couple of document "Classes" I need to cover, their representation
in the DB requires different columns, so they are registered in
different table. I've started the project with one table for all those
classes, and quite soon it become monstrous.

So every document class got their own table. (and their private unique ID).

But problem rises when I need to referencing those documents along their
lifetime. Every class needs separate table for that. I cannot have a
single table of "history" (document-ID, dispatch-date, recepient), as
I'm not able to make FK from single table into multitude of tables (with
separate classes). So the forest of tables grows.

The above schema would get significantly simpler if only I could have a
"global primary key"... or something equivalent.

And as I said, this is not a problem of performance (or identification
space exhaustion). It's a problem of "schema explosion". Something that
actually is quite simple become complex (if not unmanageable) just
because I cannot see a good tool to manage the "globality".

I kick this problem for some time now - rewriting the basic schema
numerous times, and:

1. either I get monstrous "document master table" - which very
effectively leads to contents inconsistency (document get attributs
which dont belong to their classes)

2. or the schema explodes, as with documents stored in smaller (less
columns) specialized by-class table ... they all need their private
"managemnet tables" which FK into its respective document tables.

Both ways turned out to be quite expensive in maintenance.

Throwing actual numbers: 12 basic classes of documents; 17 tables
registering various operations document may undergo during its lifetime.
Variant (2) above make it 12*17 = 204 tables, which I'm currently
maintaining and it's too much. With variant (1) I simply wasn't able
to effectively keep document attributes consistent.

Thus I'm searching for tools (paradigms/sql-idioms) that would fit the
problem.


Isn't this typically handled with an inheritance (parent-children) 
setup.  MasterDocument has id, subtype and any common columns (create 
date etc) then dependents use the same id from master to complete the 
data for a given type.  This is really common in ORM tools.  Not clear 
from the description if the operations could be similarly handled 
(operation id, operation type as master of 17 dependent 
operationSpecifics; there is also the "Activity Model")

same small integer being re-used in various context ("sample id" is the
bane for me). Could you use a sufficiently accurate time value?





But thank you all for answers.

-R






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a globally unique row ID

2017-09-14 Thread Rob Sargent



On 09/14/2017 11:11 AM, Rafal Pietrak wrote:


Not really.

As I said, I'm not looking for performance or "fair probability" of
planetary-wide uniqueness.

My main objective is the "guarantee". Which I've tried to indicate
referring to "future UPDATEs".

What I mean here is functionality similar to "primary key", or "unique
constraint". Whenever somebody (application, like faulty application
IMPORTANT!) tries to INSERT or UPDATE a not unique value there (which in
fact could possibly be generated earlier by UUID algorithms, or even a
sequence), if that value is among table that uses that (misterious)
"global primary key"; that application just fails the transaction like
any other "not unique" constraint failing.

That's the goal.

Multitude of tablas using a single sequence does not give that guarantee.

As I've said, a solution closest to my target is a separate table with
just one column of that "global primary key", which get inserted/updated
within trigger on insert/update of the "client tables" ... only I'm not
so sure how to "cleanly" manage multitude of tables using the same key
of that "global table of keys"... that is its "back references".

So I'm stuck with seriously incomplete solution.

that's why I have an impression, that I'm going into entirely wrong
direction here.


So you care if the same id is used in separate, unrelated tables? What's 
your fear here?  And I completely get the confusion generated be the 
same small integer being re-used in various context ("sample id" is the 
bane for me). Could you use a sufficiently accurate time value?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pglogical bidirectional replication of sequences

2017-09-01 Thread Rob Sargent



On 09/01/2017 02:29 AM, Peter J. Holzer wrote:

TLDR: Don't.

I'm currently conducting tests which should eventually lead to a 2 node
cluster with working bidirectional logical replication.

(Postgresql 9.6.4-1.pgdg90+1, pglogical 2.0.1-1.jessie+1 on Debian 9
(Stretch))

pglogical supports replication of sequences, and although the way it
does this suggests that it can't really work in both directions
(actually I'm sceptical that it works reliably in one direction), of
course I had to try it.

So I created a sequence on both nodes and called
select pglogical.replication_set_add_sequence('default', 'test_sequence');
on both nodes.

The result was ... interesting.

First I got the same sequence (1, 2, 3, 4, 5) on both nodes.

After a few seconds the replication kicked in, and then I got the same
value (1005) on both nodes most of the time, with a few variants (2005,
3005) thrown in.

In a word, the sequence was completely unusable.

Experiment completed, so I removed the sequence from the replication
set:

select pglogical.replication_set_remove_sequence('default', 'test_sequence');
on both nodes.

But the behaviour of the sequence doesn't change. It still returns 1005
most of the time, and sometimes 2005 or 3005. This is true even after
restarting both nodes.

Plus, I can't drop the sequence any more (as the user who created the
sequence):

wds=> drop sequence public.test_sequence ;
ERROR:  permission denied for schema pglogical

So, clearly, pglogical is still managing that sequence.

If I drop the sequence as postgres and then recreate it, it works
normally for some time (also the sequence on the other node now works
normally), but after some time, the replication kicks in again and the
sequence is stuck again at 1005.

So, is there a way to recover from this situation without drastic
measures like nuking the whole database.

 hp

I trust you mean don't use sequences



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jdbc driver vis Release 10

2017-08-25 Thread Rob Sargent


> On Aug 25, 2017, at 5:55 PM, Dave Cramer <p...@fastcrypt.com> wrote:
> 
> There are newer versions out there!
> 
> Dave Cramer
> 
> da...@postgresintl.com
> www.postgresintl.com
> 
>> On 25 August 2017 at 19:53, Rob Sargent <robjsarg...@gmail.com> wrote:
>> 
>> 
>>> On 08/25/2017 05:34 PM, Dave Cramer wrote:
>>> The JDBC driver release is divorced from the server release.
>>> 
>>> Thanks
>>> 
>>> Dave Cramer
>>> 
>>> da...@postgresintl.com
>>> www.postgresintl.com
>>> 
>>>> On 23 August 2017 at 19:33, Rob Sargent <robjsarg...@gmail.com> wrote:
>>>> I see no mention of a new jdbc driver on the release notes for Beta 1.  
>>>> Does that mean there isn't one?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-general
>>> 
>> Yes, so I've learned.  I just ran my software using jdbc 9.4.1212 via j00Q 
>> 3.9 against pg10-beta3 without a problem.  I do very vanilla stuff.
>>  
> 
Yes. I'll get there as needed but wanted to know if any piece needed changing 
NOW 

Re: [GENERAL] jdbc driver vis Release 10

2017-08-25 Thread Rob Sargent



On 08/25/2017 05:34 PM, Dave Cramer wrote:

The JDBC driver release is divorced from the server release.

Thanks

Dave Cramer

da...@postgresintl.com <mailto:da...@postgresintl.com>
www.postgresintl.com <http://www.postgresintl.com>

On 23 August 2017 at 19:33, Rob Sargent <robjsarg...@gmail.com 
<mailto:robjsarg...@gmail.com>> wrote:


I see no mention of a new jdbc driver on the release notes for
Beta 1.  Does that mean there isn't one?




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org

<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
<http://www.postgresql.org/mailpref/pgsql-general>


Yes, so I've learned.  I just ran my software using jdbc 9.4.1212 via 
j00Q 3.9 against pg10-beta3 without a problem.  I do very vanilla stuff.




Re: [GENERAL] jdbc driver vis Release 10

2017-08-23 Thread Rob Sargent



On 08/23/2017 06:09 PM, David G. Johnston wrote:
On Wed, Aug 23, 2017 at 4:33 PM, Rob Sargent <robjsarg...@gmail.com 
<mailto:robjsarg...@gmail.com>>wrote:


I see no mention of a new jdbc driver on the release notes for
Beta 1.  Does that mean there isn't one?


​Whose release notes?  PostgreSQL Server?  I don't believe the server 
release notes ever talk about external projects.​


Its seems quite a few significant changes (including version 
numbering) have happened to the JDBC project in the past year but you 
will need to go directly to the project's site/GitHub to keep abreast 
of all of it.


David J.

Thanks


[GENERAL] jdbc driver vis Release 10

2017-08-23 Thread Rob Sargent
I see no mention of a new jdbc driver on the release notes for Beta 1.  
Does that mean there isn't one?





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL with PowerBuilder, and Identity keys (serials)

2017-08-05 Thread Rob Sargent

> On Aug 5, 2017, at 3:12 PM, Dan Cooperstock at Software4Nonprofits 
>  wrote:
> 
> I’m on PostgreSQL 9.6, 64-bit Windows.
>  
> That really is the correct name for the sequence, because I’m not using 
> SERIAL. (I needed everything to match the naming in my existing DB I’m using 
> for the app, Firebird SQL, so the changes to make it work with either DB 
> would be as minimal as possible.) The setup of tables I’m using with this 
> sort of thing are like the following example:
>  
> CREATE SEQUENCE GEN_CATEGORY MINVALUE 0;
>  
> CREATE TABLE CATEGORY(
>   CATEGORYNUMSMALLINT NOT NULL DEFAULT NEXTVAL('GEN_CATEGORY'),
>   DESCRIPTIONVARCHAR(20) NOT NULL,
>   CONSTRAINT PK_CATEGORY PRIMARY KEY (CATEGORYNUM)
> );
>  
> So as you can see GEN_ plus the tablename is indeed correct. The default on 
> the CATEGORYNUM column is definitely working, which I tested with direct SQL 
> commands: after inserting a row (with the CATEGORYNUM not specified in the 
> INSERT), if I SELECT currval(‘GEN_CATEGORY’), it gives me the correct value, 
> which is also what got saved in that column.
>  
> 
> Dan Cooperstock
> DONATION and ACCOUNTS web site: http://www.Software4Nonprofits.com
> Email: i...@software4nonprofits.com
> Phone: 416-423-7722
> Mail: 57 Gledhill Ave., Toronto ON M4C 5K7, Canada
> 
> If you do not want to receive any further emails from Software4Nonprofits, 
> please reply to this email with the subject line "UNSUBSCRIBE", and we will 
> immediately remove you from our mailing list, if you are on it.
> 
>  
> From: Melvin Davidson [mailto:melvin6...@gmail.com] 
> Sent: August 5, 2017 4:46 PM
> To: Dan Cooperstock at Software4Nonprofits 
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] PostgreSQL with PowerBuilder, and Identity keys 
> (serials)
>  
> >GetIdentity="Select currval('GEN_')"
> 
> FYI, it would be helpful to specify the PostgreSQL version & O/S, but 
> generically speaking, in PostgreSQL, when you generate a sequence 
> by specifying serial as data type, the name takews the form of 
> tablename_columnname_seq, so in your case, try
> 
> https://www.postgresql.org/docs/9.4/static/datatype-numeric.html#DATATYPE-SERIAL
> 
> GetIdentity="Select currval('tablename_column_seq')".
> 
> BTW, in PostgreSQL, it is NOT recommended to use mixed case object names, as 
> all names are converted to lowercase unless enclosed in double quotes.
> 
>  
> On Sat, Aug 5, 2017 at 4:09 PM, Dan Cooperstock at Software4Nonprofits 
>  wrote:
>> I’m trying to get a Postgres DB version of an application I write in 
>> PowerBuilder working. The thing I’m stuck on is Identity keys – what you set 
>> up with the SERIAL attribute or SEQUENCEs / GENERATORs in Postgres.
>>  
>> I have the sequence set up and clearly working. And in PowerBuilder, I have 
>> added a section I found online to a file it uses to know how to deal with 
>> different aspects of different DBs (PBODBxxx.INI) for Postgres, including 
>> the following line for retrieving an identity key after saving a record, 
>> when the DB automatically fills in the value:
>>  
>> GetIdentity="Select currval('GEN_')"
>>  
>> That obviously depends on the generator being named “GEN_” plus the table’s 
>> name – which is true in our case.
>>  
>> But nothing like that is happening. Does anyone else have PostgresSQL 
>> working with PowerBuilder and identity keys, who can give me some pointers 
>> on how get this to work?
>>  
>> Thanks.
>>  
>> 
>> Dan Cooperstock
>> DONATION and ACCOUNTS web site: http://www.Software4Nonprofits.com
>> Email: i...@software4nonprofits.com
>> Phone: 416-423-7722
>> Mail: 57 Gledhill Ave., Toronto ON M4C 5K7, Canada
>> 
>> If you do not want to receive any further emails from Software4Nonprofits, 
>> please reply to this email with the subject line "UNSUBSCRIBE", and we will 
>> immediately remove you from our mailing list, if you are on it.
>> 
>>  
> 
> 
> 
> -- 
> Melvin Davidson
> I reserve the right to fantasize.  Whether or not you 
> wish to share my fantasy is entirely up to you. 

I don’t see that you posted any sort of error message?
Also don’t test that sequence more than 64K times as you’ll certainly get a 
number too big for smallest :)
Did your direct sql testing actually use ALL CAPS?



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] bidirectional mapping?

2017-08-02 Thread Rob Sargent



On 08/02/2017 01:35 PM, John McKown wrote:
On Wed, Aug 2, 2017 at 10:55 AM, Chris Travers 
>wrote:




On Wed, Aug 2, 2017 at 5:44 PM, John McKown
> wrote:

Is there a simple way to do bidirectional mapping of a table
with itself? I am thinking of a "spousal" type relationship,
where it is true that if A is spouse of B, then B is spouse of
A. I don't necessarily want "A" and "B" to be a monogamous
relationship because that is not always be true world wide.
The best I can come up with so far is something like:

CREATE TABLE forespouse (PERSON integer PRIMARY KEY,
SPOUSE integer UNIQUE
CHECK( PERSON != SPOUSE) -- sorry, can't marry self
);
CREATE UNIQUE INDEX ON forespouse(PERSON, SPOUSE);
CREATE UNIQUE INDEX ON forespouse(SPOUSE, PERSON);
-- I'm not sure that the above indices are needed.
CREATE VIEW backspouse AS SELECT SPOUSE, PERSON FROM forespouse;
CREATE VIEW spouse AS
 SELECT PERSON, SPOUSE FROM forespouse
 UNION
 SELECT SPOUSE, PERSON FROM backspouse
;


Usually the way I have done this is to normalise the
representation and use a table method for converting for joins. 
In other words:


create table marriage (party integer primary key, counterparty
integer unique, check party < counterparty);

Not sure I agree with the uniqueness of the parties involved. Unique on 
(party, counterparty) isn't a for sure, if there's any temporal 
dimension involved, in which case I would prefer (id, party, counterparty).






Re: [GENERAL] dump to pg

2017-05-31 Thread Rob Sargent
> On May 31, 2017, at 9:27 AM, David G. Johnston  
> wrote:
> 
> On Wed, May 31, 2017 at 7:43 AM, Nicolas Paris  wrote:
> Hi,
> 
> I have dumps from oracle and microsoft sql server (no more details). Is it 
> possible to load them "directly" into postgres  (without oracle/mssql 
> license)?
> dump -> csv -> postgtres
> or something ?
> 
> 
> ​Asking the public mailing lists for PostgreSQL for an opinion on the the 
> legality of doing something like this is problematic​..
> 
> Technically it would depend a lot on exactly what you are trying to load.  If 
> you can go through CSV to do it then its probably just raw insert data and 
> has a decent chance of working as long as the schema is PostgreSQL has been 
> created to accept the incoming data.
> 
> David J.
> 
If the dumps are not ascii the ‘to csv’ step gets problematic if not impossible 
legally.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] all serial type was changed to 1

2017-05-01 Thread Rob Sargent

Every row?


On 05/01/2017 05:17 PM, Max Wang wrote:


Sorry. I mean all tables’ id column were reset to 1.

Thanks.

*From:*Melvin Davidson [mailto:melvin6...@gmail.com]
*Sent:* Tuesday, 2 May 2017 9:14 AM
*To:* Max Wang 
*Cc:* Adrian Klaver ; 
pgsql-general@postgresql.org

*Subject:* Re: [GENERAL] all serial type was changed to 1

On Mon, May 1, 2017 at 7:08 PM, Max Wang > wrote:


Hi Adrian,

Only sequences (id) reset to 1.

Regards,
Max

-Original Message-
From: Adrian Klaver [mailto:adrian.kla...@aklaver.com
]
Sent: Monday, 1 May 2017 11:30 PM
To: Max Wang >;
pgsql-general@postgresql.org 
Subject: Re: [GENERAL] all serial type was changed to 1

On 04/30/2017 10:51 PM, Max Wang wrote:
> Hi All,
>
>
>
> We have a PostgreSQL database. There are 26 tables and we use serial
> type as primary key.  We had a insert error as "duplicate key value
> violates unique constraint, DETAIL:  Key (id)=(1) already
exists." one
> weeks ago. I checked and found all tables' id were reset to 1.

So to be clear:

Every row in each of the 26 tables has an id of 1?

or

Do you mean the sequences where reset to 1?

>
>
>
> I checked database log and did not find any useful information. 
I am

> not sure why this happen. The only script which connect to this
> database is a Python script and only do normal
insert/update/delete actions.
>
>
>
> Please give me some suggestions if you happen to know something
about
> this issue. I appreciate any feedback you might have.
>
>
>
> I am very new to PostgreSQL and this mail list. Please let me
know if
> I did not something wrong.
>
>
>
> Thank you.
>
>
>
> Regards,
>
> Max
>


--
Adrian Klaver
adrian.kla...@aklaver.com 


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



*But as Adrian asked, was the sequence reset to 1 for ALL tables 
sequences or just 1?*


--

*Melvin Davidson*
*I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you. *





[GENERAL] Cannot spot the difference in two queries

2017-04-27 Thread Rob Sargent
I have three very similar functions, two of which are fast and one is 
slow and I cannot explain the slowness of the third. All of which, with 
the correct arguments should return the same "optimal set". I present 
one of the two fast versions and the slow one, in whole and in part.I'm 
using postgres 9.6 on ubuntu 16.4 desktop (16G memory).


I'm confused because the bulk of the difference in the sql is in the 
"with" clause and those are equally quick and return the same dataset, 
and to my eye the processing after that is the same as sql but clearly 
not in the server.  I'm missing something obvious I'm sure but whatever 
it is is causing a nasty Cartesian then sorting that out.


(And I get that mm.markerset_id is not needed in the where clause)

Any insights appreciated.
Thanks,
rjs

I have not included table defs, thinking that since they're the same for 
both queries ...


Fast function

   create or replace function seg.optimal_pvalue_set(people_name text,
   markers_name text, chr int)
   returns table (segment_id uuid, optval numeric, firstbase int) as
   $$
   declare
  mkset uuid;
  rcount int;
   begin
  select id into mkset from seg.markerset where name = markers_name
   and chrom = chr;
   --
  create temp table optmarkers on commit drop as
  with segset as (
  select s.id
 , s.chrom
 , s.markerset_id
 , s.startbase
 , s.endbase
 , ((s.events_equal + s.events_greater)/(1.0 *
   (s.events_less + s.events_equal + s.events_greater))) as pval
  from seg.segment s
   join seg.probandset i on s.probandset_id = i.id
   join (select people_id, array_agg(person_id) as persons
 from seg.people_member
 group by people_id) as pa on i.probands <@ pa.persons
   join seg.people o on pa.people_id = o.id
  where
   s.markerset_id = mkset
   and o.name = people_name
  )
  select m.id as mkrid
 , min(ss.pval) as optval
  from segset ss
   join seg.markerset_member mm on ss.markerset_id =
   mm.markerset_id
   join seg.marker m on mm.member_id = m.id
  where
   m.basepos between ss.startbase and ss.endbase
   and m.chrom = ss.chrom
   and mm.markerset_id = mkset   --
   'b474655c-80d2-47e7-bcb5-c65245195888'
  group by m.id;
   --
  get diagnostics rcount = ROW_COUNT;
  raise notice '% segments to optimize', rcount;
   --
  return query
  select s.id as segment_id, o.optval, min(m.basepos) as firstbase
  from optmarkers o
   join seg.marker m on o.mkrid = m.id
   join seg.markerset_member mm on m.id = mm.member_id
   join seg.segment s on mm.markerset_id = s.markerset_id
  where mm.markerset_id = mkset
and m.basepos between s.startbase and s.endbase
and ((s.events_equal + s.events_greater)/(1.0 *
   (s.events_less + s.events_equal + s.events_greater))) = o.optval
  group by s.id, o.optval
  order by firstbase;
   end;
   $$ language plpgsql;

   -- timing the CTE base
   --with, the first
  select count(s.id)
 -- s.id
 -- , s.chrom
 -- , s.markerset_id
 -- , s.startbase
 -- , s.endbase
 -- , ((s.events_equal + s.events_greater)/(1.0 *
   (s.events_less + s.events_equal + s.events_greater))) as pval
  from seg.segment s
   join seg.probandset i on s.probandset_id = i.id
   join (select people_id, array_agg(person_id) as persons
 from seg.people_member
 group by people_id) as pa on i.probands <@ pa.persons
   join seg.people o on pa.people_id = o.id
  where
   s.markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'
   --mkset
   and o.name = '1151704'; -- people_name
 count
   ---
 30762
   (1 row)

   --explain analyze of Full CTE portion (that which constructs the
   temp table in prep for final query)
   --manually supplying the needed ids and such.
   QUERY PLAN
   
--
 GroupAggregate  (cost=15412.11..15413.24 rows=65 width=48) (actual
   time=5462.143..6042.901 rows=52196 loops=1)
   Group Key: m.id
   CTE segset
 ->  Nested Loop  (cost=4820.07..6024.99 rows=310 width=76)
   (actual time=11.854..449.869 rows=30762 loops=1)
   ->  Seq Scan on probandset i  (cost=0.00..21.27 rows=1
   width=16) (actual time=0.007..0.286 rows=57 loops=1)

Re: [GENERAL] referential integrity between elements of an array and another table?

2017-04-19 Thread Rob Sargent



On 04/19/2017 01:13 PM, Henry M wrote:
I was just reading this question on reddit (the text duplicated 
below). I was wondering if there is an approach for handling array 
foreign key references. I am interested in the answer since I started 
using array fields as well. Thank you.



- below is the message from the reddit poster:

https://www.reddit.com/r/PostgreSQL/comments/66a74t/question_referential_integrity_between_elements/ 



First off, thanks for taking the time. I don't see any "weekly help 
threads" or anything so I hope it's okay I made a post.


/The Problem/

I'm extending our CRM database in order to manage the ISO 
documentation of the company for which I work. We need, as part of 
ISO, to keep track of project development meetings and attendees.


Currently, I have a "meetings" table with an attendee list as an array 
of integers, each element corresponding to the id field in the 
"person" table.


How do I enforce referential integrity between elements of this array 
and the id field in "person"? All I could find was a proposed change 
for 9.2 or 9.3 that got shot down...


Please help. I don't want to do a join table.

Unless you have thousands of attendees per meeting and everyone in 
multiple meetings per day I would encourage a join table.  What's your 
hesitation for doing so?  Note I'm an array user myself, so not against 
arrays per se, but there should be a justification for denormalizing 
(depending your definition thereof).


Re: [GENERAL] referential integrity between elements of an array and another table?

2017-04-19 Thread Rob Sargent



On 04/19/2017 01:13 PM, Henry M wrote:
I was just reading this question on reddit (the text duplicated 
below). I was wondering if there is an approach for handling array 
foreign key references. I am interested in the answer since I started 
using array fields as well. Thank you.



- below is the message from the reddit poster:

https://www.reddit.com/r/PostgreSQL/comments/66a74t/question_referential_integrity_between_elements/ 



First off, thanks for taking the time. I don't see any "weekly help 
threads" or anything so I hope it's okay I made a post.


/The Problem/

I'm extending our CRM database in order to manage the ISO 
documentation of the company for which I work. We need, as part of 
ISO, to keep track of project development meetings and attendees.


Currently, I have a "meetings" table with an attendee list as an array 
of integers, each element corresponding to the id field in the 
"person" table.


How do I enforce referential integrity between elements of this array 
and the id field in "person"? All I could find was a proposed change 
for 9.2 or 9.3 that got shot down...


Please help. I don't want to do a join table.

Unless you have thousands of attendees per meeting and everyone in 
multiple meetings per day I would encourage a join table.  What's your 
hesitation for doing so?  Note I'm an array user myself, so not against 
arrays per se, but there should be a justification for denormalizing 
(depending your definition thereof).


Re: [GENERAL] Error During PostGIS Build From Source on Linux

2017-04-12 Thread Rob Sargent



On 04/12/2017 12:01 PM, Adrian Klaver wrote:

On 04/12/2017 10:14 AM, Osahon Oduware wrote:

Hi All,

I am working on a Linux OS (Centos 6.5). I built GDAL successfully from
source as below:
./configure --prefix=/path/to/gdal
make
make install

Next I built PostgreSQL successfully from source as below:
./configure --prefix=/path/to/pgsql
make
make install

I attempted to configure PostGIS from source as below:
./configure --with-pgconfig=/home/path/to/
pg_config --with-gdalconfig=/path/to/gdal
--with-geosconfig=/path/to/geos-config
--with-xml2config=/path/to/xml2-config --with-projdir=/path/to/proj
--with-gui --with-raster --with-topology

but I get the following error:
configure: error: PostGIS raster requires OGR to be enabled in GDAL. Use
--without-raster to build without raster support.

I sure need the raster support to be enabled in PostGIS. Could anyone
help me out with this.



Use the Postgres YUM repos:

https://www.postgresql.org/download/linux/redhat/

Been  years since I did this but you could (re)build GDAL with OGR as 
requested in the error message.

rjs


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Is there a point to having both a normal gist index and an exclude index?

2017-04-05 Thread Rob Sargent



On 04/05/2017 12:04 PM, Bruno Wolff III wrote:

On Wed, Apr 05, 2017 at 00:05:31 -0400,
 Tom Lane  wrote:

Bruno Wolff III  writes:

... I create both a normal gist index and an exclude index using the
following:
CREATE INDEX contains ON iplocation USING gist (network inet_ops);
ALTER TABLE iplocation
  ADD CONSTRAINT overlap EXCLUDE USING gist (network inet_ops WITH &&);



But I am wondering if it is useful to have the normal gist index for
finding netblocks containing a specific IP address, as it seems like 
the

exclude index should be usable for that as well.


No, that manually-created index is completely redundant with the
constraint index.


Thanks.

P.S. Using spgist with version 10 for the exclude index is much faster 
than using gist in 9.6. I have run the index creation for as long as 6 
hours and it hasn't completed with 9.6. It took less than 10 minutes 
to create it in 10. For this project using 10 isn't a problem and I'll 
be doing that.




That's an incredible difference.  Is it believable? Same resource, etc?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Equivalent function not found for ERROR_PROCEDURE(),ERROR_LINE(),ERROR_MESSAGE().

2017-03-28 Thread Rob Sargent

On 03/27/2017 04:59 AM, MAJUMDER, SAYAN wrote:


Hi,

I am new to postgresql and presently we are migrating from sql server 
to postgresql.


We have certain functions in sql server such as 
ERROR_PROCEDURE(),ERROR_LINE(),ERROR_MESSAGE(). I am unable to find 
any equivalent


functions in postgresql. Kindly help me. Thanks in advance. Its urgent.

Thanks and regards,

Sayan Majumder.

This message contains information that may be privileged or 
confidential and is the property of the Capgemini Group. It is 
intended only for the person to whom it is addressed. If you are not 
the intended recipient, you are not authorized to read, print, retain, 
copy, disseminate, distribute, or use this message or any part 
thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.




Seems you didn't like the "RAISE" option you were given on Stack 
Overflow.  You might explain what those functions do to allow postgres 
people to recognize any potential match.


Re: [GENERAL] import CSV file to a table

2017-03-08 Thread Rob Sargent



On 03/08/2017 09:52 AM, Karl Czajkowski wrote:

On Mar 08, Rob Sargent modulated:


Yes Karl, I agree.  I admitted as much.  But if it's clean, as in
free of quoted commas, life is much more simple.  I've lost site of
whether or not the OP knows his situation w.r.t. to this.  The awk
line will tell him and for a one-off load this can make a world of
difference in complexity - two bash lines and a COPY.


Maybe I didn't understand your awk? I thought it was counting commas
in lines.  This isn't the same as counting commas in records.

 this,is,record,one
 "this,,","is
 ,,record","two
 ,,,"

this has three commas on each line and definitely is not suitable
for naive CSV handling.


Karl
In essence it does count commas but plus one :).  $NF is number of 
fields defined by commas so one more field than number of commas. If you 
think/hope the file is simple and well formatted, this is a pretty quick 
check. But if you're looking for a general solution, you need a real csv 
parser.  I recall being quite surprised and amused to learn there is an 
actual standard for csv format. (Naturally if you have one to hand, you 
don't need the awk line.)



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unable to start postgresql

2017-03-08 Thread Rob Sargent




Meant to ask before, can you show the command you are using to connect?


My memory says OP didn't use --host, which often leads to trying the 
socket.  Do we know that's enabled in pg_hba?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] import CSV file to a table

2017-03-08 Thread Rob Sargent



On 03/08/2017 09:36 AM, Karl Czajkowski wrote:

I believe that in its fully glory, you cannot reliably locate CSV
record boundaries except by parsing each field in order including
quote processing.  Individual records may have arbitrary numbers of
field and record separator characters within the values.

Karl


On Mar 08, Rob Sargent modulated:

Since bash has been bandied about in this thread I presume awk is
available.  Here's how I would check just how 'csv'ish the incoming
file is.
...
Yes Karl, I agree.  I admitted as much.  But if it's clean, as in free 
of quoted commas, life is much more simple.  I've lost site of whether 
or not the OP knows his situation w.r.t. to this.  The awk line will 
tell him and for a one-off load this can make a world of difference in 
complexity - two bash lines and a COPY.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] import CSV file to a table

2017-03-08 Thread Rob Sargent
Since bash has been bandied about in this thread I presume awk is 
available.  Here's how I would check just how 'csv'ish the incoming file is.


   awk -F"," '{a[$NF]++}END{for(i in a){printf "%d lines have %d
   fields(columns)\n", a[i], i}}' csvfilename

If this doesn't produce one line you have to suspect quoted values 
including commas (or what ever char you choose). then you need a real 
csv parser.


If just one line, I'ld use cut to get rid of unwanted columns, then let 
COPY do it's thing



On 03/08/2017 09:13 AM, Karl Czajkowski wrote:

On Mar 08, John McKown modulated:
...

​I agree. I went with a "pure BASH" approach because it is what the
user asked for & I wasn't sure what language she might be comfortable
with. I use PERL a lot. Or maybe I should say that I abuse PERL a lot.
Such as a PERL script with writes out another PERL script, based on
some input files & parameters, then runs the just written PERL script,
which does the load into a PostgreSQL database (multiple tables). Ya, a
bit perverted.​


Well, you could follow a similar meta-programming/code-generating
pattern to have the BASH script output a single SQL file to run with
psql. You could even generate PL/pgsql code to defer more data
processing to the database itself.

I think the only robust "pure BASH" approach is to use a temporary
table, so you aren't trying to parse CSV content in BASH. Using csvkit
sounds good if you can introduce these third-party dependencies.

With the temporary table, you can use SQL for most validation or data
interrogation, but you need to know at least enough schema information
in advance to form the COPY statement. Parsing the CSV header row to
plan your work puts you right back to requiring a robust CSV parser
unless you can constrain your input scenarios to only handle very
trivial headers.

If you play games with a defaulting serial column and fixed column
names like "id, c1, c2, ..., cN" for the temporary table, you might
use the id column as a DB-assigned "row number" during COPY and
validation. In this case, you could even tell Postgres there is no
header, and then let it parse the header as another data record so you
can use SQL statements to determine the actual header names and
ordering in the input.  But this still requires knowing the column
count in advance of the COPY.

I also think using something like Python with structured data
processing would be wiser, unless you know enough about the schema in
advance to avoid any CSV parsing on the client side.


Karl






Re: [GENERAL] pg_xlog keeps growing

2017-02-27 Thread Rob Sargent

> On Feb 27, 2017, at 8:33 PM, dhanuj hippie  wrote:
> 
> Hi,
> 
> I have a postgres cluster running in hot_standby. I see the pg_xlog is 
> growing over time (may files of size 16 MB each). The replication lag is very 
> less ~2kB, and never goes into a bad state.
> I'm manually resetting this once a while using pg_resetxlog command.
> Is there a way to understand why my system keeps running into this problem ?
> 
> Thanks

wal_keep_segments is set to ???

(and version = ???, etc)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is table not found?

2017-01-31 Thread Rob Sargent
You must quite as "Raum"

> On Jan 31, 2017, at 6:45 AM, Egon Frerich  wrote:
> 
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> I created a db 'Hausrat' with a table "Raum". If I look in pgadmin I
> find the the table. In sql field is shown:
> 
>> -- Table: "Raum"
>> 
>> -- DROP TABLE "Raum";
>> 
>> CREATE TABLE "Raum"
>> (
>>  "RaumID" serial NOT NULL, -- Automatisch vergebenes
> Identifizierungsmerkmal für den Raum
>>  "Raum" character varying(15), -- Bezeichnung des Raums
>>  CONSTRAINT "Raum_pkey" PRIMARY KEY ("RaumID")
>> )
>> WITH (
>>  OIDS=FALSE
>> );
>> ALTER TABLE "Raum"
>>  OWNER TO egon;
>> COMMENT ON COLUMN "Raum"."RaumID" IS 'Automatisch vergebenes
> Identifizierungsmerkmal für den Raum';
>> COMMENT ON COLUMN "Raum"."Raum" IS 'Bezeichnung des Raums';
>> 
>> 
>> -- Index: "Raum_RaumID_idx"
>> 
>> -- DROP INDEX "Raum_RaumID_idx";
>> 
>> CREATE INDEX "Raum_RaumID_idx"
>>  ON "Raum"
>>  USING btree
>>  ("RaumID");
>> 
> But psql tells me "Raum" is not existent:
> 
>> egon@xfEinzel ~ $ psql Hausrat
>> psql (9.3.15)
>> Type "help" for help.
>> 
>> Hausrat=# SELECT * FROM Raum;
>> ERROR:  relation "raum" does not exist
>> LINE 1: SELECT * FROM Raum;
>>  ^
>> Hausrat=#
> 
> Why?
> 
> Egon
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iQIbBAEBAgAGBQJYkJTwAAoJECc7nbY+bg5uiRQP9iC4UtbHpPvdvV796fxcMSFp
> dncPHzkTKHvcEh3V9t/Xn/BzEHBDAO6RO8jmU18Ja9f/0nOyNXrWxal0OJZXint5
> l3/qRKbekkl7zfogysh4XgZQkpWwsJcYDGoG5tCUQm8TJ3sCk9a9+mbkDhy0Vsev
> MPeyYl/fYvlTBkiRmlPZTOX7YjRzeryKXguB3PBke3Vo6SZ1sFWnMjJ7bc2b
> 4cW9hmInmwXZ4EBOfcUW4QnfM0mgHBMGxJwYVQfeC6fQeqT8emh0KQLqozjFe/tk
> KNKDU0RebcrBgXP/lIGI34XahZW+WOdLli/t8wBUNiINruY2FlizuH6Uuak3tLjC
> cD7vi0SzNK2YFr9Rozw8ow9WVxSfjWBLiOA1GkFdhxKb80NLHthzo8oIVaCqz0bj
> 71kA0ewgZ9NMay6ch1VjqSPPFhAZHc1Ho3xIAa0BsZpvEdccDBsL+yk/6DGDYua9
> 5oT1p6CZqfDJqxEIoUiNaJEKm3An5ySN8hHn527/apG9yA9QMh5qJPHA0wRqtLxN
> hNSxugKhS6UOw/Wivbx0OIhN8jqBv4m22UgO9pFGgxHLs1hheSTMUSbExFCLuK+z
> sS1Kw9syplk+lFTzK6mqNpr3BQ6v2fmkPmRTZoID4e9T3DY8Bna2JXG2U1QGEzwa
> kwpJOMAvY3DDPv3pIK8=
> =i8FY
> -END PGP SIGNATURE-
> 
> 
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] update error with serializable

2017-01-20 Thread Rob Sargent



On 01/20/2017 10:05 AM, Kevin Grittner wrote:

On Fri, Jan 20, 2017 at 4:44 AM, Tom DalPozzo  wrote:


I've two threads countinuously updataing rows in the same table.
Each one does: BEGIN, UPDATE,UPDATECOMMIT
There can't be two active transactions updating the same row (my
bug apart but I don't think so).
I'm using default_transaction_isolation = 'serializable'
I get "could not serialize access due to read/write dependencies
among transactions"
I din't expect to see it, hence there must be something  in
postgresql theory that I haven't understood well and I'd like a
clarification.

Most likely one or both transactions have have updated 3 or more
tuples on a single page, causing the tuple locks for the
transaction on that page to be combined into a single page lock for
that transaction.  This is intended to prevent the memory required
for tracking predicate locks from growing too large.  This
threshold of 3 per page was entirely arbitrary and always seen as
something which could and should be improved someday.  That might
happen for version 10 (expected to be released next year), since a
patch has been submitted to make that configurable.

https://www.postgresql.org/message-id/flat/d8joa0eh9yw@dalvik.ping.uio.no#d8joa0eh9yw@dalvik.ping.uio.no

If you are able to build from source, you might want to test the
efficacy of the patch for your situation.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Configurable or dynamic?  Wouldn't something related to tuples per page 
(and maybe fillfactor) do the trick?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Materialized view vs. view

2017-01-10 Thread Rob Sargent

> 
> 
> "A rose by any other name would still smell as sweet”.
Actually there’s no “still” in that line, if you’re quoting Shakespeare.  And 
the full “That which we call a rose …” is truly appropriate here.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY: row is too big

2017-01-05 Thread Rob Sargent



On 01/05/2017 11:46 AM, Adrian Klaver wrote:

On 01/05/2017 08:31 AM, Rob Sargent wrote:



On 01/05/2017 05:44 AM, vod vos wrote:

I finally figured it out as follows:

1. modified the corresponding data type of the columns to the csv file

2. if null values existed, defined the data type to varchar. The null
values cause problem too.

so 1100 culumns work well now.

This problem wasted me three days. I have lots of csv data to COPY.



Yes, you cost yourself a lot of time by not showing the original table
definition into which you were trying insert data.


Given that the table had 1100 columns I am not sure I wanted to see it:)

Still the OP did give it to us in description:

https://www.postgresql.org/message-id/15969913dd3.ea2ff58529997.7460368287916683127%40zoho.com 

"I create a table with 1100 columns with data type of varchar, and 
hope the COPY command will auto transfer the csv data that contains 
some character and date, most of which are numeric."


In retrospect I should have pressed for was a more complete 
description of the data. I underestimated this description:


"And some the values in the csv file contain nulls, do this null 
values matter? "



My apologies for missing that.  Was sure there would be room for some 
normalization but so be it: OP's happy, I'm happy





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres 9.6.1 big slowdown by upgrading 8.4.22

2017-01-05 Thread Rob Sargent



On 01/05/2017 10:18 AM, Job wrote:

Hello guys,
a very strange thing: after upgrading from 8.4.22 to 9.6.1 i noticed, 
under heavy beanchmarks, a really slowdown of Postgresql 9.6.1, with 
the machine really "without breath".

By replacing Postgresql 8.4.22 evberything returns working fine.
With three days of investigation, i come at a function with lots of 
joing and conditions.

Only by removing this condition:
"exists ( select 1 from gruorari where 
gruorari.idgrucate=grucategorie.id and ( (('{'||gg_sett||'}')::int[] 
&& array[EXTRACT(DOW FROM NOW())::int])='t' and  now()::time between 
gruorari.dalle::time and gruorari.alle::time) )"

The benchmark with Postgresql 9.6.1 version are now very fast.
The table metnioned by the query has got indexes:
 id | numeric(1000,1)   | not null default function_get_next_sequence('gr
uorari_id_seq'::text)
 idgrucate | numeric(1000,1)   |
 dalle | character varying |
 alle  | character varying |
 gg_sett   | character varying |
 azione| character varying |
Indexes:
"keygruorari" PRIMARY KEY, btree (id)
"alle_idx" btree (alle)
"dalle_idx" btree (dalle)
"gg_sett_idx" btree (gg_sett)
"idgrucate_idx" btree (idgrucate)
What is strange, is that with 8.4.22 version there is no problem, but 
there is something that does not deal with 9.6.1 version.

Could you please help me?

THANK YOU!
/F

You might want to include the query plans for each server



Re: [GENERAL] COPY: row is too big

2017-01-05 Thread Rob Sargent



On 01/05/2017 05:44 AM, vod vos wrote:

I finally figured it out as follows:

1. modified the corresponding data type of the columns to the csv file

2. if null values existed, defined the data type to varchar. The null 
values cause problem too.


so 1100 culumns work well now.

This problem wasted me three days. I have lots of csv data to COPY.


Yes, you cost yourself a lot of time by not showing the original table 
definition into which you were trying insert data.


Re: [GENERAL] Index impact on update?

2017-01-04 Thread Rob Sargent



On 01/04/2017 09:59 AM, Israel Brewster wrote:

Short version:
Do indexes impact the speed of an UPDATE, even when the indexed 
columns aren't changing?


Details:
I have a table containing geographical data (Latitude, longitude, and 
elevation) with 406,833,705 records. The Latitude and Longitude 
columns are indexed. In order to better utilize the data, I've been 
looking into PostGIS, and decided I wanted to add a "Location" column 
with PostGIS type "GEOGRAPHY(point)". I then tried to populate it from 
the existing latitude/longitude data using the following query:


UPDATE data SET 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')');


I expected this update to take quite a while, since it has 406 million 
rows to update, but at this point it's been over 19 hours since I 
started the query, and it still hasn't completed.


I'm wondering if the presence of the indexes could be slowing things 
down even though the indexed columns aren't being updated? Would I be 
better off canceling the update query, dropping the indexes, and 
trying again? Or is more likely that the update query is "almost" 
done, and it would be better to just let it run it's course? Or is 
there an even better option, such as perhaps exporting the data, 
adding the additional column in a text editor, and re-importing the 
data with a COPY command?


Thanks for any feedback/advice you can offer!
I would use a "create table redo as select *, 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')') from original;" then index that and drop original.  
Or just "create table location as select 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')');" along with what ever id you have for the original 
tuple (if it's not just lat+lon) and join or view as necessary after 
indexing.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index impact on update?

2017-01-04 Thread Rob Sargent



On 01/04/2017 09:59 AM, Israel Brewster wrote:

Short version:
Do indexes impact the speed of an UPDATE, even when the indexed 
columns aren't changing?


Details:
I have a table containing geographical data (Latitude, longitude, and 
elevation) with 406,833,705 records. The Latitude and Longitude 
columns are indexed. In order to better utilize the data, I've been 
looking into PostGIS, and decided I wanted to add a "Location" column 
with PostGIS type "GEOGRAPHY(point)". I then tried to populate it from 
the existing latitude/longitude data using the following query:


UPDATE data SET 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')');


I expected this update to take quite a while, since it has 406 million 
rows to update, but at this point it's been over 19 hours since I 
started the query, and it still hasn't completed.


I'm wondering if the presence of the indexes could be slowing things 
down even though the indexed columns aren't being updated? Would I be 
better off canceling the update query, dropping the indexes, and 
trying again? Or is more likely that the update query is "almost" 
done, and it would be better to just let it run it's course? Or is 
there an even better option, such as perhaps exporting the data, 
adding the additional column in a text editor, and re-importing the 
data with a COPY command?


Thanks for any feedback/advice you can offer!
I would use a "create table redo as select *, 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')') from original;" then index that and drop original.  
Or just "create table location as select 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')');" along with what ever id you have for the original 
tuple (if it's not just lat+lon) and join or view as necessary after 
indexing.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY: row is too big

2017-01-02 Thread Rob Sargent


> On Jan 2, 2017, at 10:13 AM, Adrian Klaver  wrote:
> 
>> On 01/02/2017 09:03 AM, vod vos wrote:
>> You know, the csv file was exported from other database of a machine, so
>> I really dont want to break it for it is a hard work. Every csv file
>> contains headers and values. If I redesign the table, then I have to cut
>> all the csv files into pieces one by one. 
> 
> If it helps:
> 
> http://csvkit.readthedocs.io/en/latest/tutorial/1_getting_started.html#csvcut-data-scalpel
>> 
>> 
>>  On 星期一, 02 一月 2017 08:21:29 -0800 *Tom Lane
>> * wrote 
>> 
>>vod vos > writes:
>>> When I copy data from csv file, a very long values for many
>>columns (about 1100 columns). The errors appears:
>>> ERROR: row is too big: size 11808, maximum size 8160
>> 
>>You need to rethink your table schema so you have fewer columns.
>>Perhaps you can combine some of them into arrays, for example.
>>JSON might be a useful option, too.
>> 
>>regards, tom lane
>> 
>> 
>>-- 
>>Sent via pgsql-general mailing list (pgsql-general@postgresql.org
>>)
>>To make changes to your subscription:
>>http://www.postgresql.org/mailpref/pgsql-general
>> 
>> 
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Perhaps this is your opportunity to correct someone else's mistake. You need to 
show the table definition to convince us that it cannot be improved. That it 
may be hard work really doesn't mean it's not the right path. 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Rob Sargent
I would hope Postgres core folk take no more than a nanosecond to reject the 
idea that they work on an IDE. Focus on reading and writing faster and faster 
ACID all the while. 

> On Dec 29, 2016, at 5:32 PM, Tim Uckun  wrote:
> 
> Honestly I don't even like JS. Having said that I am not too crazy about 
> PL-PGSQL either. I am willing to put up with either given that they are 
> supported widely in default installs of postgres in AWS, Linux and MacOSX,
> 
> As I said before, I think posgres gives a unique and underutilized language 
> platform. You can code in different languages, it has a good variety of built 
> in types, and of course you get persistance and caching built in!  Using 
> DBLINK you might even be able to separate out your code from the bulk of your 
> data in another database. Postgres all the way down!
> 
> It's fun to play around with.  There is a lot of missing pieces though. A 
> good IDE like thing would be good, version control would be nice, deeper 
> namespacing (hierarchical schemas?), easier testing etc would go a long way. 
> 
> Thanks for all the input guys! 
> 
>> On Fri, Dec 30, 2016 at 12:14 AM, Ivan Sergio Borgonovo 
>>  wrote:
>> On 12/29/2016 10:35 AM, Pavel Stehule wrote:
>> 
>>> 2016-12-29 10:03 GMT+01:00 Tim Uckun >> >:
>>> 
>>> I think it's awesome that postgres allows you to code in different
>>> languages like this. It really is a unique development environment
>>> and one that is overlooked as a development platform.  It would be
>>> nice if more languages were delivered in the default package
>>> especially lua, V8 and mruby.
>>> 
>>> 
>>> It is about dependencies and maintenance. There are not too much people
>>> who has good experience with C embedding Lua, V8 and others. Any people
>>> who can do some work are welcome.
>>> 
>>> The living outside main package has disadvantages - only enthusiast
>>> knows about it, but some advantages too - you are not fixed on
>>> PostgreSQL development cycle, and development can be faster.
>> 
>> I'll add my 2 cents.
>> 
>> Postgresql and in general SQL are about integrity and coherency.
>> Checking coherency is much easier with strict data type.
>> PL/PGSQL gives you that, JS is far far away from that.
>> 
>> Postgresql is a very flexible database and you can stretch it to do "MEAN 
>> like"[1] stuff but that's going to increase your "impedance mismatch".
>> 
>> If you think there is some space for JS in your application stack that's 
>> nearer to the client rather than to the DB.
>> Or possibly you need to do "MEAN like" stuff but you don't want to install 
>> another "database".
>> 
>> As other said using stored procedures is a two edged sword.
>> It can decouple DB schema from the application or it can increase the 
>> coupling.
>> Choosing JS for performance in the stored procedure realm is going to 
>> encourage coupling and make scalability harder and it is going to become a 
>> mess when you'll need to refactor.
>> 
>> [1] https://en.wikipedia.org/wiki/MEAN_(software_bundle)
>> 
>> -- 
>> Ivan Sergio Borgonovo
>> http://www.webthatworks.it http://www.borgonovo.net
>> 
>> 
>> 
>> 
>> -- 
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
> 


Re: [GENERAL] About the MONEY type

2016-12-18 Thread Rob Sargent

> On Dec 18, 2016, at 5:23 PM, Gavin Flower  
> wrote:
> 
> On 18/12/16 12:25, Bruce Momjian wrote:
>> On Wed, Nov 30, 2016 at 01:35:12PM -0800, John R Pierce wrote:
>>> note, btw, TIMESTAMP WITH TIME ZONE doesn't actually store the timezone...
>>> rather, it converts it to an internal representation of GMT, and then 
>>> converts
>>> it back to display time at the client's current (or specified) time zone.
>> Right, TIMESTAMP WITH TIME ZONE converts the timestamp value to the
>> local time zone on output.  Imagine a monetary type that converted the
>> money amount to local currency on output --- that would be cool.
>> 
> Hmm...
> 
> Would need to know the appropriate conversion rate. the 2 obvious 
> dates/times, on entry and now, may neither be the one wanted.
> 
> Also, often the buy/sell conversion rates are not the same!
> 
> Am sure there also other issues.
> 
> I don't think automatic conversion is as easy as you make it out to be.
> 
> 
> 
> Cheers,
> 
> Gavin
> 
Bets on how serious Mr. Pierce was are currently trending in Vegas.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] huge table occupation after updates

2016-12-10 Thread Rob Sargent

> On Dec 10, 2016, at 10:01 AM, Tom DalPozzo <t.dalpo...@gmail.com> wrote:
> 
> 2016-12-10 16:36 GMT+01:00 Rob Sargent <robjsarg...@gmail.com>:
> 
> > On Dec 10, 2016, at 7:27 AM, Tom DalPozzo <t.dalpo...@gmail.com> wrote:
> >
> > Hi,
> > I'd like to do that! But my DB must be crash proof! Very high reliability 
> > is a must.
> > I also use sycn replication.
> > Regards
> > Pupillo
> >
> >
> >
> >
> > Are each of the updates visible to a user or read/analyzed by another 
> > activity?  If not you can do most of the update in memory and flush a 
> > snapshot periodically to the database.
> >
> >
> 
> This list discourages top posting. You’re asked to place your reply at the 
> bottom
> 
> You haven’t laid out you’re application architecture (how many clients, who 
> is reading who is writing, etc). Caching doesn’t mean your database is any 
> less crash proof.  At that rate of activity, depending on architecture, you 
> could lose updates in all sorts of crash scenarios.
> 
> ​As for crash proof, I meant that once my client app is told that her update 
> request was committed, it mustn't get lost (hdd failure apart of course). And 
> I can't wait to flush the cache before telling to the app :"committed".
> ​I can replicate also the cache on the standby PC of course. ​ 
> Regards
> Pupillo
> 
> 
> 
> ​ 
> 

OK clientA sends an update; you commit and tell clientA committed. clientB 
updates same record; Do you tell clientA of clientB’s update?
Are the two updates cumulative or destructive.  
Can you report all updates done by clientA?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] huge table occupation after updates

2016-12-10 Thread Rob Sargent

> On Dec 10, 2016, at 7:27 AM, Tom DalPozzo  wrote:
> 
> Hi,
> I'd like to do that! But my DB must be crash proof! Very high reliability is 
> a must.
> I also use sycn replication.
> Regards
> Pupillo
> 
> 
> 
> 
> Are each of the updates visible to a user or read/analyzed by another 
> activity?  If not you can do most of the update in memory and flush a 
> snapshot periodically to the database.
> 
> 

This list discourages top posting. You’re asked to place your reply at the 
bottom

You haven’t laid out you’re application architecture (how many clients, who is 
reading who is writing, etc). Caching doesn’t mean your database is any less 
crash proof.  At that rate of activity, depending on architecture, you could 
lose updates in all sorts of crash scenarios.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] huge table occupation after updates

2016-12-10 Thread Rob Sargent

> On Dec 10, 2016, at 6:25 AM, Tom DalPozzo  wrote:
> 
> Hi,
> you're right, VACUUM FULL  recovered the space, completely.
> So, at this point I'm worried about my needs.
> I cannot issue vacuum full as I read it locks the table.
> In my DB, I (would) need to have a table with one bigint id field+ 10 bytea 
> fields, 100 bytes long each (more or less, not fixed). 
> 5/1 rows maximum, but let's say 5000.
> As traffic I can suppose 1 updates per row per day (spread over groups of 
> hours; each update involving two of those fields, randomly. 
> Also rows are chosen randomly (in my test I used a block of 2000 just to try 
> one possibility).
> So, it's a total of 50 millions updates per day, hence (50millions * 100 
> bytes *2 fields updated) 10Gbytes net per day.
> I'm afraid it's not possible, according to my results.
> Reagrds
> Pupillo
> 

Are each of the updates visible to a user or read/analyzed by another activity? 
 If not you can do most of the update in memory and flush a snapshot 
periodically to the database.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] high transaction rate

2016-12-07 Thread Rob Sargent



On 12/07/2016 03:32 PM, John R Pierce wrote:

On 12/7/2016 2:23 PM, Rob Sargent wrote:

How does your reply change, if at all, if:
 - Fields not index
 - 5000 hot records per 100K records (millions of records total)
 - A dozen machines writing 1 update per 10 seconds (one machine 
writing every 2 mins)

 - - each to a different "5000"
  or (two modes of operation)
 - - each to same "5000"

My guess this would be slow enough even in the second mode?  Or at 
this rate and style should I care?
Sorry for taking this off from OP's point 


thats 1 update of 5000 records every 2 minutes per each of 12 client 
hosts?   thats still a fair amount of tuples/second and in a table 
with millions of records, the vacuum will have a lot more to go through.


9.6 has some potentially significant enhancements in how autovacuum 
operates with respect to incrementally freezing blocks.



if you think your update patterns can take advantage of HOT, its a 
good idea to set the FILL_FACTOR of the table prior to populating it, 
maybe to 50% ?  this will make the initial table twice as large, but 
provide freespace in every block for these in-block HOT operations.


for a table that large, you'll definitely need to crank up the 
aggressiveness of autovacuum if you hope to keep up with that number 
of stale tuples distributed among millions of records.




Much appreciated - endOfOffTopic :)


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] high transaction rate

2016-12-07 Thread Rob Sargent


On 12/07/2016 09:58 AM, John R Pierce wrote:

On 12/7/2016 8:47 AM, Rob Sargent wrote:
Please tell me that in this case, updating 2 (big)integer columns 
does not generate dead tuples (i.e. does not involve a insert/delete 
pair).


if the fields being updated aren't indexed, and there's free tuple 
space that has already been vacuumed in the same block, then the 
update is done via "HOT" (or is it HEAT?) within the same block... but 
with 1000s of updates per second to the same 500 rows ?  odds of 
autovacuum keeping up are sketchy..otherwise, all updates are 
insert/delete operations due to the requirements of MVCC

How does your reply change, if at all, if:
 - Fields not index
 - 5000 hot records per 100K records (millions of records total)
 - A dozen machines writing 1 update per 10 seconds (one machine 
writing every 2 mins)

 - - each to a different "5000"
  or (two modes of operation)
 - - each to same "5000"

My guess this would be slow enough even in the second mode?  Or at this 
rate and style should I care?

Sorry for taking this off from OP's point




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken

2016-12-07 Thread Rob Sargent



On 12/07/2016 02:06 PM, Kevin Grittner wrote:

On Wed, Dec 7, 2016 at 7:33 AM, Michael Sheaver  wrote:

I would like to echo the sentiment on collation and expand it to
character sets in general. When issues with them come up, they do
take an incredible amount of time and effort to resolve, and are
one of my own biggest pain points when dealing with databases and
datasets from other sources.

I would be much happier if most developers understood the
difference between a character set (e.g., Unicode) and a character
encoding scheme (e.g., UTF-8 or UTF-16) and how the two concepts
relate.  If we reached a point where most DBAs understood the point
of being able to set a client_encoding that is different from the
server_encoding, I think I would need to pop the cork on some
champagne.

Hm.  Maybe a topic for a blog post

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



+1 - Confuses me every time


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] high transaction rate

2016-12-07 Thread Rob Sargent
2.  Accumulation of dead tuples leading to what should be very short 
operations taking longer.


No idea of that is helpful but where I would probably start



Please tell me that in this case, updating 2 (big)integer columns does 
not generate dead tuples (i.e. does not involve a insert/delete pair).




Re: [GENERAL] Query help

2016-10-04 Thread Rob Sargent

> On Oct 4, 2016, at 9:31 PM, Bret Stern  
> wrote:
> 
> Good evening,
> I'm curious about a way to ask the following question of my vendors
> table.
> 
> psuedo1 "select all vendors which exist in BUR and EBC and SNJ"
> 
> and
> psuedo2 "select all vendors which DO NOT exist in all three show rooms
> 
> 
> The data is from a Sage accounting system which I pull out and place in a
> pg db. What we are trying to do is identify which vendors are defined in
> all three of our showrooms, and vice-versa, which ones are not.
> 
> ap_vendors table
>   company_code character varying(10) NOT NULL,
>   ap_division_no character varying(2) NOT NULL,
>   vendor_no character varying(7) NOT NULL,
>   terms_code character varying(2),
>   vendor_name character varying(30),
>   address_line1 character varying(30),
>   address_line2 character varying(30),
>   address_line3 character varying(30),
>   city character varying(20),
>   state character varying(2),
>   zip_code character varying(10),
>   telephone_no character varying(17),
>   fax_no character varying(17),
>   CONSTRAINT ap_vendors_pk PRIMARY KEY (company_code, ap_division_no, 
> vendor_no)
> 
> sample records:
> "BUR";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";"92831";"''";"''"
> "BUR";"00";"AGORA";"30";"Agora Natural 
> Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
> "BUR";"00";"AKDOP";"30";"AKDO 
> Pacific";"''";"''";"''";"''";"''";"94545";"''";"''"
> "EBC";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";"92831";"''";"''"
> "EBC";"00";"AGORA";"30";"Agora Natural 
> Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
> "EBC";"00";"ARIZ01";"30";"Arizona 
> Tile";"''";"''";"''";"''";"''";"94550";"''";"''"
> "SNJ";"00";"AKDOP";"30";"AKDO 
> Pacific";"''";"''";"''";"''";"''";"94545";"''";"''"
> "SNJ";"00";"AGORA";"30";"Agora Natural 
> Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
> "SNJ";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";"92831";"''";"''"
> 
> What I need is a query which I can modify to return only vendors which exists
> in all three company_code columns ( BUR, EBC, SNJ) (they represent showroom 
> location)
> 
> eg; exists in BUR, EBC, SNJ
> ADEXU
> AGORA
> 
> OR
> 
> modify the query to return only the vendors which DO NOT exist in all
> three showrooms based on the first column company_code
> 
> eg; 
> AKDOP only exists in BUR and SNJ
> ARIZ01only exists in EBC
> 
> Thanks
> Bret
> 
> 
Not sure I like the schema but 
select vendor_no, count(*) from ap_vendors having count(*) = 3;

and maybe count(*) < 3 is your second answer.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incrementally refreshed materialized view

2016-09-26 Thread Rob Sargent

Of course 9.5 is the current release so the answer is Yes, since 9.5

On 09/26/2016 12:29 PM, Rakesh Kumar wrote:


*Does PG support INCREMENTAL MV ? Looks like not (until 9.5)*





Re: [GENERAL] Chante domain type - Postgres 9.2

2016-09-26 Thread Rob Sargent



On 09/26/2016 08:14 AM, Adrian Klaver wrote:

On 09/26/2016 06:54 AM, Thomas Kellerer wrote:

Rakesh Kumar schrieb am 26.09.2016 um 15:08:
You sound like you think that varchar(50) is somehow cheaper than 
text.


The biggest impediment to text cols in other RDBMS  is no index 
allowed.
If PG has an elegant solution to that, then yes I see the point made 
by the

original poster.


Don't confuse Postgres' "text" data type with "text" in other DBMS.


Just be aware that layers above the database often do not understand 
that and will see text as a memo field. For instance in Django a text 
field will get rendered as a Textarea widget whereas a varchar field 
will be rendered as an TextInput widget. You can override that, but it 
is extra work. Luckily Postgres has the notion of an unbounded varchar:


https://www.postgresql.org/docs/9.5/static/datatype-character.html

".. If character varying is used without length specifier, the type 
accepts strings of any size. The latter is a PostgreSQL extension."


This allows you to have 'text' without it being called text, as stated 
below.




There is no difference whatsoever between varchar and text in Postgres.











Does that trick remove the overhead (length check) Tom mentioned upstream?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Request to share information regarding postgresql pg_xlog file.

2016-09-15 Thread Rob Sargent

> On Sep 15, 2016, at 1:20 AM, Yogesh Sharma 
>  wrote:
> 
> Dear John and all,
>  
> >8.1 has been obsolete and unsupported for about 6 years now.8.1.18 was 
> >released in 2009, the final 8.1.23 release was in 2010, after which it was 
> >>dropped.
> Yes, we understood your point.
> But we require some information related to this rpm.
>  
> >These errors suggest disk file corruption, this can occur from unreliable 
> >storage, undetected memory errors, and other such things.
> How we can verify what is actual problem in system?
>  
> Also please share some information related to below.
> we tried to stop the postgresql but it couldn’t stop and timout after 60 sec.
> please confirm below message in postgre logs.
> FATAL:  terminating connection due to administrator command
>  
>  
> Regards,
> Yogesh
>  
> From: pgsql-general-ow...@postgresql.org 
> [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
> Sent: Thursday, September 15, 2016 11:28 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Request to share information regarding postgresql 
> pg_xlog file.
>  
> On 9/14/2016 10:09 PM, Yogesh Sharma wrote:
> Thanks for your support and suggestion.
> We are using below postgresql rpm.
> postgresql-8.1.18-2.1
> 
> thats not the full RPM name, thats just the version.
> 
> 8.1 has been obsolete and unsupported for about 6 years now.8.1.18 was 
> released in 2009, the final 8.1.23 release was in 2010, after which it was 
> dropped.
> 
> current releases are 9.1 (soon to be obsoletted), 9.2, 9.3, 9.4, and 9.5, 
> with 9.6 in release candidate state.
> 
> 
> CONTEXT:  writing block 53 of relation 1663/16385/280951
> ERROR:  could not open relation 1663/16385/280951: No such file or directory
> 
> 
> These errors suggest disk file corruption, this can occur from unreliable 
> storage, undetected memory errors, and other such things.
> 
> 
> 
>  
> 
> -- 
> john r pierce, recycling bits in santa cruz
What operating system is this running on? John is most likely correct: the disk 
is not healthy.  How you deal with that depends on your OS
Are you looking for the rpm for that version? Or do you have some other reason 
for asking about the rpm versus questions about the postgres version
This list requests that you “bottom post” i.e. add your comments to the bottom, 
not the top. (I don’t like it, but that’s the protocol here)




Re: [GENERAL] Sequential vs. random values - number of pages in B-tree

2016-08-23 Thread Rob Sargent


On 08/23/2016 08:34 AM, Francisco Olarte wrote:

On Tue, Aug 23, 2016 at 4:28 PM, Rob Sargent <robjsarg...@gmail.com> wrote:

On 08/23/2016 07:44 AM, Francisco Olarte wrote:

On Tue, Aug 23, 2016 at 2:26 PM, pinker <pin...@onet.eu> wrote:

I am just surprised by the order of magnitude in the difference though. 2
and 27 minutes that's the huge difference... I did another, simplified
test,
to make sure there is no duplicates and the only difference between both
sets is the order:

...

INSERT INTO t_sequential SELECT * FROM source_sequential;
102258,949 ms
INSERT INTO t_random SELECT * FROM source_random;
1657575,699 ms

If I read correctly, you are getting 100s/10Mkeys=10us/key in
sequential, and 165 in random.

I'm not surprissed at all. I've got greater differences on a memory
tree, sorted insertion can be easily optimized to be very fast. AS an
example, sequential insertion can easily avoid moving data while
filling the pages and, with a little care, it can also avoid some of
them when splitting. I'm not current with the current postgres
details, but it does not surprise me they have big optimizations for
this, especially when index ordered insertion is quite common in
things like bulk loads or timestamped log lines.

And if each insert is in a separate transaction, does this still hold true?

What are you referring to by 'this'? ( BTW, bear in mind one
transaction needs at least a disk flush, and, if done via network, at
least one RTT, so I doubt you can achieve 10us/transaction unless you
have very special conditions ).

Francisco Olarte.
By 'this' I was referring to the optimizations mentioned, and am 
wondering if this holds true under user load.  Much magic can happen in 
a custom data load, but do these optimization apply to an application 
loading single (or perhaps several) records per transaction.  Does one, 
in that scenario, not suffer any consequence for continuously loading 
one side of the tree (the rightmost node?).



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sequential vs. random values - number of pages in B-tree

2016-08-23 Thread Rob Sargent



On 08/23/2016 07:44 AM, Francisco Olarte wrote:

Hi pinker:

On Tue, Aug 23, 2016 at 2:26 PM, pinker  wrote:

I am just surprised by the order of magnitude in the difference though. 2
and 27 minutes that's the huge difference... I did another, simplified test,
to make sure there is no duplicates and the only difference between both
sets is the order:

...

INSERT INTO t_sequential SELECT * FROM source_sequential;
102258,949 ms
INSERT INTO t_random SELECT * FROM source_random;
1657575,699 ms

If I read correctly, you are getting 100s/10Mkeys=10us/key in
sequential, and 165 in random.

I'm not surprissed at all. I've got greater differences on a memory
tree, sorted insertion can be easily optimized to be very fast. AS an
example, sequential insertion can easily avoid moving data while
filling the pages and, with a little care, it can also avoid some of
them when splitting. I'm not current with the current postgres
details, but it does not surprise me they have big optimizations for
this, especially when index ordered insertion is quite common in
things like bulk loads or timestamped log lines.

Francisco Olarte.



And if each insert is in a separate transaction, does this still hold true?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Detecting if current transaction is modifying the database

2016-08-05 Thread Rob Sargent

On 08/05/2016 02:15 PM, Christian Ohler wrote:

On Fri, Aug 5, 2016 at 12:55 PM, Rob Sargent <robjsarg...@gmail.com> wrote:

What sort of interface are you looking for.  Where/When would you grab the 
information? Do what with it?  Log triggers are the typical pattern here (with 
packages just for that sort of thing).

I'm looking for a statement (or sequence of statements) that, when run
within a transaction, tells the client if any writes are happening in
that transaction – basically an interface similar to my proposed
solution.  I have some database wrapper code on the client that passes
along arbitrary statements to Postgres, and I'd like for that wrapper
code to be able to determine whether the current transaction is a
write (for various purposes including logging).  It would be nice not
to make the client-side wrapper code dependent on instrumentation of
the database schema itself.

What can you tell me about my proposed solution?  Does it do what I
describe I want from it?  Are there limitations I should be aware of?
At what point do you intend to inform the client that the db will be 
(significantly) altered? You're planned call is within the transaction 
and presumably late in the sequence (so the locks have been created). 
Not sure if your client can see any results until after the transaction 
has been largely executed.  Does that matter?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Detecting if current transaction is modifying the database

2016-08-05 Thread Rob Sargent



On 08/05/2016 01:48 PM, Christian Ohler wrote:
Thanks, fair point.  I should have mentioned that I know about 
triggers but was hoping to find a less invasive mechanism (IIUC, I'd 
have to install a trigger on every table) – it seems to me that 
Postgres should just be able to tell me whether COMMIT will do 
anything, it obviously has to track that somehow (or some 
approximation of it).


Another thing I should have mentioned is that I don't consider 
incrementing a sequence to be a modification.



On Fri, Aug 5, 2016 at 12:35 PM, Alex Ignatov 
> wrote:


Hi! Make trigger function

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




On Fri, Aug 5, 2016 at 10:25 PM +0300, "Christian Ohler"
> wrote:

Hi,

I'm trying to find a way to have Postgres tell me if the
current transaction would modify database if I committed it
now.  I can live with a conservative approximation (sometimes
– ideally, rarely – get a "yes" even though nothing would be
modified, but never get a "no" even though there are pending
modifications).  It's acceptable (probably even desirable) if
a no-op write operation like "UPDATE foo SET bar = 1 WHERE bar
= 1" is considered a modification.

(The use case is an audit log mechanism vaguely similar to
pgMemento.)


This sentence from
https://www.postgresql.org/docs/9.5/static/view-pg-locks.html
 :

> If a permanent ID is assigned to the transaction (which
normally happens
> only if the transaction changes the state of the database),
it also holds
> an exclusive lock on its permanent transaction ID until it ends.

makes me think that I can perhaps do it as follows:

SELECT count(*) FROM pg_locks WHERE pid=pg_backend_pid() AND
locktype='transactionid' AND mode='ExclusiveLock' AND granted;

Is that right?  "Permanent transaction ID" refers to the XID,
correct?  Are there other, better ways?  Are there ways to
avoid false positives due to temp tables?

Thanks in advance,
Christian.


What sort of interface are you looking for.  Where/When would you grab 
the information? Do what with it?  Log triggers are the typical pattern 
here (with packages just for that sort of thing).




Re: [GENERAL] pasting a lot of commands to psql

2016-07-07 Thread Rob Sargent
If, on the wild chance you're an emacs user, if the section is in a 
".sql" buffer Ctrl-C Ctrl-B will send the entire buffer.  Not sure if 
there's a practical limit on the number of line/statements, but I've 
used this with a couple of "pages" worth of sql.


On 07/07/2016 11:48 AM, Dmitry Shalashov wrote:

Oh sweet! Thanks!


Dmitry Shalashov, surfingbird.ru  & relap.io 



2016-07-07 20:39 GMT+03:00 Scott Mead >:




On Thu, Jul 7, 2016 at 1:30 PM, Dmitry Shalashov
> wrote:

Hi Sameer,

I am trying to copy-paste (and execute) random snippets of SQL
to psql console.

There is another ways to do it, which do not involve
copy-paste, but I am wondering why is copy-paste won't work.
What exactly is happening there...


Dmitry Shalashov, surfingbird.ru  &
relap.io 

2016-07-07 20:26 GMT+03:00 Sameer Kumar
>:



On Fri, 8 Jul 2016, 1:23 a.m. Dmitry Shalashov,
> wrote:

Hi everyone.

Let say that I have some sql file with like hundred of
simple statements in it. I `cat` it, copy it to
buffer, go to my beloved psql and insert it there.
But somewhere after first few lines it screws over:

b2b=> BEGIN;
b2b=> INSERT INTO oko_topsites
VALUES('russian_federation','ati.su
',0,NULL,5);
INSERT INTO oko_topsites
VALUES('russian_federation'b2b=> INSERT INTO
oko_topsites VALUES('russian_federation','atn.ua
',0,NULL,5);
b2b=> INSERT INTO oko_topsites
VALUES('russian_federation','audit-it.ru
',0,NULL,5);
b2b=> INSERT INTO oko_topsites
VALUES('russian_federation','aup.ru
',0,NULL,5);
'
b2b=> INSERT INTO oko_topsites
VALUES('russian_federation','autocentre.ua
',0,NULL,5);

b2b=> INSERT INTO oko_topsites
VALUES('russian_federation','autodoc.ru
',0,NULL,5);

b2b=> INSERT INTO oko_topsites
VALUES('russian_federation','calend.ru '

b2b(> INSERT INTO oko_topsites
VALUES('russian_federation','calorizator.ru
',0,NULL,5)

Unclosed quotes, unclosed parenthesis - anyway it wont
work.

How to safely insert big number of statements to psql
at once?
I am aware about "execute this file" \i option of
psql, that is not the answer I am looking for, thanks :-)


My personal favorite for this exact thing is to use '\e'

  When you are in psql, if you \e (on *nix) it will open a temp
file in whatever your $EDITOR variable is set ( I use vim).  Paste
your data, then save-close the file.  It will put you back into
psql and execute the command for you.

--Scott


What are you exactly aiming to do?

Have you tried -
psql  < myfile



Dmitry Shalashov, surfingbird.ru
 & relap.io 

-- 
--

Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com





-- 
--

Scott Mead
Sr. Architect
/OpenSCG /
http://openscg.com






Re: [GENERAL] pg_dump fundenental question

2016-07-05 Thread Rob Sargent



On 07/05/2016 10:54 AM, David G. Johnston wrote:
On Tue, Jul 5, 2016 at 10:54 AM, J. Cassidy >wrote:


Hello all,

I have hopefully an "easy" question.

If I issue the pg_dump command with no switches or options i.e.

/usr/local/pgsql/bin/pg_dump -v dbabcd > /data3TB/Backup/dbabcd

Is their any "default" compression involved or not? Does pg_dump
talk to zlib during the dump process given that I have not
specified any compression on the
command line? (see above).

​
​ In general the easiest way to answer the question of whether 
something (well, a text file at least) is compressed or not is to 
attempt to open it in a text editor - if you can read the contents​

​ its not compressed.

Reading the Description section at the following location provides the 
answer you seek - no, there is no compression by default.​


​https://www.postgresql.org/docs/9.5/static/app-pgdump.html
​David J.

I like the 'file' command, and now the MS is shipping bash, it should be 
available to all.


Re: [GENERAL] Stored procedure version control

2016-07-01 Thread Rob Sargent



On 07/01/2016 06:17 PM, Jim Nasby wrote:

On 6/30/16 9:16 AM, Merlin Moncure wrote:

It's not really necessary to create version down scripts.  In five
years of managing complex database environments we've never had to
roll a version back and likely never will; in the event of a disaster
it's probably better to restore from backup anyways.


I'm surprised no one has mentioned http://sqitch.org. It makes it very 
easy to manage migrations, as well as creating downgrade scripts (if 
you use rework, and put each object into it's own file).


I do agree that down scripts are pretty over-rated as long as you have 
good test practices (as in, database unit tests). In 9 years in an 
environment where downtime was 6 figures per hour I only had 1 or 2 
deployments that had problems, and never bad enough to consider 
reverting.
I've found down scripts quite useful in development.  One does have to 
know how to neuter them on the way to production however.






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to pass jsonb and inet arguments to a stored function with JDBC?

2016-06-14 Thread Rob Sargent


> On Jun 14, 2016, at 7:33 AM, Alexander Farber  
> wrote:
> 
> Dear PostgreSQL users,
> 
> I have a stored procedure defined as:
> 
> CREATE OR REPLACE FUNCTION words_merge_users(
> IN in_users jsonb,
> IN in_ip inet,
> OUT out_uid integer)
> RETURNS integer AS
> $func$
> ...
> $func$ LANGUAGE plpgsql;
> 
> Which I can successfully call at psql 9.5.3 prompt:
> 
> # SELECT out_uid FROM 
> words_merge_users('[{"sid":"abcde","auth":"","social":2,"given":"Abcde2","female":0,"stamp":1450102880}]'::jsonb,
>  '0.0.0.0'::inet);
> 
>  out_uid
> -
>1
> (1 row)
> 
> However an SQLException is thrown, when trying to call the same function via 
> JDBC driver 9.4.1208.jre7:
> 
> ERROR: function words_merge_users(character varying, character varying) does 
> not exist   
> Hint: No function matches the given name and argument types. You might need 
> to add explicit type casts.
> 
> Being a JDBC newbie I wonder, how to perform the call properly in Java?
> 
> PreparedStatement st = mDatabase.prepareStatement("SELECT out_uid 
> FROM words_merge_users(?, ?)");
> 
> String str1 = JSON.toString(users); 
> String str2 = 
> mSession.getRemoteAddress().getAddress().getHostAddress(); 
> 
> st.setString(1, str1); // 
> {"social":1,"auth":"","action":"login","users":[{"given":"alex","social":1,"auth":"","sid":"12345"}],"sid":"12345"}
> st.setString(2, str2); // "127.0.0.1"
> 
> ResultSet rs = st.executeQuery();
> while (rs.next()) {
> System.out.println(rs.getString(1));
> }
> 
> rs.close();
> st.close();
> 
> Thank you for any hints
> Alex

Probably a permission or ownership issue. Are you using same credentials for 
jdbc and psql?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Automate copy - Postgres 9.2

2016-06-08 Thread Rob Sargent

> On Jun 8, 2016, at 8:04 PM, Patrick B  wrote:
> 
> 
> 
> 2016-06-09 13:58 GMT+12:00 John R Pierce  >:
> On 6/8/2016 6:47 PM, Patrick B wrote:
> 
> 21 is the number of IDS that I wanna perform that COPY command
> 
> that didn't answer my question.  if you call your function like SELECT 
> myfunction(21);   as you showed, where are those 21 ID's coming from?
> 
> 
> I don't know what u need, mate:
> 
> 
> CREATE or REPLACE FUNCTION function(integer)
> RETURNS void AS $$
> 
> declare
>crtRow record;
> begin
>   FOR crtRow in EXECUTE 'select DISTINCT(account_id) from backup_table 
> WHERE migrated = 1 AND account_id IN '|| $1
> 
>   LOOP
> COPY
> (SELECT * FROM backup_table WHERE migrated = 1 AND account_id = 
> crtRow.account_id)
> TO '/var/lib/pgsql/' || crtrow.account_id || '.csv';
> end loop;
> end
> 
> $$ language 'plpgsql';
> 
> 
> Isn't this what u wanna know?
> select DISTINCT(account_id) from backup_table WHERE migrated = 1

So you want to call your function in a loop, or drop the passed in id and just 
do all the account at once, take your pick.



Re: [GENERAL] 20160417105248.d20dcefed39b5d9031c6b...@potentialtech.com

2016-04-17 Thread Rob Sargent

> On Apr 17, 2016, at 12:41 PM, Sergei Agalakov  
> wrote:
> 
> I know about DBSteward. I don't like to bring PHP infrastructure only to be 
> able to compare two dumps,
> and to deal with potential bugs in the third party tools. The pg_dump in 
> other hand is always here, and is always trusted.
> SQLWorkbench/J also can compare two schemas, and requires only Java. Again, I 
> trust pg_dump more.
> http://www.sql-workbench.net/
> 
> May be pg_dump was never INTENDED to generate the dump files with the 
> determined order of the statements,
> but it CAN do it with the minor changes, and be more useful to 
> administrators. Why rely on the third party tools
> for the tasks that can be done with the native, trusted tools?
> 
> Sergei
Does it matter if they differ if you cannot recreate the correct one exactly 
from source-controllled DDL?  Or know how they are supposed to differ if this 
is a migration point?



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Non-default postgresql.conf values to log

2016-04-07 Thread Rob Sargent


> On Apr 7, 2016, at 8:26 AM, Alex Ignatov  wrote:
> 
> 
> 
>> On 07.04.2016 16:53, Tom Lane wrote:
>> Alex Ignatov  writes:
>>> My question is: is there any option(s) to log non-default
>>> postgresql.conf values to log file?
>> No, but you can easily find all the non-default settings by querying
>> the pg_settings view.
>> 
>>regards, tom lane
> 
> Yeah, i know it but pg_settings is available only after pg is ready to accept 
> client connections.
> Also in some cases log file is placed on another server say, syslog. And if 
> your pg server is not available you cant say anything about your pg_settings.

At that point you're looking in a file: log file or config file. Does it matter 
which?
> -- 
> Alex Ignatov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
> 
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table size for partitioned setup

2016-03-28 Thread Rob Sargent



On 03/28/2016 02:55 PM, Mat Arye wrote:

This will run on EC2 (or other cloud service) machines and on ssds.
Right now runs on m4.4xlarge with 64GiB of ram.
Willing to pay for beefy instances if it means better performance.


On Mon, Mar 28, 2016 at 4:49 PM, Rob Sargent <robjsarg...@gmail.com 
<mailto:robjsarg...@gmail.com>> wrote:




On 03/28/2016 02:41 PM, Mat Arye wrote:

Hi All,

I am writing a program that needs time-series-based insert
mostly workload. I need to make the system scaleable with many
thousand of inserts/s. One of the techniques I plan to use is
time-based table partitioning and I am trying to figure out
how large to make my time tables.

Does anybody have any hints on optimal table sizes either in
terms of rows or in terms of size? Any rule of thumbs I can
use for table size in relation to amount of memory on the
machine? Is the size of the index more important than the size
of the table (if queries mostly use indexes)?

Basically, I am asking for pointers about how to think about
this problem and any experiences people have had.

Thanks,
Mat

P.S. I am aware of limits listed here:
http://www.postgresql.org/about/. I am asking about practical
size limits for performance consideration.

Your current hardware, or hardware budget might play into the answer.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org

<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Those who supply real answers on this list um, er, discourage 
top-posting. (Not my fave, but there you go)





Re: [GENERAL] Table size for partitioned setup

2016-03-28 Thread Rob Sargent



On 03/28/2016 02:41 PM, Mat Arye wrote:

Hi All,

I am writing a program that needs time-series-based insert mostly 
workload. I need to make the system scaleable with many thousand of 
inserts/s. One of the techniques I plan to use is time-based table 
partitioning and I am trying to figure out how large to make my time 
tables.


Does anybody have any hints on optimal table sizes either in terms of 
rows or in terms of size? Any rule of thumbs I can use for table size 
in relation to amount of memory on the machine? Is the size of the 
index more important than the size of the table (if queries mostly use 
indexes)?


Basically, I am asking for pointers about how to think about this 
problem and any experiences people have had.


Thanks,
Mat

P.S. I am aware of limits listed here: 
http://www.postgresql.org/about/. I am asking about practical size 
limits for performance consideration.

Your current hardware, or hardware budget might play into the answer.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Partitioning and ORM tools

2016-03-22 Thread Rob Sargent



On 03/22/2016 03:00 PM, Joshua D. Drake wrote:

On 03/22/2016 01:50 PM, CS DBA wrote:


Understood, was just wondering if there is a way to cause the child
table insert results to be returned to the ORM/Application instead of
the master/base table insert


Insert into the child table directly based on the partition rules.

JD


I would think the ORM (as yet undefined) would want to think in terms of 
the parent table and not know about the physical schema details.
Can  the client not be written to check only for errors vs checking for 
non-zero inserts?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Partitioning and ORM tools

2016-03-22 Thread Rob Sargent



On 03/22/2016 12:55 PM, Melvin Davidson wrote:
Your problem seems strange as it has never been previously reported 
for anyone else that has _successfully_ set up partioning.
Perhaps is you provide just a little bit more detail we might be able 
to help you.

Useful and needed information would be:
1. Version of PostgreSQL
2. Operating System
3. Table structure for partitioned table
4. Trigger function and trigger used for insert
5. The actual insert statement.

On Tue, Mar 22, 2016 at 2:40 PM, CS DBA > wrote:


Hi All;

we setup partitioning for a large table but had to back off
because the return status (i.e: "INSERT 0 1") returns "INSERT 0 0"
when inserting into the partitioned table which causes the ORM
tool to assume the insert inserted 0 rows. Is there a standard /
best practices work around for this?

Thanks in advance




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org

)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




And which ORM are you using.

I take it the problem is that the ORM is saying "zero rows inserted" and 
that's um, er, upsetting the client ( which might decide to retry and 
then generates an error for non-unique key or some such noise)


rjs



--
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.




Re: [GENERAL] Let's Do the CoC Right

2016-01-22 Thread Rob Sargent



On 01/22/2016 03:53 PM, Andrew Sullivan wrote:
This is why I posted all that stuff about what the IETF does some 
while ago. There is definitely more than one way to do this. Best 
regards, A 
Just a gut feeling, but I think this thread had driven the rest of the 
regulars to drink at a bar without wifi




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Support for BDR in 9.5?

2016-01-08 Thread Rob Sargent



On 01/08/2016 10:39 AM, Andrew Biggs (adb) wrote:
Can anyone tell me if PostgreSQL 9.5 supports (either natively or by 
extension) the BDR functionality?


I tried it out and ran into issues, but it could well have been I was 
doing something wrong.


Thanks!
Andrew


I'm sure those who might be able to help you would be interested in 
what, exactly, it was you tried.





Re: [GENERAL] Convert 2 foreign key values within the same SELECT

2016-01-04 Thread Rob Sargent



On 01/04/2016 12:36 PM, gvim wrote:
I have a query which successfully retrieves id values "me" and "you" 
when 2 planetary values are supplied:


SELECT l.me_id AS me, l.you_id AS you,
a.l AS left, a.aspect, a.r AS right, l.id AS link_id, 
c.comment

FROM aspects a, links_aspects la, links l, syn_comments c
WHERE (a.r = 'Venus' AND a.l = 'Venus')
  AND a.type = 'synastry'
  AND la.aspect_id = a.id
  AND la.id = c.links_aspects_id
  AND la.link_id = l.id

 me  | you | left  | aspect | right | link_id |  comment
-+-+---++---+-+--- 


 365 | 371 | Venus | cnj| Venus |  49 | Garry/Dom Venus/Venus syn
(1 row)


. however, l.me_id and l.you_id are foreign keys referencing a 
users table from which I would like to retrieve concat(u.first_name, ' 
', u.surname) AS name for l.me_id and l.you_id. It seems simple but I 
can't think of a JOIN which will do the trick.


gvim



Join to user table twice, once for you once for me?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Generating an XSD file from an existing database

2015-12-06 Thread Rob Sargent

> On Dec 6, 2015, at 2:30 PM, Blake McBride  wrote:
> 
> Greetings,
> 
> I need an XML specification for my exiting schema - an XSD file.  Reading the 
> docs I see schema_to_xml but I do not understand:
> 
> A.  Will that produce the XSD file I seek?
> 
> B.  I do not understand the documentation.  Can I do this through psql?  I 
> need an exact example if possible.
> 
> Thank you.
> 
> Blake McBride
> 
Yes, those functions are available through psql
select schema_to_xml()
and
select schema_to_xmlschema() are what you’re after.

Redirect the output  of each into separate files.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Recursive Arrays 101

2015-11-05 Thread Rob Sargent

On 11/05/2015 11:08 AM, Gavin Flower wrote:

On 06/11/15 04:33, Rob Sargent wrote:

On 11/05/2015 04:56 AM, Achilleas Mantzios wrote:

On 04/11/2015 17:53, Rob Sargent wrote:

On 11/04/2015 03:03 AM, Achilleas Mantzios wrote:
Sorry for being kind of late to the party (I was in 2015.PgConf.EU 
!!), and not having read
most of the replies, what we have been successfully doing for this 
problem for our app

is do it this way :
parents int[] -- where parents stores the path from the node to 
the root of the tree

and then have those indexes :
btree (first(parents))
btree (level(parents)) -- length
btree (last(parents))
gin (parents gin__int_ops) -- the most important

This has been described as "genealogical tree" approach, and works 
very good, IMHO much better

than nested sets.

Is there a more complete description of this approach available?  
By the title one might assume could be applied to populations as 
opposed to phylogeny (the OP's use case). Does it deal with 
consanguinity?  Does it perform well going "up" the tree (which is 
of course branched at every level)?


From here https://en.wikipedia.org/wiki/Phylogenetic_tree I assume 
that phylogenetic trees are normal
trees, and I see no reason why not be modeled with the genealogical 
approach described. The earliest paper

I based my work on was :
https://www.google.com/url?sa=t=j==s=web=2=0CCUQFjABahUKEwiR6auUlvnIAhXGvhQKHVyDA-s=https%3A%2F%2Fdownload.samba.org%2Fpub%2Funpacked%2Fldb%2Fldb_sqlite3%2Ftrees.ps=AFQjCNEktJsibP435MBki5cdGmO_CzKmwg=I9yC_tpyeWrEueDJTXbyAA=bv.106674449,d.d24=rja 



Finding the root is O(1). Going "up" the tree or finding common 
ancestry is reduced to the problem
of finding overlap/intersections/contains/contained between 
postgresql arrays.


The indexes, functions and operators provided by contrib/intarray 
were a basic element for the success of this

approach.

Going "up" a genealogy to me means getting two parents, four 
grandparents, 8 great grandparents etc.  On a good day, at least when 
there are no loops.  This isn't, to my understanding, how phylogeny 
works (but my genetics degree was thirty year ago) so perhaps I'm 
still confused by the titles used.  And certainly not to say that 
your approach isn't what the OP really needs!



You're actually going 'DOWN' the tree, in terms of how trees are used 
in computer science & graph theory!


See http://www.mathcove.net/petersen/lessons/get-lesson?les=32


Cheers,
Gavin



Fine.  Be that way :)  Still the question of loops/consanguinity?





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Recursive Arrays 101

2015-11-05 Thread Rob Sargent

On 11/05/2015 04:56 AM, Achilleas Mantzios wrote:

On 04/11/2015 17:53, Rob Sargent wrote:

On 11/04/2015 03:03 AM, Achilleas Mantzios wrote:
Sorry for being kind of late to the party (I was in 2015.PgConf.EU 
!!), and not having read
most of the replies, what we have been successfully doing for this 
problem for our app

is do it this way :
parents int[] -- where parents stores the path from the node to the 
root of the tree

and then have those indexes :
btree (first(parents))
btree (level(parents)) -- length
btree (last(parents))
gin (parents gin__int_ops) -- the most important

This has been described as "genealogical tree" approach, and works 
very good, IMHO much better

than nested sets.

Is there a more complete description of this approach available?  By 
the title one might assume could be applied to populations as opposed 
to phylogeny (the OP's use case).  Does it deal with consanguinity?  
Does it perform well going "up" the tree (which is of course branched 
at every level)?


From here https://en.wikipedia.org/wiki/Phylogenetic_tree I assume 
that phylogenetic trees are normal
trees, and I see no reason why not be modeled with the genealogical 
approach described. The earliest paper

I based my work on was :
https://www.google.com/url?sa=t=j==s=web=2=0CCUQFjABahUKEwiR6auUlvnIAhXGvhQKHVyDA-s=https%3A%2F%2Fdownload.samba.org%2Fpub%2Funpacked%2Fldb%2Fldb_sqlite3%2Ftrees.ps=AFQjCNEktJsibP435MBki5cdGmO_CzKmwg=I9yC_tpyeWrEueDJTXbyAA=bv.106674449,d.d24=rja

Finding the root is O(1). Going "up" the tree or finding common 
ancestry is reduced to the problem
of finding overlap/intersections/contains/contained between postgresql 
arrays.


The indexes, functions and operators provided by contrib/intarray were 
a basic element for the success of this

approach.

Going "up" a genealogy to me means getting two parents, four 
grandparents, 8 great grandparents etc.  On a good day, at least when 
there are no loops.  This isn't, to my understanding, how phylogeny 
works (but my genetics degree was thirty year ago) so perhaps I'm still 
confused by the titles used.  And certainly not to say that your 
approach isn't what the OP really needs!





Re: [GENERAL] Recursive Arrays 101

2015-11-04 Thread Rob Sargent

On 11/04/2015 03:03 AM, Achilleas Mantzios wrote:
Sorry for being kind of late to the party (I was in 2015.PgConf.EU 
!!), and not having read
most of the replies, what we have been successfully doing for this 
problem for our app

is do it this way :
parents int[] -- where parents stores the path from the node to the 
root of the tree

and then have those indexes :
btree (first(parents))
btree (level(parents)) -- length
btree (last(parents))
gin (parents gin__int_ops) -- the most important

This has been described as "genealogical tree" approach, and works 
very good, IMHO much better

than nested sets.

Is there a more complete description of this approach available?  By the 
title one might assume could be applied to populations as opposed to 
phylogeny (the OP's use case).  Does it deal with consanguinity?  Does 
it perform well going "up" the tree (which is of course branched at 
every level)?


Re: [GENERAL] Taking lot time

2015-11-03 Thread Rob Sargent

On 11/03/2015 07:42 AM, Ramesh T wrote:

I have a Query it taking a lot of time to fetch results
so,explain query gave

"Hash Join  (cost=55078.00..202405.95 rows=728275 width=418)"
"  Hash Cond: (itd.tran_id = iad._adj__id)"
"  ->  Seq Scan on inv_detail itd  (cost=0.00..40784.18 rows=731029 
width=95)"

"Filter: (event_type = ANY ('{21,22,3,5}'::integer[]))"
"  ->  Hash  (cost=20590.78..20590.78 rows=610978 width=331)"
"->  Seq Scan on inv_adj  iad  (cost=0.00..20590.78 
rows=610978 width=331)"


Can you Please let me know wt happen in query..?wt should i do..
I could be way off base (again) but I think the first thing you do is 
supply the query (sql).


Re: [GENERAL] Hierarchical Query Question (PHP)

2015-10-30 Thread Rob Sargent

On 10/30/2015 04:10 PM, David Blomstrom wrote:
Just so I understand what's going on, I can create a lookup table by 
pasting this code...


create table taxon (
  taxonid serial,
  descr text
);
create table gz_life_mammals (
  id serial,
  taxonid integer, -- use the lookup table
  parentid integer -- use the lookup table
);

...into pgAdmin III, right? (I can't use the shell/terminal at the 
moment, and it will be at least a few weeks before I can get it 
fixed.) And this script will create TWO tables - gz_life_mammals and a 
matching "lookup table"?


If I'm following correctly it will create two tables, the first is the 
definition of the lookup table.  You should add an index its taxonid (if 
'serial' doesn't imply that?)


In the second table (gz_lief_mamals) have taxonid and parentid as 
"foreign keys" to taxon.taxonid.


The gui side of pgAdmin3 should help out here, but it's been quite a 
while since I was there.






Re: [GENERAL] Hierarchical Query Question (PHP)

2015-10-30 Thread Rob Sargent

On 10/30/2015 04:10 PM, David Blomstrom wrote:
Just so I understand what's going on, I can create a lookup table by 
pasting this code...


create table taxon (
  taxonid serial,
  descr text
);
create table gz_life_mammals (
  id serial,
  taxonid integer, -- use the lookup table
  parentid integer -- use the lookup table
);

...into pgAdmin III, right? (I can't use the shell/terminal at the 
moment, and it will be at least a few weeks before I can get it 
fixed.) And this script will create TWO tables - gz_life_mammals and a 
matching "lookup table"?


If I'm following correctly it will create two tables, the first is the 
definition of the lookup table.  You should add an index its taxonid (if 
'serial' doesn't imply that?)


In the second table (gz_lief_mamals) have taxonid and parentid as 
"foreign keys" to taxon.taxonid.


The gui side of pgAdmin3 should help out here, but it's been quite a 
while since I was there.






Re: [GENERAL] Hierarchical Query Question (PHP)

2015-10-30 Thread Rob Sargent

On 10/30/2015 04:55 PM, David Blomstrom wrote:
The field descr would presumably hold the values I originally had in 
the field Taxon - e.g. the names of various taxons, like 'Mammalia' 
and 'Canis-lupus.' The field id is just a numerical key, and I already 
have the numerical values for parentid.


But what am I supposed to put in the field taxonid?
Hm,  you might want to take a  step back.  What does the parentid field 
you already have represent.  Is it you mysql version of the data.  Does 
it point to an id of a taxon from the version?  If so you need to 
redefine the taxon table to just integer, not serial and load that table 
from you csv dump of mysql.


Otherwise I suspect you will need redo your parentids.


Re: [GENERAL] Hierarchical Query Question (PHP)

2015-10-30 Thread Rob Sargent

On 10/30/2015 04:55 PM, David Blomstrom wrote:
The field descr would presumably hold the values I originally had in 
the field Taxon - e.g. the names of various taxons, like 'Mammalia' 
and 'Canis-lupus.' The field id is just a numerical key, and I already 
have the numerical values for parentid.


But what am I supposed to put in the field taxonid?
I'm going to shut up - because the OP might be better served by a single 
voice and other on this thread are surely more capable.




Re: [GENERAL] Hierarchical Query Question (PHP)

2015-10-30 Thread Rob Sargent

On 10/30/2015 04:38 PM, David Blomstrom wrote:

Ah, yes - "Execute SQL." It created the table this time. Awesome.

One other question - when I close the SQL window, it asks me if I want 
to save the file. Is there any special reason for saving it? It looks 
like it simply saved a copy of the query I executed.


On Fri, Oct 30, 2015 at 3:36 PM, Adrian Klaver 
> wrote:


On 10/30/2015 03:29 PM, David Blomstrom wrote:

Thanks for the tips.

In pgAdmin III, I can create a table step-by-step by choosing
Edit >
Object > New Table

But is there a pace for me to past in a block of code that
creates the
table with just one process? If I click on the SQL icon, a SQL
window
opens up, and I can past the code into SQL Editor, but I don't
know how
to "activate" it; pressing ENTER does nothing.


http://pgadmin.org/docs/1.20/query.html

The green arrow, menu Query --> Execute or F5, take your pick.



I'm following a pgAdmin tutorial, but this page appears to
focus on the
terminal instead...

http://pgadmin.org/docs/1.4/pg/tutorial-table.html



-- 
Adrian Klaver

adrian.kla...@aklaver.com 




--
David Blomstrom
Writer & Web Designer (Mac, M$ & Linux)
www.geobop.org 
This goes back to the issue of "scripting".  Must of us would rather 
save the sql in a file and have it to re-run, edit, re-run as needed.  
Then add it version control.  But that takes a working terminal.




Re: [GENERAL] Domain check constraint not honored?

2015-10-29 Thread Rob Sargent

On 10/29/2015 12:29 PM, Eric Schwarzenbach wrote:


Thank you! (Slapping head)
Your regexp seems to do the trick.

On 10/29/2015 01:49 PM, Rob Sargent wrote:

On 10/29/2015 11:41 AM, Eric Schwarzenbach wrote:
I have created a custom type as a domain based on text, which adds a 
check constraint using a regexp to limit it to containing digits and 
'.'. However I am finding I can add values with other characters to 
a column of this type. Is this to be expected for some reason?


Or alternately, did I define the constraint wrong somehow? It is 
defined thus:


CREATE DOMAIN hierpath AS text
CHECK(
   VALUE ~ '[0-9.]+'
);

Checking the docs I'm pretty sure the '.' doesn't need escaping but 
I also tried a test leaving it out ( '[0-9]+') and the result is the 
same. It lets me store letters in a column defined to be of this type.


The version() function tells me
"PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc 
(Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit"


Thanks,
Eric


I think you regexp is too weak.  So long as the value has a digit or 
period, it's good.

'^[0-9.]+$' might work



Cool.

Note that this is a bottom-post forum and unfortunately so imho. Please 
don't top post.


Also thought I should mention that there is an ip address type if that's 
what you're trying to accomplish.


Re: [GENERAL] Domain check constraint not honored?

2015-10-29 Thread Rob Sargent

On 10/29/2015 04:29 PM, Eric Schwarzenbach wrote:

On 10/29/2015 03:44 PM, Alvaro Herrera wrote:
segment needing to be filled with zeros to a fixed length.) (Also 
FWIW, the latest version of this regexp is now '^([0-9]+.)*[0-9]+$')




Cheers,

Eric



So it can start with a dot, but not end with one?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Domain check constraint not honored?

2015-10-29 Thread Rob Sargent

On 10/29/2015 11:41 AM, Eric Schwarzenbach wrote:
I have created a custom type as a domain based on text, which adds a 
check constraint using a regexp to limit it to containing digits and 
'.'. However I am finding I can add values with other characters to a 
column of this type. Is this to be expected for some reason?


Or alternately, did I define the constraint wrong somehow? It is 
defined thus:


CREATE DOMAIN hierpath AS text
CHECK(
   VALUE ~ '[0-9.]+'
);

Checking the docs I'm pretty sure the '.' doesn't need escaping but I 
also tried a test leaving it out ( '[0-9]+') and the result is the 
same. It lets me store letters in a column defined to be of this type.


The version() function tells me
"PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 
4.8.2-19ubuntu1) 4.8.2, 64-bit"


Thanks,
Eric


I think you regexp is too weak.  So long as the value has a digit or 
period, it's good.

'^[0-9.]+$' might work


  1   2   3   4   >