date:20190117


On Thu, 17 Jan 2019, David G. Johnston wrote:


I would advise changing Contacts to "Activities" as the former can readily
be interpreted (and is in the wild) as both "an instance of contacting a
person" and "the person at the organization who is being contacted" (i.e.,
your People class).


David,

  That's why I originally named the Contacts table Activities and the Peoples 
table
as Contacts. :-) I'll change it back again since there's now a Peoples
table.

Many thanks!

Rich

Re: Refining query statement

On Thu, Jan 17, 2019 at 3:44 PM Rich Shepard  wrote:
> FROM People AS p
>   JOIN Organizations AS o ON p.org_id = o.org_id
>   JOIN Contacts AS c ON c.person_id = p.person_id

I would advise changing Contacts to "Activities" as the former can
readily be interpreted (and is in the wild) as both "an instance of
contacting a person" and "the person at the organization who is being
contacted" (i.e., your People class).

David J.

Re: Refining query statement


On Thu, 17 Jan 2019, Adrian Klaver wrote:


I would think the active would be on People or Organizations. Then you can
eliminate then from the query results before you ever got to the contact
history.


Adrian,

Excellent point. I don't know why I put active in the contacts table as it
does make more sense to put it in the people table (as some clients have
multiple sites, each with its own contact person).


Also not sure how you the above result when the query showed:
WHERE c.active = TRUE AND c.next_contact <= 'today'::date
Or did the history results come from an undisclosed query?


I don't recall which query I used for the MWE. I'll move the active column
to the new table, as well as those values, and keep working on learning how
to formulate complex, multi-table queries.

Also, moving the active table to peoples, where there is only one row per
person (rather than multiple contact rows per person) removes the need to
find only the most recent row in the many of the one-to-many relationship.

Thanks for the advice,

Rich

Re: Refining query statement


On 1/17/19 2:44 PM, Rich Shepard wrote:

On Thu, 17 Jan 2019, Adrian Klaver wrote:


Got to thinking more and realized the answer depends on what you want the
query to produce. Can you let us know what is you are trying to pull out
with the query?


Adrian, et al.,

Took your advice and re-thought what I need the query to return. This
allowed me to realize that I don't need a separate contact history query as
I can incorporate it in a single query. The goal and pseudocode are:

Purpose: List all contact information and contact history for active people
where next contact date is less than today.

For each person select person_id, lname, fname, and direct_phone from 
People.


For each person get the org_name from Organizations.

For each person get contact history in date order and next contact date 
from

Contacts where active = True.

Order by next contact dates in ascending order.

Query code:

SELECT p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, 
c.cont_date,

    c.cont_type, c.notes, c.next_contact, c.comment
FROM People AS p
  JOIN Organizations AS o ON p.org_id = o.org_id
  JOIN Contacts AS c ON c.person_id = p.person_id
WHERE c.active = TRUE AND c.next_contact <= 'today'::date
GROUP BY o.org_name, p.person_id, c.person_id, c.cont_date, c.cont_type,
   c.next_contact
ORDER BY p.person_id, c.next_contact

The results are not correctly grouped or ordered; I'll work on fixing these
issues.

The other issue that needs fixing is identifying the most recent 'active'
value in the Contacts table for each person_id and including that person in
the results only when active = 't'. Here's a MWE of a redacted set of rows
that should not have been returned by the query:

  person_id | cont_date  | cont_type  |
  notes
   | active | next_contact | comment 
--++--+-

     36 | 2018-12-12 | Conference | Notes on this conversation.
     | t  | 2018-12-17   |
     36 | 2018-12-17 | Phone  | Left message asking for a meeting.
   | t  | 2019-01-03   |
     36 | 2019-01-03 | Phone  | Left another message.
   | t  | 2019-01-07   |
     36 | 2019-01-07 | Phone  | Going into a meeting.
   | t  | 2019-01-14   |
     36 | 2019-01-15 | Phone  | Will call when/if.
   | f  | infinity | (5 rows)

Because the most recent row's active value is 'f' these rows should not be
included as there is no next contact date. I'll keep reading looking for
this answer and pointers will be helpful.


I would think the active would be on People or Organizations. Then you 
can eliminate then from the query results before you ever got to the 
contact history.


Also not sure how you the above result when the query showed:

WHERE c.active = TRUE AND c.next_contact <= 'today'::date

Or did the history results come from an undisclosed query?



Regards,

Rich





--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Refining query statement


On Thu, 17 Jan 2019, Adrian Klaver wrote:


Got to thinking more and realized the answer depends on what you want the
query to produce. Can you let us know what is you are trying to pull out
with the query?


Adrian, et al.,

Took your advice and re-thought what I need the query to return. This
allowed me to realize that I don't need a separate contact history query as
I can incorporate it in a single query. The goal and pseudocode are:

Purpose: List all contact information and contact history for active people
where next contact date is less than today.

For each person select person_id, lname, fname, and direct_phone from People.

For each person get the org_name from Organizations.

For each person get contact history in date order and next contact date from
Contacts where active = True.

Order by next contact dates in ascending order.

Query code:

SELECT p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, c.cont_date,
   c.cont_type, c.notes, c.next_contact, c.comment
FROM People AS p
 JOIN Organizations AS o ON p.org_id = o.org_id
 JOIN Contacts AS c ON c.person_id = p.person_id
WHERE c.active = TRUE AND c.next_contact <= 'today'::date
GROUP BY o.org_name, p.person_id, c.person_id, c.cont_date, c.cont_type,
  c.next_contact
ORDER BY p.person_id, c.next_contact

The results are not correctly grouped or ordered; I'll work on fixing these
issues.

The other issue that needs fixing is identifying the most recent 'active'
value in the Contacts table for each person_id and including that person in
the results only when active = 't'. Here's a MWE of a redacted set of rows
that should not have been returned by the query:

 person_id | cont_date  | cont_type  |
 notes
  | active | next_contact | comment 
--++--+-

36 | 2018-12-12 | Conference | Notes on this conversation.
| t  | 2018-12-17   |
36 | 2018-12-17 | Phone  | Left message asking for a meeting.
  | t  | 2019-01-03   |
36 | 2019-01-03 | Phone  | Left another message.
  | t  | 2019-01-07   |
36 | 2019-01-07 | Phone  | Going into a meeting.
  | t  | 2019-01-14   |
36 | 2019-01-15 | Phone  | Will call when/if.
  | f  | infinity | 
(5 rows)


Because the most recent row's active value is 'f' these rows should not be
included as there is no next contact date. I'll keep reading looking for
this answer and pointers will be helpful.

Regards,

Rich

Re: strange slow query performance

On Thu, Jan 17, 2019 at 9:19 AM Ben Snaidero  wrote:
> Any ideas as to why this is happening?

Not really, I would expect roughly double execution time, not an
exponential increase.  Still not experienced enough to diagnose with
what has been provided but I will suggest you provide the version that
you are running these queries against and confirm that the results are
consistent across multiple runs to discount the possibility that
locking or other transient work is involved.

Curious if increasing work_mem helps but its mostly a trial-and-error
thing for me (I wouldn't expect it to)...

David J.

Re: Varlena with recursive data structures?

2019-01-17 Thread Michel Pelletier

Hi Karl,

I'm going down this road myself.  In addition to the files Tom Lane pointed
out there is also some helpful documentation here:

https://www.postgresql.org/docs/current/storage-toast.html#STORAGE-TOAST-INMEMORY

On Wed, Jan 16, 2019 at 2:09 PM Sam Patterson  wrote:

> Hi all,
>
> I've recently started developing an extension for Postgres for which I'll
> need to create a new variable-length base type. The type will require a
> tree-like structure in order to parse sufficiently, which of course
> probably means having some sort of recursive data structure, like a struct
> that has members which are pointers to itself for child nodes. After doing
> some research, specifically looking at how other variable-length data types
> store their data, it seems almost all of them store the data in a binary
> representation, using bit masks and offsets etc in order to store/access
> the data whilst having an in-memory representation that's used to
> manipulate the data.
>
> I presume the purpose for using this approach is because all the data in a
> varlena type has to be contiguous, and the moment you start using pointers
> this is no longer possible. So my question is, given a structure that looks
> something like this,
>
> typedef struct Node
> {
> char *data;
> Node *left;
> Node *right;
> } Node;
>
> am I right in saying that I wouldn't be able to store that representation
> on-disk, but instead I'd have to transform it into some binary
> representation and back again when writing/reading respectively, are there
> any alternatives?
>
> Regards,
>
> Karl
>

Re: strange slow query performance

2019-01-17 Thread Laurenz Albe

Ben Snaidero wrote:
> The following query runs as expected.
> 
> explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS 
> MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS 
> MainTable_type FROM MainTable  
>WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( 
> nodeobject NE INNER JOIN (   Node N1NE INNER JOIN Node N2NE   ON 
> ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight)))   ON NE.inodeid 
> = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) 
>  AND objectid=3161;
>   
>  QUERY PLAN   
>  
> -
>  Index Scan using ix_MainTable_objectid_datetime on MainTable  
> (cost=3254.91..3264.39 rows=1 width=32) (actual time=33.094..33.094 rows=0 
> loops=1)
>Index Cond: ((objectid = ANY ($3)) AND (objectid = 3161))
>InitPlan 1 (returns $3)
>  ->  Nested Loop  (cost=1.10..3254.75 rows=16296 width=4) (actual 
> time=0.403..26.147 rows=19042 loops=1)
> [...]
>  Planning time: 5.693 ms
>  Execution time: 33.383 ms
> (15 rows)
> 
> But when we add a second condition to the where clause it causes the query 
> performance to drop dramatically.  Values in "any(array" do not make a 
> difference.  In this example they are the same but even for different values 
> performance is still the poor.  Any ideas as to why this is happening?
> 
> explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS 
> MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS 
> MainTable_type FROM MainTable  
>WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( 
> nodeobject NE INNER JOIN (   Node N1NE INNER JOIN Node N2NE   ON 
> ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight)))   ON NE.inodeid 
> = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) 
>  AND objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( 
> nodeobject NE INNER JOIN (   Node N1NE INNER JOIN Node N2NE   ON 
> ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight)))   ON NE.inodeid 
> = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid=3161;
> 
>   
>   QUERY PLAN  
>   
>  
> --
>  Index Scan using ix_MainTable_objectid_datetime on MainTable  
> (cost=6509.66..6534.02 rows=1 width=32) (actual time=16442.004..16442.004 
> rows=0 loops=1)
>Index Cond: ((objectid = ANY ($3)) AND (objectid = ANY ($7)) AND (objectid 
> = 3161))
>InitPlan 1 (returns $3)
>  ->  Nested Loop  (cost=1.10..3254.75 rows=16296 width=4) (actual 
> time=0.438..28.484 rows=19042 loops=1)
> [...]
>InitPlan 2 (returns $7)
>  ->  Nested Loop  (cost=1.10..3254.75 rows=16296 width=4) (actual 
> time=0.056..11.786 rows=19042 loops=1)
> [...]
>  Planning time: 4.860 ms
>  Execution time: 16442.462 ms
> (26 rows)

Not sure what is going on, but why don't you put the condition "WHERE 
ne.objectid=3161"
into the subselects?  Then there should be at most one result row, and I can 
imagine
that things will become faster.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Need aws_oracle_ext.systimestamp function defination for postgres

2019-01-17 Thread Ahtesham Karajgi

Hi all,

I am working on a migration from Oracle to PostgreSQL using the AWS SCT.

I had converted 823 table out of 866 tables to postgres. However for few
tables I am facing challenges in convertion due to
"aws_oracle_ext.systimestamp" fuctions.

below is the Error for your reference.

ERROR: function aws_oracle_ext.systimestamp() does not exist
Hint: No function matches the given name and argument types. You might need
to add explicit type casts.

I am searching for the fucntion defination for the
aws_oracle_ext.systimestamp.

Similar to the below "aws_oracle_ext.sysdate".

CREATE OR REPLACE FUNCTION aws_oracle_ext.sysdate() RETURNS timestamp
without time zone AS $BODY$ DECLARE l_var1 interval ; BEGIN l_var1 := ’0
hour’; /* Please type your value instead of 0 */ return
(clock_timestamp()::TIMESTAMP(0) WITHOUT TIME ZONE) + l_var1; END; $BODY$
LANGUAGE plpgsql VOLATILE;


Thank you in advance.

Re: Refining query statement


On Thu, 17 Jan 2019, Adrian Klaver wrote:


It would produce results, so yes it would work. The question are they the
results you want?


Adrian,

  To which the answer is no as I just discovered.


The above would return anything with a next_contact less then today. That
could extend backwards to some undetermined point in the past, depending
on how diligent you where on updating next_contact. Before you go much
further I would look over the suggestions posted and then sit down with
pen and paper and figure out what it you want to track and how you are
going to maintain that tracking. Then draw up an outline of how you will
achieve that in code.


  I have that tracking defined (who is active and needs a contact today or
earlier) and am now working on how to code it so only the last contact for a
person is examined to determine if the active column = true. I'll ponder
that.

Thanks,

Rich

Re: Refining query statement


On 1/17/19 10:01 AM, Rich Shepard wrote:

On Thu, 17 Jan 2019, Adrian Klaver wrote:


Seems to me a boolean field of name active to denote contacts you need to
keep up with is in order. Then make the next_contact field NOT NULL and
replace the current NULL values with 'infinity':



WHERE COALESCE(next_contact, 'infinity') BETWEEN '01/01/2019'::date AND
'today'::date


Adrian,

   Having added a new column, 'active' with boolean values (and a 
default of

true), would this WHERE clause work:

WHERE active = TRUE and next_contact <= 'today'::date


It would produce results, so yes it would work.

The question are they the results you want?

The above would return anything with a next_contact less then today. 
That could extend backwards to some undetermined point in the past, 
depending on how diligent you where on updating next_contact. Before you 
go much further I would look over the suggestions posted and then sit 
down with pen and paper and figure out what it you want to track and how 
you are going to maintain that tracking. Then draw up an outline of how 
you will achieve that in code.




?

Regards,

Rich




--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Refining query statement


On Thu, 17 Jan 2019, David G. Johnston wrote:


Off the top of my head (and this is a model I am quite familiar with
even if I'm doing this email at speed):

I'd suggest an actual activity table:


David,

  Adrian's suggestion of a Contacts table column called 'active' having a
boolean data type seems to me to be more compact. Testing active = true as a
condition in the WHERE clause will display only those People with active
contact needs.

Thanks very much,

Rich

Re: Postgres Automated Failover

2019-01-17 Thread AI Rumman

Thanks, I'll check it out.

On Thu, Jan 17, 2019 at 9:06 AM Jehan-Guillaume (ioguix) de Rorthais <
iog...@free.fr> wrote:

> On Thu, 17 Jan 2019 08:32:48 -0500
> AI Rumman  wrote:
>
> > Hi,
> >
> > I am planning to use Postgresql with TimescaleDb extension. I have to
> > design a system similar to AWS RDS which supports automated failover,
> > transparent minor version upgrades etc.
> > In early days, I worked with PgPool II to enable heartbeat between the
> > Postgres servers. Is there any new tool to achieve the same by which I
> can
> > design a transparent failover system for Postgres?
>
> There is multiple HA solutions, none are "transparent failover" though.
> They
> are just automated-failover. A rollback-ed transaction because of failover
> will
> never be transparent from the application point of view.
>
> Look at Patroni or PAF, depending on what you want to achieve.
>

Re: Refining query statement


On Thu, 17 Jan 2019, Adrian Klaver wrote:


Seems to me a boolean field of name active to denote contacts you need to
keep up with is in order. Then make the next_contact field NOT NULL and
replace the current NULL values with 'infinity':



WHERE COALESCE(next_contact, 'infinity') BETWEEN '01/01/2019'::date AND
'today'::date


Adrian,

  Having added a new column, 'active' with boolean values (and a default of
true), would this WHERE clause work:

WHERE active = TRUE and next_contact <= 'today'::date

?

Regards,

Rich

Re: Refining query statement


On Thu, 17 Jan 2019, Adrian Klaver wrote:


To be clear the next-activity date = next_contact in the database, correct?


Adrian,

  Yes. I've renamed the Activities table to Contacts and the Contacts table
to People.


NULL basically means unknown, so having it stand for something is a bit of
a stretch. Seems to me a boolean field of name active to denote contacts
you need to keep up with is in order. Then make the next_contact field NOT
NULL and replace the current NULL values with 'infinity':


  This makes good sense.


As to current situation:

1) Use my previous suggestion.


  I did so, but will modify it as you suggest:
2) WHERE COALESCE(next_contact, 'infinity') BETWEEN '01/01/2019'::date AND 
'today'::date


Thanks again,

Rich

Re: Refining query statement

On Thu, Jan 17, 2019 at 10:07 AM Rich Shepard  wrote:
>The direct answer is that a completed activity has a row with either a
> future next-activity date or a null (which is the case when the status of
> that organization or contact is 'no further contact'.)

Off the top of my head (and this is a model I am quite familiar with
even if I'm doing this email at speed):

I'd suggest an actual activity table:

activity (
created_on date not null,
due_on date not null,
performed_on date null,
contact_id bigint not null references (contact)
);

contact (
contact_id bigserial primary key,
told_me_to_go_pound_sand boolean default false
);

Now, an exception report can be made for every contact where
"told_me_to_go_pound_sand" is false and there is no record on activity
where performed_on is null (and also pound sand is true and there is
one performed_on is null)

The report you want is basically everything on activity where
performed_on is null and due_on is today or earlier.

A unique partial index can be made (activity.contact_id) where
performed_on is null; to ensure that at most only one open activity
exists for each contact.

David J.

Re: Refining query statement

On 1/17/19 9:07 AM, Rich Shepard wrote:

On Thu, 17 Jan 2019, David G. Johnston wrote:

Yes...though now it just sounds like a flawed data model.

David,

This is what I thought.

How stuck are you in that regard? Those "future" contacts should have
their

own records and not be derived via an optional field on an existing
record.

My goal is to make a functioning business tracking application for my
consulting services. Almost all my prior postgres databases hold
environmental data for statistical and spatio-temporal analyses so
writing a

business application is a new experience for me and I want to get it
correct.

Put differently, how do you know which activities are completed and
which are not?

The direct answer is that a completed activity has a row with either a
future next-activity date or a null (which is the case when the status of
that organization or contact is 'no further contact'.)

To be clear the next-activity date = next_contact in the database, correct?

NULL basically means unknown, so having it stand for something is a bit
of a stretch. Seems to me a boolean field of name active to denote
contacts you need to keep up with is in order. Then make the
next_contact field NOT NULL and replace the current NULL values with
'infinity':

https://www.postgresql.org/docs/10/datatype-datetime.html

8.5.1.4. Special Values

As to current situation:

1) Use my previous suggestion.

2) WHERE COALESCE(next_contact, 'infinity') BETWEEN '01/01/2019'::date
AND 'today'::date

https://www.postgresql.org/docs/10/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL

I should rename the Contacts table as People and the Activities table as
Contacts. The original names came from a sales management system I used
as a
design guide, but they're probably confusing to others as well as to me.
:-)

I can provide my current schema (eight tables) to the list (perhaps
as an

attachment), an individual, or put in on a cloud site and pass the URL.

Thanks,

Rich

--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Refining query statement


On Thu, 17 Jan 2019, David G. Johnston wrote:


Yes...though now it just sounds like a flawed data model.


David,

  This is what I thought.


How stuck are you in that regard? Those "future" contacts should have their
own records and not be derived via an optional field on an existing
record.


  My goal is to make a functioning business tracking application for my
consulting services. Almost all my prior postgres databases hold
environmental data for statistical and spatio-temporal analyses so writing a
business application is a new experience for me and I want to get it
correct.


Put differently, how do you know which activities are completed and
which are not?


  The direct answer is that a completed activity has a row with either a
future next-activity date or a null (which is the case when the status of
that organization or contact is 'no further contact'.)

  I should rename the Contacts table as People and the Activities table as
Contacts. The original names came from a sales management system I used as a
design guide, but they're probably confusing to others as well as to me. :-)

  I can provide my current schema (eight tables) to the list (perhaps as an
attachment), an individual, or put in on a cloud site and pass the URL.

Thanks,

Rich

Re: Refining query statement

On Thu, Jan 17, 2019 at 9:47 AM Rich Shepard  wrote:
> What I want is a list of contacts to make today. This includes ones that
> should have been made earlier but weren't and excludes earlier contacts that
> have no scheduled next contact (therefore, the nulls.).
>
> Does this clarify what I'm asking of the data?

Yes...though now it just sounds like a flawed data model.  How stuck
are you in that regard?  Those "future" contacts should have their own
records and not be derived via an optional field on an existing
record.

Put differently, how do you know which activities are completed and
which are not?

Dave

Re: Refining query statement


On Thu, 17 Jan 2019, Adrian Klaver wrote:


Got to thinking more and realized the answer depends on what you want the
query to produce. Can you let us know what is you are trying to pull out
with the query?


Adrian,

Certainly. Over breakfast I realized the same thing: the existing SELECT
query is not asking for the correct data.

As I replied in response to David's message, what I want is a list of
contacts to make today. This includes ones that should have been made
earlier but weren't and excludes earlier contacts that have no scheduled
next contact (therefore, the nulls.).

Thanks,

Rich

Re: Refining query statement


On Thu, 17 Jan 2019, David G. Johnston wrote:


Why is next_contact allowed to be null?


David,

There are a number of reasons. The prospect might have retired, told me to
pound sand, or has put off a decision.


Your concept of "most current row" is strictly based upon next_contact so
if next_contact is null it is impossible to know whether it is the most
current row (consider that two activities could be missing the
next_contact date - which of those is the most current?)


That's quite true. I realize I've been asking the question incorrectly.

What I want is a list of contacts to make today. This includes ones that
should have been made earlier but weren't and excludes earlier contacts that
have no scheduled next contact (therefore, the nulls.).

Does this clarify what I'm asking of the data?

Thanks,

Rich

Re: Refining query statement


On 1/17/19 8:14 AM, Rich Shepard wrote:

On Tue, 15 Jan 2019, Thomas Kellerer wrote:

   select distinct on (C.contact_id) C.contact_id, C.lname, C.fname, 
C.direct_phone, O.org_name, A.next_contact

   from Contacts as C
 join Organizations as O on C.org_id = O.org_id
 join Activities as A on C.contact_id = A.contact_id
   where A.next_contact <= 'today'
 and A.next_contact > '2018-12-31'
   order by c.contact_id, a.next_contact DESC;


Combining this with Adrian's advice to use BETWEEN I have this statement
that almost works:

SELECT DISTINCT ON (c.contact_id) c.contact_id, c.lname, c.fname, 
c.direct_phone, o.org_name, a.next_contact

FROM Contacts AS c
   JOIN Organizations AS o ON c.org_id = o.org_id
   JOIN Activities AS a ON c.contact_id = a.contact_id WHERE 
next_contact BETWEEN '01/01/2019'::date AND 'today'::date ORDER BY 
c.contact_id, a.next_contact DESC;


It fails when the most recent next_contact column in Activities is NULL and
an earier row has a non-NULL date in the specified range.

I tried specifying max(a.next_contact) and added GROUP BY, but the result
set all returned o.org_name columns to the same one.

The WHERE clause needs to exclude a contact_id where the most current 
row in

Activities has NULL for the next_contact column. I've tried a few ideas but
none work so I need to learn the proper syntax, and I don't find that in
Rick van der Lans' or Joe Celko's books I have.


Got to thinking more and realized the answer depends on what you want 
the query to produce. Can you let us know what is you are trying to pull 
out with the query?




Looking forward to learning,

Rich





--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Refining query statement

On Thu, Jan 17, 2019 at 9:14 AM Rich Shepard  wrote:
> ORDER BY c.contact_id, a.next_contact DESC;

> The WHERE clause needs to exclude a contact_id where the most current row in
> Activities has NULL for the next_contact column.

Why is next_contact allowed to be null?

Your concept of "most current row" is strictly based upon next_contact
so if next_contact is null it is impossible to know whether it is the
most current row (consider that two activities could be missing the
next_contact date - which of those is the most current?)

David J.

strange slow query performance

2019-01-17 Thread Ben Snaidero

The following query runs as expected.

explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS
MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type
AS MainTable_type FROM MainTable
   WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM (
nodeobject NE INNER JOIN (   Node N1NE INNER JOIN Node N2NE   ON
((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight)))   ON
NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814])))
 AND objectid=3161;

   QUERY PLAN

-
 Index Scan using ix_MainTable_objectid_datetime on MainTable
(cost=3254.91..3264.39 rows=1 width=32) (actual time=33.094..33.094 rows=0
loops=1)
   Index Cond: ((objectid = ANY ($3)) AND (objectid = 3161))
   InitPlan 1 (returns $3)
 ->  Nested Loop  (cost=1.10..3254.75 rows=16296 width=4) (actual
time=0.403..26.147 rows=19042 loops=1)
   ->  Nested Loop  (cost=0.81..1784.15 rows=618 width=4) (actual
time=0.249..10.485 rows=2082 loops=1)
 ->  Index Scan using pk_node on node n2ne
(cost=0.41..16.80 rows=2 width=8) (actual time=0.103..0.233 rows=2 loops=1)
   Index Cond: (iid = ANY ('{14438,10814}'::integer[]))
 ->  Index Only Scan using ix_node on node n1ne
(cost=0.41..880.59 rows=309 width=8) (actual time=0.074..5.022 rows=1041
loops=2)
   Index Cond: ((ileft >= n2ne.ileft) AND (ileft <=
n2ne.iright))
   Heap Fetches: 30
   ->  Index Only Scan using ix_nodeobject_inodeid_objectid on
nodeobject ne  (cost=0.29..1.81 rows=57 width=8) (actual time=0.002..0.007
rows=9 loops=2082)
 Index Cond: (inodeid = n1ne.iid)
 Heap Fetches: 13973
 Planning time: 5.693 ms
 Execution time: 33.383 ms
(15 rows)

But when we add a second condition to the where clause it causes the query
performance to drop dramatically.  Values in "any(array" do not make a
difference.  In this example they are the same but even for different
values performance is still the poor.  Any ideas as to why this is
happening?

explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS
MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type
AS MainTable_type FROM MainTable
   WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM (
nodeobject NE INNER JOIN (   Node N1NE INNER JOIN Node N2NE   ON
((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight)))   ON
NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814])))
 AND objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM (
nodeobject NE INNER JOIN (   Node N1NE INNER JOIN Node N2NE   ON
((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight)))   ON
NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND
objectid=3161;


QUERY PLAN


--
-
 Index Scan using ix_MainTable_objectid_datetime on MainTable
(cost=6509.66..6534.02 rows=1 width=32) (actual time=16442.004..16442.004
rows=0 loops=1)
   Index Cond: ((objectid = ANY ($3)) AND (objectid = ANY ($7)) AND
(objectid = 3161))
   InitPlan 1 (returns $3)
 ->  Nested Loop  (cost=1.10..3254.75 rows=16296 width=4) (actual
time=0.438..28.484 rows=19042 loops=1)
   ->  Nested Loop  (cost=0.81..1784.15 rows=618 width=4) (actual
time=0.172..7.141 rows=2082 loops=1)
 ->  Index Scan using pk_node on node n2ne
(cost=0.41..16.80 rows=2 width=8) (actual time=0.046..0.111 rows=2 loops=1)
   Index Cond: (iid = ANY ('{14438,10814}'::integer[]))
 ->  Index Only Scan using ix_node on node n1ne
(cost=0.41..880.59 rows=309 width=8) (actual time=0.073..3.438 rows=1041
loops=2)
   Index Cond: ((ileft >= n2ne.ileft) AND (ileft <=
n2ne.iright))
   Heap Fetches: 30
   ->  Index Only Scan using ix_nodeobject_inodeid_objectid on
nodeobject ne  (cost=0.29..1.81 rows=57 width=8) (actual time=0.003..0.010
rows=9 loops=2082)
 Index Cond: (inodeid = n1ne.iid)
 Heap Fetches: 13973
   InitPlan 2 (returns $7)
 ->  Nested Loop  (cost=1.10..3254.75 rows=16296 width=4) (actual
time=0.056..11.786 rows=19042 loops=1)
   ->  Nested Loop  (cost=0.81..1784.15 rows=618 width=4) (actual
time=0.034..1.343 rows=2082 loops=1)
 ->  Index Scan using pk_node on node n2ne_1
(cost=0.41..16.80 rows=2 width=8) (actual time=0.007..0.016 rows=2 loops=1)
   Index Cond: (iid = ANY ('{14438,10814}'::integer[]))
 ->  Index Only Scan using ix_node on node n1ne_1
(cost=0.41..880.59 rows=309 width=8) (actual time=0.012..0.581

Re: pgbouncer

2019-01-17 Thread Nicola Contu

> If I set 3, and I tried to connect from 4 shells, I am still able to
connect.

That would be the point - pgbouncer is sharing the 3 connections it
keeps with PostgreSQL between the 4 client connections made to it.

Mmh, my pool_mode is per session. The 4 sessions were active, not doing any
query, but connected to the shell.
So that's what my doubt

Il giorno gio 17 gen 2019 alle ore 16:32 David G. Johnston <
david.g.johns...@gmail.com> ha scritto:

> On Thu, Jan 17, 2019 at 8:18 AM Nicola Contu 
> wrote:
> >
> > Hello,
> > I am a bit confused about the settings in pgbouncer
> >
> > What's exactly the pool_size?
>
> Roughly, the number of open connections pgbouncer will keep to PostgreSQL.
>
> > If I set 3, and I tried to connect from 4 shells, I am still able to
> connect.
>
> That would be the point - pgbouncer is sharing the 3 connections it
> keeps with PostgreSQL between the 4 client connections made to it.
>
> > Same thing for max_db_connections. I set this to 1 and I am able to
> connect from 2 shells.
>
> Same as pool_size but basically a fail-safe since pools are
> per-user/per-database while this limit is per-database only.
>
> > This is kind of confusing and I'm not really cleared reading the
> documentation.
>
> For each setting you need to understand whether it limits
> pgbouncer->PostgreSQL or client->pgbouncer
>
> Configurations in [databases] limit the former; [pgbouncer] options
> either provide defaults for the [databases] or limit clients.
>
> > [databases]
> > cmdwela = host=127.0.0.1 port=5432 dbname=cmdwela auth_user=pgbouncer
> pool_size=120
> > cicciopasticcio = host=127.0.0.1 port=5432 dbname=cicciopasticcio
> auth_user=pgbouncer pool_size=2
> >
> > [pgbouncer]
> > listen_port = 6543
> > listen_addr = *
> > auth_type = md5
> > auth_file = /etc/pgbouncer/users.txt
> > auth_query = select uname,phash from user_lookup($1)
> > logfile = /var/log/pgbouncer.log
> > pidfile = /home/postgres/pgbouncer.pid
> > admin_users = admin
> > user=postgres
>
> > max_db_connections=1
> So one open connection max per database/user pool but it is shared -
> i.e., actively executing queries running in parallel are limited to
> this number.
>
> > max_client_conn=5
> > I just want to limit connections from the app etc.
>
> That limit is 5
>
> David J.
>

Re: Refining query statement


On Tue, 15 Jan 2019, Thomas Kellerer wrote:


   select distinct on (C.contact_id) C.contact_id, C.lname, C.fname, 
C.direct_phone, O.org_name, A.next_contact
   from Contacts as C
 join Organizations as O on C.org_id = O.org_id
 join Activities as A on C.contact_id = A.contact_id
   where A.next_contact <= 'today'
 and A.next_contact > '2018-12-31'
   order by c.contact_id, a.next_contact DESC;


Combining this with Adrian's advice to use BETWEEN I have this statement
that almost works:

SELECT DISTINCT ON (c.contact_id) c.contact_id, c.lname, c.fname, 
c.direct_phone, o.org_name, a.next_contact
FROM Contacts AS c
  JOIN Organizations AS o ON c.org_id = o.org_id
  JOIN Activities AS a ON c.contact_id = a.contact_id 
WHERE next_contact BETWEEN '01/01/2019'::date AND 'today'::date 
ORDER BY c.contact_id, a.next_contact DESC;


It fails when the most recent next_contact column in Activities is NULL and
an earier row has a non-NULL date in the specified range.

I tried specifying max(a.next_contact) and added GROUP BY, but the result
set all returned o.org_name columns to the same one.

The WHERE clause needs to exclude a contact_id where the most current row in
Activities has NULL for the next_contact column. I've tried a few ideas but
none work so I need to learn the proper syntax, and I don't find that in
Rick van der Lans' or Joe Celko's books I have.

Looking forward to learning,

Rich

Re: pgbouncer

2019-01-17 Thread Fabio Pardi

David,

Are you sure? The behaviour I experienced is different from what you described.

On 17/01/2019 16:32, David G. Johnston wrote:
> On Thu, Jan 17, 2019 at 8:18 AM Nicola Contu  wrote:
>>
>> Hello,
>> I am a bit confused about the settings in pgbouncer
>>
>> What's exactly the pool_size?
> 
> Roughly, the number of open connections pgbouncer will keep to PostgreSQL.
> 
>> If I set 3, and I tried to connect from 4 shells, I am still able to connect.
> 
> That would be the point - pgbouncer is sharing the 3 connections it
> keeps with PostgreSQL between the 4 client connections made to it.


pgbouncer will spawn connections until it reach the maximum allowed. 
Connections go 1:1 between pgbouncer and the client, and are not 'shared'

pool_size is the size of the pool that pgbouncer keeps open for you. If not 
set, the default_pool_size is used.


If you check the open connections to your host, you can verify it 

netstat --tcp -n | grep ... 

will show ESTABLISHED connections = pool_size


client connections will be accepted by pgbouncer until pool_size is reached.

after that, pgbouncer can spawn more connection only if reserve_pool_size 
allows it, and after a wait of reserve_pool_timeout .

when your roof is reached (means you are opening pool_size + reserve_pool_size) 
then client connections will enter a queue.

Entering a queue, does not mean be serviced by the database. It means wait your 
turn. If you connect manually, you will notice an open connection, (socket is 
opened) but not database shell.


regards,

fabio pardi

> 
>> Same thing for max_db_connections. I set this to 1 and I am able to connect 
>> from 2 shells.
> 
> Same as pool_size but basically a fail-safe since pools are
> per-user/per-database while this limit is per-database only.
> 
>> This is kind of confusing and I'm not really cleared reading the 
>> documentation.
> 
> For each setting you need to understand whether it limits
> pgbouncer->PostgreSQL or client->pgbouncer
> 
> Configurations in [databases] limit the former; [pgbouncer] options
> either provide defaults for the [databases] or limit clients.
> 
>> [databases]
>> cmdwela = host=127.0.0.1 port=5432 dbname=cmdwela auth_user=pgbouncer 
>> pool_size=120
>> cicciopasticcio = host=127.0.0.1 port=5432 dbname=cicciopasticcio 
>> auth_user=pgbouncer pool_size=2
>>
>> [pgbouncer]
>> listen_port = 6543
>> listen_addr = *
>> auth_type = md5
>> auth_file = /etc/pgbouncer/users.txt
>> auth_query = select uname,phash from user_lookup($1)
>> logfile = /var/log/pgbouncer.log
>> pidfile = /home/postgres/pgbouncer.pid
>> admin_users = admin
>> user=postgres
> 
>> max_db_connections=1
> So one open connection max per database/user pool but it is shared -
> i.e., actively executing queries running in parallel are limited to
> this number.
> 
>> max_client_conn=5
>> I just want to limit connections from the app etc.
> 
> That limit is 5
> 
> David J.
>

Re: Identifying comments [ANSWERED]

On Thu, Jan 17, 2019 at 8:39 AM Rich Shepard  wrote:
>
> On Thu, 17 Jan 2019, Rich Shepard wrote:
>
> > I want only to confirm that I can use /* ... */ for multiline comments in my
> > DDL and DML scripts.
>
> Oops! Section 4.1.5 tells me that I can.
>

You could also just try it...

select 1 /* comment */ AS col_one

executes just fine...

David J.

Re: Identifying comments [ANSWERED]


On Thu, 17 Jan 2019, Rich Shepard wrote:


I want only to confirm that I can use /* ... */ for multiline comments in my
DDL and DML scripts.


Oops! Section 4.1.5 tells me that I can.

Apologies to all,

Rich

Identifying comments


I know that "--" has been the single-line comment identifier since SQL was
implemented by IBM on mainframes using Hollerith cards. Today, some RDBMSs
also allow the C comment indentifier pair, /* ... */.

The postgres10 manual when searched for 'comments' presents the double
hyphens and, in Section 4.1.3, on the top line of page 39, I read that "--
and /* cannot appear anywhere in an operator name, since they will be taken
as the start of a comment."

I want only to confirm that I can use /* ... */ for multiline comments in my
DDL and DML scripts.

TIA,

Rich

Re: pgbouncer