Re: seq vs index scan in join query

2017-11-29 Thread legrand legrand
Hi,

Could you give us the partitions (ranges values) and indexes definition for
result table ?

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: SV: Refreshing materialized views

2017-11-29 Thread Ben Primrose
It may be simpler to just run the query from the materialized view
definition as the user that you want to refresh the mv.

On Tue, Nov 28, 2017 at 10:30 PM, Daevor The Devoted 
wrote:

>
>
> On 28 Nov 2017 5:18 pm, "Tom Lane"  wrote:
>
> Henrik Uggla  writes:
> > The underlying tables are foreign tables. The user has been mapped to a
> foreign user with select permission. I have no problem selecting from the
> foreign tables or the materialized views.
>
> [ shrug... ] WFM; if I can select from the foreign table then I can make
> a materialized view that selects from it, and that refreshes without
> complaint.  Admittedly, getting things set up to select from the foreign
> table is trickier than it sounds: your local user needs SELECT on the
> foreign table plus a mapping to some remote userid, and *on the remote
> server* that remote userid needs SELECT on whatever the foreign table
> is referencing.  I'm guessing you messed up one of these components.
>
> regards, tom lane
>
> Hendrik, perhaps an easy way to check out Tom's suggestion is to create a
> very simple materialized view that selects just from one of the foreign
> tables, then attempt the REFRESH. If that works, then keep adding more
> tables from your original materialized view until you have found the
> problem.
> Basically, reduce the problem to the simplest case, and if that works,
> then keep adding to it until you hit the problem. You may still not know
> why the problem is happening, but you'll at least know where to focus any
> further investigation.
>
> Kind regards,
> Daevor, The Devoted
>



-- 


*Ben Primrose  |  Postgres DBA |  TraceLink Inc.*400 Riverpark Dr.
Floor 2, Suite 200
North Reading, MA 01864
o: +1.978.396.6507
e: bprimr...@tracelink.com


Re: vacuumdb fails with error pg_statistic_relid_att_inh_index constraint violation after upgrade to 9.6

2017-11-29 Thread Swapnil Vaze
Hello,
Thanks for the reply.
We deleted those rows and re run vacuum and analyze and it worked fine.
Thanks,

Thanks,
Swapnil Vaze


SV: SV: Refreshing materialized views

2017-11-29 Thread Henrik Uggla
I finally managed to sort out all needed permissions and mappings.
Thanks for all replies!

cheers
Henrik

Från: Ben Primrose 
Skickat: den 29 november 2017 12:42:56
Till: Daevor The Devoted
Kopia: hendrik.ug...@kristianstad.se; pgsql-general@lists.postgresql.org
Ämne: Re: SV: Refreshing materialized views

It may be simpler to just run the query from the materialized view definition 
as the user that you want to refresh the mv.

On Tue, Nov 28, 2017 at 10:30 PM, Daevor The Devoted 
mailto:doll...@gmail.com>> wrote:


On 28 Nov 2017 5:18 pm, "Tom Lane" 
mailto:t...@sss.pgh.pa.us>> wrote:
Henrik Uggla 
mailto:henrik.ug...@kristianstad.se>> writes:
> The underlying tables are foreign tables. The user has been mapped to a 
> foreign user with select permission. I have no problem selecting from the 
> foreign tables or the materialized views.

[ shrug... ] WFM; if I can select from the foreign table then I can make
a materialized view that selects from it, and that refreshes without
complaint.  Admittedly, getting things set up to select from the foreign
table is trickier than it sounds: your local user needs SELECT on the
foreign table plus a mapping to some remote userid, and *on the remote
server* that remote userid needs SELECT on whatever the foreign table
is referencing.  I'm guessing you messed up one of these components.

regards, tom lane

Hendrik, perhaps an easy way to check out Tom's suggestion is to create a very 
simple materialized view that selects just from one of the foreign tables, then 
attempt the REFRESH. If that works, then keep adding more tables from your 
original materialized view until you have found the problem.
Basically, reduce the problem to the simplest case, and if that works, then 
keep adding to it until you hit the problem. You may still not know why the 
problem is happening, but you'll at least know where to focus any further 
investigation.

Kind regards,
Daevor, The Devoted



--

Ben Primrose  |  Postgres DBA |  TraceLink Inc.
400 Riverpark Dr.
Floor 2, Suite 200
North Reading, MA 01864
o: +1.978.396.6507
e: bprimr...@tracelink.com




Re: large numbers of inserts out of memory strategy

2017-11-29 Thread Ted Toth
On Tue, Nov 28, 2017 at 9:59 PM, Tom Lane  wrote:
> Brian Crowell  writes:
>> On Tue, Nov 28, 2017 at 12:38 PM, Tomas Vondra >> wrote:
>>> So what does the script actually do? Because psql certainly is not
>>> running pl/pgsql procedures on it's own. We need to understand why
>>> you're getting OOM in the first place - just inserts alone should not
>>> cause failures like that. Please show us more detailed explanation of
>>> what the load actually does, so that we can try reproducing it.
>
>> Perhaps the script is one giant insert statement?
>
> It's pretty clear from the memory map that the big space consumption
> is inside a single invocation of a plpgsql function:
>
> SPI Proc: 2464408024 total in 279 blocks; 1672 free (1 chunks); 
> 2464406352 used
>   PL/pgSQL function context: 537911352 total in 74 blocks; 2387536 free 
> (4 chunks); 535523816 used
>
> So whatever's going on here, there's more to it than a giant client-issued
> INSERT (or COPY), or for that matter a large number of small ones.  What
> would seem to be required is a many-megabyte-sized plpgsql function body
> or DO block.
>
> Actually, the truly weird thing about that map is that the "PL/pgSQL
> function context" seems to be a child of a "SPI Proc" context, whereas
> it's entirely clear from the code that it ought to be a direct child of
> TopMemoryContext.  I have no idea how this state of affairs came to be,
> and am interested to find out.
>
> regards, tom lane


Yes I did generate 1 large DO block:

DO $$
DECLARE thingid bigint; thingrec bigint; thingdataid bigint;
BEGIN
INSERT INTO thing
(ltn,classification,machine,source,thgrec,flags,serial,type) VALUES
('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec
INTO thingid,thingrec;
INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid);
INSERT INTO thingstatus
(thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs)
VALUES 
(thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0);
INSERT INTO thinger
(thing_id,spe_key,cse_unc,lat_spd,cov,dtg,lng,spd,ave_spd,cse,tol,nrpts,lat,alpha,sigma,spd_unc,lng_spd)
VALUES 
(thingid,-1,0.0,-6.58197336634e-08,'{4.27624291532e-09,0.0,3.07802916488e-09,0.0,4.27624291532e-09,0.0,3.07802916488e-09,4.16110417234e-08,0.0,4.16110417234e-08}',1509459163,2.21596980095,0.000226273215958,1.0,0.0,0.1000149,121,0.584555745125,10.0,4.23079740131e-08,0.0,-2.4881907e-10);
INSERT INTO thingdata
(thing_id,category,db_num,xref,org_type,trademark,shortname,fcode,platform,callsign,type,orig_xref,shipclass,home_base,uic,service,di,lngfixed,hull,precision,alert,flag,besufx,name,mmsi,catcode,ntds,imo,pn_num,chxref,threat,sconum,latfixed,db_type,pif,echelon,jtn,quantity,overwrite)
VALUES 
(thingid,'XXX','','','','','004403704','23','','','','','UNEQUATED','','','','',0.0,'','{0,0,0,0,0}','','KS','','UNKNOWN','004403704','','','','','','AFD','',0.0,3,'','',0,0,0)
RETURNING id INTO thingdataid;
INSERT INTO thingnum (thingdata_id,thgnum,state,dtg,cmd) VALUES
(thingdataid,'013086',0,1502970401,'FOO');



END $$;

Should I limit the number of 'thing' inserts within a DO block or
wrapping each 'thing' insert in it's own DO block?



Re: large numbers of inserts out of memory strategy

2017-11-29 Thread Tom Lane
Ted Toth  writes:
> On Tue, Nov 28, 2017 at 9:59 PM, Tom Lane  wrote:
>> So whatever's going on here, there's more to it than a giant client-issued
>> INSERT (or COPY), or for that matter a large number of small ones.  What
>> would seem to be required is a many-megabyte-sized plpgsql function body
>> or DO block.

> Yes I did generate 1 large DO block:

Apparently by "large" you mean "hundreds of megabytes".  Don't do that,
at least not on a machine that hasn't got hundreds of megabytes to spare.
The entire thing has to be sucked into menory and parsed before anything
will happen.

regards, tom lane



Re: seq vs index scan in join query

2017-11-29 Thread Marti Raudsepp
Hi

On Wed, Nov 29, 2017 at 8:55 AM, Emanuel Alvarez  
wrote:
> on the other hand, if we disable sequential scans (SET enable_seqscan
> = 0), we see than not only the query runs faster but the cost seems to
> be lower, as seen in the query plan [2].

True, the cost of the scan itself is lower, but together with
hashjoin/nestloop, the total cost of plan [2] is higher.

This is a wild guess but...

-> Index Scan using keywords_pkey on keywords
   Buffers: shared hit=284808 read=4093
vs
-> Seq Scan on keywords
   Buffers: shared read=36075

Looks like the index scan's advantage in this example is a much higher
cache hit ratio (despite touching so many more pages) and PostgreSQL
is underestimating it.

Have you tuned the effective_cache_size setting? A good starting point
is half the total RAM in your machine. It would be interesting to see
how high you need to set it for the planner to switch to the index
scan plan.

Regards,
Marti Raudsepp



pg_replication_slots

2017-11-29 Thread Nicola Contu
Hello,
I just have few questions on the replication slots :

- is it possible to get size of the slot?
- if the slave is down, the table grows, when the slave comes up again,
will the table be flushed after pushing wals?
- will they impact performances on the master?

I'm just worried about the size.

Thank you


Re: large numbers of inserts out of memory strategy

2017-11-29 Thread Steven Lembark

> > what tools / languages ate you using?  
> 
> I'm using python to read binary source files and create the text files
> contains the SQL. Them I'm running psql -f .

Then chunking the input should be trivial.
There are a variety of techniques you can use to things like disable
indexes during loading, etc. Maybe load them into temp tables and 
then insert the temp's into the destination tables. The point is to 
amortize the memory load over the entire load period.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: large numbers of inserts out of memory strategy

2017-11-29 Thread Steven Lembark

> I'm pretty new to postgres so I haven't changed any configuration
> setting and the log is a bit hard for me to make sense of :(

Diving into the shark tank is a helluva way to learn how to swim :-)

Are you interested in finding doc's on how to deal with the tuning?


-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: seq vs index scan in join query

2017-11-29 Thread Laurenz Albe
Emanuel Alvarez wrote:
> the problematic query looks like this:
> 
> SELECT keywords.strategy_id, results.position, results.created_at FROM results
>   JOIN  keywords ON results.keyword_id = keywords.id
>   WHERE results.account_id = 1
>  AND results.created_at >= '2017-10-25 00:00:00.00'
>  AND results.created_at <= '2017-11-10 23:59:59.99';
> 
> 
> as you can see in the query plan [1] a sequential scan is preferred.
> as we understand it, this happens because the number of rows returned
> from results is too large. if we reduce this number by either
> selecting a smaller created_at range, or another account_id with fewer
> keywords, the planner falls back to an index scan, confirming that the
> number of rows returned from results has a direct influence in this
> choice.
> 
> on the other hand, if we disable sequential scans (SET enable_seqscan
> = 0), we see than not only the query runs faster but the cost seems to
> be lower, as seen in the query plan [2].

The optimizer is right here.

Even though your second execution without sequential scans ran faster,
it is worse.

That is because the execution with the sequential scan touched
26492  + 80492 = 106984 blocks, while the second execution touched
311301 + 48510 = 359811 blocks, more than three times as many.

The second execution was just lucky because most of these blocks were
already cached, and it had to read only half as many blocks from disk.

If you repeat the execution a couple of times, you should see that
the execution using the sequential scans becomes faster.


You can boost performance even more by increasing work_mem
so that the hash can be created in memory.

Yours,
Laurenz Albe








Re: seq vs index scan in join query

2017-11-29 Thread Andres Freund
On 2017-11-29 18:17:18 +0100, Laurenz Albe wrote:
> That is because the execution with the sequential scan touched
> 26492  + 80492 = 106984 blocks, while the second execution touched
> 311301 + 48510 = 359811 blocks, more than three times as many.

That's not necessarily said. What those count are buffer *accesses*,
*not* the number of distinct blocks accessed. You'll very commonly have
more buffer accesses in indexscans but still fewer total reads because a
lot of those accesses will be reads previously done in the same
scan. Just imagine a scan of an index with a leaf page pointing to 100
tuples of the same value - that'd result in at least a 100 buffer
accesses, but it'd be highly likely that they'll be in cache.

Greetings,

Andres Freund



Re: large numbers of inserts out of memory strategy

2017-11-29 Thread Rory Campbell-Lange
On 28/11/17, Rob Sargent (robjsarg...@gmail.com) wrote:
> 
> On 11/28/2017 10:50 AM, Ted Toth wrote:
> > On Tue, Nov 28, 2017 at 11:19 AM, Rob Sargent  wrote:
> > > > On Nov 28, 2017, at 10:17 AM, Ted Toth  wrote:
> > > > 
> > > > I'm writing a migration utility to move data from non-rdbms data
> > > > source to a postgres db. Currently I'm generating SQL INSERT
> > > > statements involving 6 related tables for each 'thing'. With 100k or
> > > > more 'things' to migrate I'm generating a lot of statements and when I
> > > > try to import using psql postgres fails with 'out of memory' when
> > > > running on a Linux VM with 4G of memory. If I break into smaller
> > > > chunks say ~50K statements then thde import succeeds. I can change my
> > > > migration utility to generate multiple files each with a limited
> > > > number of INSERTs to get around this issue but maybe there's
> > > > another/better way?

> > > what tools / languages ate you using?

> > I'm using python to read binary source files and create the text files
> > contains the SQL. Them I'm running psql -f .

> If you're going out to the file system, I would use COPY of csv files (if
> number of records per table is non-trivial).  Any bulk loading python
> available?

psycopg2 has a copy_from function and (possibly pertinent in this case) a
copy_expert function which allows the read buffer size to be specified.



Re: seq vs index scan in join query

2017-11-29 Thread Laurenz Albe
Andres Freund wrote:
> On 2017-11-29 18:17:18 +0100, Laurenz Albe wrote:
> > That is because the execution with the sequential scan touched
> > 26492  + 80492 = 106984 blocks, while the second execution touched
> > 311301 + 48510 = 359811 blocks, more than three times as many.
> 
> That's not necessarily said. What those count are buffer *accesses*,
> *not* the number of distinct blocks accessed. You'll very commonly have
> more buffer accesses in indexscans but still fewer total reads because a
> lot of those accesses will be reads previously done in the same
> scan. Just imagine a scan of an index with a leaf page pointing to 100
> tuples of the same value - that'd result in at least a 100 buffer
> accesses, but it'd be highly likely that they'll be in cache.

Thanks for explaining that.

Yours,
Laurenz Albe



Partition pruning / agg push down for star schema in pg v11

2017-11-29 Thread legrand legrand
Hello,

Working on Oracle migration POCs, I'm very interested in v11 and declarative
partitioning optimizations.

I have a typical star schema, having dimension tables "product", "calendar"
and "country" and a fact table "sales". 
This fact table is partitionned by time (range by month) and country (list).

Will query like:

select product.name, calendar.month, sum(sales.net_price) 
from sales
 inner join product on (product.id = sales.cust_id)
 inner join country on (country.id = sales.country_id)
 inner join calendaron (calendar.id = sales.calendar_id)
where 
 country.name = 'HERE'
 and calendar.year = '2017'
group by product.name,calendar.month

be able to identify needed partitions ?

nb: the query has predicates on dimension tables not on columns used for
fact table partitioning:
- country.name vs sales.country_id,
- calendar.year vs sales.calendar_id.


Second question: will some aggregation be pushed to the fact table ?

Something like
select product.name,calendar.month,agg.sum_net 
from
  (select product_id,calendar_id,sum(net_price) as sum_net
   from sales
inner join country  on (country.id = sales.country_id)
inner join calendar on (calendar.id = sales.calendar_id)
   where 
country.name = 'HERE'
and calendar.year = '2017') agg
 inner join product on (product.id = agg.cust_id)
 inner join calendaron (calendar.id = agg.calendar_id)
group by product.name,calendar.month


Thanks in advance (commitfest or patches references are welcome)

Regards
PAscal  





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: seq vs index scan in join query

2017-11-29 Thread Emanuel Alvarez
Thank you all for your responses!

On Wed, Nov 29, 2017 at 7:31 AM, legrand legrand
 wrote:
> Hi,
>
> Could you give us the partitions (ranges values) and indexes definition for
> result table ?

We partition by month, they usually start the 20th of each month (this
was the date we partitioned the table), for the tables in questions
constraints look like this:
"constraint_37" CHECK (created_at >= '2017-10-21 00:00:00'::timestamp
without time zone AND created_at < '2017-11-20 00:00:00'::timestamp
without time zone)
"constraint_110" CHECK (created_at >= '2017-11-20 00:00:00'::timestamp
without time zone AND created_at < '2017-12-20 00:00:00'::timestamp
without time zone)

Indexes are:
"results_account_id_created_at_idx" btree (account_id, created_at DESC)
"results_id_idx" btree (id)
"results_keyword_id_created_at_idx" btree (keyword_id, created_at DESC)

On Wed, Nov 29, 2017 at 11:57 AM, Marti Raudsepp  wrote:
> -> Index Scan using keywords_pkey on keywords
>Buffers: shared hit=284808 read=4093
> vs
> -> Seq Scan on keywords
>Buffers: shared read=36075
>
> Looks like the index scan's advantage in this example is a much higher
> cache hit ratio (despite touching so many more pages) and PostgreSQL
> is underestimating it.

Interesting, we were completely missing this.

> Have you tuned the effective_cache_size setting? A good starting point
> is half the total RAM in your machine. It would be interesting to see
> how high you need to set it for the planner to switch to the index
> scan plan.

We did this, and although it has an effect it's rapidly shadowed by
the results size. For example, setting it to 10GB is OK for one month
worth of data, but it'll fallback to seq scan for two months data.
Although we might be able to live with that for now.

On Wed, Nov 29, 2017 at 2:17 PM, Laurenz Albe  wrote:
> The optimizer is right here.

Never doubted it :)

> Even though your second execution without sequential scans ran faster,
> it is worse.
>
> That is because the execution with the sequential scan touched
> 26492  + 80492 = 106984 blocks, while the second execution touched
> 311301 + 48510 = 359811 blocks, more than three times as many.
>
> The second execution was just lucky because most of these blocks were
> already cached, and it had to read only half as many blocks from disk.
>
> If you repeat the execution a couple of times, you should see that
> the execution using the sequential scans becomes faster.

In a live environment, execution times for sequential scans are always
slower, though they do vary in time. The reason for this is that the
partition for last month's results is accessed frequently, and as such
is kept in the cache; while the other two tables, keywords and
keyword_data, are accessed sparsely, and mostly through their indexes.
This way, indexes have a much bigger chance of being cached in memory
than other parts of the table. In other words: the second execution is
always lucky.

Is this interpretation correct? Is there any option to deal with this
issue? Besides adding more RAM.

We could, theoretically, partition keyword_data as it's also time
base, but it's not that big to justify a partitioning. It's also not
small enough to be doing sequential scan on it all the time.


> You can boost performance even more by increasing work_mem
> so that the hash can be created in memory.

This is interesting, and has a positive effect on our queries. We are
currently testing a combination of work_mem with effective_cache_size
settings, though I'm afraid refactoring the query will be inevitable.

Thank you!



Re: seq vs index scan in join query

2017-11-29 Thread Jeff Janes
On Tue, Nov 28, 2017 at 10:55 PM, Emanuel Alvarez 
wrote:

> hi all,
>
> we're in the process of optimizing some queries and we've noted a case
> where the planner prefers a sequential scan instead of using an index,
> while the index scan is actually much faster. to give you some
> context: we have two main tables, keywords and results. keywords has
> approximately 700.000 rows; while results holds approximately one row
> per keyword per day (roughly 70m at the moment, not all keywords are
> active at any given day). results is currently partitioned by
> (creation) time. it's also worth noting that we use SSDs in our
> servers, and have random_page_cost set to 1.
>
>
> the problematic query looks like this:
>
> SELECT keywords.strategy_id, results.position, results.created_at FROM
> results
>   JOIN  keywords ON results.keyword_id = keywords.id
>   WHERE results.account_id = 1
>  AND results.created_at >= '2017-10-25 00:00:00.00'
>  AND results.created_at <= '2017-11-10 23:59:59.99';
>
>
> as you can see in the query plan [1] a sequential scan is preferred.
>

I would say the preference is not for the seq scan, but rather for the hash
join.  If the seq scan couldn't be fed into a hash join, it would not look
very favorable.

I think hash joins are a bit optimistic on how much cpu time they think
they use building the hash table.  You can probably get better plans for
this type of query by increasing cpu_tuple_cost to 0.02 or 0.03.  That
works because the hash join over the seq scan has to scan 700,000 tuples to
build the hash table, which is then probed only 70,000 time, while the
nested loop index scan just probes the 70,000 rows is needs directly and
ignores the other 90%.

...


>
> on the other hand, if we disable sequential scans (SET enable_seqscan
> = 0), we see than not only the query runs faster but the cost seems to
> be lower, as seen in the query plan [2].
>

The costs for plan 2 doesn't look lower to me.  196754.90 > 120421.32


>
> in this example the gain it's not much: ~0.5s. but when we add a
> second join table with additional keyword data the planner still
> prefers a sequential scan on a table that has +6m rows. query looks
> like this:
>
> SELECT keywords.strategy_id, results.position, results.created_at,
> keyword_data.volume FROM results
>   JOIN  keywords ON results.keyword_id = keywords.id
>   JOIN keyword_data ON keywords.keyword_data_id = keyword_data.id
>   WHERE results.account_id = 1
>  AND results.created_at >= '2017-10-25 00:00:00.00'
>  AND results.created_at <= '2017-11-19 23:59:59.99';
>
>
> in this case query takes up to 8s, query plan can be found in [3].
> obviously dataset has to be large to prefer a sequential on a 6m rows
> table. similarly, reducing the created_at range or using an account_id
> with fewer keywords makes the planner prefer index scan, accelerating
> the query considerably.
>

If that is the query you are really concerned about, we would have to see
the faster plan for that query.  (Or better yet, keep the created_at range
the same, and set enable_hashjoin to off to get it to switch plans).

This looks like is a very skewed query.  keyword_data has 10 rows for every
row in keywords, yet adding a join to keyword_data doesn't increase the
number of rows returned by the query at all.  That is weird, isn't it?

For what its worth, in my hands on your simpler query it likes to sort the
70,000 qualifying rows from "results" table, then do a merge join againsts
the index on keywords.  And it truly is the faster option.  I have to
enable_mergejoin=off before I can get either of your plans.  Once I do, the
nested loop does seem to be faster than the hash join but not by the two
fold that you see, and they jump around quite a bit from run to run.

Cheers,

Jeff


Re: pg_replication_slots

2017-11-29 Thread Emanuel Alvarez
On Wed, Nov 29, 2017 at 12:39 PM, Nicola Contu  wrote:
> Hello,
Hi!

> I just have few questions on the replication slots :
>
> - is it possible to get size of the slot?
I use something like this to know where my slot is at:
  SELECT pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn)
FROM pg_replication_slots;

pg_current_xlog_location() will give you the actual log location in
master, and you can get all info about a slot in the
pg_replication_slots relation. In this case we query for restart_lsn
which has the slot restart location. pg_xlog_location_diff() is a
convenient function which returns the difference between two
transaction log locations (pg_lsn type). pg_lsn represent byte
locations, so the number returned by this function is the difference
in bytes from the current log position and slot restart location, the
most approximate value for what a slot size would be.

> - if the slave is down, the table grows, when the slave comes up again, will
> the table be flushed after pushing wals?

Yes. Actually, checkpoints should continue to be created in the
master, only difference is that WAL segments are kept at least from
the slot restart location onward. Once your slave is back up, and all
WALs have been transfered, old segments will be deleted.

> - will they impact performances on the master?

Slots themselves shouldn't have any appreciable impact on performance.
And unless your slave is down for a long period of time, or it's
configured wrongly so it doesn't use the slot, or your master server
is very restricted in disk space, you shouldn't have any problem, and
you'll see it coming before it affects the system. Although, you might
have the issues common to augmenting the WAL level, namely increased
IO. A good practice is to store your WAL files in a separate disk to
be able to take advantage of parallel IO ops. But this shouldn't even
be necessary unless you note something.

Also not related to slots themselves, streaming replication has the
aggregated impact of transferring WALs across the net. Again, this
isn't usually an issue, specially if you use asynchronous replication.

> I'm just worried about the size.

If your slave is fast enough to keep up with master you shouldn't even
note there's replication going on. Just in case, I have a check that
runs the query in the answer to the first question every a couple of
minutes and sends me a notification if it gets too high. 99.99% of the
time is zero, but we did have instances where the slave went down and
the disk on master started to get filled. Just watch out for those
cases, and in an emergency you can just delete the slot and all the
WAL segments will be reclaimed seamlessly.

> Thank you

Hope the answers are useful.

Regards,
Ema



Searching for big differences between values

2017-11-29 Thread Durumdara
Hello!

Somewhere the users made mistakes on prices (stock).

I need to search for big differences between values.
For example:

20
21
21,5
30
28
..
46392 <-
46392 <-

But it could be:

42300
43100
44000
43800
65000 <-
42100

Human eye could locate these values, but there we need to check 30.000
articles and 450.000 values.

Do you have any idea, how to this with SQL?

In most cases the first values are ok, the second interval (after a date)
it could be wrong...

I don't know how to define the solution, but I think PGSQL have intelligent
solution for this problem.

We need to search for elements have bigger value like base price *
tolerance.
But the base price is calculated dynamically from the lower values... The
tolerance is lower on highest base prices.

Thank you for any help!

Best regards
   dd