Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-12 Thread Tom Lane
Haribabu kommi haribabu.ko...@huawei.com writes:
 To handle the above case instead of directly resetting the dead tuples as 
 zero, how if the exact dead tuples
 are removed from the table stats. With this approach vacuum gets triggered 
 frequently thus it reduces the bloat.

This does not seem like a very good idea as-is, because it will mean that
n_dead_tuples can diverge arbitrarily far from reality over time, as a
result of accumulation of errors.  It also doesn't seem like a very good
idea that VACUUM sets n_live_tuples while only adjusting n_dead_tuples
incrementally; ideally those counters should move in the same fashion.
In short, I think this patch will create at least as many problems as
it fixes.

What would make more sense to me is for VACUUM to estimate the number
of remaining dead tuples somehow and send that in its message.  However,
since the whole point here is that we aren't accounting for transactions
that commit while VACUUM runs, it's not very clear how to do that.

Another way to look at it is that we want to keep any increments to
n_dead_tuples that occur after VACUUM takes its snapshot.  Maybe we could
have VACUUM copy the n_dead_tuples value as it exists when VACUUM starts,
and then send that as the value to subtract when it's done?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-12 Thread Sameer Thakur
 This paragraph reads a bit strange to me:

 +  A statistics session is the time period when statistics are gathered by
 statistics collector
 +  without being reset. So a statistics session continues across normal
 shutdowns,
 +  but whenever statistics are reset, like during a crash or upgrade, a new
 time period
 +  of statistics collection commences i.e. a new statistics session.
 +  The query_id value generation is linked to statistics session to
 emphasize the fact
 +  that whenever statistics are reset,the query_id for the same queries will
 also change.

 time period when?  Shouldn't that be time period during which.
 Also, doesn't a new statistics session start when a stats reset is
 invoked by the user?  The bit after commences appears correct (to me,
 not a native by any means) but seems also a bit strange.

I have tried to rephrase this. Hopefully less confusing

 A statistics session refers to the time period when statement
statistics are gathered by
statistics collector. A statistics session persists across normal
shutdowns. Whenever statistics are reset like during a crash or upgrade, a new
statistics session starts. The query_id value generation is linked to
statistics session to
emphasize that whenever statistics are reset,the query_id for the same
queries will also change.

regards
Sameer




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5774365.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] Patch for fast gin cache performance improvement

2013-10-12 Thread Ian Link

I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above.  So I'd like to mark
this patch as Returned with Feedback.  Is it OK?

Sounds like a good idea. Thanks for the review!
Ian Link



   	   
   	Etsuro Fujita  
  Thursday, October
 10, 2013 1:01 AM
  Ian Link wrote:
Although I asked this question, I've reconsidered about these
parameters, and it seems that these parameters not only make code
rather complex but are a little confusing to users.  So I'd like to propose
to introduce only one parameter:
fast_cache_size.  While users that give weight to update performance
for the fast update technique set this parameter to a large value,
users that give weight not only to update performance but to search
performance set this parameter to a small value.  What do you think about
this?
I think it makes sense to maintain this separation. If the user doesn't need
a per-index setting, they don't have to use the parameter. Since the parameter
is off by default, they don't even need to worry about it.
There might as well be one parameter for users that don't need fine-grained
control. We can document this and I don't think it will be confusing to the
user.

OK, though I'd like to hear the opinion of others.

4. In my understanding, the small value of
gin_fast_limit/fast_cache_size leads to the increase in GIN search
performance, which, however, leads to the decrease in GIN update
performance.  Am I right?  If so, I think the tradeoff should be noted in
the documentation.
I believe this is correct.

5. The following documents in Chapter 57. GIN Indexes need to be
updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and
Tricks
Sure, I can add something.

6. I would like to see the results for the additional test cases
(tsvectors).
I don't really have any good test cases for this available, and have very
limited
time for postgres at the moment. Feel free to create a test case, but I don't
believe I can at the moment. Sorry!

Unfortunately, I don't have much time to do such a thing, though I think we
should do that.  (In addition, we should do another performance test to make
clear an influence of fast update performance from introducing these parameters,
which would be required to determine the default values of these parameters.)

7. The commented-out elog() code should be removed.
Sorry about that, I shouldn't have submitted the patch with those still there.

I should have a new patch soonish, hopefully. Thanks for your feedback!

I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above.  So I'd like to mark
this patch as Returned with Feedback.  Is it OK?

Thanks,

Best regards,
Etsuro Fujita



   	   
   	Ian Link  
  Monday, September
 30, 2013 3:09 PM
  

Hi Etsuro,
Sorry for the delay but I have been very busy with work. I have been 
away from postgres for a while, so I will need a little time to review 
the code and make sure I give you an informed response. I'll get back to
 you as soon as I am able. Thanks for understanding.
Ian Link 


  
   	   
   	Etsuro Fujita  
  Friday, September
 27, 2013 2:24 AM
  I wrote:
I had a look over this patch.  I think this patch is interesting and very
useful.
Here are my review points:

8. I think there are no issues in this patch.  However, I have one question:
how this patch works in the case where gin_fast_limit/fast_cache_size = 0?  In
this case, in my understanding, this patch inserts new entries into the
pending
list temporarily and immediately moves them to the main GIN data structure
using
ginInsertCleanup().  Am I right?  If so, that is obviously inefficient.

Sorry, There are incorrect expressions.  I mean gin_fast_limit  0 and
fast_cache_size = 0.

Although I asked this question, I've reconsidered about these parameters, and it
seems that these parameters not only make code rather complex but are a little
confusing to users.  So I'd like to propose to introduce only one parameter:
fast_cache_size.  While users that give weight to update performance for the
fast update technique set this parameter to a large value, users that give
weight not only to update performance but to search performance set this
parameter to a small value.  What do you think about this?

Thanks,

Best regards,
Etsuro Fujita



   	   
   	Etsuro Fujita  
  Thursday, 
September 26, 2013 6:02 AM
  Hi Ian,

This patch contains a performance improvement for the fast gin cache. As you
may know, the performance of the fast gin cache decreases with its size.
Currently, the size of the fast gin cache is tied to work_mem. The size of
work_mem can often be quite high. The large size of work_mem is inappropriate
for the fast gin cache size. Therefore, we created a separate cache size
called
gin_fast_limit. This global variable controls the size of the fast gin cache,
independently of work_mem. Currently, the default 

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-12 Thread Magnus Hagander
On Oct 12, 2013 2:13 AM, MauMau maumau...@gmail.com wrote:

 From: Bruce Momjian br...@momjian.us

 On Thu, Oct 10, 2013 at 11:01:52PM +0900, MauMau wrote:

 Although this is not directly related to memory, could you set
 max_prepared_transactions = max_connections at initdb time?  People
 must feel frustrated when they can't run applications on a Java or
 .NET application server and notice that they have to set
 max_prepared_transactions and restart PostgreSQL.  This is far from
 friendly.


 I think the problem is that many users don't need prepared transactions
 and therefore don't want the overhead.  Is that still accurate?


 I'm not sure if many use XA features, but I saw the questions and answer
a few times, IIRC.  In the trouble situation, PostgreSQL outputs an
intuitive message like increase max_prepared_transactions, so many users
might possibly have been able to change the setting and solve the problem
themselves without asking for help, feeling stress like Why do I have to
set this?  For example, max_prepared_transactions is called hideous
creature in the following page:

 https://community.jboss.org/wiki/InstallPostgreSQLOnFedora?_sscc=t

Anybody who follows that page is screwed anyway. I notice they recommend
running regular VACUUM FULL across the whole database, so it's obvious they
know nothing about postgresql. There's nothing we can do about what people
write on random pages around the Internet.

 According to the below page, the amount of memory consumed for this is
(770 + 270 * max_locks_per_transaction) * max_prepared_transactions.
 With the default setting of maxconnections=100 and
max_locks_per_transaction=64, this is only 180KB.  So the overhead is
negligible.

You are assuming memory is the only overhead. I don't think it is.

 If the goal is to make PostgreSQL more friendly and run smoothly without
frustration from the start and not perfect tuning, I think
max_prepared_transactions=max_connections is an easy and good item.  If the
goal is limited to auto-tuning memory sizes, this improvement can be
treated separately.

Frankly, I think we'd help 1000 times more users of we enabled a few wal
writers by default and jumped the wal level. Mainly so they could run one
off base backup. That's used by orders of magnitude more users than XA.

/Magnus


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-12 Thread Magnus Hagander
On Oct 11, 2013 10:23 PM, Josh Berkus j...@agliodbs.com wrote:

 On 10/11/2013 01:11 PM, Bruce Momjian wrote:
  In summary, I think we need to:
 
  *  decide on new defaults for work_mem and maintenance_work_mem
  *  add an initdb flag to allow users/packagers to set shared_bufffers?
  *  add an autovacuum_work_mem setting?
  *  change the default for temp_buffers?

 If we're changing defaults, bgwriter_lru_maxpages and vacuum_cost_limit
 could also use a bump; those thresholds were set for servers with  1GB
 of RAM

Uh, those are there to limit io and not memory, right? More memory isn't
the reason to increase them, more io is. For people deploying on modern
server hardware then yes it's often low, but for all those deploying in
virtualized environments with io performance reminding you of the 1990ies,
I'm not so sure it is...

/Magnus


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-12 Thread Alexander Korotkov
On Sat, Oct 12, 2013 at 1:55 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 10.10.2013 13:57, Heikki Linnakangas wrote:
  On 09.10.2013 02:04, Tomas Vondra wrote:
  On 8.10.2013 21:59, Heikki Linnakangas wrote:
  On 08.10.2013 17:47, Alexander Korotkov wrote:
  Hi, Tomas!
 
  On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondrat...@fuzzy.cz   wrote:
 
  I've attempted to rerun the benchmarks tests I did a few weeks ago,
  but
 I got repeated crashes when loading the data (into a table with
  tsvector+gin index).
 
  Right before a crash, theres this message in the log:
 
   PANIC:  not enough space in leaf page!
 
 
  Thanks for testing. Heikki's version of patch don't works for me too
 on
  even much more simplier examples. I can try to get it working if he
  answer
  my question about GinDataLeafPageGetPostingList* macros.
 
  The new macros in that patch version were quite botched. Here's a new
  attempt.
 
  Nope, still the same errors :-(
 
  PANIC:  not enough space in leaf page!
  LOG:  server process (PID 29722) was terminated by signal 6: Aborted
  DETAIL:  Failed process was running: autovacuum: ANALYZE public.messages
 
  I've continued hacking away at the patch, here's yet another version.
  I've done a lot of cleanup and refactoring to make the code more
  readable (I hope). I'm not sure what caused the panic you saw, but it's
  probably fixed now.  Let me know if not.

 Yup, this version fixed the issues. I haven't been able to do any
 benchmarks yet, all I have is some basic stats

|   HEAD   |  patched
 ==
 load duration  |  1084 s  |   1086 s
 subject index  |   96 MB  | 96 MB
 body index | 2349 MB  |   2051 MB

 So there's virtually no difference in speed (which is expected, AFAIK)
 and the large index on full message bodies is significantly smaller.


Yes, it should be no significant difference in speed. But difference in
index sizes seems to be too small. Could you share database dump somewhere?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Compression of full-page-writes

2013-10-12 Thread Amit Kapila
On Fri, Oct 11, 2013 at 10:36 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-10-11 09:22:50 +0530, Amit Kapila wrote:
 I think it will be difficult to prove by using any compression
 algorithm, that it compresses in most of the scenario's.
 In many cases it can so happen that the WAL will also not be reduced
 and tps can also come down if the data is non-compressible, because
 any compression algorithm will have to try to compress the data and it
 will burn some cpu for that, which inturn will reduce tps.

 Then those concepts maybe aren't such a good idea after all. Storing
 lots of compressible data in an uncompressed fashion isn't an all that
 common usecase. I most certainly don't want postgres to optimize for
 blank padded data, especially if it can hurt other scenarios. Just not
 enough benefit.
 That said, I actually have relatively high hopes for compressing full
 page writes. There often enough is lot of repetitiveness between rows on
 the same page that it should be useful outside of such strange
 scenarios. But maybe pglz is just not a good fit for this, it really
 isn't a very good algorithm in this day and aage.

Do you think that if WAL reduction or performance with other
compression algorithm (for ex. snappy)  is better, then chances of
getting the new compression algorithm in postresql will be more?
Wouldn't it be okay, if we have GUC to enable it and have pluggable
api for calling compression method, with this we can even include
other compression algorithm's if they proved to be good and reduce the
dependency of this patch on inclusion of new compression methods in
postgresql?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-12 Thread Tomas Vondra
On 12.10.2013 12:11, Alexander Korotkov wrote:
 On Sat, Oct 12, 2013 at 1:55 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz wrote:
 
 Yup, this version fixed the issues. I haven't been able to do any
 benchmarks yet, all I have is some basic stats
 
|   HEAD   |  patched
 ==
 load duration  |  1084 s  |   1086 s
 subject index  |   96 MB  | 96 MB
 body index | 2349 MB  |   2051 MB
 
 So there's virtually no difference in speed (which is expected, AFAIK)
 and the large index on full message bodies is significantly smaller.
 
 
 Yes, it should be no significant difference in speed. But difference in
 index sizes seems to be too small. Could you share database dump somewhere?

Turns out that if I do VACUUM FULL after loading the data (a sequence of
INSERT commands), the index sizes drop significantly.

   |   HEAD   |  patched
==
subject index  |   42 MB  |15 MB
body index |  624 MB  |   328 MB

So there's a significant improvement, as expected. I'm wondering if the
bloat is expected too? Is that the consequence of incremental index
updates vs. rebuilding the whole index at once during VACUUM FULL?

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background workers, round three

2013-10-12 Thread Kohei KaiGai
I briefly checked these patches. Let me add some comments.

* terminate-worker-v1.patch
TerminateBackgroundWorker() turns on slot-terminate flag under
LW_SHARED lock. Is it reasonable because all the possible caller
is the background worker process itself, isn't it?

* ephemeral-precious-v1.patch
AtEOXact_BackgroundWorker() is located around other AtEOXact_*
routines. Doesn't it makes resource management complicated?
In case when main process goes into error handler but worker
process is still running in health, it may continue to calculate
something and put its results on shared memory segment, even
though main process suggest postmaster to kill it.

All the ResourceOwnerRelease() callbacks are located prior to
AtEOXact_BackgroundWorker(), it is hard to release resources
being in use by background worker, because they are healthy
running until it receives termination signal, but sent later.
In addition, it makes implementation complicated if we need to
design background workers to release resources if and when it
is terminated. I don't think it is a good coding style, if we need
to release resources in different location depending on context.

So, I'd like to propose to add a new invocation point of
ResourceOwnerRelease() after all AtEOXact_* jobs, with
new label something like RESOURCE_RELEASE_FINAL.

In addition, AtEOXact_BackgroundWorker() does not synchronize
termination of background worker processes being killed.
Of course it depends on situation, I think it is good idea to wait
for completion of worker processes to be terminated, to ensure
resource to be released is backed to the main process if above
ResourceOwnerRelease() do the job.

Thanks,

2013/10/11 Robert Haas robertmh...@gmail.com:
 On Fri, Oct 11, 2013 at 9:27 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Finally I got the chance to put my hands on this code. Really sorry
 for the late replay.

 Thanks for the review.  I'll respond to this in more detail later, but
 to make a long story short, I'm looking to apply
 terminate-worker-v1.patch (possibly with modifications after going
 over your review comments), but I'm not feeling too good any more
 about ephemeral-precious-v1.patch, because my experience with those
 facilities has so far proved unsatisfying.  I think I'd like to
 withdraw the latter patch pending further study.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] removing old ports and architectures

2013-10-12 Thread Andres Freund
Hi,

As discussed in 20130926225545.gb26...@awork2.anarazel.de and on quite
some other occasions there's quite some scalability improvements we
could make if we had cross platform support for atomic
operations. Providing that is a fair bit of work for every
architecture/compiler/OS, so I think it's a reasonable thing to remove
unused things first.
Alternatively we can maintain parallel code paths for atomic ops
supporting platforms and for those without, but that seems like a bad
idea from a complexity and testing perspective given that all even
remotely current platforms should support them.

I think we should remove support the following ports:
- IRIX
- UnixWare
- Tru64

Neither of those are relevant.

I think we should remove support for the following architectures:
- VAX
- univel (s_lock support remaining)
- sinix (s_lock support remaining)
- sun3 (I think it's just s_lock support remaining)
- natsemi 32k
- superH
- ALPHA (big pain in the ass to get right, nobody uses it anymore)
- m86k (doesn't have a useable CAS on later iterations like coldfire)
- M32R (no userspace CAS afaics)
- mips for anything but gcc  4.4, using gcc's atomics support
- s390 for anything but gcc  4.4, using gcc's atomics support
- 32bit/v9 sparc (doesn't have proper atomics, old)

Possibly:
- all mips
- PA-RISC. I think Tom was the remaining user there? Maybe just !gcc.

Any arguments against?

Last round of discussion of removing dead ports:
1335292179.13481.4.ca...@vanquo.pezone.net

Discusses state of of spinlocks and barriers on various platforms:
20130920151110.ga8...@awork2.anarazel.de

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-12 Thread Tomas Vondra
On 11.10.2013 13:42, Huchev wrote:
 
 gettimeofday(start, NULL);
 for (i = 0; i  VALUES; i++) {
 state = XXH32_init(result);
 XXH32_update(state, i, 4);
 XXH32_digest(state);
 }
 gettimeofday(end, NULL);
 
 
 This code is using the update variant, which is only useful when dealing
 with very large amount of data which can't fit into a single block of
 memory. This is obviously overkill for a 4-bytes-only test. 3 functions
 calls, a malloc, intermediate data book keeping, etc.
 
 To hash a single block of data, it's better to use the simpler (and faster)
 variant XXH32() :
 
 gettimeofday(start, NULL);
 for (i = 0; i  VALUES; i++) { XXH32(i, 4, result); }
 gettimeofday(end, NULL);
 
 You'll probably get better results by an order of magnitude. For better
 results, you could even inline it (yes, for such short loop with almost no
 work to do, it makes a very sensible difference).

Not really. Even with this change it's slightly slower than crc32, at
least with the 32-bit integers. With 64-bit integers it's about 2x as
fast. But even then it's like ~1% of the total runtime, so any
improvements here are not really changing anything.

The inlining is not a good idea IMHO, because that'd be very different
from the actual usage (there won't be such tight loop). OTOH I'm not
sure if the compiler does not already inline that as an optimization.

 That being said, it's true that these advanced hash algorithms only
 shine with big enough amount of data to hash. Hashing a 4-bytes
 value into a 4-bytes hash is a bit limited exercise. There is no
 pigeon hole issue. A simple multiplication by a 32-bits prime would
 fare good enough and result in zero collision.

Agreed. I'll revisit this if/when I'll need to support larger data types
in this aggregate.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] removing old ports and architectures

2013-10-12 Thread Peter Geoghegan
On Sat, Oct 12, 2013 at 5:46 PM, Andres Freund and...@2ndquadrant.com wrote:
 I think we should remove support the following ports:
 - IRIX
 - UnixWare
 - Tru64

 Neither of those are relevant.

Seems reasonable.

 I think we should remove support for the following architectures:
 - VAX

Agreed.

 - univel (s_lock support remaining)
 - sinix (s_lock support remaining)
 - sun3 (I think it's just s_lock support remaining)
 - natsemi 32k

I don't know enough about these, which doesn't bode well for them.

 - superH

SuperH isn't dead, but it is only used for very small embedded
systems, I think (mostly microcontrollers). So maybe.

 - ALPHA (big pain in the ass to get right, nobody uses it anymore)

Yes, for many years now ALPHA has only been useful as a way of
illustrating how bad it's possible for CPU memory operation reordering
considerations to get. So I quite agree.

 - m86k (doesn't have a useable CAS on later iterations like coldfire)

It does seem like Motorola 68k is vanishingly close to dead.

 - M32R (no userspace CAS afaics)
 - mips for anything but gcc  4.4, using gcc's atomics support
 - s390 for anything but gcc  4.4, using gcc's atomics support
 - 32bit/v9 sparc (doesn't have proper atomics, old)

Not so sure about these.

 Possibly:
 - all mips
 - PA-RISC. I think Tom was the remaining user there? Maybe just !gcc.

I think we should think hard about removing support for MIPS. A lot of
Chinese chip manufacturers have licensed MIPS technology in just the
last couple of years, so there is plenty of it out there; I'd be
slightly concerned that the proposed restrictions on MIPS would be
onerous. Much of this is the kind of hardware that a person might
plausibly want to run Postgres on.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers