Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2011-12-20 Thread Tomas Vondra
On 20.12.2011 11:20, Magnus Hagander wrote: 2011/12/20 Tomas Vondra t...@fuzzy.cz: I haven't updated the docs yet - let's see if the patch is acceptable at all first. Again, without having reviewed the code, this looks like a feature we'd want, so please add some docs, and then submit

[HACKERS] WIP: explain analyze with 'rows' but not timing

2011-12-22 Thread Tomas Vondra
Hi all, most of the time I use auto_explain, all I need is duration of the query and the plan with estimates and actual row counts. And it would be handy to be able to catch long running queries with estimates that are significantly off (say 100x lower or higher compared to actual row numbers).

Re: [HACKERS] WIP: explain analyze with 'rows' but not timing

2011-12-23 Thread Tomas Vondra
On 23.12.2011 14:57, Robert Haas wrote: 2011/12/22 Tomas Vondra t...@fuzzy.cz: The gettimeofday() calls are not exactly cheap in some cases, so why to pay that price when all you need is the number of rows? Fair point. The patch attached does this: 1) adds INSTRUMENT_ROWS, a new

Re: [HACKERS] WIP: explain analyze with 'rows' but not timing

2011-12-23 Thread Tomas Vondra
On 23.12.2011 16:14, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: One thing I'm wondering about is that the InstrumentOptions are not exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to redefine

Re: [HACKERS] WIP: explain analyze with 'rows' but not timing

2011-12-23 Thread Tomas Vondra
Dne 23.12.2011 22:37, Pavel Stehule napsal(a): 2011/12/23 Tom Lanet...@sss.pgh.pa.us: Tomas Vondrat...@fuzzy.cz writes: The motivation for this patch was that collection timing data often causes performance issues and in some cases it's not needed. But is this true for row counts? Perhaps

Re: [HACKERS] patch: ALTER TABLE IF EXISTS

2012-01-02 Thread Tomas Vondra
On 2 Leden 2012, 14:11, Pavel Stehule wrote: Hello this is relative simple patch that add possibility to skip noexisting tables. It is necessary for silent cleaning when dump is loaded. Just a curious question - what use case is solved by this? Under what circumstances you get an ALTER TABLE

Re: [HACKERS] pgstat wait timeout

2012-01-04 Thread Tomas Vondra
On 4 Leden 2012, 13:17, pratikchirania wrote: I have installed RAMdisk and pointed the parameter: #stats_temp_directory = 'B:\pg_stat_tmp' I also tried #stats_temp_directory = 'B:/pg_stat_tmp' But, still there is no file created in the RAM disk. The previous stat file is touched even after

[HACKERS] easy way of copying regex_t ?

2012-01-05 Thread Tomas Vondra
Hi all, I've been working on moving an extension that allows to move the ispell dictionaries to shared segment. It's almost complete, the last FIXME is about copying regex_t structure (stored in AFFIX). According to regex.h the structure is fairly complex and not exactly easy to understand, so

Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-01-13 Thread Tomas Vondra
On 13.1.2012 18:07, Josh Berkus wrote: Eric's review follows: Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from 2012-01-12 git checkout. Patch applied fine. 'make check' results in failures when this patch is put into place. 6 of 128

Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-17 Thread Tomas Vondra
On 20.12.2011 19:59, Tomas Vondra wrote: On 20.12.2011 11:20, Magnus Hagander wrote: 2011/12/20 Tomas Vondra t...@fuzzy.cz: I haven't updated the docs yet - let's see if the patch is acceptable at all first. Again, without having reviewed the code, this looks like a feature we'd want, so

Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-21 Thread Tomas Vondra
On 20 Leden 2012, 13:23, Magnus Hagander wrote: On Tue, Jan 17, 2012 at 21:39, Tomas Vondra t...@fuzzy.cz wrote: On 20.12.2011 19:59, Tomas Vondra wrote: On 20.12.2011 11:20, Magnus Hagander wrote: 2011/12/20 Tomas Vondra t...@fuzzy.cz: I haven't updated the docs yet - let's see if the patch

Re: [HACKERS] Finer Extension dependencies

2012-01-21 Thread Tomas Vondra
On 21 Leden 2012, 18:20, Dimitri Fontaine wrote: Hi, Thank you for reviewing this patch! Hitoshi Harada umi.tan...@gmail.com writes: Next, some questions: - Why is the finer dependency needed? Do you have tangible example that struggles with the dependency granularity? I feel so good

Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-21 Thread Tomas Vondra
On 21 Leden 2012, 18:13, Magnus Hagander wrote: On Sat, Jan 21, 2012 at 18:02, Tomas Vondra t...@fuzzy.cz wrote: On 20 Leden 2012, 13:23, Magnus Hagander wrote: I'm wondering if this (and also my deadlocks stats patch that's int he queue) should instead of inventing new pgstats messages, add

Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-26 Thread Tomas Vondra
On 26 Leden 2012, 14:44, Magnus Hagander wrote: On Sat, Jan 21, 2012 at 20:12, Tomas Vondra t...@fuzzy.cz wrote: On 21 Leden 2012, 18:13, Magnus Hagander wrote: On Sat, Jan 21, 2012 at 18:02, Tomas Vondra t...@fuzzy.cz wrote: Though I just looked at the tabstat code again, and we already split

Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-03 Thread Tomas Vondra
On 3 Únor 2012, 17:12, Robert Haas wrote: On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra t...@fuzzy.cz wrote: Fixed. The default value of TIMING option did not work as intended, it was set to true even for plain EXPLAIN

Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-03 Thread Tomas Vondra
On 3.2.2012 18:28, Robert Haas wrote: On Fri, Feb 3, 2012 at 12:08 PM, Tomas Vondra t...@fuzzy.cz wrote: I don't think changing the EXPLAIN syntax this way is a good idea. I think that one option should not silently enable/disable others, so ROWS should not enable ANALYZE automatically. I

Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-02-03 Thread Tomas Vondra
On 3.2.2012 22:51, Robert Haas wrote: On Fri, Feb 3, 2012 at 2:56 PM, Tomas Vondra t...@fuzzy.cz wrote: OK, thanks for the explanation. I don't like the idea of subsets as it IMHO makes it less obvious what options are enabled. For example this EXPLAIN (ROWS) query... does not immediately

Re: [HACKERS] performance-test farm

2012-03-05 Thread Tomas Vondra
On 12.5.2011 08:54, Greg Smith wrote: Tomas Vondra wrote: The idea is that buildfarm systems that are known to have a) reasonable hardware and b) no other concurrent work going on could also do performance tests. The main benefit of this approach is it avoids duplicating all of the system

[HACKERS] patch for a locale-specific bug in regression tests (REL9_1_STABLE)

2012-03-06 Thread Tomas Vondra
Hi, I've noticed a locale-specific bug in regression tests, I discovered thanks to the new magpie buildfarm member (testing cs_CZ locale). The problem is in foreign_data where the output is sorted by a column, and cs_CZ behaves differently from C and en_US. More precisely, in C it's true that

Re: [HACKERS] patch for a locale-specific bug in regression tests (REL9_1_STABLE)

2012-03-07 Thread Tomas Vondra
On 7.3.2012 17:56, Robert Haas wrote: On Tue, Mar 6, 2012 at 1:59 PM, Tomas Vondra t...@fuzzy.cz wrote: I've noticed a locale-specific bug in regression tests, I discovered thanks to the new magpie buildfarm member (testing cs_CZ locale). The problem is in foreign_data where the output

Re: [HACKERS] patch for a locale-specific bug in regression tests (REL9_1_STABLE)

2012-03-07 Thread Tomas Vondra
On 7.3.2012 21:39, Robert Haas wrote: On Wed, Mar 7, 2012 at 3:08 PM, Tomas Vondra t...@fuzzy.cz wrote: On 7.3.2012 17:56, Robert Haas wrote: On Tue, Mar 6, 2012 at 1:59 PM, Tomas Vondra t...@fuzzy.cz wrote: I've noticed a locale-specific bug in regression tests, I discovered thanks

[HACKERS] proposal : cross-column stats

2010-12-11 Thread Tomas Vondra
Hi everyone, one of the ssesion I've attended on PgDay last week was Heikki's session about statistics in PostgreSQL. One of the issues he mentioned (and one I regularly run into) is the absence of cross-column stats. When the columns are not independent, this usually result in poor estimates

Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Hi! Dne 12.12.2010 15:17, Martijn van Oosterhout napsal(a): Lets talk about one special case - I'll explain how the proposed solution works, and then I'll explain how to make it more general, what improvements are possible, what issues are there. Anyway this is by no means a perfect or

Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 12.12.2010 15:43, Heikki Linnakangas napsal(a): The classic failure case has always been: postcodes and city names. Strongly correlated, but in a way that the computer can't easily see. Yeah, and that's actually analogous to the example I used in my presentation. The way I think of

Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 12.12.2010 17:33, Florian Pflug napsal(a): On Dec12, 2010, at 15:43 , Heikki Linnakangas wrote: The way I think of that problem is that once you know the postcode, knowing the city name doesn't add any information. The postcode implies the city name. So the selectivity for postcode = ?

Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 13.12.2010 01:05, Robert Haas napsal(a): This is a good idea, but I guess the question is what you do next. If you know that the applicability is 100%, you can disregard the restriction clause on the implied column. And if it has no implicatory power, then you just do what we do now.

Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
P(A|B) = P(A and B) / P(B). Well, until this point we've discussed failure cases involving 'AND' conditions. What about 'OR' conditions? I think the current optimizer computes the selectivity as 's1+s2 - s1*s2' (at least that's what I found in backend/optimizer/path/clausesel.c:630). Sometimes

Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 13.12.2010 03:00, Robert Haas napsal(a): Well, the question is what data you are actually storing. It's appealing to store a measure of the extent to which a constraint on column X constrains column Y, because you'd only need to store O(ncolumns^2) values, which would be reasonably

Re: [HACKERS] proposal : cross-column stats

2010-12-13 Thread Tomas Vondra
Dne 13.12.2010 16:34, Tom Lane napsal(a): Tomas Vondra t...@fuzzy.cz writes: Well, until this point we've discussed failure cases involving 'AND' conditions. What about 'OR' conditions? I think the current optimizer computes the selectivity as 's1+s2 - s1*s2' (at least that's what I found

Re: [HACKERS] proposal : cross-column stats

2010-12-13 Thread Tomas Vondra
Dne 13.12.2010 16:38, Tom Lane napsal(a): The reason that this wasn't done years ago is precisely that nobody's figured out how to do it with a tolerable amount of stats data and a tolerable amount of processing time (both at ANALYZE time and during query planning). It's not hard to see what

Re: [HACKERS] proposal : cross-column stats

2010-12-13 Thread Tomas Vondra
Dne 13.12.2010 18:59, Joshua Tolley napsal(a): On Sun, Dec 12, 2010 at 07:10:44PM -0800, Nathan Boley wrote: Another quick note: I think that storing the full contingency table is wasteful since the marginals are already stored in the single column statistics. Look at copulas [2] ( FWIW I

Re: [HACKERS] proposal : cross-column stats

2010-12-13 Thread Tomas Vondra
Dne 13.12.2010 22:50, Josh Berkus napsal(a): Tomas, (a) find out what statistics do we need to collect and how to use it (b) implement a really stupid inefficient solution (c) optimize in iterations, i.e. making it faster, consuming less space etc. I'll suggest again how to

Re: [HACKERS] proposal : cross-column stats

2010-12-17 Thread Tomas Vondra
Dne 12.12.2010 15:43, Heikki Linnakangas napsal(a): On 12.12.2010 15:17, Martijn van Oosterhout wrote: On Sun, Dec 12, 2010 at 03:58:49AM +0100, Tomas Vondra wrote: Very cool that you're working on this. +1 Lets talk about one special case - I'll explain how the proposed solution works

Re: [HACKERS] proposal : cross-column stats

2010-12-17 Thread Tomas Vondra
Hi, I've read about 10 papers about estimation this week, and I have gained some insight. I'm not going to describe each of the papers here, I plan to set up a wiki page about cross column statistics http://wiki.postgresql.org/wiki/Cross_Columns_Stats and I'll put the list of papers and some

Re: [HACKERS] proposal : cross-column stats

2010-12-17 Thread Tomas Vondra
Dne 17.12.2010 19:58, Robert Haas napsal(a): I haven't read the paper yet (sorry) but just off the top of my head, one possible problem here is that our n_distinct estimates aren't always very accurate, especially for large tables. As we've discussed before, making them accurate requires

Re: [HACKERS] proposal : cross-column stats

2010-12-17 Thread Tomas Vondra
Dne 17.12.2010 22:24, Tom Lane napsal(a): That seems likely to be even more unreliable than our existing single-column estimates :-( regards, tom lane Well, yes :-( I guess this is a place where we could use a multi-column index, if it contains all the interesting

Re: [HACKERS] proposal : cross-column stats

2010-12-17 Thread Tomas Vondra
Dne 17.12.2010 22:41, Heikki Linnakangas napsal(a): On 17.12.2010 23:13, Tomas Vondra wrote: Dne 17.12.2010 19:58, Robert Haas napsal(a): I haven't read the paper yet (sorry) but just off the top of my head, one possible problem here is that our n_distinct estimates aren't always very

Re: [HACKERS] proposal : cross-column stats

2010-12-18 Thread Tomas Vondra
Dne 18.12.2010 16:59, Florian Pflug napsal(a): On Dec18, 2010, at 08:10 , t...@fuzzy.cz wrote: On Dec17, 2010, at 23:12 , Tomas Vondra wrote: Well, not really - I haven't done any experiments with it. For two columns selectivity equation is (dist(A) * sel(A) + dist(B) * sel(B)) / (2

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-18 Thread Tomas Vondra
Dne 12.12.2010 03:47, Robert Haas napsal(a): On Sat, Dec 11, 2010 at 4:40 PM, t...@fuzzy.cz wrote: Hello you have to respect pg coding style: a) not too long lines b) not C++ line comments OK, thanks for the notice. I've fixed those two problems. Please add this to the currently-open

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-19 Thread Tomas Vondra
Dne 19.12.2010 17:26, Tom Lane napsal(a): That seems like quite a bizarre definition. What I was envisioning was that we'd track only the time of the last whole-database stats reset. Well, but that does not quite work. I need is to know whether the snapshot is 'consistent' i.e. whether I can

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-19 Thread Tomas Vondra
Dne 19.12.2010 19:13, Jim Nasby napsal(a): Is there a reason this info needs to be tracked in the stats table? I know it's the most obvious place, but since we're worried about the size of it, what about tracking it in pg_class or somewhere else? I guess this is the best place for this kind of

Re: [HACKERS] proposal : cross-column stats

2010-12-19 Thread Tomas Vondra
Dne 19.12.2010 21:21, Simon Riggs napsal(a): On Mon, 2010-12-13 at 10:38 -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra t...@fuzzy.cz wrote: The proposed solution is based on contingency tables, built for selected groups

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-19 Thread Tomas Vondra
Dne 19.12.2010 20:28, Tom Lane napsal(a): Tomas Vondra t...@fuzzy.cz writes: Dne 19.12.2010 17:26, Tom Lane napsal(a): That seems like quite a bizarre definition. What I was envisioning was that we'd track only the time of the last whole-database stats reset. Well, but that does not quite

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-19 Thread Tomas Vondra
Dne 19.12.2010 23:58, Tom Lane napsal(a): Tomas Vondra t...@fuzzy.cz writes: Plus I won't have time to write the other patch for at least a week, so let's see whether there are serious objections agains the current patch. If you think this objection is not serious, you're mistaken. I know

Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread Tomas Vondra
Dne 21.12.2010 16:54, Florian Pflug napsal(a): Hmmm, maybe we could give this possibility (to identify two separate groups of columns) to the user. So instead of 'buid stats for (A,B,C)' the user would say 'build stats for (A,B) and (C)' - this actually represents apriori knowledge of

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-23 Thread Tomas Vondra
Dne 20.12.2010 00:03, Tom Lane napsal(a): I wrote: That is not the number of interest. The number of interest is that it's 8 bytes added onto a struct that currently contains 11 of 'em; in other words a 9% increase in the size of the stats file, and consequently about a 9% increase in the

Re: [HACKERS] proposal : cross-column stats

2010-12-23 Thread Tomas Vondra
Dne 21.12.2010 16:54, Florian Pflug napsal(a): I think that maybe it'd be acceptable to scan a large portion of the table to estimate dist(A,B), *if* we must only do so only once in a while. But even with a full scan, how would we arrive at an estimate for dist(A,B)? Keeping all values in

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-23 Thread Tomas Vondra
Dne 23.12.2010 20:09, Robert Haas napsal(a): 2010/12/23 Tomas Vondra t...@fuzzy.cz: Dne 20.12.2010 00:03, Tom Lane napsal(a): I wrote: That is not the number of interest. The number of interest is that it's 8 bytes added onto a struct that currently contains 11 of 'em; in other words a 9

Re: [HACKERS] proposal : cross-column stats

2010-12-23 Thread Tomas Vondra
Dne 21.12.2010 19:03, Tomas Vondra napsal(a): My plan for the near future (a few weeks) is to build a simple 'module' with the ability to estimate the number of rows for a given condition. This could actually be quite useful as a stand-alone contrib module, as the users often ask how to get

Re: [HACKERS] proposal : cross-column stats

2010-12-24 Thread Tomas Vondra
Dne 24.12.2010 13:15, t...@fuzzy.cz napsal(a): 2010/12/24 Florian Pflug f...@phlo.org: On Dec23, 2010, at 20:39 , Tomas Vondra wrote: I guess we could use the highest possible value (equal to the number of tuples) - according to wiki you need about 10 bits per element with 1% error

Re: [HACKERS] proposal : cross-column stats

2010-12-24 Thread Tomas Vondra
Dne 24.12.2010 13:37, Florian Pflug napsal(a): On Dec24, 2010, at 11:23 , Nicolas Barbier wrote: 2010/12/24 Florian Pflug f...@phlo.org: On Dec23, 2010, at 20:39 , Tomas Vondra wrote: I guess we could use the highest possible value (equal to the number of tuples) - according to wiki

Re: [HACKERS] proposal : cross-column stats

2010-12-24 Thread Tomas Vondra
Dne 24.12.2010 04:41, Florian Pflug napsal(a): The filter size could be derived from the table's statistics target, or be otherwise user-definable. We could also auto-resize once it gets too full. But still, that all seems awfully complex :-( Using a statistics target is a good idea I think. I

Re: [HACKERS] proposal : cross-column stats

2010-12-27 Thread Tomas Vondra
Dne 24.12.2010 13:37, Florian Pflug napsal(a): On Dec24, 2010, at 11:23 , Nicolas Barbier wrote: 2010/12/24 Florian Pflug f...@phlo.org: On Dec23, 2010, at 20:39 , Tomas Vondra wrote: I guess we could use the highest possible value (equal to the number of tuples) - according to wiki

[HACKERS] estimating # of distinct values

2010-12-27 Thread Tomas Vondra
Hi everyone, about two weeks ago I've started a thread about cross-column stats. One of the proposed solutions is based on number of distinct values, and the truth is the current distinct estimator has some problems. I've done some small research - I have read about 20 papers on this, and I'd

Re: [HACKERS] estimating # of distinct values

2010-12-27 Thread Tomas Vondra
Dne 27.12.2010 22:46, Robert Haas napsal(a): 2010/12/27 Tomas Vondra t...@fuzzy.cz: But even though these disadvantages, there really is no other way to enhance the estimates. I don't think this should be a default behavior - just as in case of cross-column stats this should

Re: [HACKERS] estimating # of distinct values

2010-12-27 Thread Tomas Vondra
Dne 28.12.2010 00:04, Kevin Grittner napsal(a): Tom Lane t...@sss.pgh.pa.us wrote: Well, first, those scans occur only once every few hundred million transactions, which is not likely a suitable timescale for maintaining statistics. I was assuming that the pass of the entire table was

Re: [HACKERS] estimating # of distinct values

2010-12-30 Thread Tomas Vondra
Dne 30.12.2010 15:43, Florian Pflug napsal(a): On Dec27, 2010, at 23:49 , Kevin Grittner wrote: Robert Haas robertmh...@gmail.com wrote: With respect to (b), I think I'd need to see a much more detailed design for how you intend to make this work. Off the top of my head there seems to be

Re: [HACKERS] estimating # of distinct values

2011-01-07 Thread Tomas Vondra
Dne 7.1.2011 20:56, Jim Nasby napsal(a): On Jan 7, 2011, at 5:32 AM, t...@fuzzy.cz wrote: Another thing I'm not sure about is where to store those intermediate stats (used to get the current estimate, updated incrementally). I was thinking about pg_stats but I'm not sure it's the right place -

Re: [HACKERS] estimating # of distinct values

2011-01-15 Thread Tomas Vondra
Hi, a short update regarding the ndistinct stream estimators - I've implemented the estimators described in the papers I've mentioned in my previous posts. If you want try it, the sources are available at github, at http://tvondra.github.com/pg_estimator (ignore the page, I have to update it,

Re: [HACKERS] estimating # of distinct values

2011-01-17 Thread Tomas Vondra
Dne 9.1.2011 13:58, Jim Nasby napsal(a): A resource fork? Not sure what you mean, could you describe it in more detail? Ooops, resource forks are a filesystem thing; we call them relation forks. From src/backend/storage/smgr/README: OK, I think using relation forks seems like a good

Re: [HACKERS] estimating # of distinct values

2011-01-18 Thread Tomas Vondra
Dne 18.1.2011 18:12, Robert Haas napsal(a): On Tue, Jan 18, 2011 at 4:53 AM, t...@fuzzy.cz wrote: So the most important question is how to intercept the new/updated rows, and where to store them. I think each backend should maintain it's own private list of new records and forward them only

Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Tomas Vondra
Dne 19.1.2011 20:25, Robert Haas napsal(a): On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Yes, I was aware of this problem (amount of memory consumed with lots of updates), and I kind of hoped someone will come up with a reasonable solution. As far as I can see

Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Tomas Vondra
Dne 19.1.2011 20:46, Tom Lane napsal(a): Robert Haas robertmh...@gmail.com writes: ... I guess I'm just saying I'd think really, really hard before abandoning the idea of periodic sampling. You have to get an awful lot of benefit out of those cross-column stats to make it worth paying a

Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Tomas Vondra
Dne 19.1.2011 23:44, Nathan Boley napsal(a): 1) The distribution of values in a table is rarely pathological, and usually follows one of several common patterns. ( IOW, we have good heuristics ) Not true. You're missing the goal of this effort - to get ndistinct estimate for combination of

Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra
Dne 20.1.2011 03:06, Nathan Boley napsal(a): And actually it does not depend on ndistinct for the columns only, it depends on ndistinct estimates for the combination of columns. So improving the ndistinct estimates for columns is just a necessary first step (and only if it works reasonably

Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra
Dne 20.1.2011 03:36, Robert Haas napsal(a): On Wed, Jan 19, 2011 at 5:13 PM, Tomas Vondra t...@fuzzy.cz wrote: Regarding the crash scenario - if the commit fails, just throw away the local estimator copy, it's not needed. I'm not sure how to take care of the case when commit succeeds

Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra
Dne 20.1.2011 09:10, Heikki Linnakangas napsal(a): It seems that the suggested multi-column selectivity estimator would be more sensitive to ndistinct of the individual columns. Is that correct? How is it biased? If we routinely under-estimate ndistinct of individual columns, for example, does

Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra
Dne 20.1.2011 11:05, Csaba Nagy napsal(a): Hi Tomas, On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote: No, the multi-column statistics do not require constant updating. There are cases where a sampling is perfectly fine, although you may need a bit larger sample. Generally if you can

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2011-02-03 Thread Tomas Vondra
Dne 30.1.2011 23:22, Robert Haas napsal(a): On Thu, Dec 23, 2010 at 2:41 PM, Tomas Vondra t...@fuzzy.cz wrote: OK, so here goes the simplified patch - it tracks one reset timestamp for a background writer and for each database. I think you forgot the attachment. Yes, I did. Thanks

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2011-02-04 Thread Tomas Vondra
Dne 4.2.2011 03:37, Greg Smith napsal(a): Thinking I should start with why I think this patch is neat...most of the servers I deal with are up 24x7 minus small amounts of downtime, presuming everyone does their job right that is. In that environment, having a starting timestamp for when the

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2011-02-06 Thread Tomas Vondra
On 6.2.2011 08:17, Greg Smith wrote: Below is what changed since the last posted version, mainly as feedback for Tomas: -Explained more clearly that pg_stat_reset and pg_stat_reset_single_counters will both touch the database reset time, and that it's initialized upon first connection to

Re: [HACKERS] autovacuum stress-testing our system

2012-11-18 Thread Tomas Vondra
Hi! On 26.9.2012 19:18, Jeff Janes wrote: On Wed, Sep 26, 2012 at 9:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012: On 26-09-2012 09:43, Tomas Vondra wrote: 5) splitting

Re: [HACKERS] autovacuum stress-testing our system

2012-11-18 Thread Tomas Vondra
On 26.9.2012 18:29, Tom Lane wrote: What seems to me like it could help more is fixing things so that the autovac launcher needn't even launch a child process for databases that haven't had any updates lately. I'm not sure how to do that, but it probably involves getting the stats collector

Re: [HACKERS] pg_trgm partial-match

2012-11-18 Thread Tomas Vondra
a seq scan (as expected). Do you expect to update the docs too? IMHO it's worth mentioning that the pg_trgm can handle even patterns shorter than 2 chars ... regards Tomas Vondra -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http

Re: [HACKERS] too much pgbench init output

2012-11-19 Thread Tomas Vondra
On 19.11.2012 11:59, Jeevan Chalke wrote: Hi, I gone through the discussion for this patch and here is my review: The main aim of this patch is to reduce the number of log lines. It is also suggested to use an options to provide the interval but few of us are not much agree on it. So

Re: [HACKERS] WIP: index support for regexp search

2012-11-19 Thread Tomas Vondra
On 19.11.2012 22:58, Alexander Korotkov wrote: Hi! New version of patch is attached. Changes are following: 1) Right way to convert from pg_wchar to multibyte. 2) Optimization of producing CFNA-like graph on trigrams (produce smaller, but equivalent, graphs in less time). 3) Comments and

Re: [HACKERS] WIP: store additional info in GIN index

2012-12-01 Thread Tomas Vondra
On 18.11.2012 22:54, Alexander Korotkov wrote: Hackers, Patch completely changes storage in posting lists and leaf pages of posting trees. It uses varbyte encoding for BlockNumber and OffsetNumber. BlockNumber are stored incremental in page. Additionally one bit of OffsetNumber is reserved

Re: [HACKERS] autovacuum stress-testing our system

2012-12-02 Thread Tomas Vondra
On 21.11.2012 19:02, Robert Haas wrote: On Sun, Nov 18, 2012 at 5:49 PM, Tomas Vondra t...@fuzzy.cz wrote: The two main changes are these: (1) The stats file is split into a common db file, containing all the DB Entries, and per-database files with tables/functions. The common file

Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Tomas Vondra
On 4.12.2012 20:12, Alexander Korotkov wrote: Hi! On Sun, Dec 2, 2012 at 5:02 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: I've tried to apply the patch with the current HEAD, but I'm getting segfaults whenever VACUUM runs (either called directly or from autovac

Re: [HACKERS] WIP: store additional info in GIN index

2012-12-05 Thread Tomas Vondra
On 5.12.2012 09:10, Alexander Korotkov wrote: On Wed, Dec 5, 2012 at 1:56 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: Thanks for bug report. It is fixed in the attached patch. Hi, I gave it another try and this time it went fine - I didn't get any segfault when loading

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-06 Thread Tomas Vondra
On 6.12.2012 05:47, Shigeru Hanada wrote: On Mon, Nov 12, 2012 at 4:36 AM, Tomas Vondra t...@fuzzy.cz wrote: Hi, I've prepared a slightly updated patch, based on the previous review. See it attached. All changes in v3 patch seem good, however I found some places which requires cosmetic

Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread Tomas Vondra
Hi, On 6.12.2012 23:45, MauMau wrote: From: Tom Lane t...@sss.pgh.pa.us Well, that's unfortunate, but it's not clear that automatic recovery is possible. The only way out of it would be if an undamaged copy of the segment was in pg_xlog/ ... but if I recall the logic correctly, we'd not

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-08 Thread Tomas Vondra
On 8.12.2012 15:26, Andres Freund wrote: On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote: I've re-run the tests with the current patch on my home workstation, and the results are these (again 10k tables, dropped either one-by-one or in batches of 100). 1) unpatched dropping one-by-one

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-08 Thread Tomas Vondra
On 8.12.2012 15:49, Tomas Vondra wrote: On 8.12.2012 15:26, Andres Freund wrote: On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote: I've re-run the tests with the current patch on my home workstation, and the results are these (again 10k tables, dropped either one-by-one or in batches of 100

Re: [HACKERS] review: pgbench - aggregation of info written into log

2012-12-08 Thread Tomas Vondra
On 8.12.2012 16:33, Andres Freund wrote: Hi Tomas, On 2012-11-27 14:55:59 +0100, Pavel Stehule wrote: attached is a v4 of the patch. There are not many changes, mostly some simple tidying up, except for handling the Windows. After a quick look I am not sure what all the talk about windows

Re: [HACKERS] review: pgbench - aggregation of info written into log

2012-12-08 Thread Tomas Vondra
Hi, attached is a v5 of this patch. Details below: On 8.12.2012 16:33, Andres Freund wrote: Hi Tomas, On 2012-11-27 14:55:59 +0100, Pavel Stehule wrote: attached is a v4 of the patch. There are not many changes, mostly some simple tidying up, except for handling the Windows. After a

Re: [HACKERS] too much pgbench init output

2012-12-08 Thread Tomas Vondra
On 20.11.2012 08:22, Jeevan Chalke wrote: Hi, On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: On 19.11.2012 11:59, Jeevan Chalke wrote: Hi, I gone through the discussion for this patch and here is my review

Re: [HACKERS] New statistics for WAL buffer dirty writes

2012-12-09 Thread Tomas Vondra
On 29.10.2012 04:58, Satoshi Nagayasu wrote: 2012/10/24 1:12, Alvaro Herrera wrote: Satoshi Nagayasu escribi�: With this patch, walwriter process and each backend process would sum up dirty writes, and send it to the stat collector. So, the value could be saved in the stat file, and could be

Re: [HACKERS] CommitFest #3 and upcoming schedule

2012-12-09 Thread Tomas Vondra
On 9.12.2012 16:56, Simon Riggs wrote: On 16 November 2012 07:20, Greg Smith g...@2ndquadrant.com wrote: Project guidelines now ask each patch submitter to review patches of the same number and approximate complexity as they submit. If you've submitted some items to the CommitFest, please

Re: [HACKERS] CommitFest #3 and upcoming schedule

2012-12-09 Thread Tomas Vondra
On 9.12.2012 22:41, Jeff Janes wrote: On Sun, Dec 9, 2012 at 10:47 AM, Tomas Vondra t...@fuzzy.cz wrote: IMHO many of the patches that are currently marked as needs review and have no reviewers, were actually reviewed or are being discussed thoroughly on the list, but this information

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-10 Thread Tomas Vondra
Dne 10.12.2012 16:38, Andres Freund napsal: On 2012-12-08 17:07:38 +0100, Tomas Vondra wrote: I've done some test and yes - once there are other objects the optimization falls short. For example for tables with one index, it looks like this: 1) unpatched one by one: 28.9 s 100 batches

Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-16 Thread Tomas Vondra
On 8.12.2012 03:08, Jeff Janes wrote: On Thu, Dec 6, 2012 at 3:52 PM, Tomas Vondra t...@fuzzy.cz wrote: Hi, On 6.12.2012 23:45, MauMau wrote: From: Tom Lane t...@sss.pgh.pa.us Well, that's unfortunate, but it's not clear that automatic recovery is possible. The only way out of it would

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-16 Thread Tomas Vondra
Hi, I've updated the patch to include the optimization described in the previous post, i.e. if the number of relations is below a certain threshold, use a simple for loop, for large numbers of relations use bsearch calls. This is done by a new constant BSEARCH_LIMIT, which is set to 10 in the

Re: [HACKERS] too much pgbench init output

2012-12-16 Thread Tomas Vondra
interval I'm still not convinced there should be yet another know for tuning the log interval - opinions? Tomas On 11.12.2012 10:23, Jeevan Chalke wrote: On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: On 20.11.2012 08:22, Jeevan Chalke wrote

Re: [HACKERS] system administration functions with hardcoded superuser checks

2012-12-18 Thread Tomas Vondra
On 18.12.2012 18:38, Pavel Stehule wrote: 2012/12/18 Peter Eisentraut pete...@gmx.net: There are some system administration functions that have hardcoded superuser checks, specifically: pg_reload_conf pg_rotate_logfile Some of these are useful in monitoring or maintenance tools, and the

Re: [HACKERS] too much pgbench init output

2012-12-19 Thread Tomas Vondra
On 19.12.2012 06:30, Jeevan Chalke wrote: On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: Hi, attached is a new version of the patch that (a) converts the 'log_step_seconds' variable to a constant (and does not allow

Re: [HACKERS] system administration functions with hardcoded superuser checks

2012-12-19 Thread Tomas Vondra
On 19.12.2012 07:34, Magnus Hagander wrote: On Wed, Dec 19, 2012 at 1:58 AM, Tomas Vondra t...@fuzzy.cz wrote: On 18.12.2012 18:38, Pavel Stehule wrote: 2012/12/18 Peter Eisentraut pete...@gmx.net: There are some system administration functions that have hardcoded superuser checks

[HACKERS] strange OOM errors with EXECUTE in PL/pgSQL

2012-12-19 Thread Tomas Vondra
Hi, one of our local users reported he's getting OOM errors on 9.2, although on 9.1 the code worked fine. Attached is a simple test-case that should give you an OOM error almost immediately. What it does: 1) creates a simple table called test with one text column. 2) creates a plpgsql function

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-19 Thread Tomas Vondra
On 19.12.2012 02:18, Andres Freund wrote: On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote: I think except of the temp buffer issue mentioned below its ready. -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode) +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes

Re: [HACKERS] strange OOM errors with EXECUTE in PL/pgSQL

2012-12-19 Thread Tomas Vondra
On 20.12.2012 02:29, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: What it does: 1) creates a simple table called test with one text column. 2) creates a plpgsql function with one parameter, and all that function does is passing the parameter to EXECUTE 3) calls the function

  1   2   3   4   5   6   7   8   9   10   >