date:20131211

Re: [HACKERS] In-Memory Columnar Store

2013-12-11 Thread Pavel Stehule

it is interesting idea. For me, a significant information from comparation,
so we do some significantly wrong. Memory engine should be faster
naturally, but I don't tkink it can be 1000x.

Yesterday we did a some tests, that shows so for large tables (5G)a our
hashing is not effective. Disabling hash join and using merge join
increased speed 2x
Dne 9. 12. 2013 20:41 "knizhnik"  napsal(a):
>
> Hello!
>
> I want to annouce my implementation of In-Memory Columnar Store extension
for PostgreSQL:
>
>  Documentation: http://www.garret.ru/imcs/user_guide.html
>  Sources: http://www.garret.ru/imcs-1.01.tar.gz
>
> Any feedbacks, bug reports and suggestions are welcome.
>
> Vertical representation of data is stored in PostgreSQL shared memory.
> This is why it is important to be able to utilize all available physical
memory.
> Now servers with Tb or more RAM are not something exotic, especially in
financial world.
> But there is limitation in Linux with standard 4kb pages  for maximal
size of mapped memory segment: 256Gb.
> It is possible to overcome this limitation either by creating multiple
segments - but it requires too much changes in PostgreSQL memory manager.
> Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in
the system).
>
> I found several messages related with MAP_HUGETLB flag, the most recent
one was from 21 of November:
> http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org
>
> I wonder what is the current status of this patch?
>
>
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql_check_function - rebase for 9.3

2013-12-11 Thread Pavel Stehule

I dislike it. a too early check means a issues with temporary tables and
mainy new dependency between functions in complex projects. It is some what
we don't want.
Dne 12. 12. 2013 5:30 "Amit Kapila"  napsal(a):

> On Wed, Dec 11, 2013 at 10:40 AM, Tom Lane  wrote:
> > Amit Kapila  writes:
> >> On Tue, Dec 10, 2013 at 12:15 PM, Pavel Stehule <
> pavel.steh...@gmail.com> wrote:
> >>> Now, PG has no any tool for checking dependency between functions and
> other
> >>> objects.
> >
> >> Isn't that already done for SQL function's (fmgr_sql_validator)?
> >
> > Pavel's point is that the only way to find out if the validator will fail
> > is to run it and see if it fails; and even if it does, how much will you
> > know about why?
>
>One of the important thing at time of function creation, users are
> interested in knowing
>is that if there are any objects (table/view/sequence ..) that are
> used in function body
>and are missing and the reason is I think they don't want such
> things to come up during execution.
>
>Similar thing happens for prepared statements in PostgreSQL, like
> at time of parse message
>only it checks both syntax errors and semantic check (which ensures
> statement is meaningful,
>for ex. whether objects and columns used in the statements exist)
>
>Like we do checks other than syntax check at time of creation of
> prepared statement, same
>thing should be considered meaning full at time of function creation.
>
>As you mentioned, there are checks (like dependency, mutual
> recursion) which are difficult or not
>feasible in current design to perform, but so will be the case for
> them to execute during first execution
>of function. So is it not better to do what is more feasible during
> function creation rather than leaving
>most of the things at execution phase?
>
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

Re: [HACKERS] pgbench with large scale factor

2013-12-11 Thread Tatsuo Ishii

Fabien,

>> Included is a proposed fix for this (also fixing weired "remaining"
>> part). If there's no objection, I will commit it.
> 
> Looks ok, but I would consider switching to "double" instead of
> "int64".

Assuming you are talking about "remaining sec" part, I agree. Here is
the revised patch.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 2c96fae..00374d8 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -1720,11 +1720,11 @@ init(bool is_no_vacuum)
INSTR_TIME_SUBTRACT(diff, start);
 
elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
-   remaining_sec = (scale * naccounts - j) * elapsed_sec / 
j;
+   remaining_sec = ((double)scale * naccounts - j) * 
elapsed_sec / j;
 
fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " 
tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
j, (int64) naccounts * scale,
-   (int) (((int64) j * 100) / (naccounts * 
scale)),
+   (int) (((int64) j * 100) / (naccounts * 
(int64)scale)),
elapsed_sec, remaining_sec);
}
/* let's not call the timing for each row, but only each 100 
rows */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Case sensitivity

2013-12-11 Thread Dev Kumkar

+ hackers


On Thu, Dec 12, 2013 at 12:34 PM, Dev Kumkar wrote:

> On Wed, Dec 11, 2013 at 9:47 PM, Dev Kumkar wrote:
>
>> Actually for searches lower will work.
>> But the other important aspect is 'inserts' which would result 2 rows if
>> the values are 'A' and 'a'. Intent here to have it case insensitive.
>>
>> If CITEXT it will update the same row and works.
>> CITEXT is an alternative but was wondering if there is any other
>> alternate solution/setting while creating database.
>>
>> Also does CITEXT fetch via JDBC works the same way as fetch/set string
>> values? Any quick comments here.
>>
>> http://techie-experience.blogspot.in/2013/04/hibernate-supporting-case-insensitive.html
>>
>> Regards...
>>
>
> Also if the key columns are CITEXT is there any performance issues on
> indexes?
>

Re: [HACKERS] pgbench with large scale factor

2013-12-11 Thread Fabien COELHO



Hello Tatsuo,


BTW, I saw this with 9.3.2's pgbench:

23930 of 38 tuples (-48%) done (elapsed 226.86 s, remaining -696.10 
s).

-48% does not seem to be quite correct to me...


Included is a proposed fix for this (also fixing weired "remaining"
part). If there's no objection, I will commit it.


Looks ok, but I would consider switching to "double" instead of "int64".

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-12-11 Thread Haribabu kommi

On 06 December 2013 11:57 Amit Kapila wrote:
> On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi
>  wrote:
> > On 29 November 2013 12:00 Amit Kapila wrote:
> >> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
> >>  wrote:
> >> Few questions about your latest patch:
> >> a. Is there any reason why you are doing estimation of dead tuples
> >> only for Autovacuum and not for Vacuum.
> >
> > No, changed.
> >
> >> /* clear and get the new stats for calculating proper dead tuples */
> >> pgstat_clear_snapshot(); tabentry =
> >> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
> >> b. In the above code, to get latest data you are first clearing
> >> snapshot and then calling pgstat function. It will inturn perform
> I/O
> >> (read of stats file) and send/receive message from stats
> >> collector to ensure it can read latest data. I think it will add
> overhead
> >> to Vacuum, especially if 'nkeep' calculated in function
> >> lazy_scan_heap() can serve the purpose. In my simple test[1], I
> >> observed
> >> that value of keep can serve the purpose.
> >>
> >> Can you please once try the test on 'nkeep' approach patch.
> >
> > Using the nkeep and snapshot approach, I ran the test for 40 mins
> with
> > a high analyze_threshold and results are below.
> >
> > Auto vacuum count   Bloat size
> > Master   11  220MB
> > Patched_nkeep  14  215MB
> > Patched_snapshot   18  198MB
> >
> > Both the approaches are showing good improvement in the test.
> > Updated patches, test script and configuration is attached in the
> mail.
> 
> I think updating dead tuple count using nkeep is good idea as similar
> thing is done for Analyze as well in acquire_sample_rows().
> One minor point, I think it is better to log dead tuples is below error
> message:
> ereport(LOG,
> (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
> "pages: %d removed, %d remain\n"
> "tuples: %.0f removed, %.0f remain\n"
> 
> "tuples: %.0f removed, %.0f remain, %.0f dead\n"
> 
> 
> About your test, how to collect the data by running this script, are
> you manually stopping it after 40 mins, because I ran it for more than
> an hour, the final result didn't came.
> As I mentioned you last time, please simplify your test, for other
> person in its current form, it is difficult to make meaning out of it.
> Write comments on top of it in steps form to explain what exactly it is
> doing and how to take data using it (for example, do I need to wait,
> till script ends; how long this test can take to complete).

A simplified test and updated patch by taking care the above comment are 
attached in the mail.
I am not able to reduce the test duration but changed as the test automatically 
exists after 45 mins run.
Please check vacuum_test.sh file more details for running the test.

  Auto vacuum count   Bloat size
Master   15220MB
Patched_nkeep18213MB

Please let me know your suggestions.

Regards,
Hari babu.



vacuum_fix_v7_nkeep.patch
Description: vacuum_fix_v7_nkeep.patch


test_script.tar.gz
Description: test_script.tar.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-Memory Columnar Store

2013-12-11 Thread knizhnik


Thank you very much for reporting the problem.
And sorry for this bug and lack of negative tests.

Attempt to access unexisted value cause autoloading of data from the 
table to columnar store (because autoload property is enabled by default)
and as far as this entry is not present in the table, the code falls 
into infinite recursion.
Patched version of IMCS is available at 
http://www.garret.ru/imcs-1.01.tar.gz


I am going to place IMCS under version control now. Just looking for 
proper place for repository...



On 12/12/2013 04:06 AM, desmodemone wrote:




2013/12/9 knizhnik mailto:knizh...@garret.ru>>

Hello!

I want to annouce my implementation of In-Memory Columnar Store
extension for PostgreSQL:

 Documentation: http://www.garret.ru/imcs/user_guide.html
 Sources: http://www.garret.ru/imcs-1.01.tar.gz

Any feedbacks, bug reports and suggestions are welcome.

Vertical representation of data is stored in PostgreSQL shared memory.
This is why it is important to be able to utilize all available
physical memory.
Now servers with Tb or more RAM are not something exotic,
especially in financial world.
But there is limitation in Linux with standard 4kb pages  for
maximal size of mapped memory segment: 256Gb.
It is possible to overcome this limitation either by creating
multiple segments - but it requires too much changes in PostgreSQL
memory manager.
Or just set MAP_HUGETLB flag (assuming that huge pages were
allocated in the system).

I found several messages related with MAP_HUGETLB flag, the most
recent one was from 21 of November:
http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org

I wonder what is the current status of this patch?






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org

)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Hello,
   excellent work! I begin to do testing and it's very fast, 
by the way I found a strange case of "endless" query with CPU a 100%  
when the value used as filter does not exists:


I am testing with postgres 9.3.1 on debian  and I used default value 
for the extension except memory ( 512mb )


how to recreate the test case :

## create a table :

create table endless ( col1 int , col2 char(30) , col3 int ) ;

## insert some values:

insert into endless values ( 1, 'ahahahaha', 3);

insert into endless values ( 2, 'ghghghghg', 4);

## create the column store objects:

select cs_create('endless','col1','col2');
 cs_create
---

(1 row)

## try and test column store :

select cs_avg(col3) from  endless_get('ahahahaha');
 cs_avg

  3
(1 row)

select cs_avg(col3) from  endless_get('ghghghghg');
 cs_avg

  4
(1 row)

## now select with a value that does not exist :

select cs_avg(col3) from  endless_get('testing');

# and now  start to loop on cpu and seems to never ends , I had to 
terminate backend


Bye

Mat

Re: [HACKERS] pgbench with large scale factor

2013-12-11 Thread Tatsuo Ishii

> BTW, I saw this with 9.3.2's pgbench:
> 
> 23930 of 38 tuples (-48%) done (elapsed 226.86 s, remaining 
> -696.10 s).
> 
> -48% does not seem to be quite correct to me...

Included is a proposed fix for this (also fixing weired "remaining"
part). If there's no objection, I will commit it.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 2c96fae..28ab52f 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -1720,11 +1720,11 @@ init(bool is_no_vacuum)
 			INSTR_TIME_SUBTRACT(diff, start);
 
 			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
-			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
+			remaining_sec = ((int64)scale * naccounts - j) * elapsed_sec / j;
 
 			fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
 	j, (int64) naccounts * scale,
-	(int) (((int64) j * 100) / (naccounts * scale)),
+	(int) (((int64) j * 100) / (naccounts * (int64)scale)),
 	elapsed_sec, remaining_sec);
 		}
 		/* let's not call the timing for each row, but only each 100 rows */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-12-11 Thread Amit Kapila

On Thu, Dec 12, 2013 at 3:43 AM, Peter Eisentraut  wrote:
> This patch fails the regression tests; see attachment.

   Thanks for reporting the diffs. The reason for failures is that
still decoding for tuple is not done as mentioned in Notes section in
mail
   
(http://www.postgresql.org/message-id/caa4ek1jeuby16uwrda2tabkk+rlrl3giyyqy1mvh_6cthmd...@mail.gmail.com)

   However, to keep the sanity of patch, I will do that and post an
updated patch, but I think the main idea behind new approach at this
   point is to get feedback on if such an optimization is acceptable
for worst case scenarios and if not whether we can get this done
   with table level or GUC option.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2013-12-11 Thread Amit Kapila

On Wed, Dec 11, 2013 at 8:31 PM, MauMau  wrote:
>>> From: "Amit Kapila" 
>
> I re-considered that.  As you suggested, I think I'll do as follows.  Would
> this be OK?
>
> [pg_ctl.c]
> evtHandle = RegisterEventSource(NULL, *event_source ? event_source :
>"PostgreSQL " PG_MAJORVERSION);
>
>
> [guc.c]
> {"event_source", PGC_POSTMASTER, LOGGING_WHERE,
> ...
> "PostgreSQL " PG_MAJORVERSION,
> NULL, NULL, NULL

   This is similar to what I had in mind.

>
> [elog.c]
> Writing the default value in this file was redundant, because event_source
> cannot be NULL.  So change
>
>
> evtHandle = RegisterEventSource(NULL, event_source ? event_source :
> "PostgreSQL");
>
> to
>
> evtHandle = RegisterEventSource(NULL, event_source);

I think this change might not be safe as write_eventlog() gets called
from write_stderr() which might get called
before reading postgresql.conf, so the code as exists in PG might be
to handle that case or some other similar
situation where event_source is still not set. Have you confirmed in
all ways that it is never the case that
event_source is not set when the control reaches this function.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extra functionality to createuser

2013-12-11 Thread Amit Kapila

On Wed, Dec 11, 2013 at 6:23 PM, Robert Haas  wrote:
> On Tue, Dec 10, 2013 at 9:55 AM, Amit Kapila  wrote:
>> Okay, the new way for syntax suggested by Peter has simplified the problem.
>> Please find the updated patch and docs for multiple -g options.
>
> Committed.

Thank you.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql_check_function - rebase for 9.3

2013-12-11 Thread Amit Kapila

On Wed, Dec 11, 2013 at 10:40 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Tue, Dec 10, 2013 at 12:15 PM, Pavel Stehule  
>> wrote:
>>> Now, PG has no any tool for checking dependency between functions and other
>>> objects.
>
>> Isn't that already done for SQL function's (fmgr_sql_validator)?
>
> Pavel's point is that the only way to find out if the validator will fail
> is to run it and see if it fails; and even if it does, how much will you
> know about why?

   One of the important thing at time of function creation, users are
interested in knowing
   is that if there are any objects (table/view/sequence ..) that are
used in function body
   and are missing and the reason is I think they don't want such
things to come up during execution.

   Similar thing happens for prepared statements in PostgreSQL, like
at time of parse message
   only it checks both syntax errors and semantic check (which ensures
statement is meaningful,
   for ex. whether objects and columns used in the statements exist)

   Like we do checks other than syntax check at time of creation of
prepared statement, same
   thing should be considered meaning full at time of function creation.

   As you mentioned, there are checks (like dependency, mutual
recursion) which are difficult or not
   feasible in current design to perform, but so will be the case for
them to execute during first execution
   of function. So is it not better to do what is more feasible during
function creation rather than leaving
   most of the things at execution phase?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench with large scale factor

2013-12-11 Thread Tatsuo Ishii

>> I noticed that "pgbench -s scale_factor" where scale_factor is larger
>> than 20,000 (SCALE_32BIT_THRESHOLD) creates pgbench_accounts table containing
>> 0 row without any complain. Is there any reason for this?
> 
> Oops. It appeared that this was a bug prior 9.3 pgbench. Sorry for noise.

BTW, I saw this with 9.3.2's pgbench:

23930 of 38 tuples (-48%) done (elapsed 226.86 s, remaining -696.10 
s).

-48% does not seem to be quite correct to me...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction Interfaces

2013-12-11 Thread Robert Haas

On Wed, Dec 4, 2013 at 7:15 PM, Andres Freund  wrote:
> There's basically three major 'verbs' that can be performed on a
> stream, currently named (walsender names):
> * INIT_LOGICAL_REPLICATION "name" "output_plugin"
> * START_LOGICAL_REPLICATION "name" last_received ("option_name" value,...)
> * FREE_LOGICAL_REPLICATION "name"
>
> The SQL variant currrently has:
> * init_logical_replication(name, plugin)
> * start_logical_replication(name, stream_upto, options[])
> * stop_logical_replication(name)
>
> You might have noticed the slight inconsistency...

I think this naming is probably not the greatest.  When I hear "init",
I don't think "permanently allocate a resource that will never be
released unless I explicitly throw it away", and when I hear "stop", I
don't think "free that resource".  I suggest naming based around
create/drop, register/unregister, or acquire/release.  Since, as
previously noted, I'm gunning for these slots to apply to ordinary
replication as well, I kind of like ACQUIRE_REPLICATION_SLOT and
RELEASE_REPLICATION_SLOT.  If we're going to make them specific to
logical replication, then your suggestion of CREATE_DECODING_SLOT and
DROP_DECODING_SLOT, or maybe ACQUIRE_DECODING_SLOT and
RELEASE_DECODING_SLOT, sounds fine.

It also strikes me that just as it's possible to stream WAL without
allocating a slot first (since we don't at present have slots),
perhaps it ought also to be possible to stream logical replication
data without acquiring a slot first.  You could argue that it was a
mistake not to introduce slots in the first place, but the stateless
nature of WAL streaming definitely has some benefits, and it's unclear
to me why you shouldn't be able to do the same thing with logical
decoding.

> b) Decide which of the SQL functions should be in a contrib module, and
>which in core. Currently init_logical_replication() and
>stop_logical_replication() are in core, whereas
>start_logical_replication() is in the 'test_logical_decoding'
>extension. The reasoning behind that is that init/stop ones are
>important to the DBA and the start_logical_replication() SRF isn't
>all that useful in the real world because our SRFs don't support
>streaming changes out.

Seems pretty arbitrary to me.  If start_logical_replication() is
usable with any output plugin, then it ought to be in core.  I think
the name isn't great, though; the actual functionality seems to be
more or less decode-from-last-position-up-to-present, which doesn't
sound much like "start".

> c) Which data-types does start_logical_replication() return. Currently
> it's OUT location text, OUT xid bigint, OUT data text. Making the 'data'
> column text has some obvious disadvantages though - there's obvious
> usecases for output plugins that return binary data. But making it bytea
> sucks, because the output is harder to read by default...

I think having two functions might be sensible.  I'm not sure what
happens if the text function is used and the plugin outputs something
that's not valid in the database encoding, though.  I guess you'd
better check for that and error out.

> d) is my main question, and Robert, Peter G. and I previously argued
> about it a fair bit. I know of the following alternatives:
>
> I) The output plugin that's specified in INIT_LOGICAL_REPLICATION is
> actually a library name, and we simply lookup the fixed symbol names in
> it. That's what currently implemented.
> The advantage is that it's pretty easy to implement, works on a HS
> standby without involving the primary, and doesn't have a problem if the
> library is used in shared_preload_library.
> The disadvantages are: All output plugins need to be shared libraries
> and there can only be one output plugin per shared library (although you
> could route differently, via options, but ugh).

I still dislike this.

> II) Keep the output plugin a library, but only lookup a
> _PG_init_output_plugin() which registers/returns the callbacks. Pretty
> much the same tradeoffs as I)
>
> III) Keep the output plugin a library, but simply rely on _PG_init()
> calling a function to register all callbacks. Imo it's worse than I) and
> II) because it basically prohibits using the library in
> shared_preload_libraries as well, because then it's _PG_init() doesn't
> get called when starting to stream, and another library might have
> registered other callbacks.

I don't understand the disadvantage that you're describing here.  What
I'm imagining is that you have some struct that looks like this:

struct output_plugin
{
char name[64];
void (*begin)(args);
void (*change)(args);
void (*commit)(args);
};

Now you provide a function RegisterOutputPlugin(output_plugin *).  If
there are any output plugins built into core, core will call
RegisterOutputPlugin once for each one.  If a shared library
containing an output plugin is loaded, the libraries _PG_init function
does the same thing.  When someone tries to use a plugin, they ask for
it b

Re: [HACKERS] SSL: better default ciphersuite

2013-12-11 Thread Tom Lane

Peter Eisentraut  writes:
> Any other opinions on this out there?  All instances of other
> SSL-enabled servers out there, except nginx, default to some variant of
> DEFAULT:!LOW:... or HIGH:MEDIUM:  The proposal here is essentially
> to disable MEDIUM ciphers by default, which is explicitly advised
> against in the Postfix and Dovecot documentation, for example.

Doesn't seem like a great idea then.  I assume that if left to its own
devices, PG presently selects some MEDIUM-level cipher by default?  If so,
it sounds like this change amounts to imposing a performance penalty for
SSL connections by fiat.  On the other hand, if we select a HIGH cipher by
default, then aren't we just refusing to let clients who explicitly ask
for a MEDIUM cipher have one?  Either way, I'd want to see a pretty darn
airtight rationale for that, and there sure isn't one in this thread
so far.

The part of the patch that removes @STRENGTH seems plausible, though,
if Marko is correct that that's effectively overriding a hand-tailored
ordering.

In the end I wonder why our default isn't just "DEFAULT".  Anybody who
thinks that's an inappropriate default should be lobbying the OpenSSL
folk, not us, I should think.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench with large scale factor

2013-12-11 Thread Tatsuo Ishii

> I noticed that "pgbench -s scale_factor" where scale_factor is larger
> than 20,000 (SCALE_32BIT_THRESHOLD) creates pgbench_accounts table containing
> 0 row without any complain. Is there any reason for this?

Oops. It appeared that this was a bug prior 9.3 pgbench. Sorry for noise.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pgbench with large scale factor

2013-12-11 Thread Tatsuo Ishii

I noticed that "pgbench -s scale_factor" where scale_factor is larger
than 20,000 (SCALE_32BIT_THRESHOLD) creates pgbench_accounts table containing
0 row without any complain. Is there any reason for this?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSL: better default ciphersuite

2013-12-11 Thread Peter Eisentraut

On Fri, 2013-11-29 at 18:43 +0200, Marko Kreen wrote:
> Well, we should - the DEFAULT is clearly a client-side default
> for compatibility only.  No server should ever run with it.

Any other opinions on this out there?  All instances of other
SSL-enabled servers out there, except nginx, default to some variant of
DEFAULT:!LOW:... or HIGH:MEDIUM:  The proposal here is essentially
to disable MEDIUM ciphers by default, which is explicitly advised
against in the Postfix and Dovecot documentation, for example.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [BUG] Archive recovery failure on 9.3+.

2013-12-11 Thread Kyotaro HORIGUCHI

Hello, we happened to see server crash on archive recovery under
some condition.

After TLI was incremented, there should be the case that the WAL
file for older timeline is archived but not for that of the same
segment id but for newer timeline. Archive recovery should fail
for the case with PANIC error like follows,

| PANIC: record with zero length at 0/1820D40

Replay script is attached. This issue occured for 9.4dev, 9.3.2,
and not for 9.2.6 and 9.1.11. The latter search pg_xlog for the
TLI before trying archive for older TLIs.

This occurrs during fetching checkpoint redo record in archive
recovery.

> if (checkPoint.redo < RecPtr)
> {
>   /* back up to find the record */
>   record = ReadRecord(xlogreader, checkPoint.redo, PANIC, false);

And this is caused by that the segment file for older timeline in
archive directory is preferred to that for newer timeline in
pg_xlog.

Looking into pg_xlog before trying the older TLIs in archive like
9.2- fixes this issue. The attached patch is one possible
solution for 9.4dev.

Attached files are,

 - recvtest.sh: Replay script. Step 1 and 2 makes the condition
   and step 3 causes the issue.

 - archrecvfix_20131212.patch: The patch fixes the issue. Archive
   recovery reads pg_xlog before trying older TLI in archive
   similarly to 9.1- by this patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#/bin/bash

ROOT=`pwd`
PGDATA=$ROOT/test/data
ARCHDIR=$ROOT/test/arc

if [ ! -d $ARCHDIR -o ! -d $PGDATA ]; then
echo "$PGDATA and/or $ARCHDIR not found"
exit
fi

echo "### EMPTY ARCHIVE DIRECTORY ###"
if [ -d $ARCHDIR ]; then rm -f $ARCHDIR/*; fi

echo "### EMPTY PGDATA DIRECTORY ###"
if [ -d $PGDATA ]; then rm -r $PGDATA/*; fi

echo "### DO INITDB ###"
initdb -D $PGDATA > /dev/null

echo "### set up postgresql.conf ###"
cat >> $PGDATA/postgresql.conf < $PGDATA/recovery.conf = 0)
 	return true;	/* success! */
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-11 Thread Andres Freund

On 2013-12-11 22:08:41 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-12-11 14:00:05 -0300, Alvaro Herrera wrote:
> > > Andres Freund wrote:
> > > > On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:
> > > > > ! if (ISUPDATE_from_mxstatus(members[i].status) &&
> > > > > ! !TransactionIdDidAbort(members[i].xid))#
> > > > 
> > > > It makes me wary to see a DidAbort() without a previous InProgress()
> > > > call.  Also, after we crashed, doesn't DidAbort() possibly return
> > > > false for transactions that were in progress before we crashed? At
> > > > least that's how I always understood it, and how tqual.c is written.
> > > 
> > > Yes, that's correct.  But note that here we're not doing a tuple
> > > liveliness test, which is what tqual.c is doing.  What we do with this
> > > info is to keep the Xid as part of the multi if it's still running or
> > > committed.  We also keep it if the xact crashed, which is fine because
> > > the Xid will be removed by some later step.  If we know for certain that
> > > the update Xid is aborted, then we can ignore it, but this is just an
> > > optimization and not needed for correctness.
> > 
> > But why deviate that way? It doesn't seem to save us much?
> 
> Well, it does save something -- there not being a live update means we
> are likely to be able to invalidate the Xmax completely if there are no
> lockers; and even in the case where there are lockers, we will be able
> to set LOCK_ONLY which means faster access in several places.

What I mean is that we could just query TransactionIdIsInProgress() like
usual. I most of the cases it will be very cheap because of the
RecentXmin() check at its beginning.

> > I am not really sure what you are talking about. That you cannot
> > properly decode records before they have been processed by XLogInsert()?
> > If so, yes, that's pretty clear and I am pretty sure it will break in
> > lots of places if you try?
> 
> Well, not sure about "lots of places".  The only misbahavior I have seen
> is in those desc routines.  Of course, the redo routines might also
> fail, but then there's no way for them to be running ...

Hm. I would guess that e.g. display xl_xact_commit fails majorly.

> > > Right now there is one case in this code that returns
> > > FRM_INVALIDATE_XMAX when it's not strictly necessary, i.e. it would also
> > > work to keep the Multi as is and return FRM_NOOP instead; and it also
> > > returns FRM_NOOP in one case when we could return FRM_INVALIDATE_XMAX
> > > instead.  Neither does any great damage, but there is a consideration
> > > that future examiners of the tuple would have to resolve the MultiXact
> > > by themselves (==> performance hit).  On the other hand, returning
> > > INVALIDATE causes the block to be dirtied, which is undesirable if not
> > > already dirty.
> > 
> > Otherwise it will be marked dirty the next time reads the page, so I
> > don't think this is problematic.
> 
> Not necessarily.  I mean, if somebody sees a multi, they might just
> resolve it to its members and otherwise leave the page alone.  Or in
> some cases not even resolve to members (if it's LOCK_ONLY and old enough
> to be behind the oldest visible multi).

But the work has to be done anyway, even if possibly slightly later?

> > > !  * Tell caller to set 
> > > HEAP_XMAX_COMMITTED hint while we have
> > > !  * the Xid in cache.  Again, this is 
> > > just an optimization, so
> > > !  * it's not a problem if the 
> > > transaction is still running and
> > > !  * in the process of committing.
> > > !  */
> > > ! if (TransactionIdDidCommit(update_xid))
> > > ! update_committed = true;
> > > ! 
> > > ! update_xid = newmembers[i].xid;
> > > ! }
> > 
> > I don't think the conclusions here are correct - we might be setting
> > HEAP_XMAX_COMMITTED a smudge to early that way. If the updating
> > transaction is in progress, there's the situation that we have updated
> > the clog, but have not yet removed ourselves from the procarray. I.e. a
> > situation in which TransactionIdIsInProgress() and
> > TransactionIdDidCommit() both return true. Afaik it is only correct to
> > set HEAP_XMAX_COMMITTED once TransactionIdIsInProgress() returns false.
> 
> Hmm ... Is there an actual difference?  I mean, a transaction that
> marked itself as committed in pg_clog cannot return to any other state,
> regardless of what happens elsewhere.

But it could lead to other transactions seing the row as committed, even
though it isn't really yet.
tqual.c sayeth:
 * NOTE: must check TransactionIdIsInProgress (which looks in PGXACT array)
 * before TransactionIdDidCommit/TransactionIdDidAbort (which look in
 * pg_clog).  Otherwise we

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-11 Thread Alvaro Herrera

Andres Freund wrote:
> On 2013-12-11 14:00:05 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:

> > > > Andres mentioned the idea of sharing some code between
> > > > heap_prepare_freeze_tuple and heap_tuple_needs_freeze, but I haven't
> > > > explored that.
> > > 
> > > My idea would just be to have heap_tuple_needs_freeze() call
> > > heap_prepare_freeze_tuple() and check whether it returns true. Yes,
> > > that's slightly more expensive than the current
> > > heap_tuple_needs_freeze(), but it's only called when we couldn't get a
> > > cleanup lock on a page, so that seems ok.
> > 
> > Doesn't seem a completely bad idea, but let's leave it for a separate
> > patch.  This should be changed in master only IMV anyway, while the rest
> > of this patch is to be backpatched to 9.3.
> 
> I am not so sure it shouldn't be backpatched together with this. We now
> have similar complex logic in both functions.

Any other opinions on this?


> > > > !   if (ISUPDATE_from_mxstatus(members[i].status) &&
> > > > !   !TransactionIdDidAbort(members[i].xid))#
> > > 
> > > It makes me wary to see a DidAbort() without a previous InProgress()
> > > call.  Also, after we crashed, doesn't DidAbort() possibly return
> > > false for transactions that were in progress before we crashed? At
> > > least that's how I always understood it, and how tqual.c is written.
> > 
> > Yes, that's correct.  But note that here we're not doing a tuple
> > liveliness test, which is what tqual.c is doing.  What we do with this
> > info is to keep the Xid as part of the multi if it's still running or
> > committed.  We also keep it if the xact crashed, which is fine because
> > the Xid will be removed by some later step.  If we know for certain that
> > the update Xid is aborted, then we can ignore it, but this is just an
> > optimization and not needed for correctness.
> 
> But why deviate that way? It doesn't seem to save us much?

Well, it does save something -- there not being a live update means we
are likely to be able to invalidate the Xmax completely if there are no
lockers; and even in the case where there are lockers, we will be able
to set LOCK_ONLY which means faster access in several places.


> > > > if (tuple->t_infomask & HEAP_MOVED_OFF)
> > > > !   frz->frzflags |= XLH_FREEZE_XVAC;
> > > > else
> > > > !   frz->frzflags |= XLH_INVALID_XVAC;
> > > >   
> > > 
> > > Hm. Isn't this case inverted? I.e. shouldn't you set XLH_FREEZE_XVAC and
> > > XLH_INVALID_XVAC exactly the other way round? I really don't understand
> > > the moved in/off, since the code has been gone longer than I've followed
> > > the code...
> > 
> > Yep, fixed.
> 
> I wonder how many of the HEAP_MOVED_* cases around are actually
> correct... What was the last version those were generated? 8.4?

8.4, yeah, before VACUUM FULL got rewritten.  I don't think anybody
tests these code paths, because it involves databases that were upgraded
straight from 8.4 and which in their 8.4 time saw VACUUM FULL executed.

I think we should be considering removing these things, or at least have
some mechanism to ensure they don't survive from pre-9.0 installs.

> > (I was toying with the "desc"
> > code because it misbehaves when applied on records as they are created,
> > as opposed to being applied on records as they are replayed.  I'm pretty
> > sure everyone already knows about this, and it's the reason why
> > everybody has skimped from examining arrays of things stored in followup
> > data records.  I was naive enough to write code that tries to decode the
> > followup record that contains the members of the multiaxct we're
> > creating, which works fine during replay but gets them completely wrong
> > during regular operation.  This is the third time I'm surprised by this
> > misbehavior; blame my bad memory for not remembering that it's not
> > supposed to work in the first place.)
> 
> I am not really sure what you are talking about. That you cannot
> properly decode records before they have been processed by XLogInsert()?
> If so, yes, that's pretty clear and I am pretty sure it will break in
> lots of places if you try?

Well, not sure about "lots of places".  The only misbahavior I have seen
is in those desc routines.  Of course, the redo routines might also
fail, but then there's no way for them to be running ...

> > Right now there is one case in this code that returns
> > FRM_INVALIDATE_XMAX when it's not strictly necessary, i.e. it would also
> > work to keep the Multi as is and return FRM_NOOP instead; and it also
> > returns FRM_NOOP in one case when we could return FRM_INVALIDATE_XMAX
> > instead.  Neither does any great damage, but there is a consideration
> > that future examiners of the tuple would have to resolve the MultiXact
> > by themselves (==> performance hit)

Re: [HACKERS] Optimize kernel readahead using buffer access strategy

2013-12-11 Thread Claudio Freire

On Wed, Dec 11, 2013 at 3:14 AM, KONDO Mitsumasa
 wrote:
>
>> enable_readahead=os|fadvise
>>
>> with os = on, fadvise = off
>
> Hmm. fadvise is method and is not a purpose. So I consider another idea of
> this GUC.


Yeah, I was thinking of opening the door for readahead=aio, but
whatever clearer than on-off would work ;)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.8

2013-12-11 Thread Andres Freund

Hi,

Lots of sensible comments removed, I plan to make changes to
address them.

On 2013-12-10 22:17:44 -0500, Robert Haas wrote:
> - I think this needs SGML documentation, same kind of thing we have
> for background workers, except probably significantly more.  A design
> document with ASCII art in a different patch does not constitute
> usable documentation.  I think it would fit best under section VII,
> internals, perhaps entitled "Writing a Logical Decoding Plugin".
> Looking at it, I rather wonder if the "Background Worker Processes"
> ought to be there as well, rather than under section V, server
> programming.

Completely agreed it needs that. I'd like the UI decisions in
http://www.postgresql.org/message-id/20131205001520.ga8...@awork2.anarazel.de
resolved before though.

> +   logical_xmin =
> +   ((volatile LogicalDecodingCtlData*) LogicalDecodingCtl)->xmin;
> 
> Ugly.

We really should have a macro for this...

> Message style.  I suggest treating a short write as ENOSPC; there is
> precedent elsewhere.
> 
> I don't think there's much point in including "remapping" in all of
> the error messages.  It adds burden for translators and users won't
> know what a remapping file is anyway.

It helps in locating wich part of the code caused a problem. I utterly
hate to get reports with error messages that I can't correlate to a
sourcefile. Yes, I know that verbose error output exists, but usually
you don't get it that way... That said, I'll try to make the messages
simpler.

> +   /* use *GetUpdateXid to correctly deal with multixacts */
> +   xmax = HeapTupleHeaderGetUpdateXid(new_tuple->t_data);
>
> I don't feel enlightened by that comment.

Well, it will return the update xid of a tuple locked with NO KEY SHARE
and updated (with NO KEY UPDATE). I.e. 9.3+ foreign key locking stuff.

> +   if (!TransactionIdIsNormal(xmax))
> +   {
> +   /*
> +* no xmax is set, can't have any permanent ones, so
> this check is
> +* sufficient
> +*/
> +   }
> +   else if (HEAP_XMAX_IS_LOCKED_ONLY(new_tuple->t_data->t_infomask))
> +   {
> +   /* only locked, we don't care */
> +   }
> +   else if (!TransactionIdPrecedes(xmax, cutoff))
> +   {
> +   /* tuple has been deleted recently, log */
> +   do_log_xmax = true;
> +   }
> 
> Obfuscated.  Rewrite without empty blocks.

I don't understand why having an empty block is less clear than
including a condition in several branches? Especially if the individual
conditions might need to be documented?

> +   /*
> +* Write out buffer everytime we've too many in-memory entries.
> +*/
> +   if (state->rs_num_rewrite_mappings >= 1000 /* arbitrary number */)
> +   logical_heap_rewrite_flush_mappings(state);
> 
> What happens if the number of items per hash table entry is small but
> the number of entries is large?

rs_num_rewrite_mappings is the overall number of in-memory mappings, not
the number of per-entry mappings. That means we flush, even if all
entries have only a couple of mappings, as soon as 1000 in memory
entries have been collected. A bit simplistic, but seems okay enough?

> +   /* XXX: should we warn about such files? */
> 
> Nah.

Ok, will remove that comment from a couple locations then...

> +   /* XXX: we could replay the transaction and
> prepare it as well. */
> 
> Should we do that?

It would allow a neat feature, namely using 2PC to make sure that a
transaction commit on all the nodes connected using changeset
extraction. But I think that's a feature for later. Its use would have
to be optional anyway.

> All right, I'm giving up for now.  These patches need a LOT of work on
> comments and documentation to be committable, even if the underlying
> architecture is basically sound.

> +* clog. If we're doing logical replication we can't do that though, 
> so
> +* hold the lock for a moment longer.
> 
> ...because why?

That's something pretty icky imo. But more in the existing HS code than
in this. Without the changes in the locking we can have the situation
that transactions are marked as running in the xl_running_xact record,
but are actually already committed. There's some code for HS that tries
to cope with that situation but I don't trust it very much, and it'd be
more complicated to make it work for logical decoding. I could reproduce
problems for the latter without those changes.

I'll add a comment explaining this.

> I'm still unhappy that we're introducing logical decoding slots but no
> analogue for physical replication.  If we had the latter, would it
> share meaningful amounts of structure with this?

Yes, I think we could mostly reuse it, we'd probably want to add a field
or two more (application_name, sync_prio?). I have been wondering
whether some of the code in replication/logical/logical.c shouldn't be
in rep

Re: [HACKERS] In-Memory Columnar Store

2013-12-11 Thread desmodemone

2013/12/9 knizhnik 

> Hello!
>
> I want to annouce my implementation of In-Memory Columnar Store extension
> for PostgreSQL:
>
>  Documentation: http://www.garret.ru/imcs/user_guide.html
>  Sources: http://www.garret.ru/imcs-1.01.tar.gz
>
> Any feedbacks, bug reports and suggestions are welcome.
>
> Vertical representation of data is stored in PostgreSQL shared memory.
> This is why it is important to be able to utilize all available physical
> memory.
> Now servers with Tb or more RAM are not something exotic, especially in
> financial world.
> But there is limitation in Linux with standard 4kb pages  for maximal size
> of mapped memory segment: 256Gb.
> It is possible to overcome this limitation either by creating multiple
> segments - but it requires too much changes in PostgreSQL memory manager.
> Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in
> the system).
>
> I found several messages related with MAP_HUGETLB flag, the most recent
> one was from 21 of November:
> http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org
>
> I wonder what is the current status of this patch?
>
>
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Hello,
   excellent work! I begin to do testing and it's very fast, by the
way I found a strange case of "endless" query with CPU a 100%  when the
value used as filter does not exists:

I am testing with postgres 9.3.1 on debian  and I used default value for
the extension except memory ( 512mb )

how to recreate the test case :

## create a table :

create table endless ( col1 int , col2 char(30) , col3 int ) ;

## insert some values:

insert into endless values ( 1, 'ahahahaha', 3);

insert into endless values ( 2, 'ghghghghg', 4);

## create the column store objects:

select cs_create('endless','col1','col2');
 cs_create
---

(1 row)

## try and test column store :

select cs_avg(col3) from  endless_get('ahahahaha');
 cs_avg

  3
(1 row)

select cs_avg(col3) from  endless_get('ghghghghg');
 cs_avg

  4
(1 row)

## now select with a value that does not exist :

select cs_avg(col3) from  endless_get('testing');

# and now  start to loop on cpu and seems to never ends , I had to
terminate backend

Bye

Mat

Re: [HACKERS] Reference to parent query from ANY sublink

2013-12-11 Thread Tom Lane

Antonin Houska  writes:
> debug_print_plan output contains
> :grpColIdx 2
> in the AGG node.

Hm, that means there's only one grouping column (and it's the second
tlist entry of the child plan node).  So that seems conclusive that
the unique-ification is being done wrong.  It's not very clear why
though.  It doesn't seem like your patch is doing anything that
would directly affect that.

For comparison purposes, using the patch I just posted, I get
this description of a correct plan:

regression=# explain verbose SELECT *
  FROM SUBSELECT_TBL upper
  WHERE (f1, f2::float) IN
(SELECT f2, f3 FROM SUBSELECT_TBL);
 QUERY PLAN 

 Hash Join  (cost=41.55..84.83 rows=442 width=16)
   Output: upper.f1, upper.f2, upper.f3
   Hash Cond: ((upper.f1 = subselect_tbl.f2) AND ((upper.f2)::double precision 
= subselect_tbl.f3))
   ->  Seq Scan on public.subselect_tbl upper  (cost=0.00..27.70 rows=1770 
width=16)
 Output: upper.f1, upper.f2, upper.f3
   ->  Hash  (cost=38.55..38.55 rows=200 width=12)
 Output: subselect_tbl.f2, subselect_tbl.f3
 ->  HashAggregate  (cost=36.55..38.55 rows=200 width=12)
   Output: subselect_tbl.f2, subselect_tbl.f3
   Group Key: subselect_tbl.f2, subselect_tbl.f3
   ->  Seq Scan on public.subselect_tbl  (cost=0.00..27.70 
rows=1770 width=12)
 Output: subselect_tbl.f1, subselect_tbl.f2, 
subselect_tbl.f3
(12 rows)

so it's unique-ifying on both f2 and f3, which is clearly necessary
for executing the IN with a plain join.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reference to parent query from ANY sublink

2013-12-11 Thread Antonin Houska

On 12/11/2013 10:15 PM, Tom Lane wrote:
> 
> FWIW, that plan isn't obviously wrong; if it is broken, most likely the
> reason is that the HashAggregate is incorrectly unique-ifying the lower
> table.  (Unfortunately, EXPLAIN doesn't show enough about the HashAgg
> to know what it's doing exactly.)  The given query is, I think, in
> principle equivalent to
> 
>  SELECT ...
>   FROM SUBSELECT_TBL upper
>   WHERE (f1, f2::float) IN
> (SELECT f2, f3 FROM SUBSELECT_TBL);
> 
> and if you ask unmodified HEAD to plan that you get
> 
>  Hash Join  (cost=41.55..84.83 rows=442 width=16)
>Hash Cond: ((upper.f1 = subselect_tbl.f2) AND ((upper.f2)::double 
> precision = subselect_tbl.f3))
>->  Seq Scan on subselect_tbl upper  (cost=0.00..27.70 rows=1770 width=16)
>->  Hash  (cost=38.55..38.55 rows=200 width=12)
>  ->  HashAggregate  (cost=36.55..38.55 rows=200 width=12)
>->  Seq Scan on subselect_tbl  (cost=0.00..27.70 rows=1770 
> width=12)

Before I opened your mail, I also recalled the technique that I noticed
in the planner code, to evaluate SEMI JOIN as INNER JOIN with the RHS
uniquified, so also thought it could be about the uniquification.

> which is the same thing at the visible level of detail ... but this
> version computes the correct result.  The cost of the HashAggregate is
> estimated higher, though, which suggests that maybe it's distinct'ing on
> two columns where the bogus plan only does one.

debug_print_plan output contains
:grpColIdx 2
in the AGG node. I think this corresponds to the join condition, which
IMO should be
(upper.f1 = subselect_tbl.f2)
while the other condition was not in the list of join clauses and
therefore ignored for the uniquification's purpose.

And gdb tells me that create_unique_path() never gets more than 1
clause. I can't tell whether it should do just for this special purpose.

> Not sure about where Antonin's patch is going off the rails.  I suspect
> it's too simple somehow, but it's also possible that it's OK and the
> real issue is some previously undetected bug in LATERAL processing.

So far I have no idea how to achieve such conditions without this patch.
Thanks for your comments.

// Antonin Houska (Tony)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reference to parent query from ANY sublink

2013-12-11 Thread Tom Lane

Kevin Grittner  writes:
> Tom Lane  wrote:
>> FWIW, that plan isn't obviously wrong; if it is broken, most
>> likely the reason is that the HashAggregate is incorrectly
>> unique-ifying the lower table.  (Unfortunately, EXPLAIN doesn't
>> show enough about the HashAgg to know what it's doing exactly.)

> Yeah, I found myself wishing for an EXPLAIN option that would show
> that.

It's not hard to do ... how about the attached?

I chose to print grouping keys for both Agg and Group nodes, and to
show them unconditionally.  There's some case maybe for only including
them in verbose mode, but since sort keys are shown unconditionally,
it seemed more consistent this way.

regards, tom lane

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index bd5428d..9969a25 100644
*** a/src/backend/commands/explain.c
--- b/src/backend/commands/explain.c
*** static void show_sort_keys(SortState *so
*** 76,84 
  			   ExplainState *es);
  static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
  	   ExplainState *es);
! static void show_sort_keys_common(PlanState *planstate,
! 	  int nkeys, AttrNumber *keycols,
! 	  List *ancestors, ExplainState *es);
  static void show_sort_info(SortState *sortstate, ExplainState *es);
  static void show_hash_info(HashState *hashstate, ExplainState *es);
  static void show_instrumentation_count(const char *qlabel, int which,
--- 76,88 
  			   ExplainState *es);
  static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
  	   ExplainState *es);
! static void show_agg_keys(AggState *astate, List *ancestors,
! 			  ExplainState *es);
! static void show_group_keys(GroupState *gstate, List *ancestors,
! ExplainState *es);
! static void show_sort_group_keys(PlanState *planstate, const char *qlabel,
! 	 int nkeys, AttrNumber *keycols,
! 	 List *ancestors, ExplainState *es);
  static void show_sort_info(SortState *sortstate, ExplainState *es);
  static void show_hash_info(HashState *hashstate, ExplainState *es);
  static void show_instrumentation_count(const char *qlabel, int which,
*** ExplainNode(PlanState *planstate, List *
*** 1341,1347 
--- 1345,1358 
  		   planstate, es);
  			break;
  		case T_Agg:
+ 			show_agg_keys((AggState *) planstate, ancestors, es);
+ 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+ 			if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ 		   planstate, es);
+ 			break;
  		case T_Group:
+ 			show_group_keys((GroupState *) planstate, ancestors, es);
  			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
  			if (plan->qual)
  show_instrumentation_count("Rows Removed by Filter", 1,
*** show_sort_keys(SortState *sortstate, Lis
*** 1693,1701 
  {
  	Sort	   *plan = (Sort *) sortstate->ss.ps.plan;
  
! 	show_sort_keys_common((PlanState *) sortstate,
! 		  plan->numCols, plan->sortColIdx,
! 		  ancestors, es);
  }
  
  /*
--- 1704,1712 
  {
  	Sort	   *plan = (Sort *) sortstate->ss.ps.plan;
  
! 	show_sort_group_keys((PlanState *) sortstate, "Sort Key",
! 		 plan->numCols, plan->sortColIdx,
! 		 ancestors, es);
  }
  
  /*
*** show_merge_append_keys(MergeAppendState 
*** 1707,1720 
  {
  	MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
  
! 	show_sort_keys_common((PlanState *) mstate,
! 		  plan->numCols, plan->sortColIdx,
! 		  ancestors, es);
  }
  
  static void
! show_sort_keys_common(PlanState *planstate, int nkeys, AttrNumber *keycols,
! 	  List *ancestors, ExplainState *es)
  {
  	Plan	   *plan = planstate->plan;
  	List	   *context;
--- 1718,1773 
  {
  	MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
  
! 	show_sort_group_keys((PlanState *) mstate, "Sort Key",
! 		 plan->numCols, plan->sortColIdx,
! 		 ancestors, es);
  }
  
+ /*
+  * Show the grouping keys for an Agg node.
+  */
  static void
! show_agg_keys(AggState *astate, List *ancestors,
! 			  ExplainState *es)
! {
! 	Agg		   *plan = (Agg *) astate->ss.ps.plan;
! 
! 	if (plan->numCols > 0)
! 	{
! 		/* The key columns refer to the tlist of the child plan */
! 		ancestors = lcons(astate, ancestors);
! 		show_sort_group_keys(outerPlanState(astate), "Group Key",
! 			 plan->numCols, plan->grpColIdx,
! 			 ancestors, es);
! 		ancestors = list_delete_first(ancestors);
! 	}
! }
! 
! /*
!  * Show the grouping keys for a Group node.
!  */
! static void
! show_group_keys(GroupState *gstate, List *ancestors,
! ExplainState *es)
! {
! 	Group	   *plan = (Group *) gstate->ss.ps.plan;
! 
! 	/* The key columns refer to the tlist of the child plan */
! 	ancestors = lcons(gstate, ancestors);
! 	show_sort_group_keys(outerPlanState(gstate), "Group Key",
! 		 plan->numCols, plan->grpColIdx,
! 		 ancestors, es);
! 	ancestors = list_delete_first(ancestors);
! }
! 
! /*
!  * Common co

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-11 Thread Andres Freund

On 2013-12-11 14:00:05 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:
> > I don't so much have a problem with exporting CreateMultiXactId(), just
> > with exporting it under its current name. It's already quirky to have
> > both MultiXactIdCreate and CreateMultiXactId() in multixact.c but
> > exporting it imo goes to far.
> 
> MultiXactidCreateFromMembers(int, MultiXactMembers *) ?

Works for me.

> > > Andres mentioned the idea of sharing some code between
> > > heap_prepare_freeze_tuple and heap_tuple_needs_freeze, but I haven't
> > > explored that.
> > 
> > My idea would just be to have heap_tuple_needs_freeze() call
> > heap_prepare_freeze_tuple() and check whether it returns true. Yes,
> > that's slightly more expensive than the current
> > heap_tuple_needs_freeze(), but it's only called when we couldn't get a
> > cleanup lock on a page, so that seems ok.
> 
> Doesn't seem a completely bad idea, but let's leave it for a separate
> patch.  This should be changed in master only IMV anyway, while the rest
> of this patch is to be backpatched to 9.3.

I am not so sure it shouldn't be backpatched together with this. We now
have similar complex logic in both functions.

> > > ! if (ISUPDATE_from_mxstatus(members[i].status) &&
> > > ! !TransactionIdDidAbort(members[i].xid))#
> > 
> > It makes me wary to see a DidAbort() without a previous InProgress()
> > call.  Also, after we crashed, doesn't DidAbort() possibly return
> > false for transactions that were in progress before we crashed? At
> > least that's how I always understood it, and how tqual.c is written.
> 
> Yes, that's correct.  But note that here we're not doing a tuple
> liveliness test, which is what tqual.c is doing.  What we do with this
> info is to keep the Xid as part of the multi if it's still running or
> committed.  We also keep it if the xact crashed, which is fine because
> the Xid will be removed by some later step.  If we know for certain that
> the update Xid is aborted, then we can ignore it, but this is just an
> optimization and not needed for correctness.

But why deviate that way? It doesn't seem to save us much?

> One interesting bit is that we might end up creating singleton
> MultiXactIds when freezing, if there's no updater and there's a running
> locker.  We could avoid this (i.e. mark the tuple as locked by a single
> Xid) but it would complicate FreezeMultiXactId's API and it's unlikely
> to occur with any frequency anyway.

Yea, that seems completely fine.

> > I don't think there's a need to check for
> > TransactionIdIsCurrentTransactionId() - vacuum can explicitly *not* be
> > run inside a transaction.
> 
> Keep in mind that freezing can also happen for tuples handled during a
> table-rewrite operation such as CLUSTER.

Good point.

> > >   if (tuple->t_infomask & HEAP_MOVED_OFF)
> > > ! frz->frzflags |= XLH_FREEZE_XVAC;
> > >   else
> > > ! frz->frzflags |= XLH_INVALID_XVAC;
> > >   
> > 
> > Hm. Isn't this case inverted? I.e. shouldn't you set XLH_FREEZE_XVAC and
> > XLH_INVALID_XVAC exactly the other way round? I really don't understand
> > the moved in/off, since the code has been gone longer than I've followed
> > the code...
> 
> Yep, fixed.

I wonder how many of the HEAP_MOVED_* cases around are actually
correct... What was the last version those were generated? 8.4?

> (I was toying with the "desc"
> code because it misbehaves when applied on records as they are created,
> as opposed to being applied on records as they are replayed.  I'm pretty
> sure everyone already knows about this, and it's the reason why
> everybody has skimped from examining arrays of things stored in followup
> data records.  I was naive enough to write code that tries to decode the
> followup record that contains the members of the multiaxct we're
> creating, which works fine during replay but gets them completely wrong
> during regular operation.  This is the third time I'm surprised by this
> misbehavior; blame my bad memory for not remembering that it's not
> supposed to work in the first place.)

I am not really sure what you are talking about. That you cannot
properly decode records before they have been processed by XLogInsert()?
If so, yes, that's pretty clear and I am pretty sure it will break in
lots of places if you try?

> Right now there is one case in this code that returns
> FRM_INVALIDATE_XMAX when it's not strictly necessary, i.e. it would also
> work to keep the Multi as is and return FRM_NOOP instead; and it also
> returns FRM_NOOP in one case when we could return FRM_INVALIDATE_XMAX
> instead.  Neither does any great damage, but there is a consideration
> that future examiners of the tuple would have to resolve the MultiXact
> by themselves (==> performance hit).  On the other hand, returning
> INVALIDATE c

Re: [HACKERS] Why standby.max_connections must be higher than primary.max_connections?

2013-12-11 Thread Peter Geoghegan

On Tue, Dec 10, 2013 at 1:17 PM, Robert Haas  wrote:
> Because the KnownAssignedXIDs and lock tables on the standby need to
> be large enough to contain the largest snapshot and greatest number of
> AccessExclusiveLocks that could exist on the master at any given time.

Right. Initially during the development of Hot Standby, it looked like
the "max_connections >= master's" requirement on standbys wasn't going
to be necessary, or could be avoided. However, Simon gave up on that
idea on pragmatic grounds here:

http://www.postgresql.org/message-id/1252002165.2889.467.camel@ebony.2ndQuadrant

I'd thought about revisiting this myself, but I think that the impetus
to do so is lessened by recent work on logical replication.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Josh Berkus

On 12/11/2013 02:39 PM, Martijn van Oosterhout wrote:
> In this discussion we've mostly used block = 1 postgresql block of 8k. 
> But when reading from a disk once you've read one block you can
> basically read the following ones practically for free.
> 
> So I wonder if you could make your sampling read always 16 consecutive
> blocks, but then use 25-50% of the tuples.  That way you get many more
> tuples for the same amount of disk I/O seeks..

Yeah, that's what I meant by "tune this for the FS".   We'll probably
have to test a lot of different "block sizes" on different FSes before
we arrive at a reasonable size, and even then I'll bet we have to offer
a GUC.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why standby.max_connections must be higher than primary.max_connections?

2013-12-11 Thread satoshi yamada

>> Hello hackers.
>>
>> I am struggling to understand why standby.max_connections must be higher
>> than
>> primary.max_connections.Do someone know the reason why?
>
> Because the KnownAssignedXIDs and lock tables on the standby need to
> be large enough to contain the largest snapshot and greatest number of
> AccessExclusiveLocks that could exist on the master at any given time.
>
>> I think this restriction obstructs making a flexible load balancing.
>> I'd like to make standby database to use load balancing.Of course
>> a role of a standby server is different from one of a master server.
>> So I think it it natural that I want to set standby.max_connections less
>> than
>> primary.max_connections.
>
> Your load balancer should probably have a configuration setting that
> controls how many connections it will try to make to the database, and
> you can set that to a lower value than max_connections.
>
Hi Robert.

I understand the reason Why standby.max_connections must be higher
than primary.max_connections.

I'll try to restrict sessions by load balancer.

Thanks.


2013/12/11 Robert Haas 

> On Tue, Dec 10, 2013 at 3:34 AM, 山田聡  wrote:
> > Hello hackers.
> >
> > I am struggling to understand why standby.max_connections must be higher
> > than
> > primary.max_connections.Do someone know the reason why?
>
> Because the KnownAssignedXIDs and lock tables on the standby need to
> be large enough to contain the largest snapshot and greatest number of
> AccessExclusiveLocks that could exist on the master at any given time.
>
> > I think this restriction obstructs making a flexible load balancing.
> > I'd like to make standby database to use load balancing.Of course
> > a role of a standby server is different from one of a master server.
> > So I think it it natural that I want to set standby.max_connections less
> > than
> > primary.max_connections.
>
> Your load balancer should probably have a configuration setting that
> controls how many connections it will try to make to the database, and
> you can set that to a lower value than max_connections.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Martijn van Oosterhout

On Thu, Dec 12, 2013 at 07:22:59AM +1300, Gavin Flower wrote:
> Surely we want to sample a 'constant fraction' (obviously, in
> practice you have to sample an integral number of rows in a page!)
> of rows per page? The simplest way, as Tom suggests, is to use all
> the rows in a page.
> 
> However, if you wanted the same number of rows from a greater number
> of pages, you could (for example) select a quarter of the rows from
> each page.  In which case, when this is a fractional number: take
> the integral number of rows, plus on extra row with a probability
> equal to the fraction (here 0.25).

In this discussion we've mostly used block = 1 postgresql block of 8k. 
But when reading from a disk once you've read one block you can
basically read the following ones practically for free.

So I wonder if you could make your sampling read always 16 consecutive
blocks, but then use 25-50% of the tuples.  That way you get many more
tuples for the same amount of disk I/O seeks..

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

signature.asc
Description: Digital signature

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Greg Stark

I think we're all wet here. I don't see any bias towards larger or smaller
rows. Larger tied will be on a larger number of pages but there will be
fewer of them on any one page. The average effect should be the same.

Smaller values might have a higher variance with block based sampling than
larger values. But that actually *is* the kind of thing that Simon's
approach of just compensating with later samples can deal with.

-- 
greg

Re: [HACKERS] preserving forensic information when we freeze

2013-12-11 Thread Alvaro Herrera

Andres Freund escribió:

> What's your plan to commit this? I'd prefer to wait till Alvaro's
> freezing changes get in, so his patch will look the same in HEAD and
> 9.3. But I think he plans to commit soon.

Yes, I do.  I'm waiting on feedback on the patch I posted this
afternoon, so if there's nothing more soon I will push it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time-Delayed Standbys

2013-12-11 Thread Fabrízio de Royes Mello

On Wed, Dec 11, 2013 at 7:47 PM, Andres Freund 
wrote:
>
> I don't think that position has any merit, sorry: Think about the way
> this stuff gets setup. The user creates a new basebackup (pg_basebackup,
> manual pg_start/stop_backup, shutdown primary). Then he creates a
> recovery conf by either starting from scratch, using
> --write-recovery-conf or by copying recovery.conf.sample. In none of
> these cases delay will be configured.
>

Ok.


> So, with that in mind, the only way it could have been configured is by
> the user *explicitly* writing it into recovery.conf. And now you want to
> to react to this explicit step by just *silently* ignoring the setting
> based on some random criteria (arguments have been made about
> hot_standby=on/off, standby_mode=on/off which aren't directly
> related). Why on earth would that by a usability improvement?
>
> Also, you seem to assume there's no point in configuring it for any of
> hot_standby=off, standby_mode=off, recovery_target=*. Why? There's
> usecases for all of them:
> * hot_standby=off: Makes delay useable with wal_level=archive (and thus
>   a lower WAL volume)
> * standby_mode=off: Configurations that use tools like pg_standby and
>   similar simply don't need standby_mode=on. If you want to trigger
>   failover from within the restore_command you *cannot* set it.
> * recovery_target_*: It can still make sense if you use
>   pause_at_recovery_target.
>
> In which scenarios does your restriction actually improve anything?
>

Given your arguments I'm forced to review my understanding of the problem.
You are absolutely right in your assertions. I was not seeing the scenario
on this perspective.

Anyway we need to improve docs, any suggestions?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Peter Geoghegan

On Wed, Dec 11, 2013 at 1:28 PM, Simon Riggs  wrote:
> But nobody has given a sensible answer to my questions, other than to
> roll out the same general points again. In practice, its a knob that
> does not do very much of use for us. At best it is addressing the
> symptoms, not the cause. IMHO.

It's just a usability feature. It lessens the intellectual overhead of
managing maintenance_work_mem. I understand that it isn't very
impressive from a technical point of view. However, in many
environments, it actually will make a significant difference, because
non-autovacuum maintenance operations are very rare compared to
autovacuum workers vacuuming, and therefore I can now afford to be
much less conservative in setting maintenance_work_mem globally on
each of 8 Postgres instances hosted on a single box. These are
precisely the kinds of Postgres instances where users are very
unlikely to have heard of maintenance_work_mem at all. These users are
not even using an admin tool in many cases, preferring to rely on ORM
migrations.  Having said that, it's also something I'll find useful on
a day to day basis, because it's a chore to set maintenace_work_mem
manually, and sometimes I forget to do so, particularly when under
pressure to relieve a production performance issues on a random
customer's database.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 reference constraint regression

2013-12-11 Thread Alvaro Herrera

Daniel Wood wrote:
> In 9.3 I can delete the parent of a parent-child relation if the child row
> is an uncommitted insert and I first update the parent.

Interesting test case, thanks.  I traced through it and promptly noticed
that the problem is that HeapTupleSatisfiesUpdate is failing to consider
the case that there are foreign lockers when a tuple was created by our
own transaction.  In the past, a tuple created by our own transaction
can't possibly be seen by other transactions -- so it wouldn't be
possible for them to lock them.  But this is no longer the case, because
an UPDATE can carry forward the locks from the previous version of the
tuple.

The fix for the immediate bug is to add some code to HTSU so that it
checks for locks by other transactions even when the tuple was created
by us.  I haven't looked at the other tqual routines yet, but I imagine
they will need equivalent fixes.

One thought on the whole HTSU issue is that it's a bit strange that it
is returning HeapTupleBeingUpdated when the tuple is actually only
locked; and thus the callers can update the tuple in that case anyway.
While I haven't explored the idea fully, perhaps we should consider
adding a new return code to HTSU, with the meaning "this tuple is not
updated, but it is locked by someone".  Then it's the caller's
responsibility to check the lock for conflicts, and perhaps add more
locks.  When BeingUpdate is returned, then a tuple cannot possibly be
updated.  This may help disentangling the mess in the heapam.c routines
a bit ... or it may not (at least heap_lock_tuple will still need to
consider doing some stuff in the BeingUpdated case, so it's unlikely
that we will see much simplification there).  Another consideration is
that perhaps it's too late for the 9.3 series for a change this large.

I will start working on a patch for this tomorrow.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-12-11 Thread Peter Eisentraut

This patch fails the regression tests; see attachment.
*** 
/var/lib/jenkins/jobs/postgresql_commitfest_world/workspace/src/test/regress/expected/arrays.out
2013-08-24 01:24:42.0 +
--- 
/var/lib/jenkins/jobs/postgresql_commitfest_world/workspace/src/test/regress/results/arrays.out
 2013-12-07 16:21:48.0 +
***
*** 495,508 
  (11 rows)
  
  SELECT * FROM array_op_test WHERE i <@ '{38,34,32,89}' ORDER BY seqno;
!  seqno |   i   |  
   t  
! 
---+---+
! 40 | {34}  | 
{AA10611,AAA1205,AAA50956,31334,A70466,81587,AAA74623}
! 74 | {32}  | 
{1729,A22860,AA99807,A17383,AAA67062,AAA15165,AAA50956}
! 98 | {38,34,32,89} | 
{AA71621,8857,AAA65037,31334,AA48845}
!101 | {}| {}
! (4 rows)
! 
  SELECT * FROM array_op_test WHERE i = '{}' ORDER BY seqno;
   seqno | i  | t  
  ---++
--- 495,501 
  (11 rows)
  
  SELECT * FROM array_op_test WHERE i <@ '{38,34,32,89}' ORDER BY seqno;
! ERROR:  compressed data is corrupt
  SELECT * FROM array_op_test WHERE i = '{}' ORDER BY seqno;
   seqno | i  | t  
  ---++
***
*** 622,632 
  (0 rows)
  
  SELECT * FROM array_op_test WHERE i <@ '{}' ORDER BY seqno;
!  seqno | i  | t  
! ---++
!101 | {} | {}
! (1 row)
! 
  SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno;
   seqno |   i|   t
  ---++
--- 615,621 
  (0 rows)
  
  SELECT * FROM array_op_test WHERE i <@ '{}' ORDER BY seqno;
! ERROR:  compressed data is corrupt
  SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno;
   seqno |   i|   t
  ---++
***
*** 644,654 
  (0 rows)
  
  SELECT * FROM array_op_test WHERE i <@ '{NULL}' ORDER BY seqno;
!  seqno | i  | t  
! ---++
!101 | {} | {}
! (1 row)
! 
  SELECT * FROM array_op_test WHERE t @> '{72908}' ORDER BY seqno;
   seqno |   i   |  
   t
  
  
---+---+
--- 633,639 
  (0 rows)
  
  SELECT * FROM array_op_test WHERE i <@ '{NULL}' ORDER BY seqno;
! ERROR:  compressed data is corrupt
  SELECT * FROM array_op_test WHERE t @> '{72908}' ORDER BY seqno;
   seqno |   i   |  
   t
  
  
---+---+

==

*** 
/var/lib/jenkins/jobs/postgresql_commitfest_world/workspace/src/test/regress/expected/polymorphism.out
  2013-11-13 20:20:40.0 +
--- 
/var/lib/jenkins/jobs/postgresql_commitfest_world/workspace/src/test/regress/results/polymorphism.out
   2013-12-07 16:21:55.0 +
***
*** 638,648 
  -- check that we can apply functions taking ANYARRAY to pg_stats
  select distinct array_ndims(histogram_bounds) from pg_stats
  where histogram_bounds is not null;
!  array_ndims 
! -
!1
! (1 row)
! 
  -- such functions must protect themselves if varying element type isn't OK
  -- (WHERE clause here is to avoid possibly getting a collation error instead)
  select max(histogram_bounds) from pg_stats where tablename = 'pg_am';
--- 638,644 
  -- check that we can apply functions taking ANYARRAY to pg_stats
  select distinct array_ndims(histogram_bounds) from pg_stats
  where histogram_bounds is not null;
! ERROR:  compressed data is corrupt
  -- such functions must protect themselves if varying element type isn't OK
  -- (WHERE clause here is to avoid possibly getting a collation error instead)
  select max(histogram_bounds) from pg_stats where tablename = 'pg_am';

==


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cube extension kNN support

2013-12-11 Thread Peter Eisentraut

cube.c: In function ‘g_cube_distance’:
cube.c:1453:2: warning: ‘retval’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let us fix the documentation

2013-12-11 Thread AK

Oops, did not notice that - thank you!



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Let-us-fix-the-documentation-tp5782999p5783002.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] preserving forensic information when we freeze

2013-12-11 Thread Andres Freund

On 2013-12-11 14:17:27 -0500, Robert Haas wrote:
> > But in which cases would that actually be slower? There'll be no
> > additional code executed if the hint bits for frozen are set, and in
> > case not it will usually safe us an external function call to
> > TransactionIdPrecedes().
> 
> Dunno.  It's at least more code generation.

IMO it looks cleaner and is less error prone, so an additional
instruction or two in the slow path doesn't seem a high price. And we're
really not talking about more.

But I won't do more than roll my eyes loudly if you decide to go ahead
with your version as long as you change rewriteheap.c accordingly.

> >>> > I think once we have this we should start opportunistically try to
> >> > freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
> >> > the page is already dirty.
> >>
> >> Separate patch, but yeah, something like that.  If we have to mark the
> >> page all-visible, we might as well freeze it while we're there.  We
> >> should think about how it interacts with Heikki's freeze-without-write
> >> patch though.
> >
> > Definitely separate yes. And I agree, it's partially moot if Heikki's
> > patch gets in, but I am not sure it will make it into 9.4. There seems
> > to be quite some work left.
> 
> I haven't heard anything further from Heikki, so I'm thinking we
> should proceed with this approach.

Yea, I think by now it's pretty unlikely to get into 9.4. I hope for
early 9.5 tho.

What's your plan to commit this? I'd prefer to wait till Alvaro's
freezing changes get in, so his patch will look the same in HEAD and
9.3. But I think he plans to commit soon.

> If we also
> handle the vacuum-dirtied-it-already case as you propose here, I think
> we'd have quite a respectable improvement in vacuum behavior for 9.4,
> even without Heikki's stuff.

Yes, and it seems like a realistic goal, so let's go for it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let us fix the documentation

2013-12-11 Thread Erikjan Rijkers

On Wed, December 11, 2013 22:51, AK wrote:
> The following url seems to be slightly incorrect:
>
> http://www.postgresql.org/docs/9.3/static/sql-prepare.html
>
> PREPARE usrrptplan (int) AS
> SELECT * FROM users u, logs l WHERE u.usrid=$1 AND u.usrid=l.usrid
> AND l.date = $2;
> EXECUTE usrrptplan(1, current_date);
>
> I guess the first line of the example should be:
>
> PREPARE usrrptplan (int, date) AS
>
> What do you think?
>

read the next line:

"Note that the data type of the second parameter is not specified, so it is 
inferred from the context in which $2 is used."

(So the example is correct as is)




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Let us fix the documentation

2013-12-11 Thread AK

The following url seems to be slightly incorrect:

http://www.postgresql.org/docs/9.3/static/sql-prepare.html

PREPARE usrrptplan (int) AS
SELECT * FROM users u, logs l WHERE u.usrid=$1 AND u.usrid=l.usrid
AND l.date = $2;
EXECUTE usrrptplan(1, current_date);

I guess the first line of the example should be:

PREPARE usrrptplan (int, date) AS

What do you think?



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Let-us-fix-the-documentation-tp5782999.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reference to parent query from ANY sublink

2013-12-11 Thread Kevin Grittner

Tom Lane  wrote:

> FWIW, that plan isn't obviously wrong; if it is broken, most
> likely the reason is that the HashAggregate is incorrectly
> unique-ifying the lower table.  (Unfortunately, EXPLAIN doesn't
> show enough about the HashAgg to know what it's doing exactly.)

Yeah, I found myself wishing for an EXPLAIN option that would show
that.

> The cost of the HashAggregate is estimated higher, though, which
> suggests that maybe it's distinct'ing on two columns where the
> bogus plan only does one.

FWIW, I noticed that the actual row counts suggested that, too.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time-Delayed Standbys

2013-12-11 Thread Andres Freund

On 2013-12-11 16:37:54 -0200, Fabrízio de Royes Mello wrote:
> On Wed, Dec 11, 2013 at 6:27 AM, Simon Riggs  wrote:
> > >> I think this feature will be used in a lot of scenarios in
> > >> which PITR is currently used.
> > >
> > > We have to judge which is better, we get something potential or to protect
> > > stupid.
> > > And we had better to wait author's comment...
> >
> > I'd say just document that it wouldn't make sense to use it for PITR.
> >
> > There may be some use case we can't see yet, so specifically
> > prohibiting a use case that is not dangerous seems too much at this
> > point. I will no doubt be reminded of these words in the future...

> I tend to agree with Simon, but I confess that I don't liked to delay a
> server with standby_mode = 'off'.

> The main goal of this patch is delay the Streaming Replication, so if the
> slave server isn't a hot-standby I think makes no sense to delay it.

> Mitsumasa suggested to add "StandbyModeRequested" in conditional branch to
> skip this situation. I agree with him!

I don't think that position has any merit, sorry: Think about the way
this stuff gets setup. The user creates a new basebackup (pg_basebackup,
manual pg_start/stop_backup, shutdown primary). Then he creates a
recovery conf by either starting from scratch, using
--write-recovery-conf or by copying recovery.conf.sample. In none of
these cases delay will be configured.

So, with that in mind, the only way it could have been configured is by
the user *explicitly* writing it into recovery.conf. And now you want to
to react to this explicit step by just *silently* ignoring the setting
based on some random criteria (arguments have been made about
hot_standby=on/off, standby_mode=on/off which aren't directly
related). Why on earth would that by a usability improvement?

Also, you seem to assume there's no point in configuring it for any of
hot_standby=off, standby_mode=off, recovery_target=*. Why? There's
usecases for all of them:
* hot_standby=off: Makes delay useable with wal_level=archive (and thus
  a lower WAL volume)
* standby_mode=off: Configurations that use tools like pg_standby and
  similar simply don't need standby_mode=on. If you want to trigger
  failover from within the restore_command you *cannot* set it.
* recovery_target_*: It can still make sense if you use
  pause_at_recovery_target.

In which scenarios does your restriction actually improve anything?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Simon Riggs

On 11 December 2013 19:54, Josh Berkus  wrote:
> On 12/11/2013 11:37 AM, Simon Riggs wrote:> On 11 December 2013 17:57,
> Robert Haas  wrote:
>>
>>> Extensive testing will be needed to prove
>>> that the new algorithm doesn't perform worse than the current
>>> algorithm in any important cases.
>>
>> Agreed, but the amount of testing seems equivalent in both cases,
>> assuming we weren't going to skip it for this patch.
>
> No performance testing is required for this patch.  The effect of memory
> limits on vacuum are already well-known and well-understood.

Yes, I wrote the patch that you use to tune autovacuum. Not everybody
agreed with me then either about whether we need it, so I'm used to
people questioning my thinking and am happy people do.


>> With considerable regret, I don't see how this solves the problem at
>> hand. We can and should do better.
>
> I strongly disagree.  The problem we are dealing with currently is that
> two resource limits which should have *always* been independent of each
> other are currently conflated into a single GUC variable.  This forces
> users to remember to set maintenance_work_mem interactively every time
> they want to run a manual VACUUM, because the setting in postgresql.conf
> is needed to tune autovacuum.

I understand the general argument and it sounds quite cool, I agree. I
am all for user control.

But nobody has given a sensible answer to my questions, other than to
roll out the same general points again. In practice, its a knob that
does not do very much of use for us. At best it is addressing the
symptoms, not the cause. IMHO.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Gavin Flower


On 12/12/13 09:12, Gavin Flower wrote:

On 12/12/13 08:39, Gavin Flower wrote:

On 12/12/13 08:31, Kevin Grittner wrote:

Gavin Flower  wrote:


For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes,
using 400 byte pages.  In the pathologically worst case, assuming
maximum packing density and no page has both types: the large rows 
would

occupy  500 pages and the smaller rows 50 pages. So if one selected 11
pages at random, you get about 10 pages of large rows and about one 
for

small rows!

With 10 * 2 = 20 large rows, and 1 * 20 = 20 small rows.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sorry, I've simply come up with well argued nonsense!

Kevin, you're dead right.


Cheers,
Gavin



I looked at:
http://www.postgresql.org/docs/current/interactive/storage-page-layout.html 

this says that each row has an overhead, which suggests there should 
be a bias towards small rows.


There must be a lot of things going on, that I'm simply not aware of, 
that affect sampling bias...



Cheers,
Gavin


Ignoring overheads per row and other things... There will be a biasing 
affect when the distribution of sizes is not symmetric.  For example: 
when the majority of rows have sizes greater than the arithmetic mean, 
then most samples will be biased towards larger rows. Similarly there 
could be a bias towards smaller rows when most rows are smaller than the 
arithmetic mean.  Yes, I did think about this in depth - but it is way 
too complicated to attempt to quantify the bias, because it depends on 
too many factors (even just limiting it to the distribution of row sizes).


So apart from the nature of volatility of the table, and the pattern of 
insertions/updates/deletes - there there will be a bias depending on the 
distribution of values in the table.


So I despair, that a simple elegant & practical algorithm will ever be 
found.


Therefore, I expect the best answer is probably some kind of empirical 
adaptive approach - which I think has already been suggested.



Cheers,
Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reference to parent query from ANY sublink

2013-12-11 Thread Tom Lane

Kevin Grittner  writes:
> Kevin Grittner  wrote:
>> I applied it to master and ran the regression tests, and one of
>> the subselect tests failed.
>> 
>> This query:
>> 
>> SELECT '' AS six, f1 AS "Correlated Field", f3 AS "Second
>> Field"
>>    FROM SUBSELECT_TBL upper
>>    WHERE f1 IN
>>  (SELECT f2 FROM SUBSELECT_TBL WHERE CAST(upper.f2 AS float) = f3);

>> [ ... ] during the `make check` or `make install-check` [ ... ]
>> is missing the last two rows.  Oddly, if I go into the database
>> later and try it, the rows show up.  It's not immediately
>> apparent to me what's wrong.

> Using the v2 patch, with the default statistics from table
> creation, the query modified with an alias of "lower" for the
> second reference, just for clarity, yields a plan which generates
> incorrect results:

>  Hash Join  (cost=37.12..80.40 rows=442 width=12) (actual time=0.059..0.064 
> rows=3 loops=1)
>    Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = 
> lower.f2))
>    ->  Seq Scan on subselect_tbl upper  (cost=0.00..27.70 rows=1770 width=16) 
> (actual time=0.006..0.007 rows=8 loops=1)
>    ->  Hash  (cost=34.12..34.12 rows=200 width=12) (actual time=0.020..0.020 
> rows=5 loops=1)
>  Buckets: 1024  Batches: 1  Memory Usage: 1kB
>  ->  HashAggregate  (cost=32.12..34.12 rows=200 width=12) (actual 
> time=0.014..0.018 rows=6 loops=1)
>    ->  Seq Scan on subselect_tbl lower  (cost=0.00..27.70 
> rows=1770 width=12) (actual time=0.002..0.004 rows=8 loops=1)
>  Total runtime: 0.111 ms

FWIW, that plan isn't obviously wrong; if it is broken, most likely the
reason is that the HashAggregate is incorrectly unique-ifying the lower
table.  (Unfortunately, EXPLAIN doesn't show enough about the HashAgg
to know what it's doing exactly.)  The given query is, I think, in
principle equivalent to

 SELECT ...
  FROM SUBSELECT_TBL upper
  WHERE (f1, f2::float) IN
(SELECT f2, f3 FROM SUBSELECT_TBL);

and if you ask unmodified HEAD to plan that you get

 Hash Join  (cost=41.55..84.83 rows=442 width=16)
   Hash Cond: ((upper.f1 = subselect_tbl.f2) AND ((upper.f2)::double precision 
= subselect_tbl.f3))
   ->  Seq Scan on subselect_tbl upper  (cost=0.00..27.70 rows=1770 width=16)
   ->  Hash  (cost=38.55..38.55 rows=200 width=12)
 ->  HashAggregate  (cost=36.55..38.55 rows=200 width=12)
   ->  Seq Scan on subselect_tbl  (cost=0.00..27.70 rows=1770 
width=12)

which is the same thing at the visible level of detail ... but this
version computes the correct result.  The cost of the HashAggregate is
estimated higher, though, which suggests that maybe it's distinct'ing on
two columns where the bogus plan only does one.

Not sure about where Antonin's patch is going off the rails.  I suspect
it's too simple somehow, but it's also possible that it's OK and the
real issue is some previously undetected bug in LATERAL processing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Josh Berkus

On 12/11/2013 12:40 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> And, for that matter, accepting this patch by no means blocks doing
>> something more sophisticated in the future.
> 
> Yeah.  I think the only real argument against it is "do we really need
> yet another knob?".  Since Josh, who's usually the voicer of that
> argument, is for this one, I don't have a problem with it.

This passes the "is it a chronic problem not to have a knob for this?" test.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Tom Lane

Josh Berkus  writes:
> And, for that matter, accepting this patch by no means blocks doing
> something more sophisticated in the future.

Yeah.  I think the only real argument against it is "do we really need
yet another knob?".  Since Josh, who's usually the voicer of that
argument, is for this one, I don't have a problem with it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-11 Thread Robert Haas

On Wed, Dec 11, 2013 at 2:49 PM, Dimitri Fontaine
 wrote:
> Robert Haas  writes:
>>> You've got that backwards.  We do have the luxury of rejecting new
>>> features until people are generally satisfied that the basic design is
>>> right.  There's no overlord decreeing that this must be in 9.4.
>>
>> I strongly agree.  PostgreSQL has succeeded because we try not to do
>> things at all until we're sure we know how to do them right.
>
> I still agree to the principle, or I wouldn't even try. Not in details,
> because the current design passed all the usual criteria a year ago.
>
>   http://www.postgresql.org/message-id/6466.1354817...@sss.pgh.pa.us
>
>> I can certainly understand Dimitri's frustration, in that he's written
>> several versions of this patch and none have been accepted.  But what
>
> The design was accepted, last year. It took a year to review it, which
> is fair enough, only to find new problems again. Circles at their best.
> You just said on another thread that perfect is the enemy of good. What
> about applying the same line of thoughts to this patch?

Sure.  For every patch that gets submitted, we have to decide whether
it represents an improvement over where we are today, or not.  For the
record:

1. The patch you're talking about is 2 or 3 orders of magnitude less
complicated than this one, and it is pretty easy to see that it will
not paint us into a corner.  It also happens to fix what I've long
considered a deficiency in PostgreSQL.  I think it is legitimate for
me to have more confidence in that patch than this one.

2. I explicitly did not review this patch for the precise reason that
I thought it needed a fresh pair of eyes.  You and I have not always
seen eye to eye on this and other extension-related patches, and I
don't really want to be the guy who shoots down your patches every
year.  I was prepared to sit still for this if Stephen felt it was OK.
 But after both Stephen and Heikki expressed concerns about the
approach, I decided to say that I found those concerns valid also.

I happen to think that Stephen did a very good job of explaining why
blobs in the catalog could be a very bad plan.  Specifically, I
believe he mentioned that it creates a dump/restore hazard.  If a new
keyword gets added in a new server version, a logical dump of the
extension made by a new pg_dump against the old server version will
restore properly on the new version.  Dumping and restoring the blobs
and re-execute on the new version may fail.  I had not thought of this
issue when this was last discussed, or at least I don't remember
having thought of it, and based on Tom's having endorsed the previous
design, I'm guessing he didn't think of it at the time, either.

I think that Stephen's other points about duplicating data, etc. are
somewhat valid as well, but I am particularly sensitive about dump and
restore hazards.  I don't think you will find any time in the history
of this project when I endorsed any change that would create more of
them or declined to endorse fixing them, and if you do it was probably
in my first year of involvement when I did not understand so well as I
do now how much pain such problems create.  Users are remarkably
skilled at finding these bugs; it's been less then two weeks since we
fixed the most recent one; and they cause a lot of pan.  The only
saving grace is that, up until now, we've pretty much always been able
to patch them by changing pg_dump(all).  The problems that this patch
would create can't be fixed that way, though: you'd have to manually
hack up the blobs stored in the catalog, or manually edit the
dumpfile.  That's not good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Gavin Flower


On 12/12/13 08:39, Gavin Flower wrote:

On 12/12/13 08:31, Kevin Grittner wrote:

Gavin Flower  wrote:


For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes,
using 400 byte pages.  In the pathologically worst case, assuming
maximum packing density and no page has both types: the large rows 
would

occupy  500 pages and the smaller rows 50 pages. So if one selected 11
pages at random, you get about 10 pages of large rows and about one for
small rows!

With 10 * 2 = 20 large rows, and 1 * 20 = 20 small rows.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sorry, I've simply come up with well argued nonsense!

Kevin, you're dead right.


Cheers,
Gavin



I looked at:
http://www.postgresql.org/docs/current/interactive/storage-page-layout.html
this says that each row has an overhead, which suggests there should be 
a bias towards small rows.


There must be a lot of things going on, that I'm simply not aware of, 
that affect sampling bias...



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reference to parent query from ANY sublink

2013-12-11 Thread Kevin Grittner

Kevin Grittner  wrote:

> I applied it to master and ran the regression tests, and one of
> the subselect tests failed.
>
> This query:
>
> SELECT '' AS six, f1 AS "Correlated Field", f3 AS "Second
> Field"
>   FROM SUBSELECT_TBL upper
>   WHERE f1 IN
> (SELECT f2 FROM SUBSELECT_TBL WHERE CAST(upper.f2 AS float) = f3);

> [ ... ] during the `make check` or `make install-check` [ ... ]
> is missing the last two rows.  Oddly, if I go into the database
> later and try it, the rows show up.  It's not immediately
> apparent to me what's wrong.

Using the v2 patch, with the default statistics from table
creation, the query modified with an alias of "lower" for the
second reference, just for clarity, yields a plan which generates
incorrect results:

 Hash Join  (cost=37.12..80.40 rows=442 width=12) (actual time=0.059..0.064 
rows=3 loops=1)
   Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = 
lower.f2))
   ->  Seq Scan on subselect_tbl upper  (cost=0.00..27.70 rows=1770 width=16) 
(actual time=0.006..0.007 rows=8 loops=1)
   ->  Hash  (cost=34.12..34.12 rows=200 width=12) (actual time=0.020..0.020 
rows=5 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 1kB
 ->  HashAggregate  (cost=32.12..34.12 rows=200 width=12) (actual 
time=0.014..0.018 rows=6 loops=1)
   ->  Seq Scan on subselect_tbl lower  (cost=0.00..27.70 rows=1770 
width=12) (actual time=0.002..0.004 rows=8 loops=1)
 Total runtime: 0.111 ms

As soon as there is a VACUUM and/or ANALYZE it generates a plan
more like what the OP was hoping for:

 Hash Semi Join  (cost=1.20..2.43 rows=6 width=12) (actual time=0.031..0.036 
rows=5 loops=1)
   Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = 
lower.f2))
   ->  Seq Scan on subselect_tbl upper  (cost=0.00..1.08 rows=8 width=16) 
(actual time=0.004..0.007 rows=8 loops=1)
   ->  Hash  (cost=1.08..1.08 rows=8 width=12) (actual time=0.012..0.012 rows=7 
loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 1kB
 ->  Seq Scan on subselect_tbl lower  (cost=0.00..1.08 rows=8 width=12) 
(actual time=0.003..0.005 rows=8 loops=1)
 Total runtime: 0.074 ms

By comparison, without the patch this is the plan:

 Seq Scan on subselect_tbl upper  (cost=0.00..5.59 rows=4 width=12) (actual 
time=0.022..0.037 rows=5 loops=1)
   Filter: (SubPlan 1)
   Rows Removed by Filter: 3
   SubPlan 1
 ->  Seq Scan on subselect_tbl lower  (cost=0.00..1.12 rows=1 width=4) 
(actual time=0.002..0.003 rows=1 loops=8)
   Filter: ((upper.f2)::double precision = f3)
   Rows Removed by Filter: 4
 Total runtime: 0.066 ms

When I run the query with fresh statistics and without EXPLAIN both
ways, the unpatched is consistently about 10% faster.

So pulling up the subquery can yield an incorrect plan, and even
when it yields the "desired" plan with the semi-join it is
marginally slower than using the subplan, at least for this small
data set.  I think it's an interesting idea, but it still needs
work.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-11 Thread Stephen Frost

Dimitri,

* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
> Stephen Frost  writes:
> > The extra catalog tables which store SQL scripts in text columns is one
> > of my main objections to the as-proposed Extension Templates.  I view
> > those scripts as a poor man's definition of database objects which are
> > defined properly in the catalog already.
> 
> I have a very hard time to understand this objection.
> 
> PL/SQL functions are just a SQL script stored as-is in the catalogs.
> That applies the same way to any other PL language too, with scripts
> stored as-is in the catalogs in different languages.

Sure- but in those cases only the actual function (which is, by
definition, for an *interpreted* language..) is stored as text, not
the definition of the function (eg: the CREATE FUNCTION statement), nor
all of the metadata, dependency information, etc.  Also, what you're
proposing would result in having *both* in the same catalog- the
canonical form defined in pg_proc and friends, and the SQL text blob in
the extension template catalog and I simply do not see value in that.

> So while I hear your objection to the "script in catalog" idea Stephen,
> I think we should move forward. We don't have the luxury of only
> applying patches where no compromise has to be made, where everyone is
> fully happy with the solution we find as a community.

I understand that you wish to push this forward regardless of anyone's
concerns.  While I appreciate your frustration and the time you've spent
on this, that isn't going to change my opinion of this approach.

> >  The other big issue is that
> > there isn't an easy way to see how we could open up the ability to
> > create extensions to non-superusers with this approach.
> 
> The main proposal here is to only allow the owner of a template to
> install it as an extension. For superusers, we can implement the needed
> SET ROLE command automatically in the CREATE EXTENSION command.
> 
> Is there another security issue that this “same role” approach is not
> solving? I don't think so.

This isn't kind, and for that I'm sorry, but this feels, to me, like a
very hand-wavey "well, I think this would solve all the problems" answer
to the concerns raised.  I can't answer offhand if this would really
solve all of the issues because I've not tried to implement it or test
it out, but I tend to doubt that it would.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Josh Berkus

On 12/11/2013 11:37 AM, Simon Riggs wrote:> On 11 December 2013 17:57,
Robert Haas  wrote:
>
>> Extensive testing will be needed to prove
>> that the new algorithm doesn't perform worse than the current
>> algorithm in any important cases.
>
> Agreed, but the amount of testing seems equivalent in both cases,
> assuming we weren't going to skip it for this patch.

No performance testing is required for this patch.  The effect of memory
limits on vacuum are already well-known and well-understood.

> With considerable regret, I don't see how this solves the problem at
> hand. We can and should do better.

I strongly disagree.  The problem we are dealing with currently is that
two resource limits which should have *always* been independent of each
other are currently conflated into a single GUC variable.  This forces
users to remember to set maintenance_work_mem interactively every time
they want to run a manual VACUUM, because the setting in postgresql.conf
is needed to tune autovacuum.

In other words, we are having an issue with *non-atomic data*, and this
patch partially fixes that.

Would it be better to have an admissions-control policy engine for
launching autovacuum which takes into account available RAM, estimated
costs of concurrent vacuums, current CPU activity, and which tables are
in cache?  Yes.  And if you started on that now, you might have it ready
for 9.5.

And, for that matter, accepting this patch by no means blocks doing
something more sophisticated in the future.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Robert Haas

On Wed, Dec 11, 2013 at 2:37 PM, Simon Riggs  wrote:
> On 11 December 2013 17:57, Robert Haas  wrote:
>> Extensive testing will be needed to prove
>> that the new algorithm doesn't perform worse than the current
>> algorithm in any important cases.
>
> Agreed, but the amount of testing seems equivalent in both cases,
> assuming we weren't going to skip it for this patch.
>
> Let me repeat the question, so we are clear...
> In what circumstances will the memory usage from multiple concurrent
> VACUUMs become a problem? In those circumstances, reducing
> autovacuum_work_mem will cause more passes through indexes, dirtying
> more pages and elongating the problem workload. I agree that multiple
> concurrent VACUUMs could be a problem but this
> doesn't solve that, it just makes things worse.

That's not the problem the patch is designed to solve.  It's intended
for the case where you want to allow more or less memory to autovacuum
than you do for index builds.  There's no principled reason that
anyone should want those things to be the same.  It is not difficult
to imagine situations in which you would want one set to a very
different value than the other.  In particular it seems quite likely
to me that the amount of memory appropriate for index builds might be
vastly more than is needed by autovacuum.  For example, in a
data-warehousing environment where updates are rare but large index
builds by the system's sole user are frequent, someone might want to
default index builds to 64GB of RAM (especially after Noah's patch to
allow huge allocations for the tuple array while sorting) but only
need 256MB for autovacuum.

In general, I'm reluctant to believe that Peter proposed this patch
just for fun.  I assume this is a real-world problem that Heroku
encounters in their environment.  If not, well then that's different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-11 Thread Dimitri Fontaine

Robert Haas  writes:
>> You've got that backwards.  We do have the luxury of rejecting new
>> features until people are generally satisfied that the basic design is
>> right.  There's no overlord decreeing that this must be in 9.4.
>
> I strongly agree.  PostgreSQL has succeeded because we try not to do
> things at all until we're sure we know how to do them right.

I still agree to the principle, or I wouldn't even try. Not in details,
because the current design passed all the usual criteria a year ago.

  http://www.postgresql.org/message-id/6466.1354817...@sss.pgh.pa.us

> I can certainly understand Dimitri's frustration, in that he's written
> several versions of this patch and none have been accepted.  But what

The design was accepted, last year. It took a year to review it, which
is fair enough, only to find new problems again. Circles at their best.
You just said on another thread that perfect is the enemy of good. What
about applying the same line of thoughts to this patch?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -d option for pg_isready is broken

2013-12-11 Thread Tom Lane

Robert Haas  writes:
> On Wed, Dec 11, 2013 at 2:29 PM, Tom Lane  wrote:
>> More generally, if we do go over in 9.4 to the position that PQhost
>> reports the host parameter and nothing but, I'm not sure that introducing
>> a third behavior into the back branches is something that anybody will
>> thank us for.

> It doesn't seem very plausible to say that we're just going to
> redefine it that way, unless we're planning to bump the soversion.

Well, we didn't bump the soversion (nor touch the documentation)
in commit f6a756e4, which is basically what I'm suggesting we ought
to revert.  It was nothing but a quick hack at the time, and hindsight
is saying it was a bad idea.  Admittedly, it was long enough ago that
there might be some grandfather status attached to the current behavior;
but that argument can't be made for changing its behavior still further.

> But maybe we should decide what we *are* going to do in master first,
> before deciding what to back-patch.

Right.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Gavin Flower


On 12/12/13 08:31, Kevin Grittner wrote:

Gavin Flower  wrote:


For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes,
using 400 byte pages.  In the pathologically worst case, assuming
maximum packing density and no page has both types: the large rows would
occupy  500 pages and the smaller rows 50 pages. So if one selected 11
pages at random, you get about 10 pages of large rows and about one for
small rows!

With 10 * 2 = 20 large rows, and 1 * 20 = 20 small rows.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sorry, I've simply come up with well argued nonsense!

Kevin, you're dead right.


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Peter Geoghegan

On Tue, Dec 10, 2013 at 4:48 PM, Peter Geoghegan  wrote:
> Why would I even mention that to a statistician? We want guidance. But
> yes, I bet I could give a statistician an explanation of statistics
> target that they'd understand without too much trouble.

Actually, I think that if we told a statistician about the statistics
target, his or her response would be: why would you presume to know
ahead of time what statistics target is going to be effective? I
suspect that the basic problem is that it isn't adaptive. I think that
if we could somehow characterize the quality of our sample as we took
it, and then cease sampling when we reached a certain degree of
confidence in its quality, that would be helpful. It might not even
matter that the sample was clustered from various blocks.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Gavin Flower


On 12/12/13 08:14, Gavin Flower wrote:

On 12/12/13 07:22, Gavin Flower wrote:

On 12/12/13 06:22, Tom Lane wrote:

I wrote:
Hm.  You can only take N rows from a block if there actually are at 
least

N rows in the block.  So the sampling rule I suppose you are using is
"select up to N rows from each sampled block" --- and that is going to
favor the contents of blocks containing narrower-than-average rows.

Oh, no, wait: that's backwards.  (I plead insufficient caffeine.)
Actually, this sampling rule discriminates *against* blocks with
narrower rows.  You previously argued, correctly I think, that
sampling all rows on each page introduces no new bias because row
width cancels out across all sampled pages.  However, if you just
include up to N rows from each page, then rows on pages with more
than N rows have a lower probability of being selected, but there's
no such bias against wider rows.  This explains why you saw smaller
values of "i" being undersampled.

Had you run the test series all the way up to the max number of
tuples per block, which is probably a couple hundred in this test,
I think you'd have seen the bias go away again.  But the takeaway
point is that we have to sample all tuples per page, not just a
limited number of them, if we want to change it like this.

regards, tom lane


Surely we want to sample a 'constant fraction' (obviously, in 
practice you have to sample an integral number of rows in a page!) of 
rows per page? The simplest way, as Tom suggests, is to use all the 
rows in a page.


However, if you wanted the same number of rows from a greater number 
of pages, you could (for example) select a quarter of the rows from 
each page.  In which case, when this is a fractional number: take the 
integral number of rows, plus on extra row with a probability equal 
to the fraction (here 0.25).


Either way, if it is determined that you need N rows, then keep 
selecting pages at random (but never use the same page more than 
once) until you have at least N rows.



Cheers,
Gavin




Yes the fraction/probability, could actually be one of: 0.25, 0.50, 0.75.

But there is a bias introduced by the arithmetic average size of the 
rows in a page. This results in block sampling favouring large rows, 
as they are in a larger proportion of pages.


For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes, 
using 400 byte pages.  In the pathologically worst case, assuming 
maximum packing density and no page has both types: the large rows 
would occupy  500 pages and the smaller rows 50 pages. So if one 
selected 11 pages at random, you get about 10 pages of large rows and 
about one for small rows!  In practice, it would be much less extreme 
- for a start, not all blocks will be fully packed, most blocks would 
have both types of rows, and there is usually greater variation in row 
size - but still a bias towards sampling larger rows.  So somehow, 
this bias needs to be counteracted.



Cheers,
Gavin

Actually, I just thought of a possible way to overcome the bias towards 
large rows.


1. Calculate (a rough estimate may be sufficient, if not too 'rough')
   the size of the smallest row.

2. Select a page at random (never selecting the same page twice)

3. Then select rows at random within the page (never selecting the same
   row twice).  For each row selected, accept it with the probability
   equal to (size of smallest row)/(size of selected row).  I think you
   find that will almost completely offset the bias towards larger rows!

4. If you do not have sufficient rows, and you still have pages not yet
   selected, goto 2

Note that it will be normal for for some pages not to have any rows 
selected, especially for large tables!



Cheers,
Gavin

 P.S.
I really need to stop thinking about this problem, and get on with my 
assigned project!!!

Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Simon Riggs

On 11 December 2013 17:57, Robert Haas  wrote:

> Extensive testing will be needed to prove
> that the new algorithm doesn't perform worse than the current
> algorithm in any important cases.

Agreed, but the amount of testing seems equivalent in both cases,
assuming we weren't going to skip it for this patch.

Let me repeat the question, so we are clear...
In what circumstances will the memory usage from multiple concurrent
VACUUMs become a problem? In those circumstances, reducing
autovacuum_work_mem will cause more passes through indexes, dirtying
more pages and elongating the problem workload. I agree that multiple
concurrent VACUUMs could be a problem but this
doesn't solve that, it just makes things worse.

The *only* time this parameter would have any effect looks like when
it will make matters worse.

With considerable regret, I don't see how this solves the problem at
hand. We can and should do better.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -d option for pg_isready is broken

2013-12-11 Thread Robert Haas

On Wed, Dec 11, 2013 at 2:29 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Dec 11, 2013 at 2:10 PM, Tom Lane  wrote:
>>> Robert Haas  writes:
 Well, returning /tmp on Windows is just stupid.  I don't see why we
 should feel bad about changing that.  A bug is a bug.
>
>>> What I was suggesting was we should take out the pgunixsocket fallback,
>>> not make it even more complicated.  That probably implies that we need
>>> still another accessor function to get the socket path.
>
>> Well, I guess.  I have a hard time seeing whatever rejiggering we want
>> to do in master as a reason not to back-patch that fix, though.
>
> I guess as long as the pgunixsocket thing is in there, it makes sense
> to substitute DefaultHost for it on Windows, but are we sure that's
> something to back-patch?

Well, it seems like a clear case of returning a ridiculous value, but
I'm willing to be talked out of it if someone can explain how it would
break things.  I guess it's possible someone could have code out that
that tests for the exact value /tmp and does something based on that,
but that seems a stretch - and if they did have such code, it would
probably just handle it by substituting localhost anyway.

> Right now, as I was saying, PQhost is in some gray area where it's not too
> clear what its charter is.  It's not "what was the host parameter", for
> sure, but we haven't tried to make it an accurate description of the
> connection either.  It's a bit less accurate on Windows than elsewhere,
> but do we want to risk breaking anything to only partially resolve that?

I guess it depends on how risky we think it is.

> More generally, if we do go over in 9.4 to the position that PQhost
> reports the host parameter and nothing but, I'm not sure that introducing
> a third behavior into the back branches is something that anybody will
> thank us for.

It doesn't seem very plausible to say that we're just going to
redefine it that way, unless we're planning to bump the soversion.
But maybe we should decide what we *are* going to do in master first,
before deciding what to back-patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Peter Geoghegan

On Wed, Dec 11, 2013 at 7:41 AM, Simon Riggs  wrote:
> That's about 2-3 days work and I know Peter can hack it. So the
> situation is not perfection-sought-blocking-good, this is more like
> fairly poor solution being driven through when a better solution is
> available within the time and skills available.

I think that that's a very optimistic assessment of the amount of work
required. Even by the rose-tinted standards of software project time
estimation. A ton of data is required to justify fundamental
infrastructural changes like that.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.3 reference constraint regression

2013-12-11 Thread Daniel Wood

In 9.3 I can delete the parent of a parent-child relation if the child row
is an uncommitted insert and I first update the parent.

USER1:
drop table child;
drop table parent;
create table parent (i int, c char(3));
create unique index parent_idx on parent (i);
insert into parent values (1, 'AAA');
create table child (i int references parent(i));

USER2:
BEGIN;
insert into child values (1);

USER1:
BEGIN;
update parent set c=lower(c);
delete from parent;
COMMIT;

USER2:
COMMIT;

Note that the problem also happens if the update is "set i=i".  I was
expecting this update to block as the UPDATE is on a "unique index" "that
can be used in a foreign key".  The "i=i" update should get a UPDATE lock
and not a "NO KEY UPDATE" lock as I believe the c=... update does.

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Kevin Grittner

Gavin Flower  wrote:

> For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes,
> using 400 byte pages.  In the pathologically worst case, assuming
> maximum packing density and no page has both types: the large rows would
> occupy  500 pages and the smaller rows 50 pages. So if one selected 11
> pages at random, you get about 10 pages of large rows and about one for
> small rows!

With 10 * 2 = 20 large rows, and 1 * 20 = 20 small rows.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -d option for pg_isready is broken

2013-12-11 Thread Tom Lane

Robert Haas  writes:
> On Wed, Dec 11, 2013 at 2:10 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Well, returning /tmp on Windows is just stupid.  I don't see why we
>>> should feel bad about changing that.  A bug is a bug.

>> What I was suggesting was we should take out the pgunixsocket fallback,
>> not make it even more complicated.  That probably implies that we need
>> still another accessor function to get the socket path.

> Well, I guess.  I have a hard time seeing whatever rejiggering we want
> to do in master as a reason not to back-patch that fix, though.

I guess as long as the pgunixsocket thing is in there, it makes sense
to substitute DefaultHost for it on Windows, but are we sure that's
something to back-patch?

Right now, as I was saying, PQhost is in some gray area where it's not too
clear what its charter is.  It's not "what was the host parameter", for
sure, but we haven't tried to make it an accurate description of the
connection either.  It's a bit less accurate on Windows than elsewhere,
but do we want to risk breaking anything to only partially resolve that?

More generally, if we do go over in 9.4 to the position that PQhost
reports the host parameter and nothing but, I'm not sure that introducing
a third behavior into the back branches is something that anybody will
thank us for.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] preserving forensic information when we freeze

2013-12-11 Thread Heikki Linnakangas


On 12/11/2013 09:17 PM, Robert Haas wrote:

On Thu, Nov 21, 2013 at 4:51 PM, Andres Freund  wrote:

On 2013-11-21 15:59:35 -0500, Robert Haas wrote:

Separate patch, but yeah, something like that.  If we have to mark the
page all-visible, we might as well freeze it while we're there.  We
should think about how it interacts with Heikki's freeze-without-write
patch though.


Definitely separate yes. And I agree, it's partially moot if Heikki's
patch gets in, but I am not sure it will make it into 9.4. There seems
to be quite some work left.


I haven't heard anything further from Heikki, so I'm thinking we
should proceed with this approach.


+1. It seems unlikely that my patch is going to make it into 9.4.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -d option for pg_isready is broken

2013-12-11 Thread Robert Haas

On Wed, Dec 11, 2013 at 2:10 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Dec 11, 2013 at 11:24 AM, Tom Lane  wrote:
>>> In general, I think the definition of these query functions ought to
>>> be "what was the value of this parameter when the connection was made".
>>> As such, I'm not even sure that the pgunixsocket behavior that's in
>>> PQhost now is a good idea, much less that we should extend that hack
>>> to cover DefaultHost.
>
>> Well, returning /tmp on Windows is just stupid.  I don't see why we
>> should feel bad about changing that.  A bug is a bug.
>
> What I was suggesting was we should take out the pgunixsocket fallback,
> not make it even more complicated.  That probably implies that we need
> still another accessor function to get the socket path.

Well, I guess.  I have a hard time seeing whatever rejiggering we want
to do in master as a reason not to back-patch that fix, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Gavin Flower


On 12/12/13 07:22, Gavin Flower wrote:

On 12/12/13 06:22, Tom Lane wrote:

I wrote:
Hm.  You can only take N rows from a block if there actually are at 
least

N rows in the block.  So the sampling rule I suppose you are using is
"select up to N rows from each sampled block" --- and that is going to
favor the contents of blocks containing narrower-than-average rows.

Oh, no, wait: that's backwards.  (I plead insufficient caffeine.)
Actually, this sampling rule discriminates *against* blocks with
narrower rows.  You previously argued, correctly I think, that
sampling all rows on each page introduces no new bias because row
width cancels out across all sampled pages.  However, if you just
include up to N rows from each page, then rows on pages with more
than N rows have a lower probability of being selected, but there's
no such bias against wider rows.  This explains why you saw smaller
values of "i" being undersampled.

Had you run the test series all the way up to the max number of
tuples per block, which is probably a couple hundred in this test,
I think you'd have seen the bias go away again.  But the takeaway
point is that we have to sample all tuples per page, not just a
limited number of them, if we want to change it like this.

regards, tom lane


Surely we want to sample a 'constant fraction' (obviously, in practice 
you have to sample an integral number of rows in a page!) of rows per 
page? The simplest way, as Tom suggests, is to use all the rows in a 
page.


However, if you wanted the same number of rows from a greater number 
of pages, you could (for example) select a quarter of the rows from 
each page.  In which case, when this is a fractional number: take the 
integral number of rows, plus on extra row with a probability equal to 
the fraction (here 0.25).


Either way, if it is determined that you need N rows, then keep 
selecting pages at random (but never use the same page more than once) 
until you have at least N rows.



Cheers,
Gavin




Yes the fraction/probability, could actually be one of: 0.25, 0.50, 0.75.

But there is a bias introduced by the arithmetic average size of the 
rows in a page. This results in block sampling favouring large rows, as 
they are in a larger proportion of pages.


For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes, 
using 400 byte pages.  In the pathologically worst case, assuming 
maximum packing density and no page has both types: the large rows would 
occupy  500 pages and the smaller rows 50 pages. So if one selected 11 
pages at random, you get about 10 pages of large rows and about one for 
small rows!  In practice, it would be much less extreme - for a start, 
not all blocks will be fully packed, most blocks would have both types 
of rows, and there is usually greater variation in row size - but still 
a bias towards sampling larger rows.  So somehow, this bias needs to be 
counteracted.



Cheers,
Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] preserving forensic information when we freeze

2013-12-11 Thread Robert Haas

On Thu, Nov 21, 2013 at 4:51 PM, Andres Freund  wrote:
> On 2013-11-21 15:59:35 -0500, Robert Haas wrote:
>> > * Should HeapTupleHeaderXminFrozen also check for FrozenTransactionId?
>> >   It seems quite possible that people think they've delt with frozen
>> >   xmin entirely after checking, but they still might get
>> >   FrozenTransactionId back in a pg_upgraded cluster.
>>
>> The reason I originally wrote the patch the way I did, rather than the
>> way that you prefer, is that it minimizes the number of places where
>> we might perform extra tests that are known not to be needed in
>> context.  These code paths are hot.
>
> The patch as sent shouldn't really do that in any of paths I know to be
> hot - it uses *RawXmin() there.
>
>> If you do this sort of thing,  then after macro expansion we may end up with 
>> a lot of things like:
>> (flags & FROZEN) || (rawxid == 2) ? 2 : rawxid.  I want to avoid that.
>
> But in which cases would that actually be slower? There'll be no
> additional code executed if the hint bits for frozen are set, and in
> case not it will usually safe us an external function call to
> TransactionIdPrecedes().

Dunno.  It's at least more code generation.

>>  That macros is intended, specifically, to be a test for flag bits,
>> and I think it should do precisely that.  If that's not what you want,
>> then don't use that macro.
>
> That's a fair argument. Although there's several HeapTupleHeader* macros
> that muck with stuff besides infomask.

Sure, but that doesn't mean they ALL have to.

>> > * Existing htup_details boolean checks contain an 'Is', but
>> >   HeapTupleHeaderXminCommitted, HeapTupleHeaderXminInvalid,
>> >   HeapTupleHeaderXminFrozen don't contain any verb. Not sure.
>>
>> We could say XminIsComitted, XminIsInvalid, XminIsFrozen, etc.  I
>> don't particularly care for it, but I can see the argument for it.
>
> I don't have a clear preference either, I just noticed the inconsistency
> and wasn't sure whether it was intentional.

It was intentional enough.  :-)

>>> > I think once we have this we should start opportunistically try to
>> > freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
>> > the page is already dirty.
>>
>> Separate patch, but yeah, something like that.  If we have to mark the
>> page all-visible, we might as well freeze it while we're there.  We
>> should think about how it interacts with Heikki's freeze-without-write
>> patch though.
>
> Definitely separate yes. And I agree, it's partially moot if Heikki's
> patch gets in, but I am not sure it will make it into 9.4. There seems
> to be quite some work left.

I haven't heard anything further from Heikki, so I'm thinking we
should proceed with this approach.  It seems to be the path of least
resistance, if not essential, for making CLUSTER freeze everything
automatically, a change almost everyone seems to really want.  Even if
we did have Heikki's stuff, making cluster freeze more aggressively is
still a good argument for doing this.  The pages can then be marked
all-visible (something Bruce is working on) and never need to be
revisited.  Without this, I don't think we can get there.  If we also
handle the vacuum-dirtied-it-already case as you propose here, I think
we'd have quite a respectable improvement in vacuum behavior for 9.4,
even without Heikki's stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -d option for pg_isready is broken

2013-12-11 Thread Tom Lane

Robert Haas  writes:
> On Wed, Dec 11, 2013 at 11:24 AM, Tom Lane  wrote:
>> In general, I think the definition of these query functions ought to
>> be "what was the value of this parameter when the connection was made".
>> As such, I'm not even sure that the pgunixsocket behavior that's in
>> PQhost now is a good idea, much less that we should extend that hack
>> to cover DefaultHost.

> Well, returning /tmp on Windows is just stupid.  I don't see why we
> should feel bad about changing that.  A bug is a bug.

What I was suggesting was we should take out the pgunixsocket fallback,
not make it even more complicated.  That probably implies that we need
still another accessor function to get the socket path.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-11 Thread Robert Haas

On Wed, Dec 11, 2013 at 10:43 AM, Tom Lane  wrote:
>> So while I hear your objection to the "script in catalog" idea Stephen,
>> I think we should move forward. We don't have the luxury of only
>> applying patches where no compromise has to be made, where everyone is
>> fully happy with the solution we find as a community.
>
> You've got that backwards.  We do have the luxury of rejecting new
> features until people are generally satisfied that the basic design is
> right.  There's no overlord decreeing that this must be in 9.4.

I strongly agree.  PostgreSQL has succeeded because we try not to do
things at all until we're sure we know how to do them right.
Sometimes we lag behind in features or performance as a result of that
- but the upside is that when we say something now works, it does.
Moreover, it means that the number of bad design decisions we're left
to support off into eternity is comparatively small.  Those things are
of great benefit to our community.

I can certainly understand Dimitri's frustration, in that he's written
several versions of this patch and none have been accepted.  But what
that means is that none of those approaches have consensus behind
them, which is another way of saying that, as a community, we really
*don't* know the best way to solve this problem, and our community
policy in that situation is to take no action until we do.  I've
certainly had my own share of disappointments about patches I've
written which I believed, and in some cases still believe, to be
really good work, and I'd really like to be able to force them
through.  But that's not how it works.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-Memory Columnar Store

2013-12-11 Thread Kevin Grittner

"k...@rice.edu"  wrote:

> The question I have, which applies to the matview support as
> well, is "How can we transparently substitute usage of the
> in-memory columnar store/matview in a SQL query?".

My take on that regarding matviews is:

(1)  It makes no sense to start work on this without a far more
sophisticated concept of matview "freshness" (or "staleness", as
some products prefer to call it).

(2)  Work on query rewrite to use sufficiently fresh matviews to
optimize the execution of a query and work on "freshness" tracking
are orthogonal to work on incremental maintenance.

I have no plans to work on either matview freshness or rewrite, as
there seems to be several years worth of work to get incremental
maintenance up to a level matching other products.  I welcome
anyone else to take on those other projects.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stats for network traffic WIP

2013-12-11 Thread Tom Lane

Atri Sharma  writes:
> On Wed, Dec 11, 2013 at 11:12 PM, Peter Eisentraut  wrote:
>> Is there a reason why you can't get this directly from the OS?

> I would say that its more of a convenience to track the usage directly
> from the database instead of setting up OS infrastructure to store it.

The thing that I'm wondering is why the database would be the right place
to be measuring it at all.  If you've got a network usage problem,
aggregate usage across everything on the server is probably what you
need to be worried about, and PG can't tell you that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Completing PL support for Event Triggers

2013-12-11 Thread Peter Eisentraut

On 12/11/13, 5:06 AM, Dimitri Fontaine wrote:
> Peter Eisentraut  writes:
>> I think you are mistaken.  My patch includes all changes between your v1
>> and v2 patch.
> 
> I mistakenly remembered that we did remove all the is_event_trigger
> business from the plperl patch too, when it's not the case. Sorry about
> this confusion.
> 
> My vote is for “ready for commit” then.

PL/Perl was committed.

Please update the commit fest entry with your plans about PL/Python.
(Returned with Feedback or move to next CF or close and create a
separate entry?)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-11 Thread Tom Lane

Josh Berkus  writes:
> However, it would really be useful to have an extra tag (in addition to
> the ERROR or FATAL) for "If you're seeing this message, something has
> gone seriously wrong on the server."  Just stuff like corruption
> messages, backend crashes, etc.

Right, we've discussed that idea elsewhere; there's a basically orthogonal
classification that needs to happen.  Pretty much all PANICs are high
priority from a DBA's perspective, but only a subset of either FATAL or
ERROR are.  Somebody needs to do the legwork to determine just what kind
of classification scheme we want and propose at least an initial set of
ereports to be so marked.

One thought I had was that we could probably consider the default behavior
(in the absence of any call of an explicit criticality-marking function)
to be like this:
for ereport(), it's critical if a PANIC and otherwise not
for elog(), it's critical if >= ERROR level, otherwise not.
The rationale for this is that we generally use elog for
not-supposed-to-happen cases, so those are probably interesting.
If we start getting complaints about some elog not being so interesting,
we can convert it to an ereport so it can include an explicit marking
call.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -d option for pg_isready is broken

2013-12-11 Thread Robert Haas

On Wed, Dec 11, 2013 at 11:24 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Dec 11, 2013 at 9:35 AM, Andres Freund  
>> wrote:
>>> One use case is accessing a particular host when using DNS round robin
>>> to standbys in combination with SSL.
>
>> Ah, interesting point.  And it's not inconceivable that some
>> application out there could be using PQhost() to retrieve the host
>> from an existing connection object and reusing that value for a new
>> connection, in which case redefining it to sometimes return hostaddr
>> would break things.  So I think we shouldn't do that.
>
> I think the only reasonable way to fix this is to improve the logic
> in psql, not turn PQhost() into a mess with no understandable definition.
> If that means we need to add a separate PQhostaddr() query function,
> so be it.  We won't be able to fix the complained-of bug in back branches,
> but I'd rather live with that (it's basically just cosmetic anyway)
> than risk introducing perhaps-not-so-cosmetic bugs into other existing
> applications.

I can't argue with that.

> In general, I think the definition of these query functions ought to
> be "what was the value of this parameter when the connection was made".
> As such, I'm not even sure that the pgunixsocket behavior that's in
> PQhost now is a good idea, much less that we should extend that hack
> to cover DefaultHost.

Well, returning /tmp on Windows is just stupid.  I don't see why we
should feel bad about changing that.  A bug is a bug.

> There is room also for a function defined as "give me a textual
> description of what I'm connected to", which is not meant to reflect any
> single connection parameter but rather the total behavior.  Right now
> I think PQhost is on the borderline of doing this instead of just
> reporting the "host" parameter, but I think rather than pushing it
> across that border we'd be better off to invent a function explicitly
> charged with doing that.  That would give us room to do something
> actually meaningful with host+hostaddr cases, for instance.  I think
> really what ought to be printed in such cases is something like
> "host-name (address IP-address)"; leaving out the former would be
> unhelpful but leaving out the latter is outright misleading.
>
> On the other hand, I'm not sure how much of a translatability problem
> it'd be to wedge such a description into a larger message.  Might be
> better to just put the logic into psql.

libpq needs to expose enough functionality to make this simple for
psql, but psql should be the final arbiter of the output format.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.8

2013-12-11 Thread Robert Haas

On Wed, Dec 11, 2013 at 11:25 AM, Andres Freund  wrote:
> On 2013-12-10 19:11:03 -0500, Robert Haas wrote:
>> Committed #1 (again).  Regarding this:
>>
>> +   /* XXX: we could also do this unconditionally, the space is used 
>> anyway
>> +   if (copy_oid)
>> +   HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp));
>>
>> I would like to put in a big +1 for doing that unconditionally.  I
>> didn't make that change before committing, but I think it'd be a very
>> good idea.
>
> Patch attached.

Committed with kibitzing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time-Delayed Standbys

2013-12-11 Thread Fabrízio de Royes Mello

On Wed, Dec 11, 2013 at 6:27 AM, Simon Riggs  wrote:
>
> On 11 December 2013 06:36, KONDO Mitsumasa
>  wrote:
>
> >> I think this feature will be used in a lot of scenarios in
> >> which PITR is currently used.
> >
> > We have to judge which is better, we get something potential or to
protect
> > stupid.
> > And we had better to wait author's comment...
>
> I'd say just document that it wouldn't make sense to use it for PITR.
>
> There may be some use case we can't see yet, so specifically
> prohibiting a use case that is not dangerous seems too much at this
> point. I will no doubt be reminded of these words in the future...
>

Hi all,

I tend to agree with Simon, but I confess that I don't liked to delay a
server with standby_mode = 'off'.

The main goal of this patch is delay the Streaming Replication, so if the
slave server isn't a hot-standby I think makes no sense to delay it.

Mitsumasa suggested to add "StandbyModeRequested" in conditional branch to
skip this situation. I agree with him!

And I'll change 'recoveryDelay' (functions, variables) to 'standbyDelay'.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Josh Berkus

On 12/11/2013 09:57 AM, Robert Haas wrote:
> I don't agree with that assessment.  Anything that involves changing
> the scheduling of autovacuum is a major project that will legitimately
> provoke much controversy.  Extensive testing will be needed to prove
> that the new algorithm doesn't perform worse than the current
> algorithm in any important cases.  I have my doubts about whether that
> can be accomplished in an entire release cycle, let alone 2-3 days.
> In contrast, the patch proposed does something that is easy to
> understand, clearly safe, and an improvement over what we have now.

+1

There is an inherent tuning and troubleshooting challenge in anything
involving a feedback loop.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-Memory Columnar Store

2013-12-11 Thread Merlin Moncure

On Wed, Dec 11, 2013 at 10:08 AM, knizhnik  wrote:
> 1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my computer.
> Just defining insertion per-row trigger with empty procedure increase time
> of insertion of 6 million records twice - from 7 till 15 seconds. If trigger
> procedure is not empty, then time is increased proportionally number of
> performed calls.
> In my case inserting data with propagation it in columnar store using
> trigger takes about 80 seconds. But if I first load data without triggers in
> PostgreSQL table and then
> insert it in columnar store using load function (implemented in C), then
> time will be 7+9=16 seconds.

Yeah. For this problem, we either unfortunately have to try to try to
use standard sql functions in such away that supports inlining (this
is a black art mostly, and fragile), or move logic out of the function
and into the query via things like window functions, or just deal with
the performance hit.  postgres flavored SQL is pretty much the most
productive language on the planet AFAIC, but the challenge is always
performance, performance.

Down the line, I am optimistic per call function overhead can be
optimized, probably by expanding what can be inlined somehow.  The
problem is that this requires cooperation from the language executors
this is not currently possible through the SPI interface, so I really
don't know.

> Certainly I realize that plpgsql is interpreted language. But for example
> also interpreted Python is able to do 100 times more calls per second.
> Unfortunately profiler doesn;t show some bottleneck - looks like long
> calltime is caused by large overhead of initializing and resetting memory
> context and copying arguments data.
>
> 2. Inefficient implementation of expanding composite type columns using
> (foo()).* clause. In this case function foo() will be invoked as much times
> as there are fields in the returned composite type. Even in case of placing
> call in FROM list (thanks to lateral joins in 9.3), PostgreSQL still
> sometimes performs redundant calls which can be avoided using hack with
> adding "OFFSET 1" clause.

Yeah, this is long standing headache.   LATERAL mostly deals with this
but most cases (even with pre-9.3) can be worked around one way or
another.

> 3. 256Gb limit for used shared memory segment size at Linux.

I figure this will be solved fairly soon. It's a nice problem to have.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Gavin Flower


On 12/12/13 06:22, Tom Lane wrote:

I wrote:

Hm.  You can only take N rows from a block if there actually are at least
N rows in the block.  So the sampling rule I suppose you are using is
"select up to N rows from each sampled block" --- and that is going to
favor the contents of blocks containing narrower-than-average rows.

Oh, no, wait: that's backwards.  (I plead insufficient caffeine.)
Actually, this sampling rule discriminates *against* blocks with
narrower rows.  You previously argued, correctly I think, that
sampling all rows on each page introduces no new bias because row
width cancels out across all sampled pages.  However, if you just
include up to N rows from each page, then rows on pages with more
than N rows have a lower probability of being selected, but there's
no such bias against wider rows.  This explains why you saw smaller
values of "i" being undersampled.

Had you run the test series all the way up to the max number of
tuples per block, which is probably a couple hundred in this test,
I think you'd have seen the bias go away again.  But the takeaway
point is that we have to sample all tuples per page, not just a
limited number of them, if we want to change it like this.

regards, tom lane



Hmm...

In my previous reply, which hasn't shown up yet!

I realized I made a mistake!

The fraction/probability could any of 0.25. 0.50, and 0.75.


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Gavin Flower


On 12/12/13 06:22, Tom Lane wrote:

I wrote:

Hm.  You can only take N rows from a block if there actually are at least
N rows in the block.  So the sampling rule I suppose you are using is
"select up to N rows from each sampled block" --- and that is going to
favor the contents of blocks containing narrower-than-average rows.

Oh, no, wait: that's backwards.  (I plead insufficient caffeine.)
Actually, this sampling rule discriminates *against* blocks with
narrower rows.  You previously argued, correctly I think, that
sampling all rows on each page introduces no new bias because row
width cancels out across all sampled pages.  However, if you just
include up to N rows from each page, then rows on pages with more
than N rows have a lower probability of being selected, but there's
no such bias against wider rows.  This explains why you saw smaller
values of "i" being undersampled.

Had you run the test series all the way up to the max number of
tuples per block, which is probably a couple hundred in this test,
I think you'd have seen the bias go away again.  But the takeaway
point is that we have to sample all tuples per page, not just a
limited number of them, if we want to change it like this.

regards, tom lane


Surely we want to sample a 'constant fraction' (obviously, in practice 
you have to sample an integral number of rows in a page!) of rows per 
page? The simplest way, as Tom suggests, is to use all the rows in a page.


However, if you wanted the same number of rows from a greater number of 
pages, you could (for example) select a quarter of the rows from each 
page.  In which case, when this is a fractional number: take the 
integral number of rows, plus on extra row with a probability equal to 
the fraction (here 0.25).


Either way, if it is determined that you need N rows, then keep 
selecting pages at random (but never use the same page more than once) 
until you have at least N rows.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stats for network traffic WIP

2013-12-11 Thread Atri Sharma

On Wed, Dec 11, 2013 at 11:12 PM, Peter Eisentraut  wrote:
> On 12/10/13, 5:08 PM, Tom Lane wrote:
>> Having said that, I can't get very excited about this feature anyway,
>> so I'm fine with rejecting the patch.  I'm not sure that enough people
>> care to justify any added overhead at all.  The long and the short of
>> it is that network traffic generally is what it is, for any given query
>> workload, and so it's not clear what's the point of counting it.
>
> Also, if we add this, the next guy is going to want to add CPU
> statistics, memory statistics, etc.
>
> Is there a reason why you can't get this directly from the OS?

I would say that its more of a convenience to track the usage directly
from the database instead of setting up OS infrastructure to store it.

That said, it should be possible to directly do it from OS level. Can
we think of adding this to pgtop, though?

I am just musing here.

Regards,

Atri


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Robert Haas

On Wed, Dec 11, 2013 at 10:41 AM, Simon Riggs  wrote:
> It looks fairly easy to estimate the memory needed for an auto vacuum,
> since we know the scale factor and the tuple estimate. We can then use
> the memory estimate to alter the scheduling of work. And/or we can use
> actual memory usage and block auto vac workers if they need more
> memory than is currently available because of other workers.
>
> We would still benefit from a new parameter in the above sketch, but
> it would achieve something useful in practice.
>
> That's about 2-3 days work and I know Peter can hack it. So the
> situation is not perfection-sought-blocking-good, this is more like
> fairly poor solution being driven through when a better solution is
> available within the time and skills available.

I don't agree with that assessment.  Anything that involves changing
the scheduling of autovacuum is a major project that will legitimately
provoke much controversy.  Extensive testing will be needed to prove
that the new algorithm doesn't perform worse than the current
algorithm in any important cases.  I have my doubts about whether that
can be accomplished in an entire release cycle, let alone 2-3 days.
In contrast, the patch proposed does something that is easy to
understand, clearly safe, and an improvement over what we have now.

Quite apart from the amount of development time required, I think that
the idea that we would use the availability of memory to schedule work
is highly suspect.  You haven't given any details on what you think
that algorithm might look like, and I doubt that any simple solution
will do.  If running more autovacuum workers drives the machine into
swap, then we shouldn't, but we have no way of calculating what size
memory allocation will cause that to happen.  But NOT running
autovacuum workers isn't safe either, because it could cause table
bloat that them drives the machine into swap.  We have no way of
knowing whether that will happen either.  So I think your contention
that we have the necessary information available to make an
intelligent decision is incorrect.

Regardless, whether or not a more complex change is within Peter's
technical capabilities is utterly irrelevant to whether we should
adopt the proposed patch.  Your phrasing seems to imply that you would
not ask such a thing of a less-talented individual, and I think that
is utterly wrong.  Peter's technical acumen does not give us the right
to ask him to write a more complex patch as a condition of getting a
simpler one accepted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problem with displaying "wide" tables in psql

2013-12-11 Thread Sergey Muraviov

Hi.

I've improved the patch.
It works in expanded mode when either format option is set to wrapped
(\pset format wrapped), or we have no pager, or pager doesn't chop long
lines (so you can still use the trick).
Target output width is taken from either columns option (\pset columns 70),
or environment variable $COLUMNS, or terminal size.
And it's also compatible with any border style (\pset border 0|1|2).

Here are some examples:

postgres=# \x 1
postgres=# \pset format wrapped
postgres=# \pset border 0
postgres=# select * from wide_table;
* Record 1

value afadsafasd fasdf asdfasd fsad fas df sadf sad f sadf  sadf sa df
sadfsadfa
  sd fsad fsa df sadf asd fa sfd sadfsadf asdf sad f sadf sad fadsf
* Record 2

value afadsafasd fasdf asdfasd

postgres=# \pset border 1
postgres=# \pset columns 70
postgres=# select * from wide_table;
-[ RECORD 1 ]-
value | afadsafasd fasdf asdfasd fsad fas df sadf sad f sadf  sadf sa
  | df sadfsadfasd fsad fsa df sadf asd fa sfd sadfsadf asdf sad f
  |  sadf sad fadsf
-[ RECORD 2 ]-
value | afadsafasd fasdf asdfasd

postgres=# \pset border 2
postgres=# \pset columns 60
postgres=# select * from wide_table;
+-[ RECORD 1 ]-+
| value | afadsafasd fasdf asdfasd fsad fas df sadf sad f  |
|   | sadf  sadf sa df sadfsadfasd fsad fsa df sadf as |
|   | d fa sfd sadfsadf asdf sad f sadf sad fadsf  |
+-[ RECORD 2 ]-+
| value | afadsafasd fasdf asdfasd |
+---+--+

Regards,
Sergey


2013/12/10 Jeff Janes 

> On Mon, Dec 2, 2013 at 10:45 PM, Sergey Muraviov <
> sergey.k.murav...@gmail.com> wrote:
>
>> Hi.
>>
>> Psql definitely have a problem with displaying "wide" tables.
>> Even in expanded mode, they look horrible.
>> So I tried to solve this problem.
>>
>
> I get compiler warnings:
>
> print.c: In function 'print_aligned_vertical':
> print.c:1238: warning: ISO C90 forbids mixed declarations and code
> print.c: In function 'print_aligned_vertical':
> print.c:1238: warning: ISO C90 forbids mixed declarations and code
>
> But I really like this and am already benefiting from it.  No point in
> having the string of hyphens between every record wrap to be 30 lines long
> just because one field somewhere down the list does so.  And configuring
> the pager isn't much of a solution because the pager doesn't know that the
> hyphens are semantically different than the other stuff getting thrown at
> it.
>
> Cheers,
>
> Jeff
>
>
>



-- 
Best regards,
Sergey Muraviov
From be9f01777599dc5e84c417e5cae56459677a88d4 Mon Sep 17 00:00:00 2001
From: Sergey Muraviov 
Date: Wed, 11 Dec 2013 20:17:26 +0400
Subject: [PATCH] wrapped tables in expanded mode

---
 src/bin/psql/print.c | 123 ---
 1 file changed, 118 insertions(+), 5 deletions(-)

diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
index 736225c..4c37f7d 100644
--- a/src/bin/psql/print.c
+++ b/src/bin/psql/print.c
@@ -124,6 +124,7 @@ const printTextFormat pg_utf8format =
 
 /* Local functions */
 static int	strlen_max_width(unsigned char *str, int *target_width, int encoding);
+static bool IsWrappingNeeded(const printTableContent *cont, bool is_pager);
 static void IsPagerNeeded(const printTableContent *cont, const int extra_lines, bool expanded,
 			  FILE **fout, bool *is_pager);
 
@@ -1234,6 +1235,45 @@ print_aligned_vertical(const printTableContent *cont, FILE *fout)
 			fprintf(fout, "%s\n", cont->title);
 	}
 
+	if (IsWrappingNeeded(cont, is_pager))
+	{
+		int output_columns = 0;
+		/*
+		 * Choose target output width: \pset columns, or $COLUMNS, or ioctl
+		 */
+		if (cont->opt->columns > 0)
+			output_columns = cont->opt->columns;
+		else
+		{
+			if (cont->opt->env_columns > 0)
+output_columns = cont->opt->env_columns;
+#ifdef TIOCGWINSZ
+			else
+			{
+struct winsize screen_size;
+
+if (ioctl(fileno(stdout), TIOCGWINSZ, &screen_size) != -1)
+	output_columns = screen_size.ws_col;
+			}
+#endif
+		}
+
+		output_columns -= hwidth;
+
+		if (opt_border == 0)
+			output_columns -= 1;
+		else
+		{
+			output_columns -= 3; /* -+- */
+
+			if (opt_border > 1) 
+output_columns -= 4; /* +--+ */
+		}
+
+		if ((output_columns > 0) && (dwidth > output_columns))
+			dwidth = output_columns;
+	}
+
 	/* print records */
 	for (i = 0, ptr = cont->cells; *ptr; i++, ptr++)
 	{
@@ -1294,12 +1334,49 @@ print_aligned_vertical(const printTableContent *cont, FILE *fout)
 
 			if (!dcomplete)
 			{
-if (opt_border < 2)
-	fprintf(fout, "%s\n", dlineptr[line_count].ptr);
+if (dlineptr[line_count].width > dwidth)
+{
+	int offset = 0;
+	int chars_to_output = dlineptr[line_count].width;
+	while (chars_to_output > 0)
+	{
+		int target_width, bytes_to_output;
+
+

Re: [HACKERS] stats for network traffic WIP

2013-12-11 Thread Peter Eisentraut

On 12/10/13, 5:08 PM, Tom Lane wrote:
> Having said that, I can't get very excited about this feature anyway,
> so I'm fine with rejecting the patch.  I'm not sure that enough people
> care to justify any added overhead at all.  The long and the short of
> it is that network traffic generally is what it is, for any given query
> workload, and so it's not clear what's the point of counting it.

Also, if we add this, the next guy is going to want to add CPU
statistics, memory statistics, etc.

Is there a reason why you can't get this directly from the OS?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-11 Thread Josh Berkus

On 12/11/2013 08:48 AM, Tom Lane wrote:
> The fundamental problem IMO is that you want to complicate the definition
> of what these things mean as a substitute for DBAs learning something
> about Postgres.  That seems like a fool's errand from here.  They're going
> to have to learn what FATAL means sooner or later, and making it more
> complicated just raises the height of that barrier.

I don't think it works to change the NOTICE/ERROR/FATAL tags; for one
thing, I can hear the screaming about people's log scripts from here.

However, it would really be useful to have an extra tag (in addition to
the ERROR or FATAL) for "If you're seeing this message, something has
gone seriously wrong on the server."  Just stuff like corruption
messages, backend crashes, etc.

Otherwise we're requiring users to come up with an alphabet soup of
regexes to filter out the noise error messages from the really, really
important ones.  Speaking as someone who does trainings for new DBAs,
the part where I do "what to look for in the logs" requires over an hour
and still doesn't cover everything. And doesn't internationalize. That's
nasty.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-11 Thread MauMau


From: "Kevin Grittner" 

5. FATAL: terminating walreceiver process due to administrator
command
6. FATAL: terminating background worker \"%s\" due to
administrator command


Those are client connections and their backends terminated for a
reason other than the client side of the connection requesting it.
If we don't classify those as FATAL then the definition of FATAL
becomes much more fuzzy. What would you define it to mean?


I'm sorry to cause you trouble, but my understanding is that those are not 
client connections.  They are just background server processes; walreceiver 
is started by startup process, and background workers are started by 
extension modules.  Am I misunderstanding something?


According to Table 18-1 in the manual:

http://www.postgresql.org/docs/current/static/runtime-config-logging.html

the definition of FATAL is:

FATAL
Reports an error that caused the current session to abort.

This does not apply to the above messages, because there is no error.  The 
DBA just shut down the database server, and the background processes 
terminated successfully.


If some message output is desired, LOG's definition seems the nearest:

LOG
Reports information of interest to administrators, e.g., checkpoint 
activity.


So, I thought "ereport(LOG, ...); proc_exit(0)" is more appropriate than 
ereport(FATAL, ...).  Is this so strange?


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Tom Lane

I wrote:
> Hm.  You can only take N rows from a block if there actually are at least
> N rows in the block.  So the sampling rule I suppose you are using is
> "select up to N rows from each sampled block" --- and that is going to
> favor the contents of blocks containing narrower-than-average rows.

Oh, no, wait: that's backwards.  (I plead insufficient caffeine.)
Actually, this sampling rule discriminates *against* blocks with
narrower rows.  You previously argued, correctly I think, that
sampling all rows on each page introduces no new bias because row
width cancels out across all sampled pages.  However, if you just
include up to N rows from each page, then rows on pages with more
than N rows have a lower probability of being selected, but there's
no such bias against wider rows.  This explains why you saw smaller
values of "i" being undersampled.

Had you run the test series all the way up to the max number of
tuples per block, which is probably a couple hundred in this test,
I think you'd have seen the bias go away again.  But the takeaway
point is that we have to sample all tuples per page, not just a
limited number of them, if we want to change it like this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why the buildfarm is all pink

2013-12-11 Thread Tom Lane

Andres Freund  writes:
> On 2013-12-11 10:07:19 -0500, Tom Lane wrote:
>> Do you remember offhand where the failures are?

> No, but they are easy enough to reproduce. Out of 10 runs, I've attached
> the one with the most failures and checked that it seems to contain all
> the failures from other runs. All of them probably could be fixed by
> moving things around, but I am not sure how maintainable that approach
> is :/

Thanks for doing the legwork.  These all seem to be cases where the
planner decided against doing an index-only scan on tenk1, which is
presumably because its relallvisible fraction is too low.  But these are
later in the test series than the "vacuum analyze tenk1" that's currently
present in create_index, and most of them are even later than the
database-wide VACUUM in sanity_check.  So those vacuums are failing to
mark the table as all-visible, even though it's not changed since the COPY
test.  This seems odd.  Do you know why your slave server is holding back
the xmin horizon so much?

After looking at this, I conclude that moving the vacuums earlier would
probably make things worse not better, because the critical interval seems
to be from the "COPY TO tenk1" command to the vacuum command.  So the idea
of putting vacuums into the COPY test is a bad one, and I'll proceed with
the patch I posted yesterday for moving the ANALYZE steps around.  I think
fixing what you're seeing is going to be a different issue.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-11 Thread MauMau


From: "Andres Freund" 

On 2013-12-12 00:31:25 +0900, MauMau wrote:

5. FATAL:  terminating walreceiver process due to administrator command
6. FATAL:  terminating background worker \"%s\" due to administrator 
command


Those are important if they happen outside a shutdown. So, if you really
want to remove them from there, you'd need to change the signalling to
handle the cases differently.


How are they important?  If someone mistakenly sends SIGTERM to walreceiver 
and background workers, they are automatically launched by postmaster or 
startup process later like other background processes.  But other background 
processes such as walsender, bgwriter, etc. don't emit FATAL messages.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-11 Thread Tom Lane

"MauMau"  writes:
> I agree that #1-#3 are of course reasonable when there's any client the user 
> runs.  The problem is that #1 (The database system is starting up) is output 
> in the server log by pg_ctl.  In that case, there's no client the user is 
> responsible for.  Why does a new DBA have to be worried about that FATAL 
> message?  He didn't do anything wrong.

FATAL doesn't mean "the DBA did something wrong".  It means "we terminated
a client session".

The fundamental problem IMO is that you want to complicate the definition
of what these things mean as a substitute for DBAs learning something
about Postgres.  That seems like a fool's errand from here.  They're going
to have to learn what FATAL means sooner or later, and making it more
complicated just raises the height of that barrier.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why the buildfarm is all pink

2013-12-11 Thread Tom Lane

Kevin Grittner  writes:
> Kevin Grittner  wrote:
>> Tom Lane  wrote:
>>> I haven't touched matview.sql here; that seems like a distinct issue.

>> I'll fix that.

> Done.

Thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.8

2013-12-11 Thread Andres Freund

On 2013-12-10 19:11:03 -0500, Robert Haas wrote:
> Committed #1 (again).  Regarding this:
> 
> +   /* XXX: we could also do this unconditionally, the space is used 
> anyway
> +   if (copy_oid)
> +   HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp));
> 
> I would like to put in a big +1 for doing that unconditionally.  I
> didn't make that change before committing, but I think it'd be a very
> good idea.

Patch attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 47c93d8e7afbcaef268de66246571cdc2f134c97 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 11 Dec 2013 17:20:27 +0100
Subject: [PATCH] Dep't of second thoughts: Always include oids in WAL logged
 replica identities.

Since replica identities are logged using the normal format for heap
tuples, the space for oids in WITH OIDS tables was already used, so
there's little point in only including the oid if it is included in
the configured IDENTITY.

Per comment from Robert Haas.
---
 src/backend/access/heap/heapam.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 249fffe..e647453 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6638,7 +6638,6 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
 	TupleDesc	idx_desc;
 	char		replident = relation->rd_rel->relreplident;
 	HeapTuple	key_tuple = NULL;
-	bool		copy_oid = false;
 	bool		nulls[MaxHeapAttributeNumber];
 	Datum		values[MaxHeapAttributeNumber];
 	int			natt;
@@ -6698,7 +6697,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
 		int attno = idx_rel->rd_index->indkey.values[natt];
 
 		if (attno == ObjectIdAttributeNumber)
-			copy_oid = true;
+			/* copied below */
+			;
 		else if (attno < 0)
 			elog(ERROR, "system column in index");
 		else
@@ -6709,8 +6709,12 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
 	*copy = true;
 	RelationClose(idx_rel);
 
-	/* XXX: we could also do this unconditionally, the space is used anyway */
-	if (copy_oid)
+	/*
+	 * Always copy oids if the table has them, even if not included in the
+	 * index. The space in the logged tuple is used anyway, so there's little
+	 * point in not including the information.
+	 */
+	if (relation->rd_rel->relhasoids)
 		HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp));
 
 	/*
-- 
1.8.5.rc2.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-Memory Columnar Store

2013-12-11 Thread knizhnik


Hi,

I depends on what you mean by "transparently substitute".
I f you want to be able to execute standard SQL queries using columnar 
store, then it seems to be impossible without rewriting of executor.
I provided another approach based on calling standard functions which 
perform manipulations not with scalar types but with timeseries.


For example instead of standard SQL

select sum(ClosePrice) from Quote;

I will have to write:

select cs_sum(ClosePrice) from Quote_get();

It looks similar but not quite the same.
And for more complex queries difference is larger.
For example the query

select sum(score*volenquired)/sum(volenquired) from DbItem group by 
(trader,desk,office);


can be written as

select agg_val,cs_cut(group_by,'c22c30c10') from
(select (cs_project_agg(ss1.*)).* from
  (select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q,
   cs_hash_sum(q.score*q.volenquired, 
q.trader||q.desk||q.office) s1,
cs_hash_sum(q.volenquired, q.trader||q.desk||q.office) 
s2) ss1) ss2;


Looks too complex, doesn't it?
But first two lines are responsible to perform reverse mapping: from 
vertical data representation to normal horisontal tuples.
The good thing is that this query is executed more than 1000 times 
faster (with default PostgreSQL configuration parameters except shared 
shared_buffers

which was set large enough to fit all data in memory).

On 12/11/2013 07:14 PM, k...@rice.edu wrote:

On Mon, Dec 09, 2013 at 11:40:41PM +0400, knizhnik wrote:

Hello!

I want to annouce my implementation of In-Memory Columnar Store
extension for PostgreSQL:

  Documentation: http://www.garret.ru/imcs/user_guide.html
  Sources: http://www.garret.ru/imcs-1.01.tar.gz

Any feedbacks, bug reports and suggestions are welcome.

Vertical representation of data is stored in PostgreSQL shared memory.
This is why it is important to be able to utilize all available
physical memory.

Hi,

This is very neat! The question I have, which applies to the matview
support as well, is "How can we transparently substitute usage of the
in-memory columnar store/matview in a SQL query?".

Regards,
Ken




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -d option for pg_isready is broken

2013-12-11 Thread Tom Lane

Robert Haas  writes:
> On Wed, Dec 11, 2013 at 9:35 AM, Andres Freund  wrote:
>> One use case is accessing a particular host when using DNS round robin
>> to standbys in combination with SSL.

> Ah, interesting point.  And it's not inconceivable that some
> application out there could be using PQhost() to retrieve the host
> from an existing connection object and reusing that value for a new
> connection, in which case redefining it to sometimes return hostaddr
> would break things.  So I think we shouldn't do that.

I think the only reasonable way to fix this is to improve the logic
in psql, not turn PQhost() into a mess with no understandable definition.
If that means we need to add a separate PQhostaddr() query function,
so be it.  We won't be able to fix the complained-of bug in back branches,
but I'd rather live with that (it's basically just cosmetic anyway)
than risk introducing perhaps-not-so-cosmetic bugs into other existing
applications.

In general, I think the definition of these query functions ought to
be "what was the value of this parameter when the connection was made".
As such, I'm not even sure that the pgunixsocket behavior that's in
PQhost now is a good idea, much less that we should extend that hack
to cover DefaultHost.

There is room also for a function defined as "give me a textual
description of what I'm connected to", which is not meant to reflect any
single connection parameter but rather the total behavior.  Right now
I think PQhost is on the borderline of doing this instead of just
reporting the "host" parameter, but I think rather than pushing it
across that border we'd be better off to invent a function explicitly
charged with doing that.  That would give us room to do something
actually meaningful with host+hostaddr cases, for instance.  I think
really what ought to be printed in such cases is something like
"host-name (address IP-address)"; leaving out the former would be
unhelpful but leaving out the latter is outright misleading.

On the other hand, I'm not sure how much of a translatability problem
it'd be to wedge such a description into a larger message.  Might be
better to just put the logic into psql.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-Memory Columnar Store

2013-12-11 Thread knizhnik

Hello!

Implementation of IMCS itself took me about two months (with testing and
writing documentation).
But huge part of the code was previously written by me for other
projects, so I have reused them.
Most of the time I have spent in integration of this code with
PostgreSQL (I was not so familiar with it before).

Certainly implementations of columnar store for Oracle (Oracle Database
In-Memory Option), DB2 (BLU Acceleration), ... are more convenient for
users: them can execute normal SQL queries and do not require users to
learn new functions and approach. But it requires complete redesign of
query engine (or providing alternative implementation). I was not able
to do it.

This is why I try to provide advantages of vertical data representation
(vector operation, parallel execution, data skipping) as well as
advantages of fast access to in-memory data as standard PostgreSQL
extension. There are obviously some limitations and queries look more
complicated than in case of standard SQL...

But from the other side it is possible to write queries which are hardly
to be expressed using standard SQL.
For example calculating split-adjusted prices can not be done in SQL
without using stored procedures.
To make usage of IMCS functions as simple as possible I defined a larger
number of various operators for most popular operations.

For example Volume-Weighted-Average-Price can be calculated just as:

select Volume//Close as VWAP from Quote_get();

It is even shore than analog SQL statement:

select sum(Close*Volume)/sum(Volume) as VWAP from Quote;

Concerning integration with PostgreSQL, there were several problems.
Some of them seems to have no easy solution, but other are IMHO
imperfections in PostgreSQL which I hope will be fixed sometime:

1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my
computer. Just defining insertion per-row trigger with empty procedure
increase time of insertion of 6 million records twice - from 7 till 15
seconds. If trigger procedure is not empty, then time is increased
proportionally number of performed calls.
In my case inserting data with propagation it in columnar store using
trigger takes about 80 seconds. But if I first load data without
triggers in PostgreSQL table and then
insert it in columnar store using load function (implemented in C), then
time will be 7+9=16 seconds.

Certainly I realize that plpgsql is interpreted language. But for
example also interpreted Python is able to do 100 times more calls per
second.
Unfortunately profiler doesn;t show some bottleneck - looks like long
calltime is caused by large overhead of initializing and resetting
memory context and copying arguments data.

2. Inefficient implementation of expanding composite type columns using
(foo()).* clause. In this case function foo() will be invoked as much
times as there are fields in the returned composite type. Even in case
of placing call in FROM list (thanks to lateral joins in 9.3),
PostgreSQL still sometimes performs redundant calls which can be avoided
using hack with adding "OFFSET 1" clause.

3. 256Gb limit for used shared memory segment size at Linux.

Concerning last problem - I have included in IMCS distributive much
simpler patch which just set MAP_HUGETLB flags when

a) is it defined in system headers
b) requested memory size is larger than 256Gb

In this case right now PostgreSQL will just fail to start.
But certainly it is more correct to trigger this flag through
configuration parameter, because large pages can minimize MMU overhead
and so increase speed even if size of used memory is less than 256Gb
(this is why Oracle is widely using it).

. Вызов функции занимает прядка 2 микросекунд. Т.е. если я напишу
триггер с пустой процедурой, то вставка 6 миллионов объектов займёт 15
секунд. Это при том, что без триггера вставка занимает всего 7 секунд...

On 12/11/2013 06:33 PM, Merlin Moncure wrote:

On Mon, Dec 9, 2013 at 1:40 PM, knizhnik wrote:

Hello!

I want to annouce my implementation of In-Memory Columnar Store extension
for PostgreSQL:

Documentation: http://www.garret.ru/imcs/user_guide.html
Sources: http://www.garret.ru/imcs-1.01.tar.gz

Any feedbacks, bug reports and suggestions are welcome.

Vertical representation of data is stored in PostgreSQL shared memory.
This is why it is important to be able to utilize all available physical
memory.
Now servers with Tb or more RAM are not something exotic, especially in
financial world.
But there is limitation in Linux with standard 4kb pages for maximal size
of mapped memory segment: 256Gb.
It is possible to overcome this limitation either by creating multiple
segments - but it requires too much changes in PostgreSQL memory manager.
Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the
system).

I found several messages related with MAP_HUGETLB flag, the most recent one
was from 21 of November:
http://www.postgresql.org/message-id/

Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-11 Thread MauMau


From: "Tom Lane" 

Jim Nasby  writes:

On 12/9/13 5:56 PM, Tom Lane wrote:

How so?  "FATAL" means "an error that terminates your session", which
is exactly what these are.


Except in these cases the user never actually got a working session; 
their request was denied.


To be clear, from the client standpoint it's certainly fatal, but not 
from the server's point of view. This is fully expected behavior as far 
as the server is concerned. (Obviously it might be an error that caused 
the shutdown/recovery, but that's something different.)


Right, but as already pointed out in this thread, these messages are
worded from the client's point of view.  "The client never got a working
connection" seems to me to be an empty distinction.  If you got SIGTERM'd
before you could issue your first query, should that not be FATAL because
you'd not gotten any work done?

More generally, we also say FATAL for all sorts of entirely routine
connection failures, like wrong password or mistyped user name.  People
don't seem to have a problem with those.  Even if some do complain,
the costs of changing that behavior after fifteen-years-and-counting
would certainly exceed any benefit.


I agree that #1-#3 are of course reasonable when there's any client the user 
runs.  The problem is that #1 (The database system is starting up) is output 
in the server log by pg_ctl.  In that case, there's no client the user is 
responsible for.  Why does a new DBA have to be worried about that FATAL 
message?  He didn't do anything wrong.


I thought adding "options='-c log_min_messages=PANIC'" to the connection 
string for PQping() in pg_ctl.c would vanish the message, but it didn't. 
The reason is that connection options take effect in PostgresMain(), which 
is after checking the FATAL condition in ProcessStartupPacket().  Do you 
think there is any good solution?


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 152 matches

Mail list logo