Re: [HACKERS] Parallel Seq Scan

2015-04-04 Thread David Rowley
So I've just finished reading the impressive 244 emails (so far) about
Parallel Seq scan, and I've had a quick skim over the latest patch.

Its quite exciting to think that one day we'll have parallel query in
PostgreSQL, but I have to say, that I think that there's a major point
about the proposed implementation that seems to have gotten forgotten
about, which I can't help but think won't get that far off the ground
unless more thought goes into it.

On 11 February 2015 at 09:56, Andres Freund and...@2ndquadrant.com wrote:

 I think we're getting to the point where having a unique mapping from
 the plan to the execution tree is proving to be rather limiting
 anyway. Check for example discussion about join removal. But even for
 current code, showing only the custom plans for the first five EXPLAIN
 EXECUTEs is pretty nasty (Try explain that to somebody that doesn't know
 pg internals. Their looks are worth gold and can kill you at the same
 time) and should be done differently.


Going over the previous emails in this thread I see that it has been a long
time since anyone discussed anything around how we might decide at planning
time how many workers should be used for the query, and from the emails I
don't recall anyone proposing a good idea about how this might be done, and
I for one can't see how this is at all possible to do at planning time.

I think that the planner should know nothing of parallel query at all, and
the planner quite possibly should go completely unmodified for this patch.
One major problem I can see is that, given a query such as:

SELECT * FROM million_row_product_table WHERE category = 'ELECTRONICS';

Where we have a non-unique index on category, some plans which may be
considered might be:

1. Index scan on the category index to get all rows matching 'ELECTRONICS'
2. Sequence scan on the table, filter matching rows.
3. Parallel plan which performs a series of partial sequence scans pulling
out all matching rows.

I really think that if we end choosing things like plan 3, when plan 2 was
thrown out because of its cost, then we'll end up consuming more CPU and
I/O than we can possibly justify using. The environmentalist in me screams
that this is wrong. What if we kicked off 128 worker process on some
high-end hardware to do this? I certainly wouldn't want to pay the power
bill. I understand there's costing built in to perhaps stop this, but I
still think it's wrong headed, and we need to still choose the fastest
non-parallel plan and only consider parallelising that later.

Instead what I think should happen is:

The following link has been seen before on this thread, but I'll post it
again:
http://docs.oracle.com/cd/A57673_01/DOC/server/doc/A48506/pqoconce.htm

There's one key sentence in there that should not be ignored:

It is important to note that the query is parallelized dynamically at
execution time.

dynamically at execution time... I think this also needs to happen in
PostgreSQL. If we attempt to do this parallel stuff at plan time, and we
happen to plan at some quiet period, or perhaps worse, some application's
start-up process happens to PREPARE a load of queries when the database is
nice and quite, then quite possibly we'll end up with some highly parallel
queries. Then perhaps come the time these queries are actually executed the
server is very busy... Things will fall apart quite quickly due to the
masses of IPC and context switches that would be going on.

I completely understand that this parallel query stuff is all quite new to
us all and we're likely still trying to nail down the correct
infrastructure for it to work well, so this is why I'm proposing that the
planner should know nothing of parallel query, instead I think it should
work more along the lines of:

* Planner should be completely oblivious to what parallel query is.
* Before executor startup the plan is passed to a function which decides if
we should parallelise it, and does so if the plan meets the correct
requirements. This should likely have a very fast exit path such as:

if root node's cost  parallel_query_cost_threshold
  return; /* the query is not expensive enough to attempt to make parallel
*/

The above check will allow us to have an almost zero overhead for small low
cost queries.

This function would likely also have some sort of logic in order to
determine if the server has enough spare resource at the current point in
time to allow queries to be parallelised (Likely this is not too important
to nail this down for a first implementation).

* The plan should then be completely traversed node by node to determine
which nodes can be made parallel. This would likely require an interface
function to each node which returns true or false, depending on if it's
safe to parallelise. For seq scan this could be a simple test to see if
we're scanning a temp table.

* Before any changes are made to the plan, a complete copy of it should be
made.

* Funnel nodes could then be injected below the last 

Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom ... btw, has anyone noticed that this patch broke hamerkop and
 Tom bowerbird?  Or at least, it's hard to see what other recent commit
 Tom would explain the failures they're showing.

Now that Robert committed the fix for 64bit Datum w/o USE_FLOAT8_BYVAL,
bowerbird seems fixed (hamerkop hasn't run yet).

I see nothing in the win32 stuff that tries to define USE_FLOAT8_BYVAL
on 64-bit windows, is this just an oversight or does it not actually
work there? or is it for on-disk compatibility with 32-bit windows?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compile warnings on OSX 10.10 clang 6.0

2015-04-04 Thread Michael Paquier
On Sat, Apr 4, 2015 at 6:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Peter Eisentraut pete...@gmx.net writes:
 These warnings also happen with older versions of clang.  Now idea how
 to fix yet.  I'm thinking that clang should be fixed, because these
 warnings are stupid.

 Yeah, they're utterly stupid; whoever put them in obviously doesn't
 have a clue about typical Makefile construction.  I wonder if next
 we'll see complaints about unnecessary -D or -I switches.

 Having said that, I did look awhile ago about how we might get rid of
 them, and it seems not easy; for starters we would need to drop the
 assumption that CFLAGS can always be included when linking.  Also,
 AFAICT -pthread sometimes *is* required when linking; so it's
 not even very obvious when to suppress the switch, even if we could
 do so without wholesale rearrangement of our FLAGS handling.

 On the other hand, there's often more than one way to skin a cat.
 It occurred to me that maybe we could just turn off this class of warning,
 and after some experimentation I found out that
 -Wno-unused-command-line-argument does that, at least in the version
 of clang that Apple's currently shipping.

 Who's for enabling that if the compiler takes it?

Yes, please. I always found those pthread warnings annoying.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  For those following along at home, the failures are on these queries:

  SELECT 1.1 AS two UNION SELECT 2.2;
  SELECT 1.1 AS two UNION SELECT 2;
  SELECT 1 AS two UNION SELECT 2.2;
  SELECT 1.1 AS three UNION SELECT 2 UNION ALL SELECT 2;

  In each case, the expected result is with the values in ascending
  numerical order.  In each case, the 1 or 1.1 value which ought to
  appear before 2 or 2.2 instead appears after it.  Strictly speaking,
  this is not the wrong answer to the query, and could be perhaps
  explained by the planner choosing a hash aggregate rather than a sort
  + unique plan.  But this patch doesn't change the planner at all, so
  the plan should be the same as it has always been.

 Tom Yeah.  We could add an EXPLAIN to make certain, perhaps, but since
 Tom none of the other critters are failing I doubt this explanation.

This failure in the union.sql test is exactly the one that happens if
you build with --disable-float8-byval, for what that's worth.

Does the windows build perhaps not define USE_FLOAT8_BYVAL? that would
explain the breakage.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-04 Thread Amit Kapila
On Fri, Apr 3, 2015 at 3:06 AM, Petr Jelinek p...@2ndquadrant.com wrote:

 Hi,

 so here is version 11.


Thanks, patch looks much better, but I think still few more
things needs to discussed/fixed.

1.
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause


Why do you want to allow func_arg_list?

Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.

SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same numeric value expr

It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.

In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.

2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id =
2;
ERROR:  syntax error at or near TABLESAMPLE
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR:  syntax error at or near TABLESAMPLE
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?


3.
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
..
+ parser_errposition(pstate,
+ exprLocation((Node *) r;

Better to align exprLocation() with pstate

4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan-rs_snapshot;
Buffer buffer = scan-rs_cbuf;

bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}

I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.

5.
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
 List
*sampleargs)
{
..
if (con-val.type == T_Null)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg(REPEATABLE
clause must be NOT NULL numeric value),
 parser_errposition
(pstate, con-location)));
..
}

InitSamplingMethod(SampleScanState *scanstate, TableSampleClause
*tablesample)
{
..
if (fcinfo.argnull[1])
ereport(ERROR,
(errcode
(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 errmsg(REPEATABLE clause cannot be
NULL)));
..
}

I think it would be better if we can have same error message
and probably the same error code in both of the above cases.

6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
 List *sampleargs)

It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?

7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE);
..
}

What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause.  If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.


8.
+DATA(insert OID = 3295 (  tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t
f v 3 0 2278 2281
23 700 _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 (  tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f
f t f v 3 0 2278 2281
23 700 _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_
));

Datatype for second argument is kept as  700 (Float4), shouldn't
it be 1700 (Numeric)?

9.
postgres=# explain SELECT t1.id FROM test_tablesample as t1 TABLESAMPLE
SYSTEM (
10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
 QUERY PLAN

 Nested Loop  (cost=0.00..4.05 rows=100 width=4)
   -  Sample Scan on test_tablesample t1  (cost=0.00..0.01 rows=1 width=4)
   -  Sample Scan on test_tablesample t2  (cost=0.00..4.02 rows=2 width=0)
(3 rows)

Isn't it better to display sample method name in explain.
I think it can help in case of join queries.
We can use below format to display:
Sample Scan (System) on test_tablesample ...

10. Bernoulli.c

+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan
support */

typo.
/ths/this


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-04-04 Thread Pavel Stehule
2015-04-04 15:29 GMT+02:00 Haribabu Kommi kommi.harib...@gmail.com:

 On Sat, Apr 4, 2015 at 4:19 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  Hi
 
  2015-03-31 14:38 GMT+02:00 Haribabu Kommi kommi.harib...@gmail.com:
 
  keyword_databases - The database name can be all, replication,
  sameuser, samerole and samegroup.
  keyword_roles - The role can be all and a group name prefixed with
 +.
 
 
  I am not very happy about name - but I have no better idea :( - maybe
  database_mask, user_mask
 

 How about database_keywords and user_keywords as column names?


I am thinking, it is better than keyword_databases - it is more logical.
But it is maybe question for mathematicians.

some other variant database_filters and user_filters or
database_predicates and user_predicates

but it is not terrible nice too

Regards

Pavel



 
 
  The rest of the database and role names are treated as normal database
  and role names.
 
  2. Added the code changes to identify the names with quoted.
 
 
  Is it a good idea? Database's names are not quoted every else (in system
  catalog). So now, we cannot to use join to this view.
 
  postgres=# select (databases)[1] from pg_hba_conf ;
   databases
  ---
   omega 2
 
  (4 rows)
 
  postgres=# select datname from pg_database ;
datname
  ---
   template1
   template0
   postgres
   omega 2
  (4 rows)
 
  I dislike this - we know, so the name must be quoted in file (without it,
  the file was incorrect). And if you need quotes, there is a function
  quote_ident. If we use quotes elsewhere, then it should be ok, bot not
 now.
  Please, remove it. More, it is not necessary, when you use a keyword
  columns.

 Thanks. The special handling of quotes for pg_hba_conf is not required.
 I will keep the behaviour similar to the other catalog tables.
 I will remove the quote support in the next version.

 Regards,
 Hari Babu
 Fujitsu Australia



Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-04-04 Thread Haribabu Kommi
On Sat, Apr 4, 2015 at 4:19 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hi

 2015-03-31 14:38 GMT+02:00 Haribabu Kommi kommi.harib...@gmail.com:

 keyword_databases - The database name can be all, replication,
 sameuser, samerole and samegroup.
 keyword_roles - The role can be all and a group name prefixed with +.


 I am not very happy about name - but I have no better idea :( - maybe
 database_mask, user_mask


How about database_keywords and user_keywords as column names?



 The rest of the database and role names are treated as normal database
 and role names.

 2. Added the code changes to identify the names with quoted.


 Is it a good idea? Database's names are not quoted every else (in system
 catalog). So now, we cannot to use join to this view.

 postgres=# select (databases)[1] from pg_hba_conf ;
  databases
 ---
  omega 2

 (4 rows)

 postgres=# select datname from pg_database ;
   datname
 ---
  template1
  template0
  postgres
  omega 2
 (4 rows)

 I dislike this - we know, so the name must be quoted in file (without it,
 the file was incorrect). And if you need quotes, there is a function
 quote_ident. If we use quotes elsewhere, then it should be ok, bot not now.
 Please, remove it. More, it is not necessary, when you use a keyword
 columns.

Thanks. The special handling of quotes for pg_hba_conf is not required.
I will keep the behaviour similar to the other catalog tables.
I will remove the quote support in the next version.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-04 Thread Sawada Masahiko
On Wed, Mar 11, 2015 at 11:46 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Tue, Mar 10, 2015 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote:
 Sawada,

 * Sawada Masahiko (sawada.m...@gmail.com) wrote:
 Thank you for your review!
 Attached file is the latest version (without document patch. I making it 
 now.)
 As per discussion, there is no change regarding of super user permission.

 Ok.  Here's another review.


 Thank you for your review!

 diff --git a/src/backend/catalog/system_views.sql 
 b/src/backend/catalog/system_views.sql
 index 5e69e2b..94ee229 100644
 --- a/src/backend/catalog/system_views.sql
 +++ b/src/backend/catalog/system_views.sql
 @@ -414,6 +414,11 @@ CREATE RULE pg_settings_n AS

  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;

 +CREATE VIEW pg_file_settings AS
 +   SELECT * FROM pg_show_all_file_settings() AS A;
 +
 +REVOKE ALL on pg_file_settings FROM public;
 +

 This suffers from a lack of pg_dump support, presuming that the
 superuser is entitled to change the permissions on this function.  As it
 turns out though, you're in luck in that regard since I'm working on
 fixing that for a mostly unrelated patch.  Any feedback you have on what
 I'm doing to pg_dump here:

 http://www.postgresql.org/message-id/20150307213935.go29...@tamriel.snowman.net

 Would certainly be appreciated.


 Thank you for the info.
 I will read the discussion and send the feedbacks.

 diff --git a/src/backend/utils/misc/guc-file.l 
 b/src/backend/utils/misc/guc-file.l
 [...]
 +  * Calculate size of guc_array and allocate it. From the secound time 
 to allcate memory,
 +  * we should free old allocated memory.
 +  */
 + guc_array_size = num_guc_file_variables * sizeof(struct 
 ConfigFileVariable);
 + if (!guc_file_variables)
 + {
 + /* For the first call */
 + guc_file_variables = (ConfigFileVariable *) guc_malloc(FATAL, 
 guc_array_size);
 + }
 + else
 + {
 + guc_array = guc_file_variables;
 + for (item = head; item; item = item-next, guc_array++)
 + {
 + free(guc_array-name);
 + free(guc_array-value);
 + free(guc_array-filename);

 It's a minor nit-pick, as the below loop should handle anything we care
 about, but I'd go ahead and reset guc_array-sourceline to be -1 or
 something too, just to show that everything's being handled here with
 regard to cleanup.  Or perhaps just add a comment saying that the
 sourceline for all currently valid entries will be updated.

 Fixed.


 + guc_file_variables = (ConfigFileVariable *) 
 guc_realloc(FATAL, guc_file_variables, guc_array_size);
 + }

 Nasby made a comment, I believe, that we might actually be able to use
 memory contexts here, which would certainly be much nicer as then you'd
 be able to just throw away the whole context and create a new one.  Have
 you looked at that approach at all?  Offhand, I'm not sure if it'd work
 or not (I have my doubts..) but it seems worthwhile to consider.


 I successfully used palloc() and pfree() when allocate and free
 guc_file_variable,
 but these variable seems to be freed already when I get value of
 pg_file_settings view via SQL.

 Otherwise, it seems like this would address my previous concerns that
 this would end up leaking memory, and even if it's a bit slower than one
 might hope, it's not an operation we should be doing very frequently.

 --- a/src/backend/utils/misc/guc.c
 +++ b/src/backend/utils/misc/guc.c
 [...]
  /*
 + * Return the total number of GUC File variables
 + */
 +int
 +GetNumConfigFileOptions(void)
 +{
 + return num_guc_file_variables;
 +}

 What's the point of this function..?  I'm not convinced it's the best
 idea, but it certainly looks like the function and the variable are only
 used from this file anyway?

 It's unnecessary.
 Fixed.

 + if (call_cntr  max_calls)
 + {
 + char   *values[NUM_PG_FILE_SETTINGS_ATTS];
 + HeapTuple   tuple;
 + Datum   result;
 + ConfigFileVariable conf;
 + charbuffer[256];

 Isn't that buffer a bit.. excessive in size?

 Fixed.


 diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
 index d3100d1..ebb96cc 100644
 --- a/src/include/utils/guc.h
 +++ b/src/include/utils/guc.h
 @@ -133,6 +133,14 @@ typedef struct ConfigVariable
   struct ConfigVariable *next;
  } ConfigVariable;

 +typedef struct ConfigFileVariable
 +{
 + char*name;
 + char*value;
 + char*filename;
 + int sourceline;
 +} ConfigFileVariable;
 +
 [...]
 +extern int   GetNumConfigFileOptions(void);

 These aren't actually used anywhere else, are they?  Not sure that
 there's any need to expose them beyond guc.c when that's the only place
 they're used.


 Fixed.

 This will also need a
 REVOKE EXECUTE on pg_show_all_file_settings() FROM public;

 Added.


I added 

Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  I see nothing in the win32 stuff that tries to define
  USE_FLOAT8_BYVAL on 64-bit windows, is this just an oversight or
  does it not actually work there? or is it for on-disk compatibility
  with 32-bit windows?

 Tom That flag doesn't affect on-disk compatibility.

pg_type.typbyval ?

But I don't know if anything else differs between 32-bit and 64-bit
windows that would impact on compatibility.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-04 Thread Petr Jelinek

On 04/04/15 14:57, Amit Kapila wrote:


1.
+tablesample_clause:
+TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause


Why do you want to allow func_arg_list?

Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.


It will be reported as error during parse analysis if the TABLESAMPLE 
method does not accept two parameters, same as when the expression used 
wrong type for example.




SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same numeric value expr

It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.



Yes I want extensibility here. And I think the tablesample method 
arguments are same thing as function arguments given that in the end 
they are arguments for the init function of tablesampling method.


I would be ok with just expr_list, naming parameters here isn't overly 
important, but ability to have different types and numbers of parameters 
for custom TABLESAMPLE methods *is* important.



In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.


Well I did attach it as separate diff mainly for that reason. I agree 
that DDL does not have to be committed immediately with the rest of the 
patch (although it's the simplest part of the patch IMHO).




2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
= 2;
ERROR:  syntax error at or near TABLESAMPLE
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR:  syntax error at or near TABLESAMPLE
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?



It's supported in the FROM part of UPDATE and USING part of DELETE. I 
think that that's sufficient.


Standard is somewhat useless for UPDATE and DELETE as it only defines 
quite limited syntax there. From what I've seen when doing research 
MSSQL also only supports it in their equivalent of FROM/USING list, 
Oracle does not seem to support their SAMPLING clause outside of SELECTs 
at all and if I got the cryptic DB2 manual correctly I think they don't 
support it outside of (sub)SELECTs either.



4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan-rs_snapshot;
Bufferbuffer = scan-rs_cbuf;

bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}

I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.


It's probably even better to do that one level up in the samplenexttup() 
and only call the SampleTupleVisible if page is not allvisible 
(PageIsAllVisible() is cheap).




6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List *sampleargs)

It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?



Eh, stupid copy-paste error when copying function name from header to 
actual source file.



7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE);
..
}

What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause.  If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.



Because the REPEATABLE is numeric expression so it can produce whatever 
number but we need int4 internally (well float4 would also work just the 
code would be slightly uglier). And we do this type of coercion even for 
table data (you can insert -2.3 into integer column and it will work) so 
I don't see what's wrong with it here.




8.
+DATA(insert OID = 3295 (  tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
t f v 3 0 2278 2281
23 700 _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 (  tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
f f t f v 3 0 2278 2281
23 700 _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
_null_ ));

Datatype for second argument is kept as  700 (Float4), shouldn't
it be 1700 (Numeric)?



Why is that? Given the sampling error I think the float4 is enough for 
specifying the percentage and it makes the 

Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom ... btw, has anyone noticed that this patch broke hamerkop and
  Tom bowerbird?  Or at least, it's hard to see what other recent commit
  Tom would explain the failures they're showing.

 Now that Robert committed the fix for 64bit Datum w/o USE_FLOAT8_BYVAL,
 bowerbird seems fixed (hamerkop hasn't run yet).

 I see nothing in the win32 stuff that tries to define USE_FLOAT8_BYVAL
 on 64-bit windows, is this just an oversight or does it not actually
 work there? or is it for on-disk compatibility with 32-bit windows?

That flag doesn't affect on-disk compatibility.  It could certainly break
third-party extensions, but we accept the same hazard on non-Windows with
equanimity.  I suspect this point simply wasn't revisited when we added
support for 64-bit Windows.

Having said that, I'm fine with leaving this as-is, if only because
it means we're exercising the --disable-float8-byval code paths in
the buildfarm ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compile warnings on OSX 10.10 clang 6.0

2015-04-04 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 On Sat, Apr 4, 2015 at 6:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It occurred to me that maybe we could just turn off this class of warning,
 and after some experimentation I found out that
 -Wno-unused-command-line-argument does that, at least in the version
 of clang that Apple's currently shipping.
 
 Who's for enabling that if the compiler takes it?

 Yes, please. I always found those pthread warnings annoying.

After a bit more experimentation I found out that for both gcc and clang
(at least in the versions I'm using, on RHEL6 and Yosemite), you can
write -Wno-anythingatall and the compiler will not complain about it.
(And how did *that* get by the bozo who put in this warning, I wonder.)
So that means that if we just add the obvious test

  PGAC_PROG_CC_CFLAGS_OPT([-Wno-unused-command-line-argument])

then we will end up including that in CFLAGS on pretty much every
platform, whether or not there's an actual problem to solve.

gcc *does* complain about -Wunused-command-line-argument, so a possible
answer is to test for that and then add the other to CFLAGS.  That seems
kinda grotty though, does anyone have another way?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: nbklno in _hash_splitbucket unused (after ed9cc2b)

2015-04-04 Thread Tomas Vondra

On 04/05/15 01:26, Petr Jelinek wrote:

On 05/04/15 01:03, Tomas Vondra wrote:

Hi there,

the changes in _hash_splitbucket() made nblkno unused when compiled
without asserts. Attached is a simple fix to get rid of the warnings.



This has been already fixed, see b7e1652d5 .


D'oh! Sorry for the noise. Apparently my git repo was not properly 
synced with upstream so I haven't seen the commit :-/


regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] File count restriction of directory limits number of relations inside a database.

2015-04-04 Thread sudalai
Hi,
 We have a requirement to have multiple schemas in a database.Each 
schema
will have all application tables and hold a set of users data. We have
segmented data across different schemas. We dynamically increase the schemas
as in when user signs up. So the table files created across all schemas
resides in a single folder under data/base/16546/***. We can't find any
restriction to number of relation(table,index) on a database in postgresql.
But Number of files in a folder in linux is restricted in our file system
what we use currently 32k/64K (ext2 / ext4). To overcome this we create  a
folders inside a database folder and associate as table space for each
schema but it breaks the streaming replication in standby server which we
use currently.
To overcome this We changed the postgresql-9.4.0  source code to create a
sub directory my_oid of tablespace inside the location given in create
tablespace statement  and take that location as tablespace location. Now we
specify one common location to create multiple tablespaces, on both slave
and master because postgresql creates a tablespace on subdirectory(my_oid
of tablespace). 
Since I'm not an expert in postgresql, i can't assure that it will not
affect other functionality of postgres. Please help me, i changed the method
named create_tablespace_directories in a file tablespace.c.

Here's the modified method.

  static void
create_tablespace_directories(const char *location, const Oid
tablespaceoid)
{
char   *linkloc;
char   *location_with_version_dir;
char * newlocation;
struct stat st;

linkloc = psprintf(pg_tblspc/%u, tablespaceoid);
newlocation=psprintf(%s/my_%d,location,tablespaceoid);
location_with_version_dir = psprintf(%s/%s, newlocation,

TABLESPACE_VERSION_DIRECTORY);


/*
 * Attempt to coerce target directory to safe permissions.  If
this fails,
 * it doesn't exist or has the wrong owner.
 */
if (chmod(location, S_IRWXU) != 0)
{
if (errno == ENOENT)
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FILE),
 errmsg(directory \%s\ does not exist,
location),
 InRecovery ? errhint(Create this directory for
the tablespace before 
  restarting the server.)
: 0));
else
ereport(ERROR,
(errcode_for_file_access(),
  errmsg(could not set permissions on directory
\%s\: %m,
 location)));
}
  
ereport(WARNING,(errmsg(Trying to create :
\%s\,newlocation)));

if(mkdir(newlocation,S_IRWXU)0)
{
if (errno == EEXIST)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
 errmsg(directory \%s\ already in use
as a tablespace,
newlocation)));
else
ereport(ERROR,
(errcode_for_file_access(),
 errmsg(could not create directory
\%s\: %m,
newlocation)));
}
  
  

if (InRecovery)
{
/*
 * Our theory for replaying a CREATE is to forcibly drop the
target
 * subdirectory if present, and then recreate it. This may
be more
 * work than needed, but it is simple to implement.
 */
if (stat(location_with_version_dir, st) == 0 
S_ISDIR(st.st_mode))
{
if (!rmtree(location_with_version_dir, true))
/* If this failed, mkdir() below is going to error.
*/
ereport(WARNING,
(errmsg(some useless files may be left
behind in old database directory \%s\,
location_with_version_dir)));
}
}

/*
 * The creation of the version directory prevents more than one
tablespace
 * in a single location.
 */
if (mkdir(location_with_version_dir, S_IRWXU)  0)
{
if (errno == EEXIST)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
 errmsg(directory \%s\ already in use as a
tablespace,
location_with_version_dir)));
else
ereport(ERROR,

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-04 Thread Jim Nasby

On 4/3/15 12:59 AM, Sawada Masahiko wrote:

+   case HEAPTUPLE_LIVE:
+   case HEAPTUPLE_RECENTLY_DEAD:
+   case HEAPTUPLE_INSERT_IN_PROGRESS:
+   case HEAPTUPLE_DELETE_IN_PROGRESS:
+   if 
(heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
+   
  mxactcutoff, frozen[nfrozen]))
+   frozen[nfrozen++].offset = 
offnum;
+   break;


This doesn't seem safe enough to me. Can't there be tuples that are 
still new enough that they can't be frozen, and are still live? I don't 
think it's safe to leave tuples as dead either, even if they're hinted. 
The hint may not be written. Also, the patch seems to be completely 
ignoring actually freezing the toast relation; I can't see how that's 
actually safe.


I'd feel a heck of a lot safer if any time heap_prepare_freeze_tuple 
returned false we did a second check on the tuple to ensure it was truly 
frozen.


Somewhat related... instead of forcing the freeze to happen 
synchronously, can't we set this up so a table is in one of three 
states? Read/Write, Read Only, Frozen. AT_SetReadOnly and 
AT_SetReadWrite would simply change to the appropriate state, and all 
the vacuum infrastructure would continue to process those tables as it 
does today. lazy_vacuum_rel would become responsible for tracking if 
there were any non-frozen tuples if it was also attempting a freeze. If 
it discovered there were none, AND the table was marked as ReadOnly, 
then it would change the table state to Frozen and set relfrozenxid = 
InvalidTransactionId and relminxid = InvalidMultiXactId. AT_SetReadWrite 
could change relfrozenxid to it's own Xid as an optimization. Doing it 
that way leaves all the complicated vacuum code in one place, and would 
eliminate concerns about race conditions with still running 
transactions, etc.


BTW, you also need to put things in place to ensure it's impossible to 
unfreeze a tuple in a relation that's marked ReadOnly or Frozen. I'm not 
sure what the right way to do that would be.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-04 Thread Jim Nasby

On 4/4/15 5:10 PM, Jim Nasby wrote:

On 4/3/15 12:59 AM, Sawada Masahiko wrote:

+case HEAPTUPLE_LIVE:
+case HEAPTUPLE_RECENTLY_DEAD:
+case HEAPTUPLE_INSERT_IN_PROGRESS:
+case HEAPTUPLE_DELETE_IN_PROGRESS:
+if (heap_prepare_freeze_tuple(tuple.t_data,
freezelimit,
+  mxactcutoff,
frozen[nfrozen]))
+frozen[nfrozen++].offset = offnum;
+break;


This doesn't seem safe enough to me. Can't there be tuples that are
still new enough that they can't be frozen, and are still live? I don't
think it's safe to leave tuples as dead either, even if they're hinted.
The hint may not be written. Also, the patch seems to be completely
ignoring actually freezing the toast relation; I can't see how that's
actually safe.

I'd feel a heck of a lot safer if any time heap_prepare_freeze_tuple
returned false we did a second check on the tuple to ensure it was truly
frozen.

Somewhat related... instead of forcing the freeze to happen
synchronously, can't we set this up so a table is in one of three
states? Read/Write, Read Only, Frozen. AT_SetReadOnly and
AT_SetReadWrite would simply change to the appropriate state, and all
the vacuum infrastructure would continue to process those tables as it
does today. lazy_vacuum_rel would become responsible for tracking if
there were any non-frozen tuples if it was also attempting a freeze. If
it discovered there were none, AND the table was marked as ReadOnly,
then it would change the table state to Frozen and set relfrozenxid =
InvalidTransactionId and relminxid = InvalidMultiXactId. AT_SetReadWrite
could change relfrozenxid to it's own Xid as an optimization. Doing it
that way leaves all the complicated vacuum code in one place, and would
eliminate concerns about race conditions with still running
transactions, etc.

BTW, you also need to put things in place to ensure it's impossible to
unfreeze a tuple in a relation that's marked ReadOnly or Frozen. I'm not
sure what the right way to do that would be.


Answering my own question... I think visibilitymap_clear() would be the 
right place. AFAICT this is basically as critical as clearing the VM, 
and that function has the Relation, so it can see what mode the relation 
is in.


There is another possibility here, too. We can completely divorce a 
ReadOnly mode (which I think is useful for other things besides 
freezing) from the question of whether we need to force-freeze a 
relation if we create a FrozenMap, similar to the visibility map. This 
has the added advantage of helping freeze scans on relations that are 
not ReadOnly in the case of tables that are insert-mostly or any other 
pattern where most pages stay all-frozen.


Prior to the visibility map this would have been a rather daunting 
project, but I believe this could piggyback on the VM code rather 
nicely. Anytime you clear the VM you clearly must clear the FrozenMap as 
well. The logic for setting the FM is clearly different, but that would 
be entirely self-contained to vacuum. Unlike the VM, I don't see any 
point to marking special bits in the page itself for FM.


It would be nice if each bit in the FM covered multiple pages, but that 
can be optimized later.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COALESCE() query yield different result with MJ vs. NLJ/HJ

2015-04-04 Thread Tom Lane
Qingqing Zhou zhouqq.postg...@gmail.com writes:
 [ this fails: ]
 set enable_mergejoin=on; set enable_nestloop=off; set enable_hashjoin=off;
 explain analyze select a, b from t1 left join  t2 on coalesce(a, 1) =
 coalesce(b,1)  where (coalesce(b,1))0

Ugh.  The core of the problem is a mistaken assumption that b below the
outer join means the same thing as b above it.  I've suspected for years
that the planner might someday have to explicitly distinguish the two
meanings, but at least up to now we've not really gotten burnt by failing
to make the distinction.

 A possible explanation is that in fix_join_expr_mutator(), we optimize
 with the case that if child node already compute an expression then
 upper node shall reuse it. In MJ, as coalesce() already computed in
 sort node, thus the NULL is directly used for ExecQual(0) for join
 filter.
 If we take out this optimization the problem is solved but may looks
 like an overkill. What's a better fix?

Indeed, removing that optimization altogether seems likely to break
things, not to mention being pretty inefficient.  Maybe pending a
proper fix (which I'm afraid will entail major planner redesign)
we could refuse to match anything more complex than a Var or
PlaceHolderVar if it's bubbling up from the nullable side of an
outer join.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Robert Haas
On Sat, Apr 4, 2015 at 10:27 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I see nothing in the win32 stuff that tries to define USE_FLOAT8_BYVAL
 on 64-bit windows, is this just an oversight or does it not actually
 work there? or is it for on-disk compatibility with 32-bit windows?

 That flag doesn't affect on-disk compatibility.  It could certainly break
 third-party extensions, but we accept the same hazard on non-Windows with
 equanimity.  I suspect this point simply wasn't revisited when we added
 support for 64-bit Windows.

 Having said that, I'm fine with leaving this as-is, if only because
 it means we're exercising the --disable-float8-byval code paths in
 the buildfarm ;-)

But... are we?  I mean, I don't see anything in the Windows code that
defines USE_FLOAT8_BYVAL.

(Admittedly, if we're not, I have no theory for why that patch fixed
anything, but all the same I don't see where it's getting defined.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Apr 4, 2015 at 10:27 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Having said that, I'm fine with leaving this as-is, if only because
 it means we're exercising the --disable-float8-byval code paths in
 the buildfarm ;-)

 But... are we?  I mean, I don't see anything in the Windows code that
 defines USE_FLOAT8_BYVAL.

Precisely --- it *isn't* getting set, even on 64-bit Windows where it
could be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Andrew Dunstan


On 04/04/2015 10:27 AM, Tom Lane wrote:

Andrew Gierth and...@tao11.riddles.org.uk writes:

Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom ... btw, has anyone noticed that this patch broke hamerkop and
  Tom bowerbird?  Or at least, it's hard to see what other recent commit
  Tom would explain the failures they're showing.
Now that Robert committed the fix for 64bit Datum w/o USE_FLOAT8_BYVAL,
bowerbird seems fixed (hamerkop hasn't run yet).
I see nothing in the win32 stuff that tries to define USE_FLOAT8_BYVAL
on 64-bit windows, is this just an oversight or does it not actually
work there? or is it for on-disk compatibility with 32-bit windows?

That flag doesn't affect on-disk compatibility.  It could certainly break
third-party extensions, but we accept the same hazard on non-Windows with
equanimity.  I suspect this point simply wasn't revisited when we added
support for 64-bit Windows.

Having said that, I'm fine with leaving this as-is, if only because
it means we're exercising the --disable-float8-byval code paths in
the buildfarm ;-)





This seems quite wrong. If we want those paths tested we should ensure 
that buildfarm members are set up with that explicit setting.


I think not making this the default for 64 bit MSVC builds was simply an 
error of omission.


The attached patch should set float8byval as the default on 64 bit MSVC 
builds and error out if it is explicitly set on 32 bit platforms.


cheers

andrew
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 714585f..764ba1e 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -25,11 +25,18 @@ sub _new
 		platform   = undef, };
 	bless($self, $classname);
 
+	$self-DeterminePlatform();
+	my $bits = $self-{platform} eq 'Win32' ? 32 : 64;
+
 	# integer_datetimes is now the default
 	$options-{integer_datetimes} = 1
 	  unless exists $options-{integer_datetimes};
 	$options-{float4byval} = 1
 	  unless exists $options-{float4byval};
+	$options-{float8byval} = ($bits == 64)
+	  unless exists $options-{float8byval};
+	die float8byval not permitted on 32 bit platforms
+	  if  $options-{float8byval}  $bits == 32;
 	if ($options-{xml})
 	{
 		if (!($options-{xslt}  $options-{iconv}))
@@ -56,8 +63,6 @@ sub _new
 	die Bad wal_segsize $options-{wal_segsize}
 	  unless grep { $_ == $options-{wal_segsize} } (1, 2, 4, 8, 16, 32, 64);
 
-	$self-DeterminePlatform();
-
 	return $self;
 }
 
diff --git a/src/tools/msvc/config_default.pl b/src/tools/msvc/config_default.pl
index e4d4810..0bee0c0 100644
--- a/src/tools/msvc/config_default.pl
+++ b/src/tools/msvc/config_default.pl
@@ -6,7 +6,10 @@ our $config = {
 	asserts = 0,# --enable-cassert
 	  # integer_datetimes=1,   # --enable-integer-datetimes - on is now default
 	  # float4byval=1, # --disable-float4-byval, on by default
-	  # float8byval=0, # --disable-float8-byval, off by default
+
+	  # float8byval= $platformbits == 64, # --disable-float8-byval,
+	  # off by default on 32 bit platforms, on by default on 64 bit platforms
+
 	  # blocksize = 8, # --with-blocksize, 8kB by default
 	  # wal_blocksize = 8, # --with-wal-blocksize, 8kB by default
 	  # wal_segsize = 16,  # --with-wal-segsize, 16MB by default

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File count restriction of directory limits number of relations inside a database.

2015-04-04 Thread Tomas Vondra
Hi,

On 4.4.2015 21:24, sudalai wrote:
 Hi,
 We have a requirement to have multiple schemas in a database.Each 
 schema will have all application tables and hold a set of users
 data. We have segmented data across different schemas. We
 dynamically increase the schemas as in when user signs up. So the
 table files created across all schemas resides in a single folder
 under data/base/16546/***. We can't find any restriction to number of
 relation(table,index) on a database in postgresql.

PostgreSQL is using 32-bit object IDs, so technically you can have up to
2^32 objects (tables, indexes). You'll certainly run into other issues
before hitting this limit, though.

 But Number of files in a folder in linux is restricted in our file
 system what we use currently 32k/64K (ext2 / ext4). To overcome this
 we create a folders inside a database folder and associate as table
 space for each schema but it breaks the streaming replication in
 standby server which we use currently.

I'm pretty sure ext4 allows more directory entries. For example I just
did this on my ext4 partition, within a new directory:

  $ for i in `seq 1 100`; do touch $i.file; done

and guess what, I do have 1M files there:

  $ ls | wc -l
  100

So either you're using some old version of ext4, or you've used some
strange parameters when creating the filesystem.

 To overcome this We changed the postgresql-9.4.0 source code to
 create a sub directory my_oid of tablespace inside the location
 given in create tablespace statement and take that location as
 tablespace location. Now we specify one common location to create
 multiple tablespaces, on both slave and master because postgresql
 creates a tablespace on subdirectory(my_oid of tablespace).

 Since I'm not an expert in postgresql, i can't assure that it will 
 not affect other functionality of postgres. Please help me, i
 changed the method named create_tablespace_directories in a file
 tablespace.c.

This seems extremely foolish, IMNSHO.

Firstly, I'm not convinced there actually is a problem - you haven't
posted any error messages demonstrating the existence of such ext4
limits, you only claimed they exist. I've created a directory on ext4
with 1M files in it, without any problem.

Secondly, even if there is such problem on your system, it's most likely
by using improper options when creating the ext4 filesystem. Solving
this at the PostgreSQL level is a poor solution.

And finally - no one is going to confirm that your changes are safe,
without a detailed review of your patch. But that would take a lot of
time, and it seems you're solving an artificial problem, so I'd be
surprised if anyone invests the time into reviewing this.

In other words, you should probably fix the filesystem configuration,
not PostgreSQL code.

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Peter Geoghegan
On Sat, Apr 4, 2015 at 4:37 PM, Andrew Dunstan and...@dunslane.net wrote:
 This seems quite wrong. If we want those paths tested we should ensure that
 buildfarm members are set up with that explicit setting.

 I think not making this the default for 64 bit MSVC builds was simply an
 error of omission.

+1. I don't feel that having every optimization available on Windows
is worth bending over backwards for, but this omission is kind of
silly.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Andrew Dunstan


On 04/04/2015 04:27 PM, Robert Haas wrote:

On Sat, Apr 4, 2015 at 10:27 AM, Tom Lane t...@sss.pgh.pa.us wrote:

I see nothing in the win32 stuff that tries to define USE_FLOAT8_BYVAL
on 64-bit windows, is this just an oversight or does it not actually
work there? or is it for on-disk compatibility with 32-bit windows?

That flag doesn't affect on-disk compatibility.  It could certainly break
third-party extensions, but we accept the same hazard on non-Windows with
equanimity.  I suspect this point simply wasn't revisited when we added
support for 64-bit Windows.

Having said that, I'm fine with leaving this as-is, if only because
it means we're exercising the --disable-float8-byval code paths in
the buildfarm ;-)

But... are we?  I mean, I don't see anything in the Windows code that
defines USE_FLOAT8_BYVAL.

(Admittedly, if we're not, I have no theory for why that patch fixed
anything, but all the same I don't see where it's getting defined.)



See src/tools/msvc/Solution.pm

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Robert Haas
On Apr 4, 2015, at 4:37 PM, Andrew Dunstan and...@dunslane.net wrote:
 The attached patch should set float8byval as the default on 64 bit MSVC 
 builds and error out if it is explicitly set on 32 bit platforms.

+1 for changing this.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH: nbklno in _hash_splitbucket unused (after ed9cc2b)

2015-04-04 Thread Tomas Vondra

Hi there,

the changes in _hash_splitbucket() made nblkno unused when compiled 
without asserts. Attached is a simple fix to get rid of the warnings.


regards
Tomas
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 9a77945..e32c4fc 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -764,7 +764,6 @@ _hash_splitbucket(Relation rel,
   uint32 lowmask)
 {
 	BlockNumber oblkno;
-	BlockNumber nblkno;
 	Buffer		obuf;
 	Page		opage;
 	Page		npage;
@@ -781,8 +780,7 @@ _hash_splitbucket(Relation rel,
 	opage = BufferGetPage(obuf);
 	oopaque = (HashPageOpaque) PageGetSpecialPointer(opage);
 
-	nblkno = start_nblkno;
-	Assert(nblkno == BufferGetBlockNumber(nbuf));
+	Assert(start_nblkno == BufferGetBlockNumber(nbuf));
 	npage = BufferGetPage(nbuf);
 
 	/* initialize the new bucket's primary page */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-04 Thread Jeff Janes
On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 4/3/15 12:59 AM, Sawada Masahiko wrote:

 +   case HEAPTUPLE_LIVE:
 +   case HEAPTUPLE_RECENTLY_DEAD:
 +   case HEAPTUPLE_INSERT_IN_PROGRESS:
 +   case HEAPTUPLE_DELETE_IN_PROGRESS:
 +   if 
 (heap_prepare_freeze_tuple(tuple.t_data,
 freezelimit,
 +
  mxactcutoff, frozen[nfrozen]))
 +   frozen[nfrozen++].offset
 = offnum;
 +   break;


 This doesn't seem safe enough to me. Can't there be tuples that are still
 new enough that they can't be frozen, and are still live?


Yep.  I've set a table to read only while it contained unfreezable tuples,
and the tuples remain unfrozen yet the read-only action claims to have
succeeded.



 Somewhat related... instead of forcing the freeze to happen synchronously,
 can't we set this up so a table is in one of three states? Read/Write, Read
 Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to the
 appropriate state, and all the vacuum infrastructure would continue to
 process those tables as it does today. lazy_vacuum_rel would become
 responsible for tracking if there were any non-frozen tuples if it was also
 attempting a freeze. If it discovered there were none, AND the table was
 marked as ReadOnly, then it would change the table state to Frozen and set
 relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
 AT_SetReadWrite could change relfrozenxid to it's own Xid as an
 optimization. Doing it that way leaves all the complicated vacuum code in
 one place, and would eliminate concerns about race conditions with still
 running transactions, etc.


+1 here as well.  I might want to set tables to read only for reasons other
than to avoid repeated freezing.  (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to automatically
trigger a vacuum freeze in the process.

Cheers,

Jeff


Re: [HACKERS] PATCH: nbklno in _hash_splitbucket unused (after ed9cc2b)

2015-04-04 Thread Petr Jelinek

On 05/04/15 01:03, Tomas Vondra wrote:

Hi there,

the changes in _hash_splitbucket() made nblkno unused when compiled
without asserts. Attached is a simple fix to get rid of the warnings.



This has been already fixed, see b7e1652d5 .


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: adaptive ndistinct estimator v4

2015-04-04 Thread Tomas Vondra

Hi,

On 04/03/15 15:46, Greg Stark wrote:

  The simple workaround for this was adding a fallback to GEE when f[1]
or f[2] is 0. GEE is another estimator described in the paper, behaving
much better in those cases.

For completeness, what's the downside in just always using GEE?


That's a good question.

GEE is the estimator with minimal average error, as defined in Theorem 1 
in that paper. The exact formulation of the theorem is a bit complex, 
but it essentially says that knowing just the sizes of the data set and 
sample, there's an accuracy limit.


Or put another way, it's possible to construct the data set so that the 
estimator gives you estimates with error exceeding some limit (with a 
certain probability).


Knowledge of how much the data set is skewed gives us opportunity to 
improve the estimates by choosing an estimator performing better with 
such data sets. The problem is we don't know the skew - we can only 
estimate it from the sample, which is what the hybrid estimators do.


The AE estimator (presented in the paper and implemented in the patch) 
is an example of such hybrid estimators, but based on my experiments it 
does not work terribly well with one particular type of skew that I'd 
expect to be relatively common (few very common values, many very rare 
values).


Luckily, GEE performs pretty well in this case, but we can use the AE 
otherwise (ISTM it gives really good estimates).


But of course - there'll always be data sets that are poorly estimated 
(pretty much as Theorem 1 in the paper says). I'd be nice to do more 
testing on real-world data sets, to see if this performs better or worse 
than our current estimator.


regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers