Re: [HACKERS] PATCH: nbklno in _hash_splitbucket unused (after ed9cc2b)

2015-04-04 Thread Tomas Vondra

On 04/05/15 01:26, Petr Jelinek wrote:

On 05/04/15 01:03, Tomas Vondra wrote:

Hi there,

the changes in _hash_splitbucket() made nblkno unused when compiled
without asserts. Attached is a simple fix to get rid of the warnings.



This has been already fixed, see b7e1652d5 .


D'oh! Sorry for the noise. Apparently my git repo was not properly 
synced with upstream so I haven't seen the commit :-/


regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: adaptive ndistinct estimator v4

2015-04-04 Thread Tomas Vondra

Hi,

On 04/03/15 15:46, Greg Stark wrote:

 > The simple workaround for this was adding a fallback to GEE when f[1]
or f[2] is 0. GEE is another estimator described in the paper, behaving
much better in those cases.

For completeness, what's the downside in just always using GEE?


That's a good question.

GEE is the estimator with minimal average error, as defined in Theorem 1 
in that paper. The exact formulation of the theorem is a bit complex, 
but it essentially says that knowing just the sizes of the data set and 
sample, there's an accuracy limit.


Or put another way, it's possible to construct the data set so that the 
estimator gives you estimates with error exceeding some limit (with a 
certain probability).


Knowledge of how much the data set is skewed gives us opportunity to 
improve the estimates by choosing an estimator performing better with 
such data sets. The problem is we don't know the skew - we can only 
estimate it from the sample, which is what the hybrid estimators do.


The AE estimator (presented in the paper and implemented in the patch) 
is an example of such hybrid estimators, but based on my experiments it 
does not work terribly well with one particular type of skew that I'd 
expect to be relatively common (few very common values, many very rare 
values).


Luckily, GEE performs pretty well in this case, but we can use the AE 
otherwise (ISTM it gives really good estimates).


But of course - there'll always be data sets that are poorly estimated 
(pretty much as Theorem 1 in the paper says). I'd be nice to do more 
testing on real-world data sets, to see if this performs better or worse 
than our current estimator.


regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: nbklno in _hash_splitbucket unused (after ed9cc2b)

2015-04-04 Thread Petr Jelinek

On 05/04/15 01:03, Tomas Vondra wrote:

Hi there,

the changes in _hash_splitbucket() made nblkno unused when compiled
without asserts. Attached is a simple fix to get rid of the warnings.



This has been already fixed, see b7e1652d5 .


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-04 Thread Jeff Janes
On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby  wrote:

> On 4/3/15 12:59 AM, Sawada Masahiko wrote:
>
>> +   case HEAPTUPLE_LIVE:
>> +   case HEAPTUPLE_RECENTLY_DEAD:
>> +   case HEAPTUPLE_INSERT_IN_PROGRESS:
>> +   case HEAPTUPLE_DELETE_IN_PROGRESS:
>> +   if 
>> (heap_prepare_freeze_tuple(tuple.t_data,
>> freezelimit,
>> +
>>  mxactcutoff, &frozen[nfrozen]))
>> +   frozen[nfrozen++].offset
>> = offnum;
>> +   break;
>>
>
> This doesn't seem safe enough to me. Can't there be tuples that are still
> new enough that they can't be frozen, and are still live?


Yep.  I've set a table to read only while it contained unfreezable tuples,
and the tuples remain unfrozen yet the read-only action claims to have
succeeded.



> Somewhat related... instead of forcing the freeze to happen synchronously,
> can't we set this up so a table is in one of three states? Read/Write, Read
> Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to the
> appropriate state, and all the vacuum infrastructure would continue to
> process those tables as it does today. lazy_vacuum_rel would become
> responsible for tracking if there were any non-frozen tuples if it was also
> attempting a freeze. If it discovered there were none, AND the table was
> marked as ReadOnly, then it would change the table state to Frozen and set
> relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
> AT_SetReadWrite could change relfrozenxid to it's own Xid as an
> optimization. Doing it that way leaves all the complicated vacuum code in
> one place, and would eliminate concerns about race conditions with still
> running transactions, etc.
>

+1 here as well.  I might want to set tables to read only for reasons other
than to avoid repeated freezing.  (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to automatically
trigger a vacuum freeze in the process.

Cheers,

Jeff


[HACKERS] PATCH: nbklno in _hash_splitbucket unused (after ed9cc2b)

2015-04-04 Thread Tomas Vondra

Hi there,

the changes in _hash_splitbucket() made nblkno unused when compiled 
without asserts. Attached is a simple fix to get rid of the warnings.


regards
Tomas
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 9a77945..e32c4fc 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -764,7 +764,6 @@ _hash_splitbucket(Relation rel,
   uint32 lowmask)
 {
 	BlockNumber oblkno;
-	BlockNumber nblkno;
 	Buffer		obuf;
 	Page		opage;
 	Page		npage;
@@ -781,8 +780,7 @@ _hash_splitbucket(Relation rel,
 	opage = BufferGetPage(obuf);
 	oopaque = (HashPageOpaque) PageGetSpecialPointer(opage);
 
-	nblkno = start_nblkno;
-	Assert(nblkno == BufferGetBlockNumber(nbuf));
+	Assert(start_nblkno == BufferGetBlockNumber(nbuf));
 	npage = BufferGetPage(nbuf);
 
 	/* initialize the new bucket's primary page */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COALESCE() query yield different result with MJ vs. NLJ/HJ

2015-04-04 Thread Tom Lane
Qingqing Zhou  writes:
> [ this fails: ]
> set enable_mergejoin=on; set enable_nestloop=off; set enable_hashjoin=off;
> explain analyze select a, b from t1 left join  t2 on coalesce(a, 1) =
> coalesce(b,1)  where (coalesce(b,1))>0

Ugh.  The core of the problem is a mistaken assumption that "b" below the
outer join means the same thing as "b" above it.  I've suspected for years
that the planner might someday have to explicitly distinguish the two
meanings, but at least up to now we've not really gotten burnt by failing
to make the distinction.

> A possible explanation is that in fix_join_expr_mutator(), we optimize
> with the case that if child node already compute an expression then
> upper node shall reuse it. In MJ, as coalesce() already computed in
> sort node, thus the NULL is directly used for ExecQual(>0) for join
> filter.
> If we take out this optimization the problem is solved but may looks
> like an overkill. What's a better fix?

Indeed, removing that optimization altogether seems likely to break
things, not to mention being pretty inefficient.  Maybe pending a
proper fix (which I'm afraid will entail major planner redesign)
we could refuse to match anything more complex than a Var or
PlaceHolderVar if it's bubbling up from the nullable side of an
outer join.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-04 Thread Jim Nasby

On 4/4/15 5:10 PM, Jim Nasby wrote:

On 4/3/15 12:59 AM, Sawada Masahiko wrote:

+case HEAPTUPLE_LIVE:
+case HEAPTUPLE_RECENTLY_DEAD:
+case HEAPTUPLE_INSERT_IN_PROGRESS:
+case HEAPTUPLE_DELETE_IN_PROGRESS:
+if (heap_prepare_freeze_tuple(tuple.t_data,
freezelimit,
+  mxactcutoff,
&frozen[nfrozen]))
+frozen[nfrozen++].offset = offnum;
+break;


This doesn't seem safe enough to me. Can't there be tuples that are
still new enough that they can't be frozen, and are still live? I don't
think it's safe to leave tuples as dead either, even if they're hinted.
The hint may not be written. Also, the patch seems to be completely
ignoring actually freezing the toast relation; I can't see how that's
actually safe.

I'd feel a heck of a lot safer if any time heap_prepare_freeze_tuple
returned false we did a second check on the tuple to ensure it was truly
frozen.

Somewhat related... instead of forcing the freeze to happen
synchronously, can't we set this up so a table is in one of three
states? Read/Write, Read Only, Frozen. AT_SetReadOnly and
AT_SetReadWrite would simply change to the appropriate state, and all
the vacuum infrastructure would continue to process those tables as it
does today. lazy_vacuum_rel would become responsible for tracking if
there were any non-frozen tuples if it was also attempting a freeze. If
it discovered there were none, AND the table was marked as ReadOnly,
then it would change the table state to Frozen and set relfrozenxid =
InvalidTransactionId and relminxid = InvalidMultiXactId. AT_SetReadWrite
could change relfrozenxid to it's own Xid as an optimization. Doing it
that way leaves all the complicated vacuum code in one place, and would
eliminate concerns about race conditions with still running
transactions, etc.

BTW, you also need to put things in place to ensure it's impossible to
unfreeze a tuple in a relation that's marked ReadOnly or Frozen. I'm not
sure what the right way to do that would be.


Answering my own question... I think visibilitymap_clear() would be the 
right place. AFAICT this is basically as critical as clearing the VM, 
and that function has the Relation, so it can see what mode the relation 
is in.


There is another possibility here, too. We can completely divorce a 
ReadOnly mode (which I think is useful for other things besides 
freezing) from the question of whether we need to force-freeze a 
relation if we create a FrozenMap, similar to the visibility map. This 
has the added advantage of helping freeze scans on relations that are 
not ReadOnly in the case of tables that are insert-mostly or any other 
pattern where most pages stay all-frozen.


Prior to the visibility map this would have been a rather daunting 
project, but I believe this could piggyback on the VM code rather 
nicely. Anytime you clear the VM you clearly must clear the FrozenMap as 
well. The logic for setting the FM is clearly different, but that would 
be entirely self-contained to vacuum. Unlike the VM, I don't see any 
point to marking special bits in the page itself for FM.


It would be nice if each bit in the FM covered multiple pages, but that 
can be optimized later.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-04 Thread Jim Nasby

On 4/3/15 12:59 AM, Sawada Masahiko wrote:

+   case HEAPTUPLE_LIVE:
+   case HEAPTUPLE_RECENTLY_DEAD:
+   case HEAPTUPLE_INSERT_IN_PROGRESS:
+   case HEAPTUPLE_DELETE_IN_PROGRESS:
+   if 
(heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
+   
  mxactcutoff, &frozen[nfrozen]))
+   frozen[nfrozen++].offset = 
offnum;
+   break;


This doesn't seem safe enough to me. Can't there be tuples that are 
still new enough that they can't be frozen, and are still live? I don't 
think it's safe to leave tuples as dead either, even if they're hinted. 
The hint may not be written. Also, the patch seems to be completely 
ignoring actually freezing the toast relation; I can't see how that's 
actually safe.


I'd feel a heck of a lot safer if any time heap_prepare_freeze_tuple 
returned false we did a second check on the tuple to ensure it was truly 
frozen.


Somewhat related... instead of forcing the freeze to happen 
synchronously, can't we set this up so a table is in one of three 
states? Read/Write, Read Only, Frozen. AT_SetReadOnly and 
AT_SetReadWrite would simply change to the appropriate state, and all 
the vacuum infrastructure would continue to process those tables as it 
does today. lazy_vacuum_rel would become responsible for tracking if 
there were any non-frozen tuples if it was also attempting a freeze. If 
it discovered there were none, AND the table was marked as ReadOnly, 
then it would change the table state to Frozen and set relfrozenxid = 
InvalidTransactionId and relminxid = InvalidMultiXactId. AT_SetReadWrite 
could change relfrozenxid to it's own Xid as an optimization. Doing it 
that way leaves all the complicated vacuum code in one place, and would 
eliminate concerns about race conditions with still running 
transactions, etc.


BTW, you also need to put things in place to ensure it's impossible to 
unfreeze a tuple in a relation that's marked ReadOnly or Frozen. I'm not 
sure what the right way to do that would be.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Robert Haas
On Apr 4, 2015, at 4:37 PM, Andrew Dunstan  wrote:
> The attached patch should set float8byval as the default on 64 bit MSVC 
> builds and error out if it is explicitly set on 32 bit platforms.

+1 for changing this.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Tom Lane
Robert Haas  writes:
> On Sat, Apr 4, 2015 at 10:27 AM, Tom Lane  wrote:
>> Having said that, I'm fine with leaving this as-is, if only because
>> it means we're exercising the --disable-float8-byval code paths in
>> the buildfarm ;-)

> But... are we?  I mean, I don't see anything in the Windows code that
> defines USE_FLOAT8_BYVAL.

Precisely --- it *isn't* getting set, even on 64-bit Windows where it
could be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Peter Geoghegan
On Sat, Apr 4, 2015 at 4:37 PM, Andrew Dunstan  wrote:
> This seems quite wrong. If we want those paths tested we should ensure that
> buildfarm members are set up with that explicit setting.
>
> I think not making this the default for 64 bit MSVC builds was simply an
> error of omission.

+1. I don't feel that having every optimization available on Windows
is worth bending over backwards for, but this omission is kind of
silly.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Andrew Dunstan


On 04/04/2015 04:27 PM, Robert Haas wrote:

On Sat, Apr 4, 2015 at 10:27 AM, Tom Lane  wrote:

I see nothing in the win32 stuff that tries to define USE_FLOAT8_BYVAL
on 64-bit windows, is this just an oversight or does it not actually
work there? or is it for on-disk compatibility with 32-bit windows?

That flag doesn't affect on-disk compatibility.  It could certainly break
third-party extensions, but we accept the same hazard on non-Windows with
equanimity.  I suspect this point simply wasn't revisited when we added
support for 64-bit Windows.

Having said that, I'm fine with leaving this as-is, if only because
it means we're exercising the --disable-float8-byval code paths in
the buildfarm ;-)

But... are we?  I mean, I don't see anything in the Windows code that
defines USE_FLOAT8_BYVAL.

(Admittedly, if we're not, I have no theory for why that patch fixed
anything, but all the same I don't see where it's getting defined.)



See src/tools/msvc/Solution.pm

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Andrew Dunstan


On 04/04/2015 10:27 AM, Tom Lane wrote:

Andrew Gierth  writes:

"Tom" == Tom Lane  writes:
  Tom> ... btw, has anyone noticed that this patch broke hamerkop and
  Tom> bowerbird?  Or at least, it's hard to see what other recent commit
  Tom> would explain the failures they're showing.
Now that Robert committed the fix for 64bit Datum w/o USE_FLOAT8_BYVAL,
bowerbird seems fixed (hamerkop hasn't run yet).
I see nothing in the win32 stuff that tries to define USE_FLOAT8_BYVAL
on 64-bit windows, is this just an oversight or does it not actually
work there? or is it for on-disk compatibility with 32-bit windows?

That flag doesn't affect on-disk compatibility.  It could certainly break
third-party extensions, but we accept the same hazard on non-Windows with
equanimity.  I suspect this point simply wasn't revisited when we added
support for 64-bit Windows.

Having said that, I'm fine with leaving this as-is, if only because
it means we're exercising the --disable-float8-byval code paths in
the buildfarm ;-)





This seems quite wrong. If we want those paths tested we should ensure 
that buildfarm members are set up with that explicit setting.


I think not making this the default for 64 bit MSVC builds was simply an 
error of omission.


The attached patch should set float8byval as the default on 64 bit MSVC 
builds and error out if it is explicitly set on 32 bit platforms.


cheers

andrew
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 714585f..764ba1e 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -25,11 +25,18 @@ sub _new
 		platform   => undef, };
 	bless($self, $classname);
 
+	$self->DeterminePlatform();
+	my $bits = $self->{platform} eq 'Win32' ? 32 : 64;
+
 	# integer_datetimes is now the default
 	$options->{integer_datetimes} = 1
 	  unless exists $options->{integer_datetimes};
 	$options->{float4byval} = 1
 	  unless exists $options->{float4byval};
+	$options->{float8byval} = ($bits == 64)
+	  unless exists $options->{float8byval};
+	die "float8byval not permitted on 32 bit platforms"
+	  if  $options->{float8byval} && $bits == 32;
 	if ($options->{xml})
 	{
 		if (!($options->{xslt} && $options->{iconv}))
@@ -56,8 +63,6 @@ sub _new
 	die "Bad wal_segsize $options->{wal_segsize}"
 	  unless grep { $_ == $options->{wal_segsize} } (1, 2, 4, 8, 16, 32, 64);
 
-	$self->DeterminePlatform();
-
 	return $self;
 }
 
diff --git a/src/tools/msvc/config_default.pl b/src/tools/msvc/config_default.pl
index e4d4810..0bee0c0 100644
--- a/src/tools/msvc/config_default.pl
+++ b/src/tools/msvc/config_default.pl
@@ -6,7 +6,10 @@ our $config = {
 	asserts => 0,# --enable-cassert
 	  # integer_datetimes=>1,   # --enable-integer-datetimes - on is now default
 	  # float4byval=>1, # --disable-float4-byval, on by default
-	  # float8byval=>0, # --disable-float8-byval, off by default
+
+	  # float8byval=> $platformbits == 64, # --disable-float8-byval,
+	  # off by default on 32 bit platforms, on by default on 64 bit platforms
+
 	  # blocksize => 8, # --with-blocksize, 8kB by default
 	  # wal_blocksize => 8, # --with-wal-blocksize, 8kB by default
 	  # wal_segsize => 16,  # --with-wal-segsize, 16MB by default

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File count restriction of directory limits number of relations inside a database.

2015-04-04 Thread Tomas Vondra
Hi,

On 4.4.2015 21:24, sudalai wrote:
> Hi,
> We have a requirement to have multiple schemas in a database.Each 
> schema will have all application tables and hold a set of users
> data. We have segmented data across different schemas. We
> dynamically increase the schemas as in when user signs up. So the
> table files created across all schemas resides in a single folder
> under data/base/16546/***. We can't find any restriction to number of
> relation(table,index) on a database in postgresql.

PostgreSQL is using 32-bit object IDs, so technically you can have up to
2^32 objects (tables, indexes). You'll certainly run into other issues
before hitting this limit, though.

> But Number of files in a folder in linux is restricted in our file
> system what we use currently 32k/64K (ext2 / ext4). To overcome this
> we create a folders inside a database folder and associate as table
> space for each schema but it breaks the streaming replication in
> standby server which we use currently.

I'm pretty sure ext4 allows more directory entries. For example I just
did this on my ext4 partition, within a new directory:

  $ for i in `seq 1 100`; do touch $i.file; done

and guess what, I do have 1M files there:

  $ ls | wc -l
  100

So either you're using some old version of ext4, or you've used some
strange parameters when creating the filesystem.

> To overcome this We changed the postgresql-9.4.0 source code to
> create a sub directory "my_" inside the location
> given in create tablespace statement and take that location as
> tablespace location. Now we specify one common location to create
> multiple tablespaces, on both slave and master because postgresql
> creates a tablespace on subdirectory(my_).
>
> Since I'm not an expert in postgresql, i can't assure that it will 
> not affect other functionality of postgres. Please help me, i
> changed the method named "create_tablespace_directories" in a file
> "tablespace.c".

This seems extremely foolish, IMNSHO.

Firstly, I'm not convinced there actually is a problem - you haven't
posted any error messages demonstrating the existence of such ext4
limits, you only claimed they exist. I've created a directory on ext4
with 1M files in it, without any problem.

Secondly, even if there is such problem on your system, it's most likely
by using improper options when creating the ext4 filesystem. Solving
this at the PostgreSQL level is a poor solution.

And finally - no one is going to confirm that your changes are safe,
without a detailed review of your patch. But that would take a lot of
time, and it seems you're solving an artificial problem, so I'd be
surprised if anyone invests the time into reviewing this.

In other words, you should probably fix the filesystem configuration,
not PostgreSQL code.

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Robert Haas
On Sat, Apr 4, 2015 at 10:27 AM, Tom Lane  wrote:
>> I see nothing in the win32 stuff that tries to define USE_FLOAT8_BYVAL
>> on 64-bit windows, is this just an oversight or does it not actually
>> work there? or is it for on-disk compatibility with 32-bit windows?
>
> That flag doesn't affect on-disk compatibility.  It could certainly break
> third-party extensions, but we accept the same hazard on non-Windows with
> equanimity.  I suspect this point simply wasn't revisited when we added
> support for 64-bit Windows.
>
> Having said that, I'm fine with leaving this as-is, if only because
> it means we're exercising the --disable-float8-byval code paths in
> the buildfarm ;-)

But... are we?  I mean, I don't see anything in the Windows code that
defines USE_FLOAT8_BYVAL.

(Admittedly, if we're not, I have no theory for why that patch fixed
anything, but all the same I don't see where it's getting defined.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] File count restriction of directory limits number of relations inside a database.

2015-04-04 Thread sudalai
Hi,
 We have a requirement to have multiple schemas in a database.Each 
schema
will have all application tables and hold a set of users data. We have
segmented data across different schemas. We dynamically increase the schemas
as in when user signs up. So the table files created across all schemas
resides in a single folder under data/base/16546/***. We can't find any
restriction to number of relation(table,index) on a database in postgresql.
But Number of files in a folder in linux is restricted in our file system
what we use currently 32k/64K (ext2 / ext4). To overcome this we create  a
folders inside a database folder and associate as table space for each
schema but it breaks the streaming replication in standby server which we
use currently.
To overcome this We changed the postgresql-9.4.0  source code to create a
sub directory "my_" inside the location given in create
tablespace statement  and take that location as tablespace location. Now we
specify one common location to create multiple tablespaces, on both slave
and master because postgresql creates a tablespace on subdirectory(my_). 
Since I'm not an expert in postgresql, i can't assure that it will not
affect other functionality of postgres. Please help me, i changed the method
named "create_tablespace_directories" in a file "tablespace.c".

Here's the modified method.

  static void
create_tablespace_directories(const char *location, const Oid
tablespaceoid)
{
char   *linkloc;
char   *location_with_version_dir;
char * newlocation;
struct stat st;

linkloc = psprintf("pg_tblspc/%u", tablespaceoid);
newlocation=psprintf("%s/my_%d",location,tablespaceoid);
location_with_version_dir = psprintf("%s/%s", newlocation,

TABLESPACE_VERSION_DIRECTORY);


/*
 * Attempt to coerce target directory to safe permissions.  If
this fails,
 * it doesn't exist or has the wrong owner.
 */
if (chmod(location, S_IRWXU) != 0)
{
if (errno == ENOENT)
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FILE),
 errmsg("directory \"%s\" does not exist",
location),
 InRecovery ? errhint("Create this directory for
the tablespace before "
  "restarting the server.")
: 0));
else
ereport(ERROR,
(errcode_for_file_access(),
  errmsg("could not set permissions on directory
\"%s\": %m",
 location)));
}
  
ereport(WARNING,(errmsg("Trying to create :
\"%s\"",newlocation)));

if(mkdir(newlocation,S_IRWXU)<0)
{
if (errno == EEXIST)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
 errmsg("directory \"%s\" already in use
as a tablespace",
newlocation)));
else
ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not create directory
\"%s\": %m",
newlocation)));
}
  
  

if (InRecovery)
{
/*
 * Our theory for replaying a CREATE is to forcibly drop the
target
 * subdirectory if present, and then recreate it. This may
be more
 * work than needed, but it is simple to implement.
 */
if (stat(location_with_version_dir, &st) == 0 &&
S_ISDIR(st.st_mode))
{
if (!rmtree(location_with_version_dir, true))
/* If this failed, mkdir() below is going to error.
*/
ereport(WARNING,
(errmsg("some useless files may be left
behind in old database directory \"%s\"",
location_with_version_dir)));
}
}

/*
 * The creation of the version directory prevents more than one
tablespace
 * in a single location.
 */
if (mkdir(location_with_version_dir, S_IRWXU) < 0)
{
if (errno == EEXIST)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
 errmsg("directory \"%s\" already in use as a
tablespace",
location_with_version_dir)));
else
ereport(ERROR,
 

Re: [HACKERS] Compile warnings on OSX 10.10 clang 6.0

2015-04-04 Thread Tom Lane
Michael Paquier  writes:
> On Sat, Apr 4, 2015 at 6:21 AM, Tom Lane  wrote:
>> It occurred to me that maybe we could just turn off this class of warning,
>> and after some experimentation I found out that
>> "-Wno-unused-command-line-argument" does that, at least in the version
>> of clang that Apple's currently shipping.
>> 
>> Who's for enabling that if the compiler takes it?

> Yes, please. I always found those pthread warnings annoying.

After a bit more experimentation I found out that for both gcc and clang
(at least in the versions I'm using, on RHEL6 and Yosemite), you can
write "-Wno-anythingatall" and the compiler will not complain about it.
(And how did *that* get by the bozo who put in this warning, I wonder.)
So that means that if we just add the obvious test

  PGAC_PROG_CC_CFLAGS_OPT([-Wno-unused-command-line-argument])

then we will end up including that in CFLAGS on pretty much every
platform, whether or not there's an actual problem to solve.

gcc *does* complain about -Wunused-command-line-argument, so a possible
answer is to test for that and then add the other to CFLAGS.  That seems
kinda grotty though, does anyone have another way?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-04 Thread Petr Jelinek

On 04/04/15 14:57, Amit Kapila wrote:


1.
+tablesample_clause:
+TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause


Why do you want to allow func_arg_list?

Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.


It will be reported as error during parse analysis if the TABLESAMPLE 
method does not accept two parameters, same as when the expression used 
wrong type for example.




SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same 

It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.



Yes I want extensibility here. And I think the tablesample method 
arguments are same thing as function arguments given that in the end 
they are arguments for the init function of tablesampling method.


I would be ok with just expr_list, naming parameters here isn't overly 
important, but ability to have different types and numbers of parameters 
for custom TABLESAMPLE methods *is* important.



In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.


Well I did attach it as separate diff mainly for that reason. I agree 
that DDL does not have to be committed immediately with the rest of the 
patch (although it's the simplest part of the patch IMHO).




2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
= 2;
ERROR:  syntax error at or near "TABLESAMPLE"
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR:  syntax error at or near "TABLESAMPLE"
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?



It's supported in the FROM part of UPDATE and USING part of DELETE. I 
think that that's sufficient.


Standard is somewhat useless for UPDATE and DELETE as it only defines 
quite limited syntax there. From what I've seen when doing research 
MSSQL also only supports it in their equivalent of FROM/USING list, 
Oracle does not seem to support their SAMPLING clause outside of SELECTs 
at all and if I got the cryptic DB2 manual correctly I think they don't 
support it outside of (sub)SELECTs either.



4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan->rs_snapshot;
Bufferbuffer = scan->rs_cbuf;

bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}

I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.


It's probably even better to do that one level up in the samplenexttup() 
and only call the SampleTupleVisible if page is not allvisible 
(PageIsAllVisible() is cheap).




6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List *sampleargs)

It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?



Eh, stupid copy-paste error when copying function name from header to 
actual source file.



7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
..
}

What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause.  If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.



Because the REPEATABLE is numeric expression so it can produce whatever 
number but we need int4 internally (well float4 would also work just the 
code would be slightly uglier). And we do this type of coercion even for 
table data (you can insert -2.3 into integer column and it will work) so 
I don't see what's wrong with it here.




8.
+DATA(insert OID = 3295 (  tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 (  tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
f f t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
_null_ ));

Datatype for second argument is kept as  700 (Float4), shouldn't
it be 1700 (Numeric)?



Why is that? Given the sampling error I think the float4 is enough for 
specifying the percentage and it makes the calculat

Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> I see nothing in the win32 stuff that tries to define
 >> USE_FLOAT8_BYVAL on 64-bit windows, is this just an oversight or
 >> does it not actually work there? or is it for on-disk compatibility
 >> with 32-bit windows?

 Tom> That flag doesn't affect on-disk compatibility.

pg_type.typbyval ?

But I don't know if anything else differs between 32-bit and 64-bit
windows that would impact on compatibility.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> ... btw, has anyone noticed that this patch broke hamerkop and
>  Tom> bowerbird?  Or at least, it's hard to see what other recent commit
>  Tom> would explain the failures they're showing.

> Now that Robert committed the fix for 64bit Datum w/o USE_FLOAT8_BYVAL,
> bowerbird seems fixed (hamerkop hasn't run yet).

> I see nothing in the win32 stuff that tries to define USE_FLOAT8_BYVAL
> on 64-bit windows, is this just an oversight or does it not actually
> work there? or is it for on-disk compatibility with 32-bit windows?

That flag doesn't affect on-disk compatibility.  It could certainly break
third-party extensions, but we accept the same hazard on non-Windows with
equanimity.  I suspect this point simply wasn't revisited when we added
support for 64-bit Windows.

Having said that, I'm fine with leaving this as-is, if only because
it means we're exercising the --disable-float8-byval code paths in
the buildfarm ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-04-04 Thread Pavel Stehule
2015-04-04 15:29 GMT+02:00 Haribabu Kommi :

> On Sat, Apr 4, 2015 at 4:19 PM, Pavel Stehule 
> wrote:
> > Hi
> >
> > 2015-03-31 14:38 GMT+02:00 Haribabu Kommi :
> >>
> >> keyword_databases - The database name can be "all", "replication",
> >> sameuser", "samerole" and "samegroup".
> >> keyword_roles - The role can be "all" and a group name prefixed with
> "+".
> >
> >
> > I am not very happy about name - but I have no better idea :( - maybe
> > "database_mask", "user_mask"
> >
>
> How about database_keywords and user_keywords as column names?
>

I am thinking, it is better than keyword_databases - it is more logical.
But it is maybe question for mathematicians.

some other variant "database_filters" and "user_filters" or
"database_predicates" and "user_predicates"

but it is not terrible nice too

Regards

Pavel


>
> >>
> >>
> >> The rest of the database and role names are treated as normal database
> >> and role names.
> >>
> >> 2. Added the code changes to identify the names with quoted.
> >
> >
> > Is it a good idea? Database's names are not quoted every else (in system
> > catalog). So now, we cannot to use join to this view.
> >
> > postgres=# select (databases)[1] from pg_hba_conf ;
> >  databases
> > ---
> >  "omega 2"
> >
> > (4 rows)
> >
> > postgres=# select datname from pg_database ;
> >   datname
> > ---
> >  template1
> >  template0
> >  postgres
> >  omega 2
> > (4 rows)
> >
> > I dislike this - we know, so the name must be quoted in file (without it,
> > the file was incorrect). And if you need quotes, there is a function
> > quote_ident. If we use quotes elsewhere, then it should be ok, bot not
> now.
> > Please, remove it. More, it is not necessary, when you use a "keyword"
> > columns.
>
> Thanks. The special handling of quotes for pg_hba_conf is not required.
> I will keep the behaviour similar to the other catalog tables.
> I will remove the quote support in the next version.
>
> Regards,
> Hari Babu
> Fujitsu Australia
>


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-04-04 Thread Haribabu Kommi
On Sat, Apr 4, 2015 at 4:19 PM, Pavel Stehule  wrote:
> Hi
>
> 2015-03-31 14:38 GMT+02:00 Haribabu Kommi :
>>
>> keyword_databases - The database name can be "all", "replication",
>> sameuser", "samerole" and "samegroup".
>> keyword_roles - The role can be "all" and a group name prefixed with "+".
>
>
> I am not very happy about name - but I have no better idea :( - maybe
> "database_mask", "user_mask"
>

How about database_keywords and user_keywords as column names?

>>
>>
>> The rest of the database and role names are treated as normal database
>> and role names.
>>
>> 2. Added the code changes to identify the names with quoted.
>
>
> Is it a good idea? Database's names are not quoted every else (in system
> catalog). So now, we cannot to use join to this view.
>
> postgres=# select (databases)[1] from pg_hba_conf ;
>  databases
> ---
>  "omega 2"
>
> (4 rows)
>
> postgres=# select datname from pg_database ;
>   datname
> ---
>  template1
>  template0
>  postgres
>  omega 2
> (4 rows)
>
> I dislike this - we know, so the name must be quoted in file (without it,
> the file was incorrect). And if you need quotes, there is a function
> quote_ident. If we use quotes elsewhere, then it should be ok, bot not now.
> Please, remove it. More, it is not necessary, when you use a "keyword"
> columns.

Thanks. The special handling of quotes for pg_hba_conf is not required.
I will keep the behaviour similar to the other catalog tables.
I will remove the quote support in the next version.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-04 Thread Sawada Masahiko
On Wed, Mar 11, 2015 at 11:46 PM, Sawada Masahiko  wrote:
> On Tue, Mar 10, 2015 at 12:08 PM, Stephen Frost  wrote:
>> Sawada,
>>
>> * Sawada Masahiko (sawada.m...@gmail.com) wrote:
>>> Thank you for your review!
>>> Attached file is the latest version (without document patch. I making it 
>>> now.)
>>> As per discussion, there is no change regarding of super user permission.
>>
>> Ok.  Here's another review.
>>
>
> Thank you for your review!
>
>>> diff --git a/src/backend/catalog/system_views.sql 
>>> b/src/backend/catalog/system_views.sql
>>> index 5e69e2b..94ee229 100644
>>> --- a/src/backend/catalog/system_views.sql
>>> +++ b/src/backend/catalog/system_views.sql
>>> @@ -414,6 +414,11 @@ CREATE RULE pg_settings_n AS
>>>
>>>  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
>>>
>>> +CREATE VIEW pg_file_settings AS
>>> +   SELECT * FROM pg_show_all_file_settings() AS A;
>>> +
>>> +REVOKE ALL on pg_file_settings FROM public;
>>> +
>>
>> This suffers from a lack of pg_dump support, presuming that the
>> superuser is entitled to change the permissions on this function.  As it
>> turns out though, you're in luck in that regard since I'm working on
>> fixing that for a mostly unrelated patch.  Any feedback you have on what
>> I'm doing to pg_dump here:
>>
>> http://www.postgresql.org/message-id/20150307213935.go29...@tamriel.snowman.net
>>
>> Would certainly be appreciated.
>>
>
> Thank you for the info.
> I will read the discussion and send the feedbacks.
>
>>> diff --git a/src/backend/utils/misc/guc-file.l 
>>> b/src/backend/utils/misc/guc-file.l
>> [...]
>>> +  * Calculate size of guc_array and allocate it. From the secound time 
>>> to allcate memory,
>>> +  * we should free old allocated memory.
>>> +  */
>>> + guc_array_size = num_guc_file_variables * sizeof(struct 
>>> ConfigFileVariable);
>>> + if (!guc_file_variables)
>>> + {
>>> + /* For the first call */
>>> + guc_file_variables = (ConfigFileVariable *) guc_malloc(FATAL, 
>>> guc_array_size);
>>> + }
>>> + else
>>> + {
>>> + guc_array = guc_file_variables;
>>> + for (item = head; item; item = item->next, guc_array++)
>>> + {
>>> + free(guc_array->name);
>>> + free(guc_array->value);
>>> + free(guc_array->filename);
>>
>> It's a minor nit-pick, as the below loop should handle anything we care
>> about, but I'd go ahead and reset guc_array->sourceline to be -1 or
>> something too, just to show that everything's being handled here with
>> regard to cleanup.  Or perhaps just add a comment saying that the
>> sourceline for all currently valid entries will be updated.
>
> Fixed.
>
>>
>>> + guc_file_variables = (ConfigFileVariable *) 
>>> guc_realloc(FATAL, guc_file_variables, guc_array_size);
>>> + }
>>
>> Nasby made a comment, I believe, that we might actually be able to use
>> memory contexts here, which would certainly be much nicer as then you'd
>> be able to just throw away the whole context and create a new one.  Have
>> you looked at that approach at all?  Offhand, I'm not sure if it'd work
>> or not (I have my doubts..) but it seems worthwhile to consider.
>>
>
> I successfully used palloc() and pfree() when allocate and free
> guc_file_variable,
> but these variable seems to be freed already when I get value of
> pg_file_settings view via SQL.
>
>> Otherwise, it seems like this would address my previous concerns that
>> this would end up leaking memory, and even if it's a bit slower than one
>> might hope, it's not an operation we should be doing very frequently.
>>
>>> --- a/src/backend/utils/misc/guc.c
>>> +++ b/src/backend/utils/misc/guc.c
>> [...]
>>>  /*
>>> + * Return the total number of GUC File variables
>>> + */
>>> +int
>>> +GetNumConfigFileOptions(void)
>>> +{
>>> + return num_guc_file_variables;
>>> +}
>>
>> What's the point of this function..?  I'm not convinced it's the best
>> idea, but it certainly looks like the function and the variable are only
>> used from this file anyway?
>
> It's unnecessary.
> Fixed.
>
>>> + if (call_cntr < max_calls)
>>> + {
>>> + char   *values[NUM_PG_FILE_SETTINGS_ATTS];
>>> + HeapTuple   tuple;
>>> + Datum   result;
>>> + ConfigFileVariable conf;
>>> + charbuffer[256];
>>
>> Isn't that buffer a bit.. excessive in size?
>
> Fixed.
>
>>
>>> diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
>>> index d3100d1..ebb96cc 100644
>>> --- a/src/include/utils/guc.h
>>> +++ b/src/include/utils/guc.h
>>> @@ -133,6 +133,14 @@ typedef struct ConfigVariable
>>>   struct ConfigVariable *next;
>>>  } ConfigVariable;
>>>
>>> +typedef struct ConfigFileVariable
>>> +{
>>> + char*name;
>>> + char*value;
>>> + char*filename;
>>> + int sourceline;
>>> +} ConfigFileVariable;
>>> +
>> [...]
>>> +extern int 

Re: [HACKERS] TABLESAMPLE patch

2015-04-04 Thread Amit Kapila
On Fri, Apr 3, 2015 at 3:06 AM, Petr Jelinek  wrote:
>
> Hi,
>
> so here is version 11.
>

Thanks, patch looks much better, but I think still few more
things needs to discussed/fixed.

1.
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause


Why do you want to allow func_arg_list?

Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.

SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same 

It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.

In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.

2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id =
2;
ERROR:  syntax error at or near "TABLESAMPLE"
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR:  syntax error at or near "TABLESAMPLE"
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?


3.
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
..
+ parser_errposition(pstate,
+ exprLocation((Node *) r;

Better to align exprLocation() with pstate

4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan->rs_snapshot;
Buffer buffer = scan->rs_cbuf;

bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}

I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.

5.
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
 List
*sampleargs)
{
..
if (con->val.type == T_Null)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("REPEATABLE
clause must be NOT NULL numeric value"),
 parser_errposition
(pstate, con->location)));
..
}

InitSamplingMethod(SampleScanState *scanstate, TableSampleClause
*tablesample)
{
..
if (fcinfo.argnull[1])
ereport(ERROR,
(errcode
(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 errmsg("REPEATABLE clause cannot be
NULL")));
..
}

I think it would be better if we can have same error message
and probably the same error code in both of the above cases.

6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
 List *sampleargs)

It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?

7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
..
}

What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause.  If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.


8.
+DATA(insert OID = 3295 (  tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t
f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 (  tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f
f t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_
));

Datatype for second argument is kept as  700 (Float4), shouldn't
it be 1700 (Numeric)?

9.
postgres=# explain SELECT t1.id FROM test_tablesample as t1 TABLESAMPLE
SYSTEM (
10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
 QUERY PLAN

 Nested Loop  (cost=0.00..4.05 rows=100 width=4)
   ->  Sample Scan on test_tablesample t1  (cost=0.00..0.01 rows=1 width=4)
   ->  Sample Scan on test_tablesample t2  (cost=0.00..4.02 rows=2 width=0)
(3 rows)

Isn't it better to display sample method name in explain.
I think it can help in case of join queries.
We can use below format to display:
Sample Scan (System) on test_tablesample ...

10. Bernoulli.c

+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan
support */

typo.
/ths/this


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-04-04 Thread David Rowley
So I've just finished reading the impressive 244 emails (so far) about
Parallel Seq scan, and I've had a quick skim over the latest patch.

Its quite exciting to think that one day we'll have parallel query in
PostgreSQL, but I have to say, that I think that there's a major point
about the proposed implementation that seems to have gotten forgotten
about, which I can't help but think won't get that far off the ground
unless more thought goes into it.

On 11 February 2015 at 09:56, Andres Freund  wrote:

> I think we're getting to the point where having a unique mapping from
> the plan to the execution tree is proving to be rather limiting
> anyway. Check for example discussion about join removal. But even for
> current code, showing only the custom plans for the first five EXPLAIN
> EXECUTEs is pretty nasty (Try explain that to somebody that doesn't know
> pg internals. Their looks are worth gold and can kill you at the same
> time) and should be done differently.
>
>
Going over the previous emails in this thread I see that it has been a long
time since anyone discussed anything around how we might decide at planning
time how many workers should be used for the query, and from the emails I
don't recall anyone proposing a good idea about how this might be done, and
I for one can't see how this is at all possible to do at planning time.

I think that the planner should know nothing of parallel query at all, and
the planner quite possibly should go completely unmodified for this patch.
One major problem I can see is that, given a query such as:

SELECT * FROM million_row_product_table WHERE category = 'ELECTRONICS';

Where we have a non-unique index on category, some plans which may be
considered might be:

1. Index scan on the category index to get all rows matching 'ELECTRONICS'
2. Sequence scan on the table, filter matching rows.
3. Parallel plan which performs a series of partial sequence scans pulling
out all matching rows.

I really think that if we end choosing things like plan 3, when plan 2 was
thrown out because of its cost, then we'll end up consuming more CPU and
I/O than we can possibly justify using. The environmentalist in me screams
that this is wrong. What if we kicked off 128 worker process on some
high-end hardware to do this? I certainly wouldn't want to pay the power
bill. I understand there's costing built in to perhaps stop this, but I
still think it's wrong headed, and we need to still choose the fastest
non-parallel plan and only consider parallelising that later.

Instead what I think should happen is:

The following link has been seen before on this thread, but I'll post it
again:
http://docs.oracle.com/cd/A57673_01/DOC/server/doc/A48506/pqoconce.htm

There's one key sentence in there that should not be ignored:

"It is important to note that the query is parallelized dynamically at
execution time."

"dynamically at execution time"... I think this also needs to happen in
PostgreSQL. If we attempt to do this parallel stuff at plan time, and we
happen to plan at some quiet period, or perhaps worse, some application's
start-up process happens to PREPARE a load of queries when the database is
nice and quite, then quite possibly we'll end up with some highly parallel
queries. Then perhaps come the time these queries are actually executed the
server is very busy... Things will fall apart quite quickly due to the
masses of IPC and context switches that would be going on.

I completely understand that this parallel query stuff is all quite new to
us all and we're likely still trying to nail down the correct
infrastructure for it to work well, so this is why I'm proposing that the
planner should know nothing of parallel query, instead I think it should
work more along the lines of:

* Planner should be completely oblivious to what parallel query is.
* Before executor startup the plan is passed to a function which decides if
we should parallelise it, and does so if the plan meets the correct
requirements. This should likely have a very fast exit path such as:

if root node's cost < parallel_query_cost_threshold
  return; /* the query is not expensive enough to attempt to make parallel
*/

The above check will allow us to have an almost zero overhead for small low
cost queries.

This function would likely also have some sort of logic in order to
determine if the server has enough spare resource at the current point in
time to allow queries to be parallelised (Likely this is not too important
to nail this down for a first implementation).

* The plan should then be completely traversed node by node to determine
which nodes can be made parallel. This would likely require an interface
function to each node which returns true or false, depending on if it's
safe to parallelise. For seq scan this could be a simple test to see if
we're scanning a temp table.

* Before any changes are made to the plan, a complete copy of it should be
made.

* Funnel nodes could then be injected below the last node in 

Re: [HACKERS] Abbreviated keys for Numeric

2015-04-04 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> ... btw, has anyone noticed that this patch broke hamerkop and
 Tom> bowerbird?  Or at least, it's hard to see what other recent commit
 Tom> would explain the failures they're showing.

Now that Robert committed the fix for 64bit Datum w/o USE_FLOAT8_BYVAL,
bowerbird seems fixed (hamerkop hasn't run yet).

I see nothing in the win32 stuff that tries to define USE_FLOAT8_BYVAL
on 64-bit windows, is this just an oversight or does it not actually
work there? or is it for on-disk compatibility with 32-bit windows?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers