Re: [HACKERS] Tracking wait event for latches

2016-09-25 Thread Thomas Munro
On Sat, Sep 24, 2016 at 1:44 AM, Michael Paquier
 wrote:
> After digesting all the comments given, I have produced the patch
> attached that uses the following categories:
> +const char *const WaitEventNames[] = {
> +   /* activity */
> +   "ArchiverMain",
> +   "AutoVacuumMain",
> +   "BgWriterHibernate",
> +   "BgWriterMain",
> +   "CheckpointerMain",
> +   "PgStatMain",
> +   "RecoveryWalAll",
> +   "RecoveryWalStream",
> +   "SysLoggerMain",
> +   "WalReceiverMain",
> +   "WalSenderMain",
> +   "WalWriterMain",
> +   /* client */
> +   "SecureRead",
> +   "SecureWrite",
> +   "SSLOpenServer",
> +   "WalReceiverWaitStart",
> +   "WalSenderWaitForWAL",
> +   "WalSenderWriteData",
> +   /* Extension */
> +   "Extension",
> +   /* IPC */
> +   "BgWorkerShutdown",
> +   "BgWorkerStartup",
> +   "ExecuteGather",
> +   "MessageQueueInternal",
> +   "MessageQueuePutMessage",
> +   "MessageQueueReceive",
> +   "MessageQueueSend",
> +   "ParallelFinish",
> +   "ProcSignal",
> +   "ProcSleep",
> +   "SyncRep",
> +   /* timeout */
> +   "BaseBackupThrottle",
> +   "PgSleep",
> +   "RecoveryApplyDelay",
> +};
> I have moved WalSenderMain as it tracks a main loop, so it was strange
> to not have it in Activity. I also kept SecureRead and SecureWrite
> because this is referring to the functions of the same name. For the
> rest, I got convinced by what has been discussed and it makes things
> clearer. My head is not spinning anymore when I try to keep in sync
> the list of names and its associated enum, which is good. I have as
> well rewritten the docs to follow that.

-extern int WaitEventSetWait(WaitEventSet *set, long timeout,
WaitEvent *occurred_events, int nevents);
-extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitEventSetWait(WaitEventSet *set, long timeout,
+  WaitEvent *occurred_events, int nevents,
+  uint8 classId, uint16 eventId);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout,
+  uint8 classId, uint16 eventId);
 extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents,
-  pgsocket sock, long timeout);
+  pgsocket sock, long timeout, uint8 classId,
+  uint16 eventId);

+ WaitLatch(MyLatch, WL_LATCH_SET, 0, WAIT_IPC, WE_MQ_RECEIVE);

If the class really is strictly implied by the WaitEventIdentifier,
then do we really need to supply it everywhere when calling the
various wait functions?  That's going to be quite a few functions:
WaitLatch, WaitLatchOrSocket, WaitEventSetWait for now, and if some
more patches out there have legs then also ConditionVariableWait,
BarrierWait, and possibly further kinds of wait points.  And then all
their call sites will have have to supply the wait class ID, even
though it is implied by the other ID.  Perhaps that array that
currently holds the names should instead hold { classId, displayName }
so we could just look it up?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] "Re: Question about grant create on database and pg_dump/pg_dumpall

2016-09-25 Thread Rafia Sabih
On Tue, Jul 5, 2016 at 06:39 AM, Haribabu Kommi
kommi(dot)haribabu(at)gmail(dot)com wrote:

Still i feel the GRANT statements should be present, as the create
database statement
is generated only with -C option. So attached patch produces the GRANT
statements based
on the -x option.


The attached patch does the job fine. However, I am a little skeptical
about this addition, since, it is clearly mentioned in the documentation of
pg_dump that it would not restore global objects, then why expecting this.
This addditon makes pg_dump -C somewhat special as now it is restoring
these grant statements. Only if we consider the popular method of
dump-restore mentioned in the thread (https://www.postgresql.org/me
ssage-id/E1VYMqi-0001P4-P4%40wrigleys.postgresql.org) with pg_dumpall -g
and then individual pg_dump, then it would be helpful to have this patch.

Regards,
Rafia Sabih
EnterpriseDB:: http://www.enterprisedb.com


Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-09-25 Thread Thomas Munro
On Mon, Sep 26, 2016 at 5:11 PM, Thomas Munro  wrote:

> It seems that the version of docbook that you get if you follow the
> instructions[1] ...
>

And I mean these instructions: [1]
https://www.postgresql.org/docs/devel/static/docguide-toolsets.html

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-09-25 Thread Thomas Munro
On Mon, Sep 26, 2016 at 1:18 PM, Thomas Munro  wrote:

> On Mon, Sep 19, 2016 at 4:02 PM, David Fetter  wrote:
> >
> > [training_wheels_004.patch]
>
> openjade:filelist.sgml:144:16:E: character "_" invalid: only parameter
> literal, "CDATA", "ENDTAG", "MD", "MS", "PI", "PUBLIC", "SDATA",
> "STARTTAG", "SYSTEM" and parameter separators allowed
> openjade:contrib.sgml:138:2:W: cannot generate system identifier for
> general entity "require"
>
> The documentation doesn't build here, I think because require_where is
> not an acceptable entity name.  It works for me if I change the
> underscore to a minus in various places.


It seems that the version of docbook that you get if you follow the
instructions[1] on CentOS is OK with the underscore in entity names, but
the version you get if you follow the instructions for macOS + MacPorts
doesn't like it.  I didn't investigate exactly which component or version
was behind that, but it's clear that other entity names use hyphens instead
of underscores, so I would recommend making this change to your patch so we
don't change the version requirements for building the docs:

diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 7ca62a4..48ca717 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -135,7 +135,7 @@ CREATE EXTENSION module_name FROM
unpackaged;
  
  
  
- _where;
+ 
  
  
  
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 8079cbd..4552273 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -141,7 +141,7 @@
 
 
 
-
+
 
 
 

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-09-25 Thread Rushabh Lathia
On Mon, Sep 26, 2016 at 5:48 AM, Thomas Munro  wrote:

> On Mon, Sep 19, 2016 at 4:02 PM, David Fetter  wrote:
> >
> > [training_wheels_004.patch]
>
> openjade:filelist.sgml:144:16:E: character "_" invalid: only parameter
> literal, "CDATA", "ENDTAG", "MD", "MS", "PI", "PUBLIC", "SDATA",
> "STARTTAG", "SYSTEM" and parameter separators allowed
> openjade:contrib.sgml:138:2:W: cannot generate system identifier for
> general entity "require"
>
> The documentation doesn't build here, I think because require_where is
> not an acceptable entity name.  It works for me if I change the
> underscore to a minus in various places.  That fixes these errors:
>
> + 
> +  Here is an example showing how to set up a database cluster with
> +  require_where.
> +
> +$ psql -U postgres
> +# SHOW shared_preload_libraries; /* Make sure not to clobber
> something by accident */
> +
> +If you found something,
> +# ALTER SYSTEM SET
> shared_preload_libraries='the,stuff,you,found,require_where';
> +
> +Otherwise,
> +# ALTER SYSTEM SET shared_preload_libraries='require_where';
> +
> +Then restart PostgreSQL
> +
> + 
>
> Could use a full stop (period) on the end of that sentence.  Also it
> shouldn't be inside the "screen" tags.  Maybe "If you found
> something," and "Otherwise," shouldn't be either, or should somehow be
> marked up so as not to appear to be text from the session.
>
> postgres=# delete from foo;
> ERROR:  DELETE requires a WHERE clause
> HINT:  To delete all rows, use "WHERE true" or similar.
>
> Maybe one of those messages could use some indication of where this is
> coming from, for surprised users encountering this non-standard
> behaviour for the first time?
>
>
+1.

I think hint message should be more clear about where its coming
from. May be it can specify the GUC name, or suggest that if you
really want to use DELETE without WHERE clause, turn OFF this
GUC? or something in similar line.


> FWIW I saw something similar enforced globally by the DBA team at a
> large company with many database users.  I think experienced users
> probably initially felt mollycoddled when they first encountered the
> error but I'm sure that some were secretly glad of its existence from
> time to time...  I think it's a useful feature for users who want it,
> and a nice little demonstration of how extensible Postgres is.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Rushabh Lathia


Re: [HACKERS] Use of SizeOfIptrData - is that obsolete?

2016-09-25 Thread Pavan Deolasee
On Fri, Sep 23, 2016 at 12:04 AM, Tom Lane  wrote:

>
> I thought removing the comment altogether was not appropriate, because
> it remains true that you want to work really hard to ensure that
> sizeof(ItemPointerData) is 6.  We're just giving up on pretense of support
> for compilers that don't believe that


Makes sense.


> .  I'm half tempted to introduce a
> StaticAssert about it, but refrained for the moment.
>
>
I also thought about that and it probably makes sense, at least to see how
buildfarm behaves. One reason to do so is that I did not find any
discussion or evidence of why SizeOfIptrData magic is no longer necessary
and to see if it was an unintentional change at some point.


> > While htup.h refactoring happened in 9.5, I don't see any point in back
> > patching this.
>
> Agreed.  Pushed to HEAD only.
>

Thanks.


Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Possibly too stringent Assert() in b-tree code

2016-09-25 Thread Amit Kapila
On Mon, Sep 26, 2016 at 8:54 AM, Amit Kapila  wrote:
> On Sun, Sep 25, 2016 at 9:31 PM, Tom Lane  wrote:
>> Amit Kapila  writes:
>>> On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane  wrote:
 This is clearly an oversight in Simon's patch fafa374f2, which introduced
 this code without any consideration for the possibility that the page
 doesn't have a valid special area. ...
 but I'm not very clear on whether this is a safe fix, mainly because
 I don't understand what the reuse WAL record really accomplishes.
 Maybe we need to instead generate a reuse record with some special
 transaction ID indicating worst-case assumptions?
>>
>>> I think it is basically to ensure that the queries on standby should
>>> not be considering the page being recycled as valid and if there is
>>> any pending query to which this page is visible, cancel it.  If this
>>> understanding is correct, then I think the fix you are proposing is
>>> correct.
>>
>> After thinking about it some more, I do not understand why we are emitting
>> WAL here at all in *any* case, either for a new page or for a deleted one
>> that we're about to recycle.  Why wouldn't the appropriate actions have
>> been taken when the page was deleted, or even before that when its last
>> tuple was deleted?
>>
>
> It seems to me that we do take actions for conflict resolution during
> the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO
> which we emit in vacuum), but not sure if that is sufficient.
> Consider a case where the new transaction is started on standby after
>

Here by new transaction, I intend to say some newer snapshot with
valid MyPgXact->xmin.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possibly too stringent Assert() in b-tree code

2016-09-25 Thread Amit Kapila
On Sun, Sep 25, 2016 at 9:31 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane  wrote:
>>> This is clearly an oversight in Simon's patch fafa374f2, which introduced
>>> this code without any consideration for the possibility that the page
>>> doesn't have a valid special area. ...
>>> but I'm not very clear on whether this is a safe fix, mainly because
>>> I don't understand what the reuse WAL record really accomplishes.
>>> Maybe we need to instead generate a reuse record with some special
>>> transaction ID indicating worst-case assumptions?
>
>> I think it is basically to ensure that the queries on standby should
>> not be considering the page being recycled as valid and if there is
>> any pending query to which this page is visible, cancel it.  If this
>> understanding is correct, then I think the fix you are proposing is
>> correct.
>
> After thinking about it some more, I do not understand why we are emitting
> WAL here at all in *any* case, either for a new page or for a deleted one
> that we're about to recycle.  Why wouldn't the appropriate actions have
> been taken when the page was deleted, or even before that when its last
> tuple was deleted?
>

It seems to me that we do take actions for conflict resolution during
the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO
which we emit in vacuum), but not sure if that is sufficient.
Consider a case where the new transaction is started on standby after
page deletion and the same still precedes the value of xact on page,
such transactions must be cancelled before we can reuse the page.  I
think the fact that before recycling the page we do ensure that no
transaction is interested in that page in master indicates something
similar is required in standby as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Supporting huge pages on Windows

2016-09-25 Thread Tsunakawa, Takayuki
Hello,

The attached patch implements huge_pages on Windows.  I'll add this to the 
CommitFest.

The performance improvement was about 2% with the following select-only 
pgbench.  The scaling factor is 200, where the database size is roughly 3GB.  I 
ran the benchmark on my Windows 10 PC with 6 CPU cores and 16GB of RAM.

  pgbench -c18 -j6 -T60 -S bench

Before running pgbench, I used pg_prewarm to cache all pgbench tables and 
indexes (excluding the history table) in the 4GB shared buffers.  The averages 
of running pgbench three times are:

  huge_pages=off: 70412 tps
  huge_pages=on : 72100 tps

The purpose of pg_ctl.c modification is to retain "Lock pages in memory" 
Windows user right in postgres.  That user right is necessary for the 
large-page support.  The current pg_ctl removes all privileges when spawning 
postgres, which is overkill.  The system administrator should be able to assign 
appropriate privileges to the PostgreSQL service account.

Credit: This patch is based on Thomas Munro's one.


Regards
Takayuki Tsunakawa



win_large_page.patch
Description: win_large_page.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-09-25 Thread Tsunakawa, Takayuki
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Thomas Munro
> Another database vendor recommends granting SeLockMemoryPrivilege so that
> it can use large pages on Windows when using several GB of buffer pool.
> I wonder if that might help Postgres on Windows.  This could be useful as
> a starting point to test that theory:
> 
> https://www.postgresql.org/message-id/CAEepm%3D075-bgHi_VDt4SCAmt%2Bo_
> %2B1XaRap2zh7XwfZvT294oHA%40mail.gmail.com

Sorry for my late reply, and thank you.  I've created a patch based on yours, 
and I'll submit it shortly in a separate thread.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stopping logical replication protocol

2016-09-25 Thread Craig Ringer
On 26 Sep. 2016 02:16, "Vladimir Gordiychuk"  wrote:
>
> >It looks like you didn't import my updated patches, so I've rebased your
new patches on top of them.
> Yes, i forgot it, sorry. Thanks for you fixes.
>
> >I did't see you explain why this was removed:
>
> -/* fast path */
> -/* Try to flush pending output to the client */
> -if (pq_flush_if_writable() != 0)
> -WalSndShutdown();
> -
> -if (!pq_is_send_pending())
> -return;
>
> I remove it, because during decode long transaction code, we always exist
from function by fast path and not receive messages from client. Now we
always include in 'for' loop and executes
> /* Check for input from the client */ ProcessRepliesIfAny();

Makes sense.
I
> >Some of the changes to pg_recvlogical look to be copied from
> >receivelog.c, most notably the functions ProcessKeepalive and
> >ProcessXLogData .
>
> During refactoring huge function I also notice It, but decide not include
additional changes in already huge patch. I only split method that do
everything to few small functions.

Good decision. This improves things already and makes future refactoring
easier.

> >I was evaluating the tests and I don't think they actually demonstrate
> >that the patch works. There's nothing done to check that
> >pg_recvlogical exited because of client-initiated CopyDone. While
> >looking into that I found that it also never actually hits
> >ProcessCopyDone(...) because libpq handles the CopyDone reply from the
> >server its self and treats it as end-of-stream. So the loop in
> >StreamLogicalLog will just end and ProcessCopyDone() is dead code.
>
> The main idea was about fast stop logical replication as it available,
because if stop replication gets more them 1 seconds it takes more pain.

You should rely on time I tests as little as possible. Some of the test
hosts are VERY slow. If possible something deterministic should be used.

> Increase this timeout to 3 or 5 second hide problems that this PR try
fix, that's why I think this lines should be restore to state from previous
patch.

Per above I don't agree.

>
> ```
> ok($spendTime <= 5, # allow extra time for slow machines
> "pg_recvlogical exited promptly on signal when decoding");
>
> ok((time() - $cancelTime) <= 3, # allow extra time for slow machines
> "pg_recvlogical exited promptly on sigint when idle"
> );
>
> ```
>
> Also I do not understand why we do
>
> $proc->pump while $proc->pumpable;
>
> after send signal to process, why we can not just wait stop replication
slot?

Because verbose output can produce a lot of writes. If we don't pump the
buffer pg_recvlogical could block writing on its socket.   Also it lets us
capture stderr which we need to observe that pg_recvlogical actually ended
copy on user command rather than just receiving all the input available.


Re: [HACKERS] store narrow values in hash indexes?

2016-09-25 Thread Robert Haas
On Sat, Sep 24, 2016 at 1:03 AM, Amit Kapila  wrote:
> On Sat, Sep 24, 2016 at 1:02 AM, Robert Haas  wrote:
>> Currently, hash indexes always store the hash code in the index, but
>> not the actual Datum.  It's recently been noted that this can make a
>> hash index smaller than the corresponding btree index would be if the
>> column is wide.  However, if the index is being built on a fixed-width
>> column with a typlen <= sizeof(Datum), we could store the original
>> value in the hash index rather than the hash code without using any
>> more space.  That would complicate the code, but I bet it would be
>> faster: we wouldn't need to set xs_recheck, we could rule out hash
>> collisions without visiting the heap, and we could support index-only
>> scans in such cases.
>
> What exactly you mean by Datum?  Is it for datatypes that fits into 64
> bits like integer.

Yeah, I mean whatever is small enough to fit into the space currently
being used to store the hashcode, along with any accompanying padding
bytes that we can also use.

> I think if we are able to support index only scans
> for hash indexes for some data types, that will be a huge plus.
> Surely, there is some benefit without index only scans as well, which
> is we can avoid recheck, but not sure if that alone can give us any
> big performance boost.  As, you say, it might lead to some
> complication in code, but I think it is worth trying.

Yeah, the recheck is probably not that expensive if we have to
retrieve the heap page anyway.

> Won't it add some requirements for pg_upgrade as well?

I have nothing to add to what Bruce already said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM's ancillary tasks

2016-09-25 Thread Thomas Munro
On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing  wrote:
> While doing this, I noticed that ALTER TABLE should also re-analyze the
> table for obvious reasons, so I have that one set
> "changes_since_analyze" to the number of rows rewritten as well.

I noticed that ATExecAlterColumnType does this:

/*

 * Drop any pg_statistic entry for the column, since it's now wrong
type
 */

RemoveStatistics(RelationGetRelid(rel), attnum);

What if there is no rewrite, because the type is binary compatible
(ATColumnChangeRequiresRewrite returns false)?  Then I think your patch
won't attract an autovacuum daemon to ANALYZE the table and produce new
statistics, because it won't touch changes_since_analyze.  I wonder if we
should simply keep the stats if no rewrite is required.  Aren't the
statistical properties of binary-compatible types also compatible?
Alternatively, we should consider bumping changes_since_analyze in this
case too, if you're going to do it in the rewrite case.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] Allowing GIN array_ops to work on anyarray

2016-09-25 Thread Enrique Meneses
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I am resending this due to an earlier error message from the mailing list:
-
These are my comments for the review of 
https://commitfest.postgresql.org/10/708/ 

I built and installed (make world / make install-world) github branch 
REL9_6_STABLE and applied the patch (patch -p1 < 
patch/gin-true-anyarray-opclass-1.patch).

I then upgraded my development database (postgres 9.5) using pg_upgrade. This 
database has one table with an UUID array gin index. The database was upgraded 
correctly to postgresql 9.6 and I was able to successfully connect to it from a 
web application which uses the database. There were no conflicts so I expect 
others to be able to upgrade without issues.

I then dropped the database and re-created it... I made sure that I no longer 
used my prior operator class definition. I re-started my web application and 
confirmed it works. This verifies the feature works as designed.

The following is no longer required:

CREATE OPERATOR CLASS _uuid_ops DEFAULT
  FOR TYPE _uuid USING gin AS
  OPERATOR 1 &&(anyarray, anyarray),
  OPERATOR 2 @>(anyarray, anyarray),
  OPERATOR 3 <@(anyarray, anyarray),
  OPERATOR 4 =(anyarray, anyarray),
  FUNCTION 1 uuid_cmp(uuid, uuid),
  FUNCTION 2 ginarrayextract(anyarray, internal, internal),
  FUNCTION 3 ginqueryarrayextract(anyarray, internal, smallint, internal, 
internal, internal, internal),
  FUNCTION 4 ginarrayconsistent(internal, smallint, anyarray, integer, 
internal, internal, internal, internal),
  STORAGE uuid;

I also ran "make installcheck-world" and all the tests passed.

I am not sure what "Spec compliant means"... so I did not select as tested or 
passed. What should I do to validate that this change is "Spec compliant"?


The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-09-25 Thread Thomas Munro
On Mon, Sep 26, 2016 at 1:18 PM, Thomas Munro
 wrote:
> underscore to a minus in various places.  That fixes these errors:

Correction: s/these errors:/the above errors./

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-09-25 Thread Thomas Munro
On Mon, Sep 19, 2016 at 4:02 PM, David Fetter  wrote:
>
> [training_wheels_004.patch]

openjade:filelist.sgml:144:16:E: character "_" invalid: only parameter
literal, "CDATA", "ENDTAG", "MD", "MS", "PI", "PUBLIC", "SDATA",
"STARTTAG", "SYSTEM" and parameter separators allowed
openjade:contrib.sgml:138:2:W: cannot generate system identifier for
general entity "require"

The documentation doesn't build here, I think because require_where is
not an acceptable entity name.  It works for me if I change the
underscore to a minus in various places.  That fixes these errors:

+ 
+  Here is an example showing how to set up a database cluster with
+  require_where.
+
+$ psql -U postgres
+# SHOW shared_preload_libraries; /* Make sure not to clobber
something by accident */
+
+If you found something,
+# ALTER SYSTEM SET
shared_preload_libraries='the,stuff,you,found,require_where';
+
+Otherwise,
+# ALTER SYSTEM SET shared_preload_libraries='require_where';
+
+Then restart PostgreSQL
+
+ 

Could use a full stop (period) on the end of that sentence.  Also it
shouldn't be inside the "screen" tags.  Maybe "If you found
something," and "Otherwise," shouldn't be either, or should somehow be
marked up so as not to appear to be text from the session.

postgres=# delete from foo;
ERROR:  DELETE requires a WHERE clause
HINT:  To delete all rows, use "WHERE true" or similar.

Maybe one of those messages could use some indication of where this is
coming from, for surprised users encountering this non-standard
behaviour for the first time?

FWIW I saw something similar enforced globally by the DBA team at a
large company with many database users.  I think experienced users
probably initially felt mollycoddled when they first encountered the
error but I'm sure that some were secretly glad of its existence from
time to time...  I think it's a useful feature for users who want it,
and a nice little demonstration of how extensible Postgres is.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] C++ port of Postgres

2016-09-25 Thread Thomas Munro
On Thu, Sep 1, 2016 at 1:41 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> [trimmed cc list because of big attachments]
>
> On 8/16/16 4:22 PM, Jim Nasby wrote:
> > Joy, do you have an idea what a *minimally invasive* patch for C++
> > support would look like? That's certainly the first step here.
>
> I developed a minimally invasive patch for C++ support a few years ago
> shortly after I wrote that blog post.  Since there appears to have been
> some interest here now, I have updated that and split it up into logical
> chunks.
>
> So here you go.
>

I looked at a random selection of these patches this morning.

> 0001-Fix-use-of-offsetof.patch

I agree that this is already invalid C because it's not constant.  It is
accepted by GCC's C front end no matter what switches you give it and
clearly many other compilers too, but not clang's C front end with
"-pedantic".

> 0002-Use-return-instead-of-exit-in-configure.patch

Makes sense.  Any reason not to #include  instead like you did
for the 0003 patch?

> 0003-Add-missing-include-files-to-configure-tests.patch

Makes sense.

> 0014-Set-up-for-static-asserts-in-C.patch

+#if __cpp_static_assert >= 201411
+#define _Static_assert(condition, errmessage) static_assert(condition,
errmessage)

Don't you mean 201103L?  C++11 introduced two-argument static_assert, not
C++14.

> 0021-Workaround-for-using-typdef-ed-ints-in-loops.patch
>
> Types made from int don't have a ++ operator in C++, so they can't be
> used in for loops without further work.

ForkNumber is not a typedef'd int: it's an enum.  Since it's called a "fork
*number*" and clearly treated as a number in various places including loops
that want to increment it, perhaps it really should be a typedef of an int,
instead of an enum.  Then either macros or an enum ForkNumberEnum could
define the names (you can assign enums to ints, and compare enums and ints,
you just can't assign ints to enums so it'd be no problem to have typedef
ForkNumber int and then enum ForkNumberEnum { ... } to define the friendly
names, but never actually use the type ForkNumberEnum).

> 0023-Add-C-linkage-to-replacement-declaration-of-fdatasyn.patch

-extern int fdatasync(int fildes);
+extern "C" int fdatasync(int fildes);

Doesn't this need to be made conditional on __cplusplus?

> 0024-Make-inet_net_-to-C-linkage.patch

Same.

> 0025-Add-C-linkage-to-functions-exported-by-plugins.patch

-extern void _PG_init(void);
+extern "C" void _PG_init(void);

Why this way in some places...

+extern "C" {
 extern void _PG_init(void);
+}

... and this way in single-function declaration cases in other places?

-char   *widget_out(WIDGET *widget);
+char   *widget_out(WIDGET * widget);

Noise.

> 0027-Hack-Disable-volatile-that-causes-mysterious-compile.patch
>
> Subject: [PATCH 27/27] Hack: Disable volatile that causes mysterious
compiler errors
I don't grok the reason for the volatile qualifier in this code the first
place but if it actually does something useful, here's one way to fix it.
The specific reason for the error reported by GCC is that QueuePosition (a
struct) is copied by assignment:

  pos = oldpos = QUEUE_BACKEND_POS(MyBackendId);

That means that it invokes the default assignment operator, and you can't
do that with a volatile and a non-volatile object either way around
according to C++ (though clang seems less fussy than GCC in this case).
You could try to untangle that by supplying a suitably qualified explicit
member operator:

#ifdef __cplusplus

QueuePosition& operator=(const volatile QueuePosition& other)

{

page = other.page;

offset = other.offset;

return *this;

}

#endif

But that doesn't help with assignments going the other way  How about
just defining a macro in the style of the existing macros and then using
that in place of all those incompatible assignment operations:

iff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 716f1c3..469018f 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -194,6 +194,12 @@ typedef struct QueuePosition
(x).offset = (z); \
} while (0)

+#define COPY_QUEUE_POS(x, y) \
+   do { \
+   (x).page = (y).page; \
+   (x).offset = (y).offset; \
+   } while (0)
+
 #define QUEUE_POS_EQUAL(x,y) \
 ((x).page == (y).page && (x).offset == (y).offset)

@@ -1757,7 +1763,8 @@ asyncQueueReadAllNotifications(void)
LWLockAcquire(AsyncQueueLock, LW_SHARED);
/* Assert checks that we have a valid state entry */
Assert(MyProcPid == QUEUE_BACKEND_PID(MyBackendId));
-   pos = oldpos = QUEUE_BACKEND_POS(MyBackendId);
+   COPY_QUEUE_POS(pos, QUEUE_BACKEND_POS(MyBackendId));
+   COPY_QUEUE_POS(oldpos, pos);
head = QUEUE_HEAD;
LWLockRelease(AsyncQueueLock);

@@ -1861,7 +1868,7 @@ asyncQueueReadAllNotifications(void)
{
/* Update shared state */

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-09-25 Thread Petr Jelinek
On 25/09/16 22:17, Tomas Vondra wrote:
> On 09/25/2016 08:48 PM, Petr Jelinek wrote:
> 
>> Slab:
>> In general it seems understandable, the initial description helps to
>> understand what's happening well enough.
>>
>> One thing I don't understand however is why the freelist is both
>> array and doubly linked list and why there is new implementation of
>> said doubly linked list given that we have dlist.
>>
> 
> Hmm, perhaps that should be explained better.
> 
> In AllocSet, we only have a global freelist of chunks, i.e. we have a
> list of free chunks for each possible size (there's 11 sizes, starting
> with 8 bytes and then doubling the size). So freelist[0] is a list of
> free 8B chunks, freelist[1] is a list of free 16B chunks, etc.
> 
> In Slab, the freelist has two levels - first there's a bitmap on each
> block (which is possible, as the chunks have constant size), tracking
> which chunks of that particular block are free. This makes it trivial to
> check that all chunks on the block are free, and free the whole block
> (which is impossible with AllocSet).
> 
> Second, the freelist at the context level tracks blocks with a given
> number of free chunks - so freelist[0] tracks completely full blocks,
> freelist[1] is a list of blocks with 1 free chunk, etc. This is used to
> reuse space on almost full blocks first, in the hope that some of the
> less full blocks will get completely empty (and freed to the OS).
> 
> Is that clear?

Ah okay, makes sense, the documentation of this could be improved then
though as it's all squashed into single sentence that wasn't quite clear
for me.

> 
>>> +/*
>>> + * SlabChunk
>>> + *The prefix of each piece of memory in an SlabBlock
>>> + *
>>> + * NB: this MUST match StandardChunkHeader as defined by
>>> utils/memutils.h.
>>> + */
>>
>> Is this true? Why? And if it is then why doesn't the SlabChunk
>> actually match the StandardChunkHeader?
> 
> It is true, a lot of stuff in MemoryContext infrastructure relies on
> that. For example when you do pfree(ptr), we actually do something like
> 
>header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))
> 
> to get the chunk header - which includes pointer to the memory context
> and other useful stuff.
> 
> This also means we can put additional fields before StandardChunkHeader
> as that does not break this pointer arithmetic, i.e. SlabChunkData is
> effectively defined like this:
> 
> typedef struct SlabChunkData
> {
> /* block owning this chunk */
> void *block;
> 
> /* standard header */
> StandardChunkHeader header;
> } SlabChunkData;
> 

Yes but your struct then does not match StandardChunkHeader exactly so
it should be explained in more detail (the aset.c where this comment is
also present has struct that matches StandardChunkHeader so it's
sufficient there).

>>
>>> +static void
>>> +SlabCheck(MemoryContext context)
>>> +{
>>> +/* FIXME */
>>> +}
>>
>> Do you plan to implement this interface?
>>
> 
> Yes, although I'm not sure what checks should go there. The only thing I
> can think of right now is checking that the number of free chunks on a
> block (according to the bitmap) matches the freelist index.
> 

Yeah this context does not seem like it needs too much checking. The
freelist vs free chunks check sounds ok to me. I guess GenSlab then will
call the checks for the underlying contexts.

>>> +#define SLAB_DEFAULT_BLOCK_SIZE8192
>>> +#define SLAB_LARGE_BLOCK_SIZE(8 * 1024 * 1024)
>>
>> I am guessing this is based on max_cached_tuplebufs? Maybe these
>> could be written with same style?
>>
> 
> Not sure I understand what you mean by "based on"? I don't quite
> remember how I came up with those constants, but I guess 8kB and 8MB
> seemed like good values.
> 
> Also, what style you mean? I've used the same style as for ALLOCSET_*
> constants in the same file.
> 

I mean using 8 * 1024 for SLAB_DEFAULT_BLOCK_SIZE so that it's more
readable.  ALLOCSET_* does that too (with exception of
ALLOCSET_SEPARATE_THRESHOLD which I have no idea why it's different from
rest of the code).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for lazy_scan_heap

2016-09-25 Thread Rahila Syed
Hello,

Some initial comments on optimize_lazy_scan_heap_v2.patch.

>-   while (next_unskippable_block < nblocks)
>+   while (next_unskippable_block < nblocks &&
>+  !FORCE_CHECK_PAGE(next_unskippable_block))

Dont  we need similar check of !FORCE_CHECK_PAGE(next_unskippable_block) in
the below code
which appears on line no 556 of vacuumlazy.c ?

next_unskippable_block = 0;
if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
{
while (next_unskippable_block < nblocks)


>   {
>if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
>break;
>+   n_all_frozen++;
>}
>else
>{
>if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
>break;
>+
>+   /* Count the number of all-frozen pages */
>+   if (vmstatus & VISIBILITYMAP_ALL_FROZEN)
>+   n_all_frozen++;
>}

I think its better to have just one place where we increment n_all_frozen
before if-else block.

>}
>vacuum_delay_point();
>next_unskippable_block++;
>+   n_skipped++;
>}
>}

Also, instead of incrementing n_skipped everytime, it can be set to
'next_unskippable_block' or 'next_unskippable_block -blkno'
once at the end of the loop.

Thank you,
Rahila Syed



On Thu, Aug 25, 2016 at 11:52 AM, Masahiko Sawada 
wrote:

> On Thu, Aug 25, 2016 at 1:41 AM, Anastasia Lubennikova
>  wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:   not tested
> > Spec compliant:   not tested
> > Documentation:not tested
> >
> > Hi,
> > I haven't tested the performance yet, but the patch itself looks pretty
> good
> > and reasonable improvement.
> > I have a question about removing the comment. It seems to be really
> tricky
> > moment. How do we know that all-frozen block hasn't changed since the
> > moment we checked it?
>
> I think that we don't need to check whether all-frozen block hasn't
> changed since we checked it.
> Even if the all-frozen block has changed after we saw it as an
> all-frozen page, we can update the relfrozenxid.
> Because any new XIDs just added to that page are newer than the
> GlobalXmin we computed.
>
> Am I missing something?
>
> In this patch, since we count the number of all-frozen page even
> during not an aggresive scan, I thought that we don't need to check
> whether these blocks were all-frozen pages.
>
> > I'm going to test the performance this week.
> > I wonder if you could send a test script or describe the steps to test
> it?
>
> This optimization reduces the number of scanning visibility map when
> table is almost visible or frozen..
> So it would be effective for very large table (more than several
> hundreds GB) which is almost all-visible or all-frozen pages.
>
> For example,
> 1. Create very large table.
> 2. Execute VACUUM FREEZE to freeze all pages of table.
> 3. Measure the execution time of VACUUM FREEZE.
> I hope that the second VACUUM FREEZE become fast a little.
>
> I modified the comment, and attached v2 patch.
>
> Regards,
>
> --
> Masahiko Sawada
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-09-25 Thread Tomas Vondra

On 09/25/2016 08:48 PM, Petr Jelinek wrote:

Hi Tomas,

On 02/08/16 17:44, Tomas Vondra wrote:


This patch actually includes two new memory allocators (not one). Very
brief summary (for more detailed explanation of the ideas, see comments
at the beginning of slab.c and genslab.c):

Slab

* suitable for fixed-length allocations (other pallocs fail)
* much simpler than AllocSet (no global freelist management etc.)
* free space is tracked per block (using a simple bitmap)
* which allows freeing the block once all chunks are freed (AllocSet
will hold the memory forever, in the hope of reusing it)

GenSlab
---
* suitable for non-fixed-length allocations, but with chunks of mostly
the same size (initially unknown, the context will tune itself)
* a combination AllocSet and Slab (or a sequence of Slab allocators)
* the goal is to do most allocations in Slab context
* there's always a single 'current' Slab context, and every now and and
then it's replaced with a new generation (with the chunk size computed
from recent requests)
* the AllocSet context is used for chunks too large for current Slab



So it's just wrapper around the other two allocators to make this
usecase easier to handle. Do you expect there will be use for the
GenSlab eventually outside of the reoderbuffer?



Yes, you might say it's just a wrapper around the other two allocators,
but it *also* includes the logic of recomputing chunk size etc.

I haven't thought about other places that might benefit from these new 
allocators very much - in general, it's useful for places that produce a 
stream of equally-sized items (GenSlab relaxes this), that are pfreed() 
in ~FIFO manner (i.e. roughly in the order of allocation).




So none of this is meant as a universal replacement of AllocSet,
but in the suitable cases the results seem really promising. For
example for the simple test query in [1], the performance
improvement is this:

N |   master |  patched
   -
1 |100ms |100ms
5 |  15000ms |350ms
   10 | 146000ms |700ms
   20 |? |   1400ms

That's a fairly significant improvement, and the submitted version
of the patches should perform even better (~2x, IIRC).



I agree that it improves performance quite nicely and that
reoderbuffer could use this.

About the code. I am not quite sure that this needs to be split into
two patches especially if 1/3 of the second patch is the removal of
the code added by the first one and otherwise it's quite small and
straightforward. That is unless you expect the GenSlab to not go in.



I don't know - it seemed natural to first introduce the Slab, as it's 
easier to discuss it separately, and it works for 2 of the 3 contexts 
needed in reorderbuffer.


GenSlab is an improvement of Slab, or rather based on it, so that it 
works for the third context. And it introduces some additional ideas 
(particularly the generational design, etc.)


Of course, none of this means it has to be committed in two separate 
chunks, or that I don't expect GenSlab to get committed ...



Slab:
In general it seems understandable, the initial description helps to
understand what's happening well enough.

One thing I don't understand however is why the freelist is both
array and doubly linked list and why there is new implementation of
said doubly linked list given that we have dlist.



Hmm, perhaps that should be explained better.

In AllocSet, we only have a global freelist of chunks, i.e. we have a 
list of free chunks for each possible size (there's 11 sizes, starting 
with 8 bytes and then doubling the size). So freelist[0] is a list of 
free 8B chunks, freelist[1] is a list of free 16B chunks, etc.


In Slab, the freelist has two levels - first there's a bitmap on each 
block (which is possible, as the chunks have constant size), tracking 
which chunks of that particular block are free. This makes it trivial to 
check that all chunks on the block are free, and free the whole block 
(which is impossible with AllocSet).


Second, the freelist at the context level tracks blocks with a given 
number of free chunks - so freelist[0] tracks completely full blocks, 
freelist[1] is a list of blocks with 1 free chunk, etc. This is used to 
reuse space on almost full blocks first, in the hope that some of the 
less full blocks will get completely empty (and freed to the OS).


Is that clear?


+/*
+ * SlabContext is our standard implementation of MemoryContext.
+ *


Really?



Meh, that's clearly a bogus comment.


+/*
+ * SlabChunk
+ * The prefix of each piece of memory in an SlabBlock
+ *
+ * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
+ */


Is this true? Why? And if it is then why doesn't the SlabChunk
actually match the StandardChunkHeader?


It is true, a lot of stuff in MemoryContext infrastructure relies on 
that. For example when you do pfree(ptr), we actually do something like


   header = 

Re: [HACKERS] Better tracking of free space during SP-GiST index build

2016-09-25 Thread Tomas Vondra

On 09/25/2016 08:33 PM, Oleg Bartunov wrote:

On Sat, Sep 24, 2016 at 11:32 PM, Tomas Vondra
 wrote:

On 09/22/2016 07:37 PM, Tom Lane wrote:


Tomas Vondra  writes:


... I've tried increasing the cache size to 768
entries, with vast majority of them (~600) allocated to leaf pages.
Sadly, this seems to only increase the CREATE INDEX duration a bit,
without making the index significantly smaller (still ~120MB).



Yeah, that's in line with my results: not much further gain from a
larger cache.  Though if you were testing with the same IRRExplorer
data, it's not surprising that our results would match.  Would be
good to try some other cases...



Agreed, but I don't have any other data sets at hand. One possibility would
be to generate something randomly (e.g. it's not particularly difficult to
generate random IP addresses), but I'd much rather use some real-world data
sets.


Tomas, I have one real dataset, which I used for testing spgist
(https://www.postgresql.org/message-id/caf4au4zxd2xov0a__fu7xohxsiwjzm1z2xhs-ffat1dzb9u...@mail.gmail.com)
Let me know if you need it.



Sure, that would be useful.

I think it would be useful to make repository of such data sets, so that 
patch authors & reviewers can get a reasonable collection of data sets 
if needed, instead of scrambling every time. Opinions?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-09-25 Thread Petr Jelinek
Hi Tomas,

On 02/08/16 17:44, Tomas Vondra wrote:
> 
> This patch actually includes two new memory allocators (not one). Very
> brief summary (for more detailed explanation of the ideas, see comments
> at the beginning of slab.c and genslab.c):
> 
> Slab
> 
> * suitable for fixed-length allocations (other pallocs fail)
> * much simpler than AllocSet (no global freelist management etc.)
> * free space is tracked per block (using a simple bitmap)
> * which allows freeing the block once all chunks are freed (AllocSet
> will hold the memory forever, in the hope of reusing it)
> 
> GenSlab
> ---
> * suitable for non-fixed-length allocations, but with chunks of mostly
> the same size (initially unknown, the context will tune itself)
> * a combination AllocSet and Slab (or a sequence of Slab allocators)
> * the goal is to do most allocations in Slab context
> * there's always a single 'current' Slab context, and every now and and
> then it's replaced with a new generation (with the chunk size computed
> from recent requests)
> * the AllocSet context is used for chunks too large for current Slab
> 

So it's just wrapper around the other two allocators to make this
usecase easier to handle. Do you expect there will be use for the
GenSlab eventually outside of the reoderbuffer?

> So none of this is meant as a universal replacement of AllocSet, but in
> the suitable cases the results seem really promising. For example for
> the simple test query in [1], the performance improvement is this:
> 
> N |   master |  patched
>-
> 1 |100ms |100ms
> 5 |  15000ms |350ms
>10 | 146000ms |700ms
>20 |? |   1400ms
> 
> That's a fairly significant improvement, and the submitted version of
> the patches should perform even better (~2x, IIRC).
> 

I agree that it improves performance quite nicely and that reoderbuffer
could use this.

About the code. I am not quite sure that this needs to be split into two
patches especially if 1/3 of the second patch is the removal of the code
added by the first one and otherwise it's quite small and
straightforward. That is unless you expect the GenSlab to not go in.

Slab:
In general it seems understandable, the initial description helps to
understand what's happening well enough.

One thing I don't understand however is why the freelist is both array
and doubly linked list and why there is new implementation of said
doubly linked list given that we have dlist.

> +/*
> + * SlabContext is our standard implementation of MemoryContext.
> + *

Really?

> +/*
> + * SlabChunk
> + *   The prefix of each piece of memory in an SlabBlock
> + *
> + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
> + */

Is this true? Why? And if it is then why doesn't the SlabChunk actually
match the StandardChunkHeader?

> +#define SlabPointerIsValid(pointer) PointerIsValid(pointer)

What's the point of this given that it's defined in the .c file?


> +static void *
> +SlabAlloc(MemoryContext context, Size size)
> +{
> + Slabset = (Slab) context;
> + SlabBlock   block;
> + SlabChunk   chunk;
> + int idx;
> +
> + AssertArg(SlabIsValid(set));
> +
> + Assert(size == set->chunkSize);

I wonder if there should be stronger protection (ie, elog) for the size
matching.


> +static void *
> +SlabRealloc(MemoryContext context, void *pointer, Size size)
> +{
> + Slabset = (Slab)context;
> +
> + /* can't do actual realloc with slab, but let's try to be gentle */
> + if (size == set->chunkSize)
> + return pointer;
> +
> + /* we can't really do repalloc with this allocator */
> + Assert(false);

This IMHO should definitely be elog.


> +static void
> +SlabCheck(MemoryContext context)
> +{
> + /* FIXME */
> +}

Do you plan to implement this interface?

> +#define SLAB_DEFAULT_BLOCK_SIZE  8192
> +#define SLAB_LARGE_BLOCK_SIZE(8 * 1024 * 1024)

I am guessing this is based on max_cached_tuplebufs? Maybe these could
be written with same style?

GenSlab:

Since this is relatively simple wrapper it looks mostly ok to me. The
only issue I have here is that I am not quite sure about those FIXME
functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to
mcxt but I guess the alternative there is to error out so this is
probably preferable. Would want to hear other opinions here.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better tracking of free space during SP-GiST index build

2016-09-25 Thread Oleg Bartunov
On Sat, Sep 24, 2016 at 11:32 PM, Tomas Vondra
 wrote:
> On 09/22/2016 07:37 PM, Tom Lane wrote:
>>
>> Tomas Vondra  writes:
>>
>>> ... I've tried increasing the cache size to 768
>>> entries, with vast majority of them (~600) allocated to leaf pages.
>>> Sadly, this seems to only increase the CREATE INDEX duration a bit,
>>> without making the index significantly smaller (still ~120MB).
>>
>>
>> Yeah, that's in line with my results: not much further gain from a
>> larger cache.  Though if you were testing with the same IRRExplorer
>> data, it's not surprising that our results would match.  Would be
>> good to try some other cases...
>>
>
> Agreed, but I don't have any other data sets at hand. One possibility would
> be to generate something randomly (e.g. it's not particularly difficult to
> generate random IP addresses), but I'd much rather use some real-world data
> sets.

Tomas, I have one real dataset, which I used for testing spgist
(https://www.postgresql.org/message-id/caf4au4zxd2xov0a__fu7xohxsiwjzm1z2xhs-ffat1dzb9u...@mail.gmail.com)
Let me know if you need it.

>
>>>
>>>
>>> One thing I'd change is making the SpGistLUPCache dynamic, i.e.
>>> storing the size and lastUsedPagesMap on the meta page. That
>>> should allow us resizing the cache and tweak lastUsedPagesMap in
>>> the future.
>>
>>
>> Yeah, probably a good idea. I had thought of bumping
>> SPGIST_MAGIC_NUMBER again if we want to revisit the cache size; but
>> keeping it as a separate field won't add noticeable cost, and it
>> might save some trouble.
>>
>
> I see you plan to track only the cache size, while I proposed to track also
> the map, i.e. number of pages per category. I think that'd useful in case we
> come up with better values (e.g. more entries for leaf pages), or even
> somewhat adaptive way.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stopping logical replication protocol

2016-09-25 Thread Vladimir Gordiychuk
>It looks like you didn't import my updated patches, so I've rebased your
new patches on top of them.
Yes, i forgot it, sorry. Thanks for you fixes.

>I did't see you explain why this was removed:

-/* fast path */
-/* Try to flush pending output to the client */
-if (pq_flush_if_writable() != 0)
-WalSndShutdown();
-
-if (!pq_is_send_pending())
-return;

I remove it, because during decode long transaction code, we always exist
from function by fast path and not receive messages from client. Now we
always include in 'for' loop and executes
/* Check for input from the client */ ProcessRepliesIfAny();


>Some of the changes to pg_recvlogical look to be copied from
>receivelog.c, most notably the functions ProcessKeepalive and
>ProcessXLogData .

During refactoring huge function I also notice It, but decide not include
additional changes in already huge patch. I only split method that do
everything to few small functions.

>I was evaluating the tests and I don't think they actually demonstrate
>that the patch works. There's nothing done to check that
>pg_recvlogical exited because of client-initiated CopyDone. While
>looking into that I found that it also never actually hits
>ProcessCopyDone(...) because libpq handles the CopyDone reply from the
>server its self and treats it as end-of-stream. So the loop in
>StreamLogicalLog will just end and ProcessCopyDone() is dead code.

The main idea was about fast stop logical replication as it available,
because if stop replication gets more them 1 seconds it takes more pain.
That's why my tests not contained check pg_reclogincal output, only time
spend on stop replication(CopyDone exchange).
Increase this timeout to 3 or 5 second hide problems that this PR try fix,
that's why I think this lines should be restore to state from previous
patch.

```
ok($spendTime <= 5, # allow extra time for slow machines
"pg_recvlogical exited promptly on signal when decoding");

ok((time() - $cancelTime) <= 3, # allow extra time for slow machines
"pg_recvlogical exited promptly on sigint when idle"
);

```

Also I do not understand why we do

$proc->pump while $proc->pumpable;

after send signal to process, why we can not just wait stop replication
slot?



2016-09-23 8:01 GMT+03:00 Craig Ringer :

> On 19 September 2016 at 07:12, Vladimir Gordiychuk 
> wrote:
> > New patch in attach. 0001 and 0002 without changes.
> > 0003 - patch contain improvements for pg_recvloginca, now pg_recvloginca
> > after receive SIGINT will send CopyDone package to postgresql and wait
> from
> > server CopyDone. For backward compatible after repeat send SIGINT
> > pg_recvloginca will continue immediately without wait CopyDone from
> server.
> > 0004 patch contain regression tests on scenarios that fix 0001 and 0002
> > patches.
>
> Great.
>
> Thanks for this.
>
>
> First observation (which I'll fix in updated patch):
>
>
>
> It looks like you didn't import my updated patches, so I've rebased
> your new patches on top of them.
>
> Whitespace issues:
>
> $ git am ~/Downloads/0003-Add-ability-for-pg_recvlogical-safe-stop-
> replication.patch
> Applying: Add ability for pg_recvlogical safe stop replication
> /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:140: indent
> with spaces.
>msgBuf + hdr_len + bytes_written,
> /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:550: indent
> with spaces.
> /* Backward compatible, allow force interrupt logical replication
> /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:551: indent
> with spaces.
>  * after second SIGINT without wait CopyDone from server
> /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:552: indent
> with spaces.
>  */
> warning: 4 lines add whitespace errors.
>
>
> Remember to use "git log --check" before sending patches.
>
> Also, comment style, please do
>
> /* this */
>
> or
>
> /*
>  * this
>  */
>
> not
>
> /* this
>  */
>
>
>
>
> I did't see you explain why this was removed:
>
> -/* fast path */
> -/* Try to flush pending output to the client */
> -if (pq_flush_if_writable() != 0)
> -WalSndShutdown();
> -
> -if (!pq_is_send_pending())
> -return;
>
>
>
> I fixed a warning introduced here:
>
>
> pg_recvlogical.c: In function ‘ProcessXLogData’:
> pg_recvlogical.c:289:2: warning: ISO C90 forbids mixed declarations
> and code [-Wdeclaration-after-statement]
>   int bytes_left = msgLength - hdr_len;
>   ^
>
>
> Some of the changes to pg_recvlogical look to be copied from
> receivelog.c, most notably the functions ProcessKeepalive and
> ProcessXLogData . I thought that rather than copying them, shouldn't
> the existing ones be modified to meet your needs? But it looks like
> the issue is that a fair chunk of the rest of pg_recvlogical doesn't
> re-use code from receivelog.c either; for example, pg_recvlogical's
> sendFeedback differs from receivelog.c's sendFeedback. So
> 

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-09-25 Thread David Steele
On 9/3/16 8:36 AM, Michael Paquier wrote:
>
> Attached is a new series:

* [PATCH 1/8] Refactor SHA functions and move them to src/common/

I'd like to see more code comments in sha.c (though I realize this was
copied directly from pgcrypto.)

I tested by building with and without --with-openssl and running make
check for the project as a whole and the pgcrypto extension.

I notice that the copyright from pgcrypto/sha1.c was carried over but
not the copyright from pgcrypto/sha2.c.  I'm no expert on how this
works, but I believe the copyright from sha2.c must be copied over.

Also, are there any plans to expose these functions directly to the user
without loading pgcrypto?  Now that the functionality is in core it
seems that would be useful.  In addition, it would make this patch stand
on its own rather than just being a building block

* [PATCH 2/8] Move encoding routines to src/common/

I wonder if it is confusing to have two of encode.h/encode.c.  Perhaps
they should be renamed to make them distinct?

* [PATCH 3/8] Switch password_encryption to a enum

Does not apply on HEAD (98c2d3332):

error: patch failed: src/backend/commands/user.c:139
error: src/backend/commands/user.c: patch does not apply
error: patch failed: src/include/commands/user.h:15
error: src/include/commands/user.h: patch does not apply

For here on I used 39b691f251 for review and testing.

I seems you are keeping on/off for backwards compatibility, shouldn't
the default now be "md5"?

-#password_encryption = on
+#password_encryption = on  # on, off, md5 or plain

* [PATCH 4/8] Refactor decision-making of password encryption into a
single routine

+++ b/src/backend/commands/user.c
+   new_record[Anum_pg_authid_rolpassword - 1] =
+   CStringGetTextDatum(encrypted_passwd);

pfree(encrypted_passwd) here or let it get freed with the context?

* [PATCH 5/8] Create generic routine to fetch password and valid until
values for a role

Couldn't md5_crypt_verify() be made more general and take the hash type?
 For instance, password_crypt_verify() with the last param as the new
password type enum.

* [PATCH 6/8] Support for SCRAM-SHA-256 authentication

+++ b/contrib/passwordcheck/passwordcheck.c
+   case PASSWORD_TYPE_SCRAM:
+   /* unfortunately not much can be done here */
+   break;

Why can't we at least do the same check as md5 to make sure the username
was not used as the password?

+++ b/src/backend/libpq/auth.c
+* without relying on the length word, but we hardly care about protocol
+* version or older anymore.)

Do you mean protocol version 2 or older?

+++ b/src/backend/libpq/crypt.c
return STATUS_ERROR;/* empty password */
+

Looks like a stray LF.

+++ b/src/backend/parser/gram.y
+   SAVEPOINT SCHEMA SCRAM SCROLL SEARCH SECOND_P SECURITY SELECT SEQUENCE

Doesn't this belong in patch 7?  Even in patch 7 it doesn't appear that
SCRAM is a keyword since the protocol specified after USING is quoted.

I tested this patch using both md5 and scram and was able to get both of
them to working separately.

However, it doesn't look like they can be used in conjunction since the
pg_hba.conf entry must specify either m5 or scram (though the database
can easily contain a mixture).  This would probably make a migration
very unpleasant.

Is there any chance of a mixed mode that will allow new passwords to be
set as scram while still honoring the old md5 passwords? Or does that
cause too many complications with the protocol?

* [PATCH 7/8] Add clause PASSWORD val USING protocol to CREATE/ALTER ROLE

+++ b/doc/src/sgml/ref/create_role.sgml
+Sets the role's password using the wanted protocol.

How about "Sets the role's password using the requested procotol."

+an unencrypted password.   If the presented password string is
already
+in MD5-encrypted or SCRAM-encrypted format, then it is stored
encrypted
+as-is.

How about, "If the password string is..."

* [PATCH 8/8] Add regression tests for passwords

OK.

On the whole I find this patch set easier to digest than what was
submitted for 9.6.  It is more targeted but still provides very valuable
functionality.

I'm a bit concerned that a mixture of md5/scram could cause confusion
and think this may warrant discussion somewhere in the documentation
since the idea is for users to migrate from md5 to scram.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CommitFest 2016-09 status summary

2016-09-25 Thread Fabrízio de Royes Mello
Hi all,

The current status summary is:

Needs review: 65
Waiting on author: 34
Ready for Commiter: 9
Commited: 88
Moved to next CF: 1
Rejected: 11
Returned with feedback: 11
TOTAL: 219

The current progress is ~51%. We're in the last week of this CF and we
still with about ~30% of patches needing review and ~16% waiting on author
to take an action. So if you have time please consider reviewing at least
one patch. We need your help!

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Possibly too stringent Assert() in b-tree code

2016-09-25 Thread Tom Lane
Amit Kapila  writes:
> On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane  wrote:
>> This is clearly an oversight in Simon's patch fafa374f2, which introduced
>> this code without any consideration for the possibility that the page
>> doesn't have a valid special area. ...
>> but I'm not very clear on whether this is a safe fix, mainly because
>> I don't understand what the reuse WAL record really accomplishes.
>> Maybe we need to instead generate a reuse record with some special
>> transaction ID indicating worst-case assumptions?

> I think it is basically to ensure that the queries on standby should
> not be considering the page being recycled as valid and if there is
> any pending query to which this page is visible, cancel it.  If this
> understanding is correct, then I think the fix you are proposing is
> correct.

After thinking about it some more, I do not understand why we are emitting
WAL here at all in *any* case, either for a new page or for a deleted one
that we're about to recycle.  Why wouldn't the appropriate actions have
been taken when the page was deleted, or even before that when its last
tuple was deleted?

In other words, if I'm left to fix it, I will do so by reverting
fafa374f2 lock stock and barrel.  I think it was ill-considered and
unnecessary.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Development build with uuid-ossp support - macOS

2016-09-25 Thread Tom Lane
Enrique M  writes:
> Is there a way to build postgresql and install with uuid-ossp without
> having to build the documentation? I don't really need the documentation
> for my test.

Use "make all contrib" instead of "make world".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Development build with uuid-ossp support - macOS

2016-09-25 Thread Enrique M
I am going to try and switch to macports instead... I see the documentation
provides the macports command for installing the toolset...
https://www.postgresql.org/docs/9.6/static/docguide-toolsets.html  Thank
you.

On Sat, Sep 24, 2016 at 4:52 PM Enrique M 
wrote:

> I am trying to do a macOS build of postgresql (9.6 stable branch from
> github) with the uuid-ossp contrib by typing "make world" but it fails due
> to an openjade error (I did install openjade using homebrew using this
> setup https://github.com/petere/homebrew-sgml).
>
> Is there a way to build postgresql and install with uuid-ossp without
> having to build the documentation? I don't really need the documentation
> for my test.
>
> Thank you,
>
> Enrique
>
>
>


Re: [HACKERS] pg_basebackup creates a corrupt file for pg_stat_tmp and pg_replslot on a backup location

2016-09-25 Thread Ashutosh Sharma
Hi Peter,

> I just wanted to update you, I have taken this commit fest entry patch
> to review because I think it will be addresses as part of "Exclude
> additional directories in pg_basebackup", which I'm also reviewing.
> Therefore, I'm not actually planning on discussing this patch further.
> Please correct me if this assessment does not match your expectations.

Thanks for the update. I am absolutely OK with it. I feel it would be
a good idea to review "Exclude additional directories in
pg_basebackup" which also addresses the issue reported by me.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-25 Thread Mithun Cy
I have some more comments on libpq-failover-8.patch

+ /* FIXME need to check that port is numeric */

Is this still applicable?.

1)

+ /*

+ * if there is more than one host in the connect string and

+ * target_server_type is not specified explicitely, set

+ * target_server_type to "master"

+ */

I think we need comments to know why we change default value based on
number of elements in connection string. why default in not “any" when node
count > 1.


2)

+ /* loop over all the host specs in the node variable */

+ for (node = nodes; node->host != NULL || node->port != NULL; node++)

  {

I think instead of loop if we can move these code into a separate function say
pg_add_to_addresslist(host, port, ) this helps in removing temp
variables like "struct node info” and several heap calls around it.


3)

+static char *

+get_port_from_addrinfo(struct addrinfo * ai)

+{

+ char port[6];

+

+ sprintf(port, "%d", htons(((struct sockaddr_in *)
ai->ai_addr)->sin_port));

+ return strdup(port);

+}

Need some comments for this function.

We use strdup in many places no where we handle memory allocation failure.


4)

+ /*

+ * consult connection options and check if RO connection is OK RO

+ * connection is OK if readonly connection is explicitely

+ * requested or if replication option is set, or just one host is

+ * specified in the connect string.

+ */

+ if (conn->pghost == NULL || !conn->target_server_type ||

+ strcmp(conn->target_server_type, "any") == 0)

Comments not in sink with code.

On Wed, Sep 14, 2016 at 12:28 AM, Robert Haas  wrote:

> On Fri, Sep 9, 2016 at 4:49 AM, Victor Wagner  wrote:
> > Random permutation is much more computationally complex than random
> > picking.
>
> It really isn't.  The pseudocode given at
> https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle is all of 4
> lines long, and one of those lines is a comment.  The C implementation
> is longer, but not by much.
>
> Mind you, I haven't read the patch, so I don't know whether using
> something like this would make the code simpler or more complicated.
> However, if the two modes of operation are (1) try all hosts in random
> order and (2) try all hosts in the listed order, it's worth noting
> that the code for the second thing can be used to implement the first
> thing just by shuffling the list before you begin.
>
> > So, using random permutation instead  of random pick wouln't help.
> > And keeping somewhere number of elements in the list wouldn't help
> > either unless we change linked list to completely different data
> > structure which allows random access.
>
> Is there a good reason not to use an array rather than a linked list?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-09-25 Thread Ashutosh Sharma
Hi All,

> I forgot to mention that Ashutosh has tested this patch for a day
> using Jeff's tool and he didn't found any problem.  Also, he has found
> a way to easily reproduce the problem.  Ashutosh, can you share your
> changes to the script using which you have reproduce the problem?

I made slight changes in Jeff Janes patch (crash_REL10.patch) to
reproduce the issue reported by him earlier [1]. I changed the
crash_REL10.patch such that a torn page write happens only for a
bitmap page in hash index. As described by Amit in [2] & [3] we were
not logging full page image for bitmap page and that's the reason we
were not be able to restore bitmap page in case it is a torn page
which would eventually result in below error.

38422  01000 2016-09-19 16:25:50.061 PDT:WARNING:  page verification
failed, calculated checksum 65067 but expected 21260
38422  01000 2016-09-19 16:25:50.061 PDT:CONTEXT:  xlog redo at
3F/22053B50 for Hash/ADD_OVFL_PAGE: bmsize 4096, bmpage_found T
38422  XX001 2016-09-19 16:25:50.071 PDT:FATAL:  invalid page in block
9 of relation base/16384/17334

Jeff Jane's original patch had following conditions for a torn page write:

if (JJ_torn_page > 0 && counter++ > JJ_torn_page && !RecoveryInProgress())
nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ/3);

Ashutosh has added some more condition's in above if( ) to ensure that
the issue associated with bitmap page gets reproduced easily:

if (JJ_torn_page > 0 && counter++ > JJ_torn_page &&
!RecoveryInProgress() && bucket_opaque->hasho_page_id == HASHO_PAGE_ID
&& bucket_opaque->hasho_flag & LH_BITMAP_PAGE)
nbytes = FileWrite(v->mdfd_vfd, buffer, 24);

Also, please note that i am allowing only 24 bytes of data i.e. just a
page header to be written into a disk file so that even if a single
overflow page is added into hash index table the issue will be
observed.

Note: You can find Jeff Jane's patch for torn page write at - [4].

[1] - 
https://www.postgresql.org/message-id/CAMkU%3D1zGcfNTWWxnud8j_p0vb1ENGcngkwqZgPUEnwZmAN%2BXQw%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1LmQZGnYhSHXDDCOsSb_0U-gsxReEmSDRgCZr%3DAdKbTEg%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/CAA4eK1%2Bk9tGPw7vOACRo%2B4h5n%3DutHnSuGgMoJrLANybAjNBW9w%40mail.gmail.com
[4] - 
https://www.postgresql.org/message-id/CAMkU%3D1xRt8jBBB7g_7K41W00%3Dbr9UrxMVn_rhWhKPLaHfEdM5A%40mail.gmail.com

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-25 Thread Mark Kirkwood



On 25/09/16 18:18, Amit Kapila wrote:

On Sat, Sep 24, 2016 at 10:49 PM, Greg Stark  wrote:

On Thu, Sep 22, 2016 at 3:23 AM, Tom Lane  wrote:

But to kick the hash AM as such to the curb is to say
"sorry, there will never be O(1) index lookups in Postgres".

Well there's plenty of halfway solutions for that. We could move hash
indexes to contrib or even have them in core as experimental_hash or
unlogged_hash until the day they achieve their potential.

We definitely shouldn't discourage people from working on hash indexes


Okay, but to me it appears that naming it as experimental_hash or
moving it to contrib could discourage people or at the very least
people will be less motivated.  Thinking on those lines a year or so
back would have been a wise direction, but now when already there is
lot of work done (patches to make it wal-enabled, more concurrent and
performant, page inspect module are available) for hash indexes and
still more is in progress, that sounds like a step backward then step
forward.



+1

I think so too - I've seen many email threads over the years on this 
list that essentially state "we need hash indexes wal logged to make 
progress with them"...and Amit et al has/have done this (more than this 
obviously - made 'em better too) and I'm astonished that folk are 
suggesting anything other than 'commit this great patch now!'...


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers