Re: BUG #15548: Unaccent does not remove combining diacritical characters

2019-02-16 Thread Ramanarayana
Hi Hugh,

The patch I submitted was tested both in python 2 and 3 and it worked for
me.The single line of code
added in the patch runs only in python 3. I dont think it can break
python2. Would like to see the error you got in python 2   Good to know the
reported issue  is a valid one in windows.I tested your patch as well and
it is also working fine.
-- 
Cheers
Ram 4.0


Re: Actual Cost

2019-02-16 Thread Donald Dong
On Feb 16, 2019, at 9:31 PM, Tom Lane  wrote:
> 
> Donald Dong  writes:
>> On Feb 16, 2019, at 6:44 PM, Tomas Vondra wrote:
>>> I don't quite understand what is meant by "actual cost metric" and/or
>>> how is that different from running EXPLAIN ANALYZE.
> 
>> Here is an example:
> 
>> Hash Join  (cost=3.92..18545.70 rows=34 width=32) (actual cost=3.92..18500 
>> time=209.820..1168.831 rows=47 loops=3)
> 
>> Now we have the actual time. Time can have a high variance (a change
>> in system load, or just noises), but I think the actual cost would be
>> less likely to change due to external factors.
> 
> I'm with Tomas: you have not explained what you think those
> numbers mean.

Yeah, I was considering the actual cost to be the output of the cost
model given the actual rows and pages after we instrument the
execution: plug in the values which are no longer estimations.

For a hash join, we could use the actual inner_rows_total to get the
actual cost. For a seqscan, we can use the actual rows to get the
actual CPU cost.

regards,
Donald Dong




Re: Actual Cost

2019-02-16 Thread Tom Lane
Donald Dong  writes:
> On Feb 16, 2019, at 6:44 PM, Tomas Vondra wrote:
>> I don't quite understand what is meant by "actual cost metric" and/or
>> how is that different from running EXPLAIN ANALYZE.

> Here is an example:

> Hash Join  (cost=3.92..18545.70 rows=34 width=32) (actual cost=3.92..18500 
> time=209.820..1168.831 rows=47 loops=3)

> Now we have the actual time. Time can have a high variance (a change
> in system load, or just noises), but I think the actual cost would be
> less likely to change due to external factors.

I'm with Tomas: you have not explained what you think those
numbers mean.

regards, tom lane



Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2019-02-16 Thread Noah Misch
On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
> On further study, I was able to reproduce data loss

> Fixes are available:
> 
> a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
>wrong; I will correct it in a separate message.)

Here's a corrected version.  I now delete a segment only if both its first
page and its last page are considered to precede the cutoff; see the new
comment at SlruMayDeleteSegment().
commit 09393a1 (HEAD)
Author: Noah Misch 
AuthorDate: Sat Feb 16 20:02:51 2019 -0800
Commit: Noah Misch 
CommitDate: Sat Feb 16 20:02:51 2019 -0800

Don't round SimpleLruTruncate() cutoffPage values.

Every core SLRU wraps around.  The rounding did not account for that; in
rare cases, it permitted deletion of the most recently-populated page of
SLRU data.  This closes a rare opportunity for data loss, which
manifested as "could not access status of transaction" errors.  If a
user's physical replication primary logged ": apparent wraparound"
messages, the user should rebuild that primary's standbys regardless of
symptoms.  At less risk is a cluster having emitted "not accepting
commands" errors or "must be vacuumed" warnings at some point.  One can
test a cluster for this data loss by running VACUUM FREEZE in every
database.  Back-patch to 9.4 (all supported versions).

Discussion: https://postgr.es/m/20190202083822.gc32...@gust.leadboat.com

diff --git a/src/backend/access/transam/slru.c 
b/src/backend/access/transam/slru.c
index 3623352..71e29b9 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1171,11 +1171,6 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
int slotno;
 
/*
-* The cutoff point is the start of the segment containing cutoffPage.
-*/
-   cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
-   /*
 * Scan shared memory and remove any pages preceding the cutoff page, to
 * ensure we won't rewrite them later.  (Since this is normally called 
in
 * or just after a checkpoint, any dirty pages should have been flushed
@@ -1221,8 +1216,11 @@ restart:;
 * Hmm, we have (or may have) I/O operations acting on the 
page, so
 * we've got to wait for them to finish and then start again. 
This is
 * the same logic as in SlruSelectLRUPage.  (XXX if page is 
dirty,
-* wouldn't it be OK to just discard it without writing it?  
For now,
-* keep the logic the same as it was.)
+* wouldn't it be OK to just discard it without writing it?
+* SlruMayDeleteSegment() uses a stricter qualification, so we 
might
+* not delete this page in the end; even if we don't delete it, 
we
+* won't have cause to read its data again.  For now, keep the 
logic
+* the same as it was.)
 */
if (shared->page_status[slotno] == SLRU_PAGE_VALID)
SlruInternalWritePage(ctl, slotno, NULL);
@@ -1313,18 +1311,40 @@ restart:
 }
 
 /*
+ * Determine whether a segment is okay to delete.
+ *
+ * segpage is the first page of the segment, and cutoffPage is the oldest (in
+ * PagePrecedes order) page in the SLRU containing still-useful data.  Since
+ * every core PagePrecedes callback implements "wrap around", check the
+ * segment's first and last pages:
+ *
+ * first=cutoff: no; cutoff falls inside this segment
+ * first>=cutoff && last=cutoff && last>=cutoff: no; every page of this segment is too young
+ */
+static bool
+SlruMayDeleteSegment(SlruCtl ctl, int segpage, int cutoffPage)
+{
+   int seg_last_page = segpage + 
SLRU_PAGES_PER_SEGMENT - 1;
+
+   Assert(segpage % SLRU_PAGES_PER_SEGMENT == 0);
+
+   return (ctl->PagePrecedes(segpage, cutoffPage) &&
+   ctl->PagePrecedes(seg_last_page, cutoffPage));
+}
+
+/*
  * SlruScanDirectory callback
- * This callback reports true if there's any segment prior to the 
one
- * containing the page passed as "data".
+ * This callback reports true if there's any segment wholly prior 
to the
+ * one containing the page passed as "data".
  */
 bool
 SlruScanDirCbReportPresence(SlruCtl ctl, char *filename, int segpage, void 
*data)
 {
int cutoffPage = *(int *) data;
 
-   cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
-
-   if (ctl->PagePrecedes(segpage, cutoffPage))
+   if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
return true;/* found one; don't iterate any 
more */
 
return false;   /* keep going */
@@ -1339,7 +1359,7 @@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename, 
int segpage, void *data)
 {
int cutoffPage = 

Re: Actual Cost

2019-02-16 Thread Donald Dong
On Feb 16, 2019, at 6:44 PM, Tomas Vondra wrote:
> 
> On 2/17/19 3:40 AM, David Fetter wrote:
>> 
>> As someone not volunteering to do any of the work, I think it'd be a
>> nice thing to have.  How large an effort would you guess it would be
>> to build a proof of concept?
> 
> I don't quite understand what is meant by "actual cost metric" and/or
> how is that different from running EXPLAIN ANALYZE.

Here is an example:

Hash Join  (cost=3.92..18545.70 rows=34 width=32) (actual cost=3.92..18500 
time=209.820..1168.831 rows=47 loops=3)

Now we have the actual time. Time can have a high variance (a change
in system load, or just noises), but I think the actual cost would be
less likely to change due to external factors.

On 2/17/19 3:40 AM, David Fetter wrote:
> On Sat, Feb 16, 2019 at 03:10:44PM -0800, Donald Dong wrote:
>> Hi,
>> 
>> When explaining a query, I think knowing the actual rows and pages
>> in addition to the operation type  (e.g seqscan) would be enough to
>> calculate the actual cost. The actual cost metric could be useful
>> when we want to look into how off is the planner's estimation, and
>> the correlation between time and cost. Would it be a feature worth
>> considering?
> 
> As someone not volunteering to do any of the work, I think it'd be a
> nice thing to have.  How large an effort would you guess it would be
> to build a proof of concept?

Intuitively it does not feel very complicated to me, but I think the
interface when we're planning (PlannerInfo, RelOptInfo) is different
from the interface when we're explaining a query (QueryDesc). Since
I'm very new, if I'm doing it myself, it would probably take many
iterations to get to the right point. But still, I'm happy to work on
a proof of concept if no one else wants to work on it.

regards,
Donald Dong


Re: Early WIP/PoC for inlining CTEs

2019-02-16 Thread Andreas Karlsson

On 16/02/2019 22.14, Tom Lane wrote:

I was expecting more controversy ... pushed that way.


Thanks! And thanks to everyone else who worked on this patch!

Andreas



Re: Actual Cost

2019-02-16 Thread Tomas Vondra
On 2/17/19 3:40 AM, David Fetter wrote:
> On Sat, Feb 16, 2019 at 03:10:44PM -0800, Donald Dong wrote:
>> Hi,
>>
>> When explaining a query, I think knowing the actual rows and pages
>> in addition to the operation type  (e.g seqscan) would be enough to
>> calculate the actual cost. The actual cost metric could be useful
>> when we want to look into how off is the planner's estimation, and
>> the correlation between time and cost. Would it be a feature worth
>> considering?
> 
> As someone not volunteering to do any of the work, I think it'd be a
> nice thing to have.  How large an effort would you guess it would be
> to build a proof of concept?
> 

I don't quite understand what is meant by "actual cost metric" and/or
how is that different from running EXPLAIN ANALYZE.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Actual Cost

2019-02-16 Thread David Fetter
On Sat, Feb 16, 2019 at 03:10:44PM -0800, Donald Dong wrote:
> Hi,
> 
> When explaining a query, I think knowing the actual rows and pages
> in addition to the operation type  (e.g seqscan) would be enough to
> calculate the actual cost. The actual cost metric could be useful
> when we want to look into how off is the planner's estimation, and
> the correlation between time and cost. Would it be a feature worth
> considering?

As someone not volunteering to do any of the work, I think it'd be a
nice thing to have.  How large an effort would you guess it would be
to build a proof of concept?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: BUG #15548: Unaccent does not remove combining diacritical characters

2019-02-16 Thread Hugh Ranalli
On Mon, 11 Feb 2019 at 15:57, Ramanarayana  wrote:

> Hi Hugh,
>
> I tested the script in python 2.7 and it works perfect. The problem is in
> python 3.7(and may be only in windows as you were not getting the issue)
> and I was getting the following error
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u0100' in
> position 0: character maps to 
>
>  I went through the python script and found that the stdout encoding is
> set to utf-8 only  if python version is <=2.
>
> I have made the same change for python version 3 as well. Please find the
> patch for the same.Let me know if it makes sense
>
> Regards,
> Ram
>

Hi Ram,
I took a look at this, and unfortunately the proposed fix breaks Python 2
(sys.stdout.encoding isn't a writable attribute in Python 2)  :-(. I've
attached a patch which is compatible with both versions, and have confirmed
that the output is identical across Python 2 and 3 and across both Windows
and Linux. The output on Windows and Linux is identical, once the
difference in line endings is accounted for.

I've also opened the Unicode data file in UTF-8 and added a "with" block
which ensures we close the file when we are done with it. The change makes
the Python2 compatibility a little more complex (2 blocks to remove), but
it's the cleanest I could achieve.

The attached patch goes on top of patch 02 (not on top of the broken,
committed 03). I'm hoping that's not a problem. If it is, let me know and
I'll factor out the changes.

Please let me know if you have any questions.

Best wishes,
Hugh
diff --git a/contrib/unaccent/generate_unaccent_rules.py b/contrib/unaccent/generate_unaccent_rules.py
index 4419a77..7a0a96e 100644
--- a/contrib/unaccent/generate_unaccent_rules.py
+++ b/contrib/unaccent/generate_unaccent_rules.py
@@ -32,9 +32,15 @@
 # The approach is to be Python3 compatible with Python2 "backports".
 from __future__ import print_function
 from __future__ import unicode_literals
+# END: Python 2/3 compatibility - remove when Python 2 compatibility dropped
+
+import argparse
 import codecs
+import re
 import sys
+import xml.etree.ElementTree as ET
 
+# BEGIN: Python 2/3 compatibility - remove when Python 2 compatibility dropped
 if sys.version_info[0] <= 2:
 # Encode stdout as UTF-8, so we can just print to it
 sys.stdout = codecs.getwriter('utf8')(sys.stdout)
@@ -45,12 +51,9 @@ if sys.version_info[0] <= 2:
 # Python 2 and 3 compatible bytes call
 def bytes(source, encoding='ascii', errors='strict'):
 return source.encode(encoding=encoding, errors=errors)
+else:
 # END: Python 2/3 compatibility - remove when Python 2 compatibility dropped
-
-import re
-import argparse
-import sys
-import xml.etree.ElementTree as ET
+sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
 
 # The ranges of Unicode characters that we consider to be "plain letters".
 # For now we are being conservative by including only Latin and Greek.  This
@@ -61,8 +64,25 @@ PLAIN_LETTER_RANGES = ((ord('a'), ord('z')), # Latin lower case
(0x03b1, 0x03c9), # GREEK SMALL LETTER ALPHA, GREEK SMALL LETTER OMEGA
(0x0391, 0x03a9)) # GREEK CAPITAL LETTER ALPHA, GREEK CAPITAL LETTER OMEGA
 
+# Combining marks follow a "base" character, and result in a composite
+# character. Example: "U&'A\0300'"produces "À".There are three types of
+# combining marks: enclosing (Me), non-spacing combining (Mn), spacing
+# combining (Mc). We identify the ranges of marks we feel safe removing.
+# References:
+#   https://en.wikipedia.org/wiki/Combining_character
+#   https://www.unicode.org/charts/PDF/U0300.pdf
+#   https://www.unicode.org/charts/PDF/U20D0.pdf
+COMBINING_MARK_RANGES = ((0x0300, 0x0362),  # Mn: Accents, IPA
+ (0x20dd, 0x20E0),  # Me: Symbols
+ (0x20e2, 0x20e4),) # Me: Screen, keycap, triangle
+
 def print_record(codepoint, letter):
-print (chr(codepoint) + "\t" + letter)
+if letter:
+output = chr(codepoint) + "\t" + letter
+else:
+output = chr(codepoint)
+
+print(output)
 
 class Codepoint:
 def __init__(self, id, general_category, combining_ids):
@@ -70,6 +90,16 @@ class Codepoint:
 self.general_category = general_category
 self.combining_ids = combining_ids
 
+def is_mark_to_remove(codepoint):
+"""Return true if this is a combining mark to remove."""
+if not is_mark(codepoint):
+return False
+
+for begin, end in COMBINING_MARK_RANGES:
+if codepoint.id >= begin and codepoint.id <= end:
+return True
+return False
+
 def is_plain_letter(codepoint):
 """Return true if codepoint represents a "plain letter"."""
 for begin, end in PLAIN_LETTER_RANGES:
@@ -206,21 +236,22 @@ def main(args):
 charactersSet = set()
 
 # read file UnicodeData.txt
-unicodeDataFile = open(args.unicodeDataFilePath, 'r')
-
-# read everything we need into memory
-for line in 

Re: Early WIP/PoC for inlining CTEs

2019-02-16 Thread Tomas Vondra



On 2/16/19 10:14 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 2/14/19 8:22 PM, Merlin Moncure wrote:
>>> On Thu, Feb 14, 2019 at 10:02 AM Alvaro Herrera
>>>  wrote:
 On 2019-Feb-14, Peter Eisentraut wrote:
> If we're not really planning to add any more options, I'd register a
> light vote for MATERIALIZED.  It reads easier, seems more grammatically
> correct, and uses an existing word.
> 
 +1 for MATERIALIZED, as I proposed in
 https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql
> 
>>> Seconded!
> 
>> +1 to MATERIALIZED too
> 
> I was expecting more controversy ... pushed that way.
> 

As you wish. I withdraw my previous vote and I propose

   AS [NOT] MATERIALIZED [yes|false]

;-)

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



CPU costs of random_zipfian in pgbench

2019-02-16 Thread Tomas Vondra
Hi,

I'm trying to use random_zipfian() for benchmarking of skewed data sets,
and I ran head-first into an issue with rather excessive CPU costs.

Consider an example like this:

  pgbench -i -s 1 test

  pgbench -s 1 -f zipf.sql -T 30 test

where zipf.sql does this:

  \SET id random_zipfian(1, 10 * :scale, 0.1)
  BEGIN;
  SELECT * FROM pgbench_accounts WHERE aid = :id;
  END;

Unfortunately, this produces something like this:

  transaction type: zipf.sql
  scaling factor: 1
  query mode: simple
  number of clients: 1
  number of threads: 1
  duration: 30 s
  number of transactions actually processed: 1
  latency average = 43849.143 ms
  tps = 0.022805 (including connections establishing)
  tps = 0.022806 (excluding connections establishing)

which is somewhat ... not great, I guess. This happens because
generalizedHarmonicNumber() does this:

for (i = n; i > 1; i--)
ans += pow(i, -s);

where n happens to be 10 (range passed to random_zipfian), so
the loop takes quite a bit of time. So much that we only ever complete a
single transaction, because this work happens in the context of the
first transction, and so it counts into the 30-second limit.

The docs actually do mention performance of this function:

The function's performance is poor for parameter values close and
above 1.0 and on a small range.

But that does not seem to cover the example I just posted, because 0.1
is not particularly close to 1.0 (or above 1.0), and 1e9 values hardly
counts as "small range".

I see this part of random_zipfian comes from "Quickly Generating
Billion-Record Synthetic Databases" which deals with generating data
sets, where wasting a bit of time is not a huge deal. But when running
benchmarks it matters much more. So maybe there's a disconnect here?

Interestingly enough, I don't see this issue on values above 1.0, no
matter how close to 1.0 those are. Although the throughput seems lower
than with uniform access, so this part of random_zipfian is not quite
cheap either.

Now, I don't know what to do about this. Ideally, we'd have a faster
algorithm to generate zipfian distributions - I don't know if such thing
exists though. Or maybe we could precompute the distribution first, not
counting it into the benchmark duration.

But I guess the very least we can/should do is improving the docs to
make it more obvious which cases are expected to be slow.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Actual Cost

2019-02-16 Thread Donald Dong
Hi,

When explaining a query, I think knowing the actual rows and pages in addition
to the operation type  (e.g seqscan) would be enough to calculate the actual
cost. The actual cost metric could be useful when we want to look into how off
is the planner's estimation, and the correlation between time and cost. Would
it be a feature worth considering?

Thank you,
Donald Dong


Re: 2019-03 CF Summary / Review - Tranche #2

2019-02-16 Thread Alexander Korotkov
On Sun, Feb 17, 2019 at 1:09 AM Alexander Korotkov
 wrote:
>
> On Sat, Feb 16, 2019 at 11:31 PM Andres Freund  wrote:
> > On February 16, 2019 11:22:32 AM PST, Alexander Korotkov 
> >  wrote:
> > >On Sat, Feb 16, 2019 at 8:45 AM Andres Freund 
> > >wrote:
> > >> - SQL/JSON: jsonpath
> > >>
> > >>   WOA: This seems to need substantial further work before being
> > >>   committable
> > >>
> > >>   Andres: +0.5 for punting to v13
> > >
> > >I'm going to post updated patchset next week.  All the issues
> > >highlighted will be resolved there.  So, let's don't decide this too
> > >early.
> >
> > Well, given that the patches still have a lot of the same issues complained 
> > about a year ago, where people said we should try to get them into v11, I'm 
> > not sure that that's a realistic goal.
>
> I'm sorry, a year ago I didn't understand this issue correctly.
> Otherwise, I would push people to do something more productive during
> this year.
>
> If solution I'm going to post next week wouldn't be good enough, there
> is a backup plan.  We can wipe out error suppression completely.  Then
> we implement less part of standard, but still can get something very
> useful into core.

To be more clear.  In both options I'm NOT proposing to commit any
PG_TRY/PG_CATCH anymore :)

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: 2019-03 CF Summary / Review - Tranche #2

2019-02-16 Thread Alexander Korotkov
On Sat, Feb 16, 2019 at 11:31 PM Andres Freund  wrote:
> On February 16, 2019 11:22:32 AM PST, Alexander Korotkov 
>  wrote:
> >On Sat, Feb 16, 2019 at 8:45 AM Andres Freund 
> >wrote:
> >> - SQL/JSON: jsonpath
> >>
> >>   WOA: This seems to need substantial further work before being
> >>   committable
> >>
> >>   Andres: +0.5 for punting to v13
> >
> >I'm going to post updated patchset next week.  All the issues
> >highlighted will be resolved there.  So, let's don't decide this too
> >early.
>
> Well, given that the patches still have a lot of the same issues complained 
> about a year ago, where people said we should try to get them into v11, I'm 
> not sure that that's a realistic goal.

I'm sorry, a year ago I didn't understand this issue correctly.
Otherwise, I would push people to do something more productive during
this year.

If solution I'm going to post next week wouldn't be good enough, there
is a backup plan.  We can wipe out error suppression completely.  Then
we implement less part of standard, but still can get something very
useful into core.

> Jsonb was a success, but also held up the release by several months.

I don't ask to commit patchset in current shape and help up the
release because of it.  I just say there is still time before FF.  So,
let's not doom patchset too early.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: TupleTableSlot abstraction

2019-02-16 Thread Jeff Janes
On Fri, Nov 16, 2018 at 7:46 PM Andres Freund  wrote:

> Hi,
>
> On 2018-11-13 15:30:21 -0800, Andres Freund wrote:
> > What I'm now planning to do is to go through the big comment in
> > tuptable.h and update that to the new world.  While I'm not convinced
> > that that that's the best place for it, it needs to be accurate.
> >
> > Furthermore:
> > - More comment polishing
> > - I'll probably split the commits up a bit further (particulary JIT
> >   ignoring virtual tuple slots, inlining the hot path of
> >   slot_getsomeattrs())
> > - serious commit message polishing
>
> I've done all of that now, and pushed it.  Thanks Ashutosh, Amit
> Khandekar and everyone else.
>

Hi Andres and all,

This commit 763f2edd92095b1ca2 "Rejigger materializing and fetching a
HeapTuple from a slot" introduces a memory leak into the ExecutorState
context which is invoked by this statement:

CREATE TABLE tbloom AS
   SELECT
 (random() * 100)::int as i1,
 (random() * 100)::int as i2,
 (random() * 100)::int as i3,
 (random() * 100)::int as i4,
 (random() * 100)::int as i5,
 (random() * 100)::int as i6
   FROM
  generate_series(1,5000);

By blind analogy to the changes made to matview.c, I think that createas.c
is missing a heap_freetuple, as attached.

It fixes the leak, and still passes "make check".
Cheers,

Jeff


createas_leak_fix.patch
Description: Binary data


Re: ToDo: show size of partitioned table

2019-02-16 Thread Pavel Stehule
pá 15. 2. 2019 v 7:50 odesílatel Amit Langote 
napsal:

> Hi Pavel,
>
> Thanks for updating the patch.
>
> On 2019/02/08 17:26, Pavel Stehule wrote:
> > I renamed originally calculated column "size" to "direct partitions size"
> > .. see Alvaro's comment. Then I introduced new column "total partitions
> > size" that is calculated like you propose.
> >
> > Now the result of dPn+ looks like
> >
> >  List of partitioned relations
> >
> ┌┬┬───┬─┬┬───┬─┐
> > │ Schema │  Name  │ Owner │ Parent name │ Direct partitions size │ Total
> > partitions size │ Description │
> >
> ╞╪╪═══╪═╪╪═══╪═╡
> > │ public │ p  │ pavel │ │ 8192 bytes │ 24
> > kB │ │
> > │ public │ p_1│ pavel │ p   │ 8192 bytes │ 16
> > kB │ │
> > │ public │ p_1_bc │ pavel │ p_1 │ 8192 bytes │ 8192
> > bytes│ │
> >
> └┴┴───┴─┴┴───┴─┘
> > (3 rows)
>
> OK, so for each listed partitioned table (root and nested), this shows the
> total size of the directly attached leaf partitions *and* the total size
> of all partitions in its (sub-) tree.
>
> By the way, what I think Alvaro meant by "local size" is not what the
> "direct partition size" above shows.  I think "local size" means the size
> of the storage assigned to the table itself, not to partitions attached to
> it, which are distinct relations.  We don't implement that concept in
> Postgres today, but may in the future.  I'm not sure if we'll add a
> another column to show "local size" in the future when we do implement
> that concept or if Alvaro meant that there should only be "local size"
> (not "direct partition size") which will always show 0 for now and "total
> partition size" columns.
>

We can do it in future. Now, I don't think so is good to show 0 always. The
psql reports (like this) can be enhanced or changed in future without
problems, so we don't need to design all now.



>
> Anyway, I have a few more suggestions to improve the patch, but instead of
> sending the minute-level changes in the email, I've gone ahead and made
> those changes myself.  I've attached a delta patch that applies on top of
> your v9 patch.  Summary of the changes I made is as follows:
>
> * Documentation rewording here and there (also mentioned the "direct
> partitions size" and "total partitions size" division in the \dPn output
> per the latest patch)
>
> * Wrapped some lines in code so that they don't look too wide
>
> * Renamed show_nested_partitions to show_nested
>
> * Changed "Partitioned relations" in the output headers to say
> "Partitioned tables" where appropriate
>
> * Fixed quiet mode output to use correct word between object_name vs
> objects_name
>
> Please merge these changes if you think they are reasonable.
>

I like your changes. I merged all - updated patch is attached

Thank you very much

Regards

Pavel


> Thanks,
> Amit
>
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index d7539ae743..9fb632b0bd 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1659,6 +1659,71 @@ testdb=
 
   
 
+
+  
+\dP[n+] [ pattern ]
+
+
+Lists partitioned relations.  If pattern is specified, only entries
+whose name matches the pattern are listed.  By default, only
+partitioned tables are listed; supply a pattern to also include
+partitioned indexes.  If the form \dP+
+is used, the sum of sizes of table's partitions (including their
+indexes) and associated description are also displayed.
+
+
+
+ If modifier n (which stands for
+ nested) is used, then non-root partitioned tables are
+ displayed too.  The displayed size is divided into two columns in
+ this case: one that shows the total size of only the directly
+ attached leaf partitions and another that shows total size of all
+ partitions, also considering other sub-partitioned partitions, for
+ each partitioned tables that's displayed.
+
+
+  
+
+
+  
+\dPi[n+] [ pattern ]
+
+
+Lists partitioned indexes. If pattern is specified, only entries
+whose name matches the pattern are listed. If the form
+\dPi+ is used, the sum of sizes of index's
+partitions and associated description are also displayed.
+
+
+
+ If the modifier n is used, non-root partitioned
+ indexes are displayed too.
+
+
+  
+
+
+  
+\dPt[n+] [ pattern ]
+
+
+Lists 

Re: Synchronize with imath upstream

2019-02-16 Thread Noah Misch
On Sat, Feb 16, 2019 at 02:12:50PM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > On Thu, Feb 07, 2019 at 11:56:18PM -0500, Tom Lane wrote:
> >> Maybe easier to keep the instructions in a separate README file?
> 
> > Most imath.c patches have cause to update those lines, and patches to other
> > files almost never do.  Hence, I think hackers are more likely to find and
> > update those lines in imath.c.  I would choose a README if concerns weren't
> > concentrated in one file.
> 
> Fair enough, objection withdrawn.

Pushed, but it broke dory.  Will fix.



Re: Early WIP/PoC for inlining CTEs

2019-02-16 Thread Jonathan S. Katz
On 2/16/19 4:14 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 2/14/19 8:22 PM, Merlin Moncure wrote:
>>> On Thu, Feb 14, 2019 at 10:02 AM Alvaro Herrera
>>>  wrote:
 On 2019-Feb-14, Peter Eisentraut wrote:
> If we're not really planning to add any more options, I'd register a
> light vote for MATERIALIZED.  It reads easier, seems more grammatically
> correct, and uses an existing word.
> 
 +1 for MATERIALIZED, as I proposed in
 https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql
> 
>>> Seconded!
> 
>> +1 to MATERIALIZED too
> 
> I was expecting more controversy ... pushed that way.

This is awesome. Thank you everyone for working on this!



signature.asc
Description: OpenPGP digital signature


Re: Early WIP/PoC for inlining CTEs

2019-02-16 Thread Tom Lane
Tomas Vondra  writes:
> On 2/14/19 8:22 PM, Merlin Moncure wrote:
>> On Thu, Feb 14, 2019 at 10:02 AM Alvaro Herrera
>>  wrote:
>>> On 2019-Feb-14, Peter Eisentraut wrote:
 If we're not really planning to add any more options, I'd register a
 light vote for MATERIALIZED.  It reads easier, seems more grammatically
 correct, and uses an existing word.

>>> +1 for MATERIALIZED, as I proposed in
>>> https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql

>> Seconded!

> +1 to MATERIALIZED too

I was expecting more controversy ... pushed that way.

regards, tom lane



Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

2019-02-16 Thread Dave Cramer
Andres,

Thanks for looking at this. FYI, I did not originally write this, rather
the original author has not replied to requests.
JDBC could use this, I assume others could as well.

That said I'm certainly open to suggestions on how to do this.

Craig, do you have any other ideas?

Dave Cramer


On Fri, 15 Feb 2019 at 22:01, Andres Freund  wrote:

> Hi,
>
> On 2018-12-03 06:38:43 -0500, Dave Cramer wrote:
> > From 4d023cfc1fed0b5852b4da1aad6a32549b03ce26 Mon Sep 17 00:00:00 2001
> > From: Dave Cramer 
> > Date: Fri, 30 Nov 2018 18:23:49 -0500
> > Subject: [PATCH 1/5] Respect client initiated CopyDone in walsender
> >
> > ---
> >  src/backend/replication/walsender.c | 36
> ++--
> >  1 file changed, 30 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/backend/replication/walsender.c
> b/src/backend/replication/walsender.c
> > index 46edb52..93f2648 100644
> > --- a/src/backend/replication/walsender.c
> > +++ b/src/backend/replication/walsender.c
> > @@ -770,6 +770,14 @@ logical_read_xlog_page(XLogReaderState *state,
> XLogRecPtr targetPagePtr, int req
> >   sendTimeLineValidUpto = state->currTLIValidUntil;
> >   sendTimeLineNextTLI = state->nextTLI;
> >
> > + /*
> > + * If the client sent CopyDone while we were waiting,
> > + * bail out so we can wind up the decoding session.
> > + */
> > + if (streamingDoneSending)
> > + return -1;
> > +
> > +  /* more than one block available */
> >   /* make sure we have enough WAL available */
> >   flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
> >
> > @@ -1341,8 +1349,12 @@ WalSndWaitForWal(XLogRecPtr loc)
> >* It's important to do this check after the recomputation
> of
> >* RecentFlushPtr, so we can send all remaining data
> before shutting
> >* down.
> > -  */
> > - if (got_STOPPING)
> > +  *
> > +  * We'll also exit here if the client sent CopyDone
> because it wants
> > +  * to return to command mode.
> > + */
> > +
> > + if (got_STOPPING || streamingDoneReceiving)
> >   break;
> >
> >   /*
> > @@ -2095,7 +2107,14 @@ WalSndCheckTimeOut(void)
> >   }
> >  }
> >
> > -/* Main loop of walsender process that streams the WAL over Copy
> messages. */
> > +/*
> > + * Main loop of walsender process that streams the WAL over Copy
> messages.
> > + *
> > + * The send_data callback must enqueue complete CopyData messages to
> libpq
> > + * using pq_putmessage_noblock or similar, since the walsender loop may
> send
> > + * CopyDone then exit and return to command mode in response to a client
> > + * CopyDone between calls to send_data.
> > + */
>
> Wait, how is it ok to end CopyDone before all the pending data has been
> sent out?
>
>
>
> > diff --git a/src/backend/replication/logical/reorderbuffer.c
> b/src/backend/replication/logical/reorderbuffer.c
> > index 23466ba..66b6e90 100644
> > --- a/src/backend/replication/logical/reorderbuffer.c
> > +++ b/src/backend/replication/logical/reorderbuffer.c
> > @@ -1497,7 +1497,9 @@ ReorderBufferCommit(ReorderBuffer *rb,
> TransactionId xid,
> >   rb->begin(rb, txn);
> >
> >   iterstate = ReorderBufferIterTXNInit(rb, txn);
> > - while ((change = ReorderBufferIterTXNNext(rb, iterstate))
> != NULL)
> > + while ((change = ReorderBufferIterTXNNext(rb, iterstate))
> != NULL &&
> > +(rb->continue_decoding_cb == NULL ||
> > + rb->continue_decoding_cb()))
> >   {
> >   Relationrelation = NULL;
> >   Oid reloid;
>
> > @@ -1774,8 +1776,11 @@ ReorderBufferCommit(ReorderBuffer *rb,
> TransactionId xid,
> >   ReorderBufferIterTXNFinish(rb, iterstate);
> >   iterstate = NULL;
> >
> > - /* call commit callback */
> > - rb->commit(rb, txn, commit_lsn);
> > + if (rb->continue_decoding_cb == NULL ||
> rb->continue_decoding_cb())
> > + {
> > + /* call commit callback */
> > + rb->commit(rb, txn, commit_lsn);
> > + }
>
>
> I'm doubtful it's ok to simply stop in the middle of a transaction.
>
>
>
> > @@ -1194,17 +1212,6 @@ WalSndWriteData(LogicalDecodingContext *ctx,
> XLogRecPtr lsn, TransactionId xid,
> >
> >   CHECK_FOR_INTERRUPTS();
> >
> > - /* Try to flush pending output to the client */
> > - if (pq_flush_if_writable() != 0)
> > - WalSndShutdown();
> > -
> > - /* Try taking fast path unless we get too close to walsender
> timeout. */
> > - if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> > -
>wal_sender_timeout / 2) &&
> > - !pq_is_send_pending())
> > - {
> > - return;
> > - }
>
> As somebody else commented 

Re: 2019-03 CF Summary / Review - Tranche #2

2019-02-16 Thread Andres Freund
Hi,

On February 16, 2019 11:22:32 AM PST, Alexander Korotkov 
 wrote:
>On Sat, Feb 16, 2019 at 8:45 AM Andres Freund 
>wrote:
>> - SQL/JSON: jsonpath
>>
>>   WOA: This seems to need substantial further work before being
>>   committable
>>
>>   Andres: +0.5 for punting to v13
>
>I'm going to post updated patchset next week.  All the issues
>highlighted will be resolved there.  So, let's don't decide this too
>early.

Well, given that the patches still have a lot of the same issues complained 
about a year ago, where people said we should try to get them into v11, I'm not 
sure that that's a realistic goal. Jsonb was a success, but also held up the 
release by several months.

Andres 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: 2019-03 CF Summary / Review - Tranche #2

2019-02-16 Thread Alexander Korotkov
On Sat, Feb 16, 2019 at 8:45 AM Andres Freund  wrote:
> - SQL/JSON: jsonpath
>
>   WOA: This seems to need substantial further work before being
>   committable
>
>   Andres: +0.5 for punting to v13

I'm going to post updated patchset next week.  All the issues
highlighted will be resolved there.  So, let's don't decide this too
early.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: allow online change primary_conninfo

2019-02-16 Thread Sergei Kornilov
Hi

> If you do that, the startup process would try to jump to a different
> source to fetch WAL if archiving is enabled. Is that really what we
> want?

Yes, similar behavior with exit walreceiver by any reason. Same as in all 
previous patch versions.
(not sure this can be changed in some small and elegant way)

> It would be nice to have one message for primary_conninfo being
> updated, and one for primary_slot_name so as if both are changed at
> the same time the user gets the correct information.

Hm, maybe even better would be separate message if both settings are changed? 
Doing this in attached patch.

> "In a moment starts streaming WAL with new configuration." sounds
> strange. It would be more natural to have something like "The WAL
> receiver is going to be restarted with the new configuration", just a
> suggestion.

This phrase was proposed here: 
https://www.postgresql.org/message-id/20181213.184233.215171075.horiguchi.kyotaro%40lab.ntt.co.jp
My english is poor, so i added message just as proposed. I replaced to your 
variant.

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8bd57f376b..d7b75e462a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3838,9 +3838,14 @@ ANY num_sync ( 
@@ -3855,9 +3860,14 @@ ANY num_sync ( ).
-  This parameter can only be set at server start.
+  This parameter can only be set in the postgresql.conf
+  file or on the server command line.
   This setting has no effect if primary_conninfo is not
-  set.
+  set or the server is not in standby mode.
+ 
+ 
+  WAL receiver will be restarted after primary_slot_name
+  was changed.
  
 

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ecd12fc53a..8a77ebb0d3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12137,6 +12137,44 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	return false;/* not reached */
 }
 
+void
+ProcessStartupSigHup(void)
+{
+	char	   *conninfo = pstrdup(PrimaryConnInfo);
+	char	   *slotname = pstrdup(PrimarySlotName);
+	bool		conninfoChanged;
+	bool		slotnameChanged;
+
+	ProcessConfigFile(PGC_SIGHUP);
+
+	/*
+	 * we need shutdown running walreceiver if replication settings was
+	 * changed. walreceiver will be started with new settings in next
+	 * WaitForWALToBecomeAvailable iteration
+	 */
+	conninfoChanged = (strcmp(conninfo, PrimaryConnInfo) != 0);
+	slotnameChanged = (strcmp(slotname, PrimarySlotName) != 0);
+
+	if ((conninfoChanged || slotnameChanged) && WalRcvRunning())
+	{
+		if (conninfoChanged && slotnameChanged)
+			ereport(LOG,
+	(errmsg("terminating walreceiver process due to change of primary_conninfo and primary_slot_name"),
+	 errdetail("The WAL receiver is going to be restarted with the new configuration")));
+		else
+			ereport(LOG,
+	(errmsg("terminating walreceiver process due to change of %s",
+			conninfoChanged ? "primary_conninfo" : "primary_slot_name"),
+	 errdetail("The WAL receiver is going to be restarted with the new configuration")));
+
+		ShutdownWalRcv();
+		lastSourceFailed = true;
+	}
+
+	pfree(conninfo);
+	pfree(slotname);
+}
+
 /*
  * Determine what log level should be used to report a corrupt WAL record
  * in the current WAL page, previously read by XLogPageRead().
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 5048a2c2aa..9bf5c792fe 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -147,7 +147,7 @@ HandleStartupProcInterrupts(void)
 	if (got_SIGHUP)
 	{
 		got_SIGHUP = false;
-		ProcessConfigFile(PGC_SIGHUP);
+		ProcessStartupSigHup();
 	}
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 156d147c85..87bd7443ef 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3485,7 +3485,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"primary_conninfo", PGC_POSTMASTER, REPLICATION_STANDBY,
+		{"primary_conninfo", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the connection string to be used to connect to the sending server."),
 			NULL,
 			GUC_SUPERUSER_ONLY
@@ -3496,7 +3496,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"primary_slot_name", PGC_POSTMASTER, REPLICATION_STANDBY,
+		{"primary_slot_name", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the name of the replication slot to use on the sending server."),
 			NULL
 		},
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index f90a6a9139..75ec605a3f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -318,6 +318,7 @@ extern void SetWalWriterSleeping(bool sleeping);
 
 extern void XLogRequestWalReceiverReply(void);
 
+extern void ProcessStartupSigHup(void);
 extern void 

Re: Synchronize with imath upstream

2019-02-16 Thread Tom Lane
Noah Misch  writes:
> On Thu, Feb 07, 2019 at 11:56:18PM -0500, Tom Lane wrote:
>> Maybe easier to keep the instructions in a separate README file?

> Most imath.c patches have cause to update those lines, and patches to other
> files almost never do.  Hence, I think hackers are more likely to find and
> update those lines in imath.c.  I would choose a README if concerns weren't
> concentrated in one file.

Fair enough, objection withdrawn.

regards, tom lane



Re: Synchronize with imath upstream

2019-02-16 Thread Noah Misch
On Thu, Feb 07, 2019 at 11:56:18PM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > On Thu, Feb 07, 2019 at 10:12:05AM -0500, Tom Lane wrote:
> >> I have no particular opinion on whether pgindent should be part of the
> >> mix for imath, but I do strongly recommend setting up and documenting a
> >> reproducible import process, as I did for src/timezone.
> 
> > Good idea.  I've modified the imath.c header comment to take the form of an
> > import process; see attached imath1.29-pgedits-v2.patch.
> 
> Maybe easier to keep the instructions in a separate README file?

Most imath.c patches have cause to update those lines, and patches to other
files almost never do.  Hence, I think hackers are more likely to find and
update those lines in imath.c.  I would choose a README if concerns weren't
concentrated in one file.

> (I don't see a need to put a PG copyright in this file, when the
> mods from upstream are this small.)

The tree has been inconsistent about that, which is okay.  I was mimicking
src/port/strlcpy.c.  Others, e.g. src/port/crypt.c, haven't added a copyright.



Re: automatically assigning catalog toast oids

2019-02-16 Thread John Naylor
On 12/15/18, Andres Freund  wrote:
>> > [ separation of FirstBootstrapObjectId and FirstGenbkiObjectId ]

I just noticed a small typo in transam.h. Patch attached.

-John Naylor
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 4a7eab00d8..444be4aeb5 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -74,7 +74,7 @@
  *		OIDs 1- are reserved for manual assignment (see .dat files in
  *		src/include/catalog/), with 9000- tentatively reserved for forks.
  *
- *		OIDs 1-12000 are reserved for assignment by genbki.pl, when the
+ *		OIDs 1-11999 are reserved for assignment by genbki.pl, when the
  *		.dat files in src/include/catalog/ do not specify oids.
  *
  *		OIDS 12000-16383 are reserved for assignment during initdb


Re: Compressed TOAST Slicing

2019-02-16 Thread Simon Riggs
On Thu, 6 Dec 2018 at 20:54, Paul Ramsey  wrote:

> On Sun, Dec 2, 2018 at 7:03 AM Rafia Sabih 
> wrote:
> >
> > The idea looks good and believing your performance evaluation it seems
> > like a practical one too.
>
> Thank you kindly for the review!
>

Sounds good.

Could we get an similarly optimized implementation of -> operator for JSONB
as well?

Are there any other potential uses? Best to fix em all up at once and then
move on to other things. Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: allow online change primary_conninfo

2019-02-16 Thread Michael Paquier
On Sat, Feb 16, 2019 at 02:50:35PM +0300, Sergei Kornilov wrote:
> + if ((strcmp(conninfo, PrimaryConnInfo) != 0 ||
> +  strcmp(slotname, PrimarySlotName) != 0) &&
> + WalRcvRunning())
> + {
> + ereport(LOG,
> + (errmsg("terminating walreceiver process due to 
> change of %s",
> + strcmp(conninfo, 
> PrimaryConnInfo) != 0 ? "primary_conninfo" : "primary_slot_name"),
> +  errdetail("In a moment starts streaming WAL 
> with new configuration.")));
> +
> + ShutdownWalRcv();
> + lastSourceFailed = true;
> + }

If you do that, the startup process would try to jump to a different
source to fetch WAL if archiving is enabled.  Is that really what we
want?  It would be nice to have one message for primary_conninfo being
updated, and one for primary_slot_name so as if both are changed at
the same time the user gets the correct information.

"In a moment starts streaming WAL with new configuration." sounds
strange.  It would be more natural to have something like "The WAL
receiver is going to be restarted with the new configuration", just a
suggestion.
--
Michael


signature.asc
Description: PGP signature


Re: ALTER TABLE on system catalogs

2019-02-16 Thread John Naylor
On 2/8/19, Kyotaro HORIGUCHI  wrote:
>  [v2 patch]

I poked this around a bit and found that this mechanism only works for
bootstrapped tables, as those are the only ones where we can scribble
on pg_attribute entries directly during bootstrap. As such, with this
patch we cannot perform ALTER TABLE for pg_index or pg_largeobject*
[1]. IMHO, it's not worth it to introduce new notation unless it
offers complete coverage. If we're willing to only solve the problem
for pg_class and pg_attribute, I'd rather mark the table rather than
the columns, because we already have visibility into CATALOG_VARLEN.
(rough example attached)

On 2/14/19, Peter Eisentraut  wrote:
> That already exists: 'm': Value can be stored compressed inline
>
> I agree that it seems we should be using that for those tables that
> don't have a toast table.  Maybe the genbki stuff could do it
> automatically for the appropriate catalogs.

The docs say:
(Actually, out-of-line storage will still be performed for such
columns, but only as a last resort when there is no other way to make
the row small enough to fit on a page.)

If we allow 'm' as an exception, would that interfere with this? My
demo patch has this just in case:

-if (att->attstorage != 'p')
+if (att->attstorage != 'p' &&
+!(att->attstorage == 'm' && IsCatalogRelation(rel)))
 has_toastable_attrs = true;

Here's another idea:  During initdb, do "ALTER TABLE ALTER COLUMN xyz
SET STORAGE MAIN;"
In initdb, we already pass "-O" to allow system table mods, so I think
we would have to just make sure this one statement doesn't try to add
a toast table. We could have genbki.pl emit a file with SQL statements
to cover all necessary tables/columns.


[1] 
https://www.postgresql.org/message-id/20180928190630.crt43sk5zd5p555h%40alvherre.pgsql

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 3bf308fe3b..885df79f92 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -124,6 +124,7 @@ sub ParseHeader
 $catalog{rowtype_oid_macro}  = '';
 			}
 			$catalog{schema_macro} = /BKI_SCHEMA_MACRO/ ? 1 : 0;
+			$catalog{no_toast} = /BKI_NO_TOAST/ ? 1 : 0;
 			$declaring_attributes = 1;
 		}
 		elsif ($is_client_code)
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 4935e00fb2..b4777e5e5a 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -658,7 +658,7 @@ sub gen_pg_attribute
 			$row{attnum}   = $attnum;
 			$row{attrelid} = $table->{relation_oid};
 
-			morph_row_for_pgattr(\%row, $schema, $attr, $priornotnull);
+			morph_row_for_pgattr($table, \%row, $schema, $attr, $priornotnull);
 			$priornotnull &= ($row{attnotnull} eq 't');
 
 			# If it's bootstrapped, put an entry in postgres.bki.
@@ -691,7 +691,7 @@ sub gen_pg_attribute
 $row{attrelid}  = $table->{relation_oid};
 $row{attstattarget} = '0';
 
-morph_row_for_pgattr(\%row, $schema, $attr, 1);
+morph_row_for_pgattr($table, \%row, $schema, $attr, 1);
 print_bki_insert(\%row, $schema);
 			}
 		}
@@ -707,7 +707,7 @@ sub gen_pg_attribute
 # Any value not handled here must be supplied by caller.
 sub morph_row_for_pgattr
 {
-	my ($row, $pgattr_schema, $attr, $priornotnull) = @_;
+	my ($table, $row, $pgattr_schema, $attr, $priornotnull) = @_;
 	my $attname = $attr->{name};
 	my $atttype = $attr->{type};
 
@@ -719,9 +719,17 @@ sub morph_row_for_pgattr
 	$row->{atttypid}   = $type->{oid};
 	$row->{attlen} = $type->{typlen};
 	$row->{attbyval}   = $type->{typbyval};
-	$row->{attstorage} = $type->{typstorage};
 	$row->{attalign}   = $type->{typalign};
 
+	if ($table->{no_toast} && $attr->{is_varlen})
+	{
+		$row->{attstorage} = 'm';
+	}
+	else
+	{
+		$row->{attstorage} = $type->{typstorage};
+	}
+
 	# set attndims if it's an array type
 	$row->{attndims} = $type->{typcategory} eq 'A' ? '1' : '0';
 
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 77be19175a..37e494b205 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -18,6 +18,7 @@
 #include "access/tuptoaster.h"
 #include "access/xact.h"
 #include "catalog/binary_upgrade.h"
+#include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
 #include "catalog/index.h"
@@ -435,7 +436,8 @@ needs_toast_table(Relation rel)
 maxlength_unknown = true;
 			else
 data_length += maxlen;
-			if (att->attstorage != 'p')
+			if (att->attstorage != 'p' &&
+!(att->attstorage == 'm' && IsCatalogRelation(rel)))
 has_toastable_attrs = true;
 		}
 	}
diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h
index 1b8e4e9e19..1984bbc426 100644
--- a/src/include/catalog/genbki.h
+++ b/src/include/catalog/genbki.h
@@ -27,6 +27,7 @@
 #define BKI_SHARED_RELATION
 #define 

Re: 2019-03 CF Summary / Review - Tranche #2

2019-02-16 Thread Michael Paquier
On Fri, Feb 15, 2019 at 09:45:26PM -0800, Andres Freund wrote:
> - Making WAL receiver startup rely on GUC context for primary_conninfo
>   and primary_slot_name
> 
>   NR: I think this should be rejected. As noted in the thread, this
>   doesn't actually buy us much, and has some potential costs once we
>   make primary_conninfo PGC_SIGHUP.
> 
>   Andres: Reject

I am not surprised by your opinion here, you stated it clearly on the
thread :)  I have just marked it as rejected, as I don't have the
energy to fight for it.

> - online change primary_conninfo
> 
>   WOA: Needs a bit more work, but probably can be finished for v12.

Yep, agreed.

> - Add timeline to partial WAL segments
> 
>   WOA: Seems to need a good bit more work, and touches sensitive bits.
> 
>   Andres: +0.5 for punting to v13

The problem is not that easy, particularly to make things consistent
between the backend and pg_receivewal.

> - Implement NULL-related checks in object address functions to prevent cache 
> lookup errors, take two
> 
>   NR: Seems like this should go into 12, although it'd be good if Alvaro
>   could take another peek before Michael pushes.

I would prefer if Alvaro has an extra look at what I am proposing as
most of the stuff in objectaddress.c is originally his.

> - Temporary materialized views
> 
>   NR: Some infrastructure work is needed before this can go in. Not sure
>   if that can be finished for v12?

I would suggest to move that to v13.  Relation creation done by CTAS
needs to be refactored first, and my take is that we could have this
refactoring part for v12, moving the core portion of the proposal to
the next dev cycle.
--
Michael


signature.asc
Description: PGP signature


Re: Channel binding

2019-02-16 Thread Michael Paquier
On Fri, Feb 15, 2019 at 10:21:12PM -0500, Bruce Momjian wrote:
> Well, my point was that this features was considered to be very
> important for PG 11, but for some reason there has been no advancement
> of this features for PG 12.

Yeah, it is unfortunate that we have not seen patches about that or
concrete proposals for a proper libpq patch.
--
Michael


signature.asc
Description: PGP signature


Re: [Patch][WiP] Tweaked LRU for shared buffers

2019-02-16 Thread Tomas Vondra
On 2/16/19 10:36 AM, Vladimir Sitnikov wrote:
> Benjamin> A related and helpful patch would be to capture the access log and
> Benjamin> provide anonymized traces.
> 
> The traces can be captured via DTrace scripts, so no patch is required here.
> 

Right. Or a BPF on reasonably new linux kernels.

> For instance:
> https://www.postgresql.org/message-id/CAB%3DJe-F_BhGfBu1sO1H7u_XMtvak%3DBQtuJFyv8cfjGBRp7Q_yA%40mail.gmail.com
> or
> https://www.postgresql.org/message-id/CAH2-WzmbUWKvCqjDycpCOSF%3D%3DPEswVf6WtVutgm9efohH0NfHA%40mail.gmail.com
> 
> The missing bit is a database with more-or-less relevant workload.
> 

I think it'd be sufficient (or at least reasonable first step) to get
traces from workloads regularly used for benchmarking (different flavors
of pgbench workload, YCSB, TPC-H/TPC-DS and perhaps something else).

A good algorithm has to perform well in those anyway, and applications
generally can be modeled as a mix of those simple workloads.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online verification of checksums

2019-02-16 Thread Michael Banck
Hi,

Am Mittwoch, den 06.02.2019, 11:39 -0500 schrieb Stephen Frost:
> * Michael Banck (michael.ba...@credativ.de) wrote:
> > Am Dienstag, den 05.02.2019, 11:30 +0100 schrieb Tomas Vondra:
> > > On 2/5/19 8:01 AM, Andres Freund wrote:
> > > > On 2019-02-05 06:57:06 +0100, Fabien COELHO wrote:
> > > > > > > > I'm wondering (possibly again) about the existing early exit if 
> > > > > > > > one block
> > > > > > > > cannot be read on retry: the command should count this as a 
> > > > > > > > kind of bad
> > > > > > > > block, proceed on checking other files, and obviously fail in 
> > > > > > > > the end, but
> > > > > > > > having checked everything else and generated a report. I do not 
> > > > > > > > think that
> > > > > > > > this condition warrants a full stop. ISTM that under rare race 
> > > > > > > > conditions
> > > > > > > > (eg, an unlucky concurrent "drop database" or "drop table") 
> > > > > > > > this could
> > > > > > > > happen when online, although I could not trigger one despite 
> > > > > > > > heavy testing,
> > > > > > > > so I'm possibly mistaken.
> > > > > > > 
> > > > > > > This seems like a defensible judgement call either way.
> > > > > > 
> > > > > > Right now we have a few tests that explicitly check that
> > > > > > pg_verify_checksums fail on broken data ("foo" in the file).  Those
> > > > > > would then just get skipped AFAICT, which I think is the worse 
> > > > > > behaviour
> > > > > > , but if everybody thinks that should be the way to go, we can
> > > > > > drop/adjust those tests and make pg_verify_checksums skip them.
> > > > > > 
> > > > > > Thoughts?
> > > > > 
> > > > > My point is that it should fail as it does, only not immediately 
> > > > > (early
> > > > > exit), but after having checked everything else. This mean avoiding 
> > > > > calling
> > > > > "exit(1)" here and there (lseek, fopen...), but taking note that 
> > > > > something
> > > > > bad happened, and call exit only in the end.
> > > > 
> > > > I can see both as being valuable (one gives you a more complete picture,
> > > > the other a quicker answer in scripts). For me that's the point where
> > > > it's the prerogative of the author to make that choice.
> 
> ... unless people here object or prefer other options, and then it's up
> to discussion and hopefully some consensus comes out of it.
> 
> Also, I have to say that I really don't think the 'quicker answer'
> argument holds any weight, making me question if that's a valid
> use-case.  If there *isn't* an issue, which we would likely all agree is
> the case the vast majority of the time that this is going to be run,
> then it's going to take quite a while and anyone calling it should
> expect and be prepared for that.  In the extremely rare cases, what does
> exiting early actually do for us?
> 
> > Personally, I would prefer to keep it as simple as possible for now and
> > get this patch committed; in my opinion the behaviour is already like
> > this (early exit on corrupt files) so I don't think the online
> > verification patch should change this.
> 
> I'm also in the camp of "would rather it not exit immediately, so the
> extent of the issue is clear".
> 
> > If we see complaints about this, then I'd be happy to change it
> > afterwards.
> 
> I really don't think this is something we should change later on in a
> future release..  If the consensus is that there's really two different
> but valid use-cases then we should make it configurable, but I'm not
> convinced there is.

OK, fair enough.

> > > Why not make this configurable, using a command-line option?
> > 
> > I like this even less - this tool is about verifying checksums, so
> > adding options on what to do when it encounters broken pages looks out-
> > of-scope to me.  Unless we want to say it should generally abort on the
> > first issue (i.e. on wrong checksums as well).
> 
> I definitely disagree that it's somehow 'out of scope' for this tool to
> skip broken pages, when we can tell that they're broken.  

I didn't mean that it's out-of-scope for pg_verify_checksums, I meant it
is out-of-scope for this patch, which adds online checking.

> There is a question here about how to handle a short read since that
> can happen under normal conditions if we're unlucky.  The same is also
> true for files disappearing entirely.
> 
> So, let's talk/think through a few cases:
> 
> A file with just 'foo\n' in it- could that be a page starting with
> an LSN around 666F6F0A that we somehow only read the first few bytes of?
> If not, why not?  I could possibly see an argument that we expect to
> always get at least 512 bytes in a read, or 4K, but it seems like we
> could possibly run into edge cases on odd filesystems or such.  In the
> end, I'm leaning towards categorizing different things, well,
> differently- a short read would be reported as a NOTICE or equivilant,
> perhaps, meaning that the test case needs to do something more than just
> have a file with 'foo' in it, but that is likely a 

Re: allow online change primary_conninfo

2019-02-16 Thread Sergei Kornilov
Hi

> I don't quite think this is the right design. IMO the startup process
> should signal the walreceiver to shut down, and the wal receiver should
> never look at the config.

Ok, i can rewrite such way. Please check attached version.

> Otherwise there's chances for knowledge of
> pg.conf to differ between the processes.

I still not understood why this:
- is not issue for startup process with all primary_conninfo logic
- but issue for walreceiver with Michael Paquier patch; therefore without any 
primary_conninfo change logic in startup.
In both cases primary_conninfo change logic is only in one process.

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8bd57f376b..d7b75e462a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3838,9 +3838,14 @@ ANY num_sync ( 
@@ -3855,9 +3860,14 @@ ANY num_sync ( ).
-  This parameter can only be set at server start.
+  This parameter can only be set in the postgresql.conf
+  file or on the server command line.
   This setting has no effect if primary_conninfo is not
-  set.
+  set or the server is not in standby mode.
+ 
+ 
+  WAL receiver will be restarted after primary_slot_name
+  was changed.
  
 

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ecd12fc53a..6a64a27a51 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12137,6 +12137,36 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	return false;/* not reached */
 }
 
+void
+ProcessStartupSigHup(void)
+{
+	char	   *conninfo = pstrdup(PrimaryConnInfo);
+	char	   *slotname = pstrdup(PrimarySlotName);
+
+	ProcessConfigFile(PGC_SIGHUP);
+
+	/*
+	 * we need shutdown running walreceiver if replication settings was
+	 * changed. walreceiver will be started with new settings in next
+	 * WaitForWALToBecomeAvailable iteration
+	 */
+	if ((strcmp(conninfo, PrimaryConnInfo) != 0 ||
+		 strcmp(slotname, PrimarySlotName) != 0) &&
+		WalRcvRunning())
+	{
+		ereport(LOG,
+(errmsg("terminating walreceiver process due to change of %s",
+		strcmp(conninfo, PrimaryConnInfo) != 0 ? "primary_conninfo" : "primary_slot_name"),
+ errdetail("In a moment starts streaming WAL with new configuration.")));
+
+		ShutdownWalRcv();
+		lastSourceFailed = true;
+	}
+
+	pfree(conninfo);
+	pfree(slotname);
+}
+
 /*
  * Determine what log level should be used to report a corrupt WAL record
  * in the current WAL page, previously read by XLogPageRead().
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 5048a2c2aa..9bf5c792fe 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -147,7 +147,7 @@ HandleStartupProcInterrupts(void)
 	if (got_SIGHUP)
 	{
 		got_SIGHUP = false;
-		ProcessConfigFile(PGC_SIGHUP);
+		ProcessStartupSigHup();
 	}
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 156d147c85..87bd7443ef 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3485,7 +3485,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"primary_conninfo", PGC_POSTMASTER, REPLICATION_STANDBY,
+		{"primary_conninfo", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the connection string to be used to connect to the sending server."),
 			NULL,
 			GUC_SUPERUSER_ONLY
@@ -3496,7 +3496,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"primary_slot_name", PGC_POSTMASTER, REPLICATION_STANDBY,
+		{"primary_slot_name", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the name of the replication slot to use on the sending server."),
 			NULL
 		},
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index f90a6a9139..75ec605a3f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -318,6 +318,7 @@ extern void SetWalWriterSleeping(bool sleeping);
 
 extern void XLogRequestWalReceiverReply(void);
 
+extern void ProcessStartupSigHup(void);
 extern void assign_max_wal_size(int newval, void *extra);
 extern void assign_checkpoint_completion_target(double newval, void *extra);
 
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index beb45551a2..07ac9642ba 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 26;
+use Test::More tests => 27;
 
 # Initialize master node
 my $node_master = get_new_node('master');
@@ -146,7 +146,9 @@ $node_standby_2->append_conf('postgresql.conf',
 	"primary_slot_name = $slotname_2");
 $node_standby_2->append_conf('postgresql.conf',
 	"wal_receiver_status_interval = 1");
-$node_standby_2->restart;
+# should be able change primary_slot_name without restart
+# will wait 

Re: libpq host/hostaddr/conninfo inconsistencies

2019-02-16 Thread Ramanarayana
Hi,

I am getting error while applying patch.I think the patch needs to be
redone on the latest code in master as there are some commits in master
after the patch is created

On Sat, 16 Feb 2019 at 13:44, Fabien COELHO  wrote:

>
> Hello Kyotaro-san,
>
> > On such standpoint, the first hunk in the patch attracted my
> > eyes.
> >
> >   host
> >   
> >
> > -Name of host to connect to.host
> name
> > -If a host name begins with a slash, it specifies Unix-domain
> > -communication rather than TCP/IP communication; the value is the
> > -name of the directory in which the socket file is stored.
> > +   
> >
> > I don't think this is user-friendly since almost all of them don't write
> > multiple hosts there. So I prefer the previous organization.
>
> ISTM that specifying the expected syntax is the first information needed?
>
> The previous organization says "this is a host name (bla bla bla) btw I
> lied at the beginning this is a list".
>
> > The description about IP-address looks too verbose, especially we don't
> > need explain what is IP-address here.
>
> Ok.
>
> I agree that the order is not the best possible one. Here is a simplified
> and reordered version:
>
> """ Comma-separated list of hosts to connect to. Each item may be a host
> name that will be resolved with a look-up, a numeric IP address that will
> be used directly, or the name of a directory which contains the socket
> file for Unix-domain communication, if the specification begins with a
> slash. Each specified target will be tried in turn in the order given. See
>  for details. """
>
> What do you think about that version.
>
> --
> Fabien.
>
>

-- 
Cheers
Ram 4.0


Re: DNS SRV support for LDAP authentication

2019-02-16 Thread Thomas Munro
On Sat, Feb 2, 2019 at 12:48 AM Daniel Gustafsson  wrote:
> +   new_uris = psprintf("%s%s%s://%s:%d",
>
> While this construction isn't introduced in this patch, would it not make 
> sense
> to convert uris to StringInfo instead to improve readability?

Yeah.  This coding is ugly and StringInfo would be much nicer.
Thinking about that made me realise that the proposed SRV case should
also handle multiple SRV records by building a multi-URL string too
(instead of just taking the first one).  I will make it so.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [Patch][WiP] Tweaked LRU for shared buffers

2019-02-16 Thread Vladimir Sitnikov
Benjamin> A related and helpful patch would be to capture the access log and
Benjamin> provide anonymized traces.

The traces can be captured via DTrace scripts, so no patch is required here.

For instance:
https://www.postgresql.org/message-id/CAB%3DJe-F_BhGfBu1sO1H7u_XMtvak%3DBQtuJFyv8cfjGBRp7Q_yA%40mail.gmail.com
or
https://www.postgresql.org/message-id/CAH2-WzmbUWKvCqjDycpCOSF%3D%3DPEswVf6WtVutgm9efohH0NfHA%40mail.gmail.com

The missing bit is a database with more-or-less relevant workload.

Vladimir


Re: [Patch][WiP] Tweaked LRU for shared buffers

2019-02-16 Thread Benjamin Manes
Hi,

I was not involved with Andrey and his team's work, which looks like a very
promising first pass. I can try to clarify a few minor details.

What is CAR? Did you mean ARC, perhaps?


CAR is the Clock variants of ARC: CAR: Clock with Adaptive Replacement


I believe the main interest in ARC is its claim of adaptability with high
hit rates. Unfortunately the reality is less impressive as it fails to
handle many frequency workloads, e.g. poor scan resistance, and the
adaptivity is poor due to the accesses being quite noisy. For W-TinyLFU, we
have recent improvements which show near perfect adaptivity

in
our stress case that results in double the hit rate of ARC and is less than
1% from optimal.

Can you point us to the thread/email discussing those ideas? I have tried
> searching through archives, but I haven't found anything :-(


This thread

doesn't explain Andrey's work, but includes my minor contribution. The
longer thread discusses the interest in CAR, et al.

Are you suggesting to get rid of the buffer rings we use for sequential
> scans, for example? IMHO that's going to be tricky, e.g. because of
> synchronized sequential scans.


If you mean "synchronized" in terms of multi-threading and locks, then this
is a solved problem

in terms of caching. My limited understanding is that the buffer rings are
used to protect the cache from being polluted by scans which flush the
LRU-like algorithms. This allows those policies to capture more frequent
items. It also avoids lock contention on the cache due to writes caused by
misses, where Clock allows lock-free reads but uses a global lock on
writes. A smarter cache eviction policy and concurrency model can handle
this without needing buffer rings to compensate.

Somebody should write a patch to make buffer eviction completely random,
> without aiming to get it committed. That isn't as bad of a strategy as it
> sounds, and it would help with assessing improvements in this area.
>

A related and helpful patch would be to capture the access log and provide
anonymized traces. We have a simulator
 with dozens of
policies to quickly provide a breakdown. That would let you know the hit
rates before deciding on the policy to adopt.

Cheers.

On Fri, Feb 15, 2019 at 4:22 PM Tomas Vondra 
wrote:

>
> On 2/16/19 12:51 AM, Peter Geoghegan wrote:
> > On Fri, Feb 15, 2019 at 3:30 PM Tomas Vondra
> >  wrote:
> >> That TPS chart looks a bit ... wild. How come the master jumps so much
> >> up and down? That's a bit suspicious, IMHO.
> >
> > Somebody should write a patch to make buffer eviction completely
> > random, without aiming to get it committed. That isn't as bad of a
> > strategy as it sounds, and it would help with assessing improvements
> > in this area.
> >
> > We know that the cache replacement algorithm behaves randomly when
> > there is extreme contention, while also slowing everything down due
> > to maintaining the clock.
>
> Possibly, although I still find it strange that the throughput first
> grows, then at shared_buffers 1GB it drops, and then at 3GB it starts
> growing again. Considering this is on 200GB data set, I doubt the
> pressure/contention is much different with 1GB and 3GB, but maybe it is.
>
> > A unambiguously better caching algorithm would at a minimum be able
> > to beat our "cheap random replacement" prototype as well as the
> > existing clocksweep algorithm in most or all cases. That seems like a
> > reasonably good starting point, at least.
> >
>
> Yes, comparison to cheap random replacement would be an interesting
> experiment.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [Patch][WiP] Tweaked LRU for shared buffers

2019-02-16 Thread Benjamin Manes
>
> No, I "synchronized scans" are an optimization to reduce I/O when multiple
> processes do sequential scan on the same table.


Oh, very neat. Thanks!

Interesting. I assume the trace is essentially a log of which blocks were
> requested? Is there some trace format specification somewhere?
>

Yes, whatever constitutes a cache key (block address, item hash, etc). We
write parsers for each trace so there isn't a format requirement. The
parsers produce a 64-bit int stream of keys, which are broadcasted to each
policy via an actor framework. The trace files can be text or binary,
optionally compressed (xz preferred). The ARC traces are block I/O and this
is their format description
,
so perhaps something similar? That parser is only 5 lines of custom code

.

On Fri, Feb 15, 2019 at 5:51 PM Tomas Vondra 
wrote:

> On 2/16/19 1:48 AM, Benjamin Manes wrote:
> > Hi,
> >
> > I was not involved with Andrey and his team's work, which looks like a
> > very promising first pass. I can try to clarify a few minor details.
> >
> > What is CAR? Did you mean ARC, perhaps?
> >
> >
> > CAR is the Clock variants of ARC: CAR: Clock with Adaptive Replacement
> > <
> https://www.usenix.org/legacy/publications/library/proceedings/fast04/tech/full_papers/bansal/bansal.pdf
> >
> >
>
> Thanks, will check.
>
> > I believe the main interest in ARC is its claim of adaptability with
> > high hit rates. Unfortunately the reality is less impressive as it fails
> > to handle many frequency workloads, e.g. poor scan resistance, and the
> > adaptivity is poor due to the accesses being quite noisy. For W-TinyLFU,
> > we have recent improvements which show near perfect adaptivity
> > <
> https://user-images.githubusercontent.com/378614/52891655-2979e380-3141-11e9-91b3-2a3cc80b.png
> > in
> > our stress case that results in double the hit rate of ARC and is less
> > than 1% from optimal.
> >
>
> Interesting.
>
> > Can you point us to the thread/email discussing those ideas? I have
> > tried searching through archives, but I haven't found anything :-(
> >
> >
> > This thread
> > <
> https://www.postgresql.org/message-id/1526057854777-0.post%40n3.nabble.com
> >
> > doesn't explain Andrey's work, but includes my minor contribution. The
> > longer thread discusses the interest in CAR, et al.
> >
>
> Thanks.
>
> > Are you suggesting to get rid of the buffer rings we use for
> > sequential scans, for example? IMHO that's going to be tricky, e.g.
> > because of synchronized sequential scans.
> >
> >
> > If you mean "synchronized" in terms of multi-threading and locks, then
> > this is a solved problem
> > 
> in
> > terms of caching.
>
> No, I "synchronized scans" are an optimization to reduce I/O when
> multiple processes do sequential scan on the same table. Say one process
> is half-way through the table, when another process starts another scan.
> Instead of scanning it from block #0 we start at the position of the
> first process (at which point they "synchronize") and then wrap around
> to read the beginning.
>
> I was under the impression that this somehow depends on the small
> circular buffers, but I've now checked the code and I see that's bogus.
>
>
> > My limited understanding is that the buffer rings are
> > used to protect the cache from being polluted by scans which flush the
> > LRU-like algorithms. This allows those policies to capture more frequent
> > items. It also avoids lock contention on the cache due to writes caused
> > by misses, where Clock allows lock-free reads but uses a global lock on
> > writes. A smarter cache eviction policy and concurrency model can handle
> > this without needing buffer rings to compensate.
> >
>
> Right - that's the purpose of circular buffers.
>
> > Somebody should write a patch to make buffer eviction completely
> > random, without aiming to get it committed. That isn't as bad of a
> > strategy as it sounds, and it would help with assessing improvements
> > in this area.
> >
> >
> > A related and helpful patch would be to capture the access log and
> > provide anonymized traces. We have a simulator
> >  with dozens of
> > policies to quickly provide a breakdown. That would let you know the hit
> > rates before deciding on the policy to adopt.
> >
>
> Interesting. I assume the trace is essentially a log of which blocks
> were requested? Is there some trace format specification somewhere?
>
> cheers
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: libpq host/hostaddr/conninfo inconsistencies

2019-02-16 Thread Fabien COELHO



Hello Kyotaro-san,


On such standpoint, the first hunk in the patch attracted my
eyes.

  host
  
   
-Name of host to connect to.host 
name
-If a host name begins with a slash, it specifies Unix-domain
-communication rather than TCP/IP communication; the value is the
-name of the directory in which the socket file is stored.
+   

I don't think this is user-friendly since almost all of them don't write 
multiple hosts there. So I prefer the previous organization.


ISTM that specifying the expected syntax is the first information needed?

The previous organization says "this is a host name (bla bla bla) btw I 
lied at the beginning this is a list".


The description about IP-address looks too verbose, especially we don't 
need explain what is IP-address here.


Ok.

I agree that the order is not the best possible one. Here is a simplified 
and reordered version:


""" Comma-separated list of hosts to connect to. Each item may be a host 
name that will be resolved with a look-up, a numeric IP address that will 
be used directly, or the name of a directory which contains the socket 
file for Unix-domain communication, if the specification begins with a 
slash. Each specified target will be tried in turn in the order given. See 
 for details. """


What do you think about that version.

--
Fabien.