date:20231217

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-17 Thread jian he

On Mon, Dec 18, 2023 at 1:09 PM torikoshia  wrote:
>
> Hi,
>
> > save the error metadata to  system catalogs would be more expensive,
> > please see below explanation.
> > I have no knowledge of publications.
> > but i feel there is a feature request: publication FOR ALL TABLES
> > exclude regex_pattern.
> > Anyway, that would be another topic.
>
> I think saving error metadata to system catalog is not a good idea, too.
> And I believe Sawada-san just pointed out missing features and did not
> suggested that we use system catalog.
>
> > I don't think "specify the maximum number  of errors to tolerate
> > before raising an ERROR." is very useful
>
> That may be so.
> I imagine it's useful in some use case since some loading tools have
> such options.
> Anyway I agree it's not necessary for initial patch as mentioned in [1].
>
> > I suppose we can specify an ERRORFILE directory. similar
> > implementation [2], demo in [3]
> > it will generate 2 files, one file shows the malform line content as
> > is, another file shows the error info.
>
> That may be a good option when considering "(2) logging errors to
> somewhere".
>
> What do you think about the proposal to develop these features in
> incrementally?
>

I am more with  tom's idea [1], that is when errors happen (data type
conversion only), do not fail, AND we save the error to a table.  I
guess we can implement this logic together, only with a new COPY
option.

imagine a case (it's not that contrived, imho), while conversion from
text to table's int, postgres isspace is different from the source
text file's isspace logic.
then all the lines are malformed. If we just say on error continue and
not save error meta info, the user is still confused which field has
the wrong data, then the user will probably try to incrementally test
which field contains malformed data.

Since we need to save the error somewhere.
Everyone has the privilege to INSERT can do COPY.
I think we also need to handle the access privilege also.
So like I mentioned above, one copy_error error table hub, then
everyone can view/select their own copy failure record.

but save to a server text file/directory, not easy for an INSERT
privilege user to see these files, I think.
similarly not easy to see these failed records in log for limited privilege.

if someone wants to fail at maxerror rows, they can do it, since we
will count how many rows failed.
even though I didn't get it.

[1] https://www.postgresql.org/message-id/900123.1699488001%40sss.pgh.pa.us

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-17 Thread Hayato Kuroda (Fujitsu)

Dear Thomas, Alexander,

> Thanks for reporting. Yes, it has been already reported by me [1], and the 
> server
> log was provided by Andrew [2]. The issue was that a file creation was failed
> because the same one was unlink()'d just before but it was in
> STATUS_DELETE_PENDING
> status. Kindly Alexander proposed a fix [3] and it looks good to me, but
> confirmations by senior and windows-friendly developers are needed to move
> forward.
> (at first we thought the issue was solved by updating, but it was not correct)
> 
> I know that you have developed there region, so I'm very happy if you check 
> the
> forked thread.

I forgot to say an important point. The issue was not introduced by the feature.
It just actualized a possible failure, only for Windows environment.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: "pgoutput" options missing on documentation

2023-12-17 Thread Emre Hasegeli

> This change (Required in between two sentences) looks slightly odd to
> me. Can we instead extend the second line to something like: "This
> parameter is required, and the individual publication names are ...".
> Similarly we can adjust the proto_vesion explanation.

I don't think it's an improvement to join 2 independent sentences with
a comma.  I expanded these by mentioning what is required.

> This sounds like we are supporting more than one logical decoding
> plugin. Can we slightly rephrase it to something like:
> "PostgreSQL supports extensible logical decoding plugin
> pgoutput which is used for the built-in logical
> replication as well."

I understand the confusion.  I reworded it and dropped "extensible".

The new versions are attached.


v03-0002-doc-Clarify-pgoutput-options.patch
Description: Binary data


v03-0001-pgoutput-Test-if-proto_version-is-given.patch
Description: Binary data

Re: "pgoutput" options missing on documentation

2023-12-17 Thread Emre Hasegeli

> I found the existing error code appropriate because for syntax
> specification, either we need to mandate this at the grammar level or
> at the API level. Also, I think we should give a message similar to an
> existing message: "publication_names parameter missing". For example,
> we can say, "proto_version parameter missing". BTW, I also don't like
> the other changes parse_output_parameters() done in 0001, if we want
> to improve all the similar messages there are other places in the code
> as well, so we can separately make the case for the same.

Okay, I am changing these back.  I think we should keep the word
"option".  It is used on other error messages.

Re: Report planning memory in EXPLAIN ANALYZE

2023-12-17 Thread Ashutosh Bapat

Forgot to attach patch. Here it is

On Mon, Dec 18, 2023 at 12:55 PM Ashutosh Bapat
 wrote:
>
> On Sun, Dec 17, 2023 at 10:31 PM Alvaro Herrera  
> wrote:
> >
> > OK, I propose the following further minor tweaks.  (I modified the docs
> > following the wording we have for COSTS and BUFFERS).
>
> LGTM. Included in the attached patch.
>
> >
> > There are two things that still trouble me a bit.  First, we assume that
> > the planner is using an AllocSet context, which I guess is true, but if
> > somebody runs the planner in a context of a different memcxt type, it's
> > going to be a problem.  So far we don't have infrastructure for creating
> > a context of the same type as another context.  Maybe it's too fine a
> > point to worry about, for sure.
>
> I had considered this point. Different contexts take different
> arguments for creation, so some jugglery is required to create a
> context based on type. It looked more than necessary for the limited
> scope of this patch. That's why I settled on the assertion. If we see
> the need in future we can always add that support.
>
> >
> > The other question is about trying to support the EXPLAIN EXECUTE case.
> > Do you find that case really useful?  In a majority of cases planning is
> > not going to happen because it was already done by PREPARE (where we
> > _don't_ report memory, because we don't have EXPLAIN there), so it seems
> > a bit weird.  I suppose you could make it useful if you instructed the
> > user to set plan_cache_mode to custom, assuming that does actually work
> > (I didn't try).
>
> If we set plan_cache_mode to force_custom_plan, we always plan the
> statement and thus report memory.
>
> You are right that we don't always plan the statement when EXECUTE Is
> issued. But it seems we create plan underneath EXECUTE more often that
> I expected. And the report looks mildly useful and interesting.
>
> postgres@21258=#prepare stmt as select * from pg_class where oid = $1;
> PREPARE
> postgres@21258=#explain (memory) execute stmt(1); -- first time
>  QUERY PLAN
> -
>  Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
> rows=1 width=273)
>Index Cond: (oid = '1'::oid)
>  Planner Memory: used=40448 bytes allocated=81920 bytes
> (3 rows)
>
>
> postgres@21258=#explain (memory) execute stmt(1);
>  QUERY PLAN
> -
>  Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
> rows=1 width=273)
>Index Cond: (oid = '1'::oid)
>  Planner Memory: used=40368 bytes allocated=81920 bytes
> (3 rows)
>
> observe that the memory used is slightly different from the first
> time. So when the plan is created again something happens that eats
> few bytes less. I didn't investigate what.
>
> The same output repeats if the statement is executed 3 more times.
> That's as many times a custom plan is created for a statement by
> default.
>
> postgres@21258=#explain (memory) execute stmt(1);
>  QUERY PLAN
> -
>  Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
> rows=1 width=273)
>Index Cond: (oid = $1)
>  Planner Memory: used=40272 bytes allocated=81920 bytes
> (3 rows)
>
> Observe that the memory used is less here again. So when creating the
> generic plan something happened which causes the change in memory
> consumption. Didn't investigate.
>
>
> postgres@21258=#explain (memory) execute stmt(1);
>  QUERY PLAN
> -
>  Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
> rows=1 width=273)
>Index Cond: (oid = $1)
>  Planner Memory: used=3520 bytes allocated=24576 bytes
> (3 rows)
>
> And now the planner is settled on very low value but still non-zero or
> 240 bytes. I think the parameter evaluation takes that much memory.
> Haven't verified.
>
> If we use an non-parameterized statement
> postgres@21258=#prepare stmt as select * from pg_class where oid = 2345;
> PREPARE
> postgres@21258=#explain (memory) execute stmt;
>  QUERY PLAN
> -
>  Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
> rows=1 width=273)
>Index Cond: (oid = '2345'::oid)
>  Planner Memory: used=37200 bytes allocated=65536 bytes
> (3 rows)
>
> first time memory is consumed by the planner.
>
> postgres@21258=#explain (memory) execute stmt;
>  QUERY PLAN
> -
>  Index Scan using pg_class_oid_index on pg_class

Re: Report planning memory in EXPLAIN ANALYZE

2023-12-17 Thread Ashutosh Bapat

On Sun, Dec 17, 2023 at 10:31 PM Alvaro Herrera  wrote:
>
> OK, I propose the following further minor tweaks.  (I modified the docs
> following the wording we have for COSTS and BUFFERS).

LGTM. Included in the attached patch.

>
> There are two things that still trouble me a bit.  First, we assume that
> the planner is using an AllocSet context, which I guess is true, but if
> somebody runs the planner in a context of a different memcxt type, it's
> going to be a problem.  So far we don't have infrastructure for creating
> a context of the same type as another context.  Maybe it's too fine a
> point to worry about, for sure.

I had considered this point. Different contexts take different
arguments for creation, so some jugglery is required to create a
context based on type. It looked more than necessary for the limited
scope of this patch. That's why I settled on the assertion. If we see
the need in future we can always add that support.

>
> The other question is about trying to support the EXPLAIN EXECUTE case.
> Do you find that case really useful?  In a majority of cases planning is
> not going to happen because it was already done by PREPARE (where we
> _don't_ report memory, because we don't have EXPLAIN there), so it seems
> a bit weird.  I suppose you could make it useful if you instructed the
> user to set plan_cache_mode to custom, assuming that does actually work
> (I didn't try).

If we set plan_cache_mode to force_custom_plan, we always plan the
statement and thus report memory.

You are right that we don't always plan the statement when EXECUTE Is
issued. But it seems we create plan underneath EXECUTE more often that
I expected. And the report looks mildly useful and interesting.

postgres@21258=#prepare stmt as select * from pg_class where oid = $1;
PREPARE
postgres@21258=#explain (memory) execute stmt(1); -- first time
 QUERY PLAN
-
 Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
rows=1 width=273)
   Index Cond: (oid = '1'::oid)
 Planner Memory: used=40448 bytes allocated=81920 bytes
(3 rows)

postgres@21258=#explain (memory) execute stmt(1);
 QUERY PLAN
-
 Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
rows=1 width=273)
   Index Cond: (oid = '1'::oid)
 Planner Memory: used=40368 bytes allocated=81920 bytes
(3 rows)

observe that the memory used is slightly different from the first
time. So when the plan is created again something happens that eats
few bytes less. I didn't investigate what.

The same output repeats if the statement is executed 3 more times.
That's as many times a custom plan is created for a statement by
default.

postgres@21258=#explain (memory) execute stmt(1);
 QUERY PLAN
-
 Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
rows=1 width=273)
   Index Cond: (oid = $1)
 Planner Memory: used=40272 bytes allocated=81920 bytes
(3 rows)

Observe that the memory used is less here again. So when creating the
generic plan something happened which causes the change in memory
consumption. Didn't investigate.

postgres@21258=#explain (memory) execute stmt(1);
 QUERY PLAN
-
 Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
rows=1 width=273)
   Index Cond: (oid = $1)
 Planner Memory: used=3520 bytes allocated=24576 bytes
(3 rows)

And now the planner is settled on very low value but still non-zero or
240 bytes. I think the parameter evaluation takes that much memory.
Haven't verified.

If we use an non-parameterized statement
postgres@21258=#prepare stmt as select * from pg_class where oid = 2345;
PREPARE
postgres@21258=#explain (memory) execute stmt;
 QUERY PLAN
-
 Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
rows=1 width=273)
   Index Cond: (oid = '2345'::oid)
 Planner Memory: used=37200 bytes allocated=65536 bytes
(3 rows)

first time memory is consumed by the planner.

postgres@21258=#explain (memory) execute stmt;
 QUERY PLAN
-
 Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
rows=1 width=273)
   Index Cond: (oid = '2345'::oid)
 Planner Memory: used=240 bytes allocated=8192 bytes
(3 rows)

Next time onwards it has settled on the custom plan.

I think there's something to learn and investigate from memory numbers
here. So not completely meaningless and useless.

Re: [PATCH]: Not to invaldiate CatalogSnapshot for local invalidation messages

2023-12-17 Thread Xiaoran Wang

Hi,
Thanks for your reply.

jian he  于2023年12月18日周一 08:20写道：

> Hi
> ---setup.
> drop table s2;
> create table s2(a int);
>
> After apply the patch
> alter table s2 add primary key (a);
>
> watch CatalogSnapshot
> 
> #0  GetNonHistoricCatalogSnapshot (relid=1259)
> at
> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:412
> #1  0x55ba78f0d6ba in GetCatalogSnapshot (relid=1259)
> at
> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:371
> #2  0x55ba785ffbe1 in systable_beginscan
> (heapRelation=0x7f256f30b5a8, indexId=2662, indexOK=false,
> snapshot=0x0, nkeys=1, key=0x7ffe230f0180)
> at
> ../../Desktop/pg_src/src7/postgresql/src/backend/access/index/genam.c:413
> (More stack frames follow...)
>
> -
> Hardware watchpoint 13: CatalogSnapshot
>
> Old value = (Snapshot) 0x55ba7980b6a0 
> New value = (Snapshot) 0x0
> InvalidateCatalogSnapshot () at
> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435
> 435 SnapshotResetXmin();
> (gdb) bt 4
> #0  InvalidateCatalogSnapshot ()
> at
> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435
> #1  0x55ba78f0ee85 in AtEOXact_Snapshot (isCommit=true,
> resetXmin=false)
> at
> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:1057
> #2  0x55ba7868201b in CommitTransaction ()
> at
> ../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:2373
> #3  0x55ba78683495 in CommitTransactionCommand ()
> at
> ../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:3061
> (More stack frames follow...)
>
> --
> but the whole process changes pg_class, pg_index,
> pg_attribute,pg_constraint etc.
> only one GetCatalogSnapshot and  InvalidateCatalogSnapshot seems not
> correct?
> what if there are concurrency changes in the related pg_catalog table.
>
> your patch did pass the isolation test!
>

Yes, I have run the installcheck-world locally, and all the tests passed.
There are two kinds of Invalidation Messages.
One kind is from the local backend, such as what you did in the example
"alter table s2 add primary key (a);",  it modifies the pg_class,
pg_attribute ect ,
so it generates some Invalidation Messages to invalidate the "s2" related
tuples in pg_class , pg_attribute ect, and Invalidate Message to invalidate
s2
relation cache. When the command is finished, in the
CommandCounterIncrement,
those Invalidation Messages will be processed to make the system cache work
well for the following commands.

The other kind of Invalidation Messages are from other backends.
Suppose there are two sessions:
session1
---
1: create table foo(a int);
---
session 2
---
1: create table test(a int); (before session1:1)
2: insert into foo values(1); (execute after session1:1)
---
Session1 will generate Invalidation Messages and send them when the
transaction is committed,
and session 2 will accept those Invalidation Messages from session 1 and
then execute
the second command.

Before the patch, Postgres will invalidate the CatalogSnapshot for those
two kinds of Invalidation
Messages. So I did a small optimization in this patch, for local
Invalidation Messages, we don't
call InvalidateCatalogSnapshot, we can use one CatalogSnapshot in a
transaction even if we modify
the catalog and generate Invalidation Messages, as the visibility of the
tuple is identified by the curcid,
as long as we update the curcid of the CatalogSnapshot in
SnapshotSetCommandId,
it can work
correctly.



> I think you patch doing is against following code comments in
> src/backend/utils/time/snapmgr.c
>
> /*
>  * CurrentSnapshot points to the only snapshot taken in
> transaction-snapshot
>  * mode, and to the latest one taken in a read-committed transaction.
>  * SecondarySnapshot is a snapshot that's always up-to-date as of the
> current
>  * instant, even in transaction-snapshot mode.  It should only be used for
>  * special-purpose code (say, RI checking.)  CatalogSnapshot points to an
>  * MVCC snapshot intended to be used for catalog scans; we must invalidate
> it
>  * whenever a system catalog change occurs.
>  *
>  * These SnapshotData structs are static to simplify memory allocation
>  * (see the hack in GetSnapshotData to avoid repeated malloc/free).
>  */
> static SnapshotData CurrentSnapshotData = {SNAPSHOT_MVCC};
> static SnapshotData SecondarySnapshotData = {SNAPSHOT_MVCC};
> SnapshotData CatalogSnapshotData = {SNAPSHOT_MVCC};
> SnapshotData SnapshotSelfData = {SNAPSHOT_SELF};
> SnapshotData SnapshotAnyData = {SNAPSHOT_ANY};
>

Thank you for pointing it out, I think I need to update the comment in the
patch too.

Re: Fix bug with indexes on whole-row expressions

2023-12-17 Thread ywgrit

Thanks, tom. Considering the scenario where the indexed column is a
function Var on a whole expression, it's really not a good idea to disable
creating index on whole expression. I tried
find_composite_type_dependencies, it seems that this function can only
detect dependencies created by statements such as 'CREATE INDEX
test_tbl1_idx ON test_tbl1((row(x,y)::test_type1))', and cannot detect
dependencies created by statements such as 'CREATE INDEX test_tbl1_idx ON
test_tbl1((test _tbl1))'. After the execution of the former sql statement,
4 rows are added to the pg_depend table, one of which is the index ->
pg_type dependency. After the latter sql statement is executed, only one
row is added to the pg_depend table, and there is no index -> pg_type
dependency, so I guess this function doesn't detect all cases of index on
whole-row expression. And I would suggest to do the detection when the
index is created, because then we can get the details of the index and give
a warning in the way you mentioned.

Tom Lane  于2023年12月13日周三 23:01写道：

> ywgrit  writes:
> > I forbid to create indexes on whole-row expression in the following
> patch.
> > I'd like to hear your opinions.
>
> As I said in the previous thread, I don't think this can possibly
> be acceptable.  Surely there are people depending on the capability.
> I'm not worried so much about the exact case of an index column
> being a whole-row Var --- I agree that that's pretty useless ---
> but an index column that is a function on a whole-row Var seems
> quite useful.  (Your patch fails to detect that, BTW, which means
> it does not block the case presented in bug #18244.)
>
> I thought about extending the ALTER TABLE logic to disallow changes
> in composite types that appear in index expressions.  We already have
> find_composite_type_dependencies(), and it turns out that this already
> blocks ALTER for the case you want to forbid, but we concluded that we
> didn't need to prevent it for the bug #18244 case:
>
>  * If objsubid identifies a specific column, refer to that in error
>  * messages.  Otherwise, search to see if there's a user column of
> the
>  * type.  (We assume system columns are never of interesting
> types.)
>  * The search is needed because an index containing an expression
>  * column of the target type will just be recorded as a
> whole-relation
>  * dependency.  If we do not find a column of the type, the
> dependency
>  * must indicate that the type is transiently referenced in an
> index
>  * expression but not stored on disk, which we assume is OK, just
> as
>  * we do for references in views.  (It could also be that the
> target
>  * type is embedded in some container type that is stored in an
> index
>  * column, but the previous recursion should catch such cases.)
>
> Perhaps a reasonable answer would be to issue a WARNING (not error)
> in the case where an index has this kind of dependency.  The index
> might need to be reindexed --- but it might not, too, and in any case
> I doubt that flat-out forbidding the ALTER is a helpful idea.
>
> regards, tom lane
>

Re: meson: Stop using deprecated way getting path of files

2023-12-17 Thread John Naylor

On Tue, Dec 5, 2023 at 3:27 AM Tristan Partin  wrote:
>
> On Mon Dec 4, 2023 at 2:10 PM CST, Tom Lane wrote:
> > Not sure what you were using, but are you aware that SQL access to the
> > buildfarm database is available to project members?  My own stock
> > approach to checking on this sort of thing is like
> >
> > select * from
> > (select sysname, snapshot, unnest(string_to_array(log_text, E'\n')) as l
> > from build_status_log join snapshots using (sysname, snapshot)
> > where log_stage = 'configure.log') ss
> > where l like 'checking for builtin %'
>
> This looks useful. I had no idea about this. Can you send me any
> resources for setting this up? My idea was just to do some web scraping.

+1 -- I was vaguely aware of this, but can't find any mention of
specifics in the buildfarm how-to, or other places I thought to look.

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-12-17 Thread John Naylor

On Fri, Dec 15, 2023 at 3:15 PM Masahiko Sawada  wrote:
>
> On Fri, Dec 15, 2023 at 10:30 AM John Naylor  wrote:

> >  parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
> > - int nrequested_workers, int max_items,
> > - int elevel, BufferAccessStrategy bstrategy)
> > + int nrequested_workers, int vac_work_mem,
> > + int max_offset, int elevel,
> > + BufferAccessStrategy bstrategy)
> >
> > It seems very strange to me that this function has to pass the
> > max_offset. In general, it's been simpler to assume we have a constant
> > max_offset, but in this case that fact is not helping. Something to
> > think about for later.
>
> max_offset was previously used in old TID encoding in tidstore. Since
> tidstore has entries for each block, I think we no longer need it.

It's needed now to properly size the allocation of TidStoreIter which
contains...

+/* Result struct for TidStoreIterateNext */
+typedef struct TidStoreIterResult
+{
+ BlockNumber blkno;
+ int num_offsets;
+ OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
+} TidStoreIterResult;

Maybe we can palloc the offset array to "almost always" big enough,
with logic to resize if needed? If not too hard, seems worth it to
avoid churn in the parameter list.

> > v45-0010:
> >
> > Thinking about this some more, I'm not sure we need to do anything
> > different for the *starting* segment size. (Controlling *max* size
> > does seem important, however.) For the corner case of m_w_m = 1MB,
> > it's fine if vacuum quits pruning immediately after (in effect) it
> > finds the DSA has gone to 2MB. It's not worth bothering with, IMO. If
> > the memory accounting starts >1MB because we're adding the trivial
> > size of some struct, let's just stop doing that. The segment
> > allocations are what we care about.
>
> IIUC it's for work_mem, whose the minimum value is 64kB.
>
> >
> > v45-0011:
> >
> > + /*
> > + * max_bytes is forced to be at least 64kB, the current minimum valid
> > + * value for the work_mem GUC.
> > + */
> > + max_bytes = Max(64 * 1024L, max_bytes);
> >
> > Why?
>
> This is to avoid creating a radix tree within very small memory. The
> minimum work_mem value is a reasonable lower bound that PostgreSQL
> uses internally. It's actually copied from tuplesort.c.

There is no explanation for why it should be done like tuplesort.c. Also...

- tree->leaf_ctx = SlabContextCreate(ctx,
-"radix tree leaves",
-RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)),
-sizeof(RT_VALUE_TYPE));
+ tree->leaf_ctx = SlabContextCreate(ctx,
+"radix tree leaves",
+Min(RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)),
+work_mem),
+sizeof(RT_VALUE_TYPE));

At first, my eyes skipped over this apparent re-indent, but hidden
inside here is another (undocumented) attempt to clamp the size of
something. There are too many of these sprinkled in various places,
and they're already a maintenance hazard -- a different one was left
behind in v45-0011:

@@ -201,6 +183,7 @@ TidStoreCreate(size_t max_bytes, int max_off,
dsa_area *area)
ts->control->max_bytes = max_bytes - (70 * 1024);
  }

Let's do it in just one place. In TidStoreCreate(), do

/* clamp max_bytes to at least the size of the empty tree with
allocated blocks, so it doesn't immediately appear full */
ts->control->max_bytes = Max(max_bytes, {rt, shared_rt}_memory_usage);

Then we can get rid of all the worry about 1MB/2MB, 64kB, 70kB -- all that.

I may not recall everything while writing this, but it seems the only
other thing we should be clamping is the max aset block size (solved)
/ max DSM segment size (in progress).

Re: introduce dynamic shared memory registry

2023-12-17 Thread Andrei Lepikhov


On 5/12/2023 10:46, Nathan Bossart wrote:

I don't presently have any concrete plans to use this for anything, but I
thought it might be useful for extensions for caching, etc. and wanted to
see whether there was any interest in the feature.


I am delighted that you commenced this thread.
Designing extensions, every time I feel pain introducing one shared 
value or some global stat, the extension must be required to be loadable 
on startup only. It reduces the flexibility of even very lightweight 
extensions, which look harmful to use in a cloud.


--
regards,
Andrei Lepikhov
Postgres Professional

Re: Change GUC hashtable to use simplehash?

2023-12-17 Thread John Naylor

I wrote:
> Updated next steps:

> * Add some desperately needed explanatory comments.

There is a draft of this in v10-0001. I also removed the validation
scaffolding and ran pgindent. This could use some review and/or
bikeshedding, in particular on the name hashfn_unstable.h. I also
considered *_volatile.h or *_inmemory.h, but nothing stands out as
more clear.

> * Use this in some existing cases where it makes sense.

For now just two:
v10-0002 is Jeff's change to the search path cache, but with the
chunked interface that I found to be faster.
v10-0003 is a copy of something buried in an earlier version: use in
pgstat. Looks nicer, but not yet tested.
From aecbc51780ada4634855727c50e3b85a8f7f Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 9 Dec 2023 16:24:56 +0700
Subject: [PATCH v10 3/3] Use fasthash32 for pgstat_hash_hash_key

Currently this calls the 32-bit Murmur finalizer on the three elements,
then joined with hash_combine().  This is simpler and has better
collision guarantees.

WIP: Make sure performance is at least comparable.

WIP: We may not need the full 32-bit finalizer reducing step.
It would be slightly cheaper to just use fasthash64 and then take
the lower 32 bits.

Discussion: (none yet, buried in a related patchset)
---
 src/include/utils/pgstat_internal.h | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 60fbf9394b..ecc46bef04 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -14,7 +14,7 @@
 #define PGSTAT_INTERNAL_H

-#include "common/hashfn.h"
+#include "common/hashfn_unstable.h"
 #include "lib/dshash.h"
 #include "lib/ilist.h"
 #include "pgstat.h"
@@ -777,15 +777,10 @@ static inline uint32
 pgstat_hash_hash_key(const void *d, size_t size, void *arg)
 {
 	const PgStat_HashKey *key = (PgStat_HashKey *) d;
-	uint32		hash;

 	Assert(size == sizeof(PgStat_HashKey) && arg == NULL);

-	hash = murmurhash32(key->kind);
-	hash = hash_combine(hash, murmurhash32(key->dboid));
-	hash = hash_combine(hash, murmurhash32(key->objoid));
-
-	return hash;
+	return fasthash32((const char *) key, size, 0);
 }

 /*
-- 
2.43.0

From c7bd727b24a8935343df6fb24d10948fa6d4d57c Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Mon, 18 Dec 2023 11:10:28 +0700
Subject: [PATCH v10 2/3] Use fasthash for the search path cache

This serves to demonstrate the incremental API, allowing inlined
hash calculation without a strlen call. This brings the general case
performance closer to the optimization done in commit a86c61c9ee.

WIP: roleid should be mixed in normally, unless we have
reason to just use it as a seed.

Jeff Davis, with switch to chunked interface by me

Discussion: https://www.postgresql.org/message-id/b40292c99e623defe5eadedab1d438cf51a4107c.camel%40j-davis.com
---
 src/backend/catalog/namespace.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 5027efc91d..7fe2fd1fd4 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -41,7 +41,7 @@
 #include "catalog/pg_ts_template.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
-#include "common/hashfn.h"
+#include "common/hashfn_unstable.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -247,11 +247,25 @@ static bool MatchNamedCall(HeapTuple proctup, int nargs, List *argnames,
 static inline uint32
 spcachekey_hash(SearchPathCacheKey key)
 {
-	const unsigned char *bytes = (const unsigned char *) key.searchPath;
-	int			blen = strlen(key.searchPath);
+	const char *const start = key.searchPath;
+	const char *buf = key.searchPath;
+	fasthash_state hs;

-	return hash_combine(hash_bytes(bytes, blen),
-		hash_uint32(key.roleid));
+	/* WIP: maybe roleid should be mixed in normally */
+	fasthash_init(, FH_UNKNOWN_LENGTH, key.roleid);
+	while (*buf)
+	{
+		int			chunk_len = 0;
+
+		while (chunk_len < FH_SIZEOF_ACCUM && buf[chunk_len] != '\0')
+			chunk_len++;
+
+		fasthash_accum(, buf, chunk_len);
+		buf += chunk_len;
+	}
+
+	/* pass the length to tweak the final mix */
+	return fasthash_final32(, buf - start);
 }

 static inline bool
-- 
2.43.0

From a990c20cab3c293a514b0c5120dfb83a3258e666 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Mon, 27 Nov 2023 17:03:38 +0700
Subject: [PATCH v10 1/3] Add inlineable, incremental hash functions for
 in-memory use

A number of places hash NUL-termminated strings. Currently, we need
to call strlen first because hash_bytes needs the length. For short
strings the C library call has a large overhead, and strlen calls
show up prominently in profiles.

Per suggestion from Andres Freund, add hash functions with an
incremental interface. Instead of trying to whack around hash_bytes
while maintaining its current behavior on all platforms, we base
this work on fasthash (MIT

Re: Fix a comment in basic_archive about NO_INSTALLCHECK

2023-12-17 Thread Bharath Rupireddy

On Thu, Apr 6, 2023 at 9:26 AM Michael Paquier  wrote:
>
> On Mon, Apr 03, 2023 at 08:56:10AM +0530, Bharath Rupireddy wrote:
> > It looks like comments in make file and meson file about not running
> > basic_archive tests in NO_INSTALLCHECK mode are wrong. The comments say the
> > module needs to be loaded via shared_preload_libraries=basic_archive, but
> > it actually doesn't. The custom file needs archive related parameters and
> > wal_level=replica. Here's a patch correcting that comment.
>
> Saying that, updating the comments about the dependency with
> archive_library and the module's GUC is right.

Reworded the comment a bit and attached the v2 patch.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v2-0001-Fix-a-comment-in-basic_archive-about-NO_INSTALLCH.patch
Description: Binary data

Re: Transaction timeout

2023-12-17 Thread Andrey M. Borodin



> On 16 Dec 2023, at 05:58, Japin Li  wrote:
> 
> 
> On Fri, 15 Dec 2023 at 17:51, Andrey M. Borodin  wrote:
>>> On 8 Dec 2023, at 15:29, Japin Li  wrote:
>>> 
>>> Thanks for updating the patch. LGTM.
>> 
>> PFA v9. Changes:
>> 1. Added tests for idle_in_transaction_timeout
>> 2. Suppress statement_timeout if it’s shorter than transaction_timeout
>> 
> +   if (StatementTimeout > 0
> +   && IdleInTransactionSessionTimeout < TransactionTimeout)
>   ^
> 
> Should be StatementTimeout?
Yes, that’s an oversight. I’ve adjusted tests so they catch this problem.

> Maybe we should add documentation to describe this behavior.

I've added a paragraph about it to config.sgml, but I'm not sure about the 
comprehensiveness of the wording.


Best regards, Andrey Borodin.


v10-0001-Introduce-transaction_timeout.patch
Description: Binary data

Re: Add a perl function in Cluster.pm to generate WAL

2023-12-17 Thread Bharath Rupireddy

On Wed, Jul 19, 2023 at 4:11 PM Bharath Rupireddy
 wrote:
>
> Attached the v2 patch. Thoughts?

Rebase needed, attached v3 patch.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v3-0001-Add-a-TAP-test-function-to-generate-WAL.patch
Description: Binary data

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-17 Thread torikoshia


Hi,


save the error metadata to  system catalogs would be more expensive,
please see below explanation.
I have no knowledge of publications.
but i feel there is a feature request: publication FOR ALL TABLES
exclude regex_pattern.
Anyway, that would be another topic.


I think saving error metadata to system catalog is not a good idea, too.
And I believe Sawada-san just pointed out missing features and did not 
suggested that we use system catalog.



I don't think "specify the maximum number  of errors to tolerate
before raising an ERROR." is very useful


That may be so.
I imagine it's useful in some use case since some loading tools have 
such options.

Anyway I agree it's not necessary for initial patch as mentioned in [1].


I suppose we can specify an ERRORFILE directory. similar
implementation [2], demo in [3]
it will generate 2 files, one file shows the malform line content as
is, another file shows the error info.


That may be a good option when considering "(2) logging errors to 
somewhere".


What do you think about the proposal to develop these features in 
incrementally?


On 2023-12-15 05:48, Masahiko Sawada wrote:

So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.


[1] 
https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: "pgoutput" options missing on documentation

2023-12-17 Thread Amit Kapila

On Fri, Dec 15, 2023 at 7:06 PM Emre Hasegeli  wrote:
>
>
> > SUGGESTION
> > -proto_version
> > -publication_names
> > -binary
> > -messages
> > -origin
> >
> > Requires minimum protocol version 2:
> > -streaming (boolean)
> >
> > Requires minimum protocol version 3:
> > -two_phase
> >
> > Requires minimum protocol version 4:
> > -streaming (parallel)
>
> I am still not sure if this is any better.  I don't like that
> "streaming" appears twice, and I wouldn't know how to format this
> nicely.
>

The currently proposed way seems reasonable to me.

> The new versions are attached.
>
> I also added "Required." for "proto_version" and "publication_names".
>

Comma separated list of publication names for which to subscribe
-   (receive changes). The individual publication names are treated
-   as standard objects names and can be quoted the same as needed.
+   (receive changes).  Required.  The individual publication names are

This change (Required in between two sentences) looks slightly odd to
me. Can we instead extend the second line to something like: "This
parameter is required, and the individual publication names are ...".
Similarly we can adjust the proto_vesion explanation.

One minor comment:

+ 
+  PostgreSQL supports extensible logical decoding
+  plugins.  pgoutput is the standard one used for
+  the built-in logical replication.
+ 

This sounds like we are supporting more than one logical decoding
plugin. Can we slightly rephrase it to something like:
"PostgreSQL supports extensible logical decoding plugin
pgoutput which is used for the built-in logical
replication as well."

-- 
With Regards,
Amit Kapila.

Re: "pgoutput" options missing on documentation

2023-12-17 Thread Amit Kapila

On Fri, Dec 15, 2023 at 7:06 PM Emre Hasegeli  wrote:
>
> > I saw that the original "publication_names" error was using
> > errcode(ERRCODE_INVALID_PARAMETER_VALUE), but TBH since there is no
> > option given at all I felt ERRCODE_SYNTAX_ERROR might be more
> > appropriate errcode for those 2 mandatory option errors.
>
> It makes sense to me.  Changed.
>

I found the existing error code appropriate because for syntax
specification, either we need to mandate this at the grammar level or
at the API level. Also, I think we should give a message similar to an
existing message: "publication_names parameter missing". For example,
we can say, "proto_version parameter missing". BTW, I also don't like
the other changes parse_output_parameters() done in 0001, if we want
to improve all the similar messages there are other places in the code
as well, so we can separately make the case for the same.

-- 
With Regards,
Amit Kapila.

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-17 Thread torikoshia

On 2023-12-15 05:48, Masahiko Sawada wrote:

Thanks for joining this discussion!

I've read this thread and the latest patch. IIUC with SAVE_ERROR
option, COPY FROM creates an error table for the target table and
writes error information there.

While I agree that the final shape of this feature would be something
like that design, I'm concerned some features are missing in order to
make this feature useful in practice. For instance, error logs are
inserted to error tables without bounds, meaning that users who want
to tolerate errors during COPY FROM will have to truncate or drop the
error tables periodically, or the database will grow with error logs
without limit. Ideally such maintenance work should be done by the
database. There might be some users who want to log such conversion
errors in server logs to avoid such maintenance work. I think we
should provide an option for where to write, at least. Also, since the
error tables are normal user tables internally, error logs are also
replicated to subscribers if there is a publication FOR ALL TABLES,
unlike system catalogs. I think some users would not like such
behavior.

Looking at SAVE_ERROR feature closely, I think it consists of two
separate features. That is, it enables COPY FROM to load data while
(1) tolerating errors and (2) logging errors to somewhere (i.e., an
error table). If we implement only (1), it would be like COPY FROM
tolerate errors infinitely and log errors to /dev/null. The user
cannot see the error details but I guess it could still help some
cases as Andres mentioned[1] (it might be a good idea to send the
number of rows successfully loaded in a NOTICE message if some rows
could not be loaded). Then with (2), COPY FROM can log error
information to somewhere such as tables and server logs and the user
can select it.

+1.
I may be biased since I wrote some ~v6 patches which just output the
soft errors and number of skipped rows to log, but I think just (1)
would be worth implementing as you pointed out and I like if users could
choose where to log output.

I think there would be situations where it is preferable to save errors
to server log even considering problems which were pointed out in [1],
i.e. manually loading data.

[1]
https://www.postgresql.org/message-id/739953.1699467519%40sss.pgh.pa.us

feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.

Regards,

[1]
https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm%40awork3.anarazel.de

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: "pgoutput" options missing on documentation

2023-12-17 Thread Peter Smith

On Mon, Dec 18, 2023 at 11:35 AM Peter Smith  wrote:
>
> Hi, I had a look at the latest v02 patches.
>
> On Sat, Dec 16, 2023 at 12:36 AM Emre Hasegeli  wrote:
> >
> > > OK, to deal with that can't you just include "origin" in the first
> > > group which has no special protocol requirements?
> >
> > I think it'd be confusing because the option is not available before
> > version 16.  I think it should really check for the version number and
> > complain if it's less than 4.
>
> Hm. I don't think a proto_version check is required for "origin".
>
> IIUC, the protocol version number indicates the message byte format.
> It's needed so that those messages bytes can be read/written in the
> same/compatible way. OTOH I thought the "origin" option has nothing
> really to do with actual message formats on the wire; I think it works
> just by filtering up-front to decide either to send the changes or not
> send the changes. For example, so long as PostgreSQL >= v16, I expect
> you could probably use "origin" with any proto_version you wanted.
>

But, I don't know how the user would be able to arrange for such a
mixture of PG/proto_version versions. because they do seem tightly
coupled for pgoutput.

e.g.
server_version = walrcv_server_version(LogRepWorkerWalRcvConn);
options.proto.logical.proto_version =
server_version >= 16 ?
LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM :
server_version >= 15 ? LOGICALREP_PROTO_TWOPHASE_VERSION_NUM :
server_version >= 14 ? LOGICALREP_PROTO_STREAM_VERSION_NUM :
LOGICALREP_PROTO_VERSION_NUM;

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: "pgoutput" options missing on documentation

2023-12-17 Thread Peter Smith

Hi, I had a look at the latest v02 patches.

On Sat, Dec 16, 2023 at 12:36 AM Emre Hasegeli  wrote:
>
> > OK, to deal with that can't you just include "origin" in the first
> > group which has no special protocol requirements?
>
> I think it'd be confusing because the option is not available before
> version 16.  I think it should really check for the version number and
> complain if it's less than 4.

Hm. I don't think a proto_version check is required for "origin".

IIUC, the protocol version number indicates the message byte format.
It's needed so that those messages bytes can be read/written in the
same/compatible way. OTOH I thought the "origin" option has nothing
really to do with actual message formats on the wire; I think it works
just by filtering up-front to decide either to send the changes or not
send the changes. For example, so long as PostgreSQL >= v16, I expect
you could probably use "origin" with any proto_version you wanted.

>
> > SUGGESTION
> > -proto_version
> > -publication_names
> > -binary
> > -messages
> > -origin
> >
> > Requires minimum protocol version 2:
> > -streaming (boolean)
> >
> > Requires minimum protocol version 3:
> > -two_phase
> >
> > Requires minimum protocol version 4:
> > -streaming (parallel)
>
> I am still not sure if this is any better.  I don't like that
> "streaming" appears twice, and I wouldn't know how to format this
> nicely.
>

I won't keep pushing to rearrange the docs. I think all the content is
OK anyway, so let's see if other people have any opinions on how the
new information is best presented.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PATCH]: Not to invaldiate CatalogSnapshot for local invalidation messages

2023-12-17 Thread jian he

Hi
---setup.
drop table s2;
create table s2(a int);

After apply the patch
alter table s2 add primary key (a);

watch CatalogSnapshot

#0  GetNonHistoricCatalogSnapshot (relid=1259)
at ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:412
#1  0x55ba78f0d6ba in GetCatalogSnapshot (relid=1259)
at ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:371
#2  0x55ba785ffbe1 in systable_beginscan
(heapRelation=0x7f256f30b5a8, indexId=2662, indexOK=false,
snapshot=0x0, nkeys=1, key=0x7ffe230f0180)
at ../../Desktop/pg_src/src7/postgresql/src/backend/access/index/genam.c:413
(More stack frames follow...)

-
Hardware watchpoint 13: CatalogSnapshot

Old value = (Snapshot) 0x55ba7980b6a0 
New value = (Snapshot) 0x0
InvalidateCatalogSnapshot () at
../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435
435 SnapshotResetXmin();
(gdb) bt 4
#0  InvalidateCatalogSnapshot ()
at ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435
#1  0x55ba78f0ee85 in AtEOXact_Snapshot (isCommit=true, resetXmin=false)
at 
../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:1057
#2  0x55ba7868201b in CommitTransaction ()
at 
../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:2373
#3  0x55ba78683495 in CommitTransactionCommand ()
at 
../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:3061
(More stack frames follow...)

--
but the whole process changes pg_class, pg_index,
pg_attribute,pg_constraint etc.
only one GetCatalogSnapshot and  InvalidateCatalogSnapshot seems not correct?
what if there are concurrency changes in the related pg_catalog table.

your patch did pass the isolation test!

I think you patch doing is against following code comments in
src/backend/utils/time/snapmgr.c

/*
 * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
 * mode, and to the latest one taken in a read-committed transaction.
 * SecondarySnapshot is a snapshot that's always up-to-date as of the current
 * instant, even in transaction-snapshot mode.  It should only be used for
 * special-purpose code (say, RI checking.)  CatalogSnapshot points to an
 * MVCC snapshot intended to be used for catalog scans; we must invalidate it
 * whenever a system catalog change occurs.
 *
 * These SnapshotData structs are static to simplify memory allocation
 * (see the hack in GetSnapshotData to avoid repeated malloc/free).
 */
static SnapshotData CurrentSnapshotData = {SNAPSHOT_MVCC};
static SnapshotData SecondarySnapshotData = {SNAPSHOT_MVCC};
SnapshotData CatalogSnapshotData = {SNAPSHOT_MVCC};
SnapshotData SnapshotSelfData = {SNAPSHOT_SELF};
SnapshotData SnapshotAnyData = {SNAPSHOT_ANY};

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-17 Thread jian he

On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada  wrote:
>
> Hi,
>
> I've read this thread and the latest patch. IIUC with SAVE_ERROR
> option, COPY FROM creates an error table for the target table and
> writes error information there.
>
> While I agree that the final shape of this feature would be something
> like that design, I'm concerned some features are missing in order to
> make this feature useful in practice. For instance, error logs are
> inserted to error tables without bounds, meaning that users who want
> to tolerate errors during COPY FROM  will have to truncate or drop the
> error tables periodically, or the database will grow with error logs
> without limit. Ideally such maintenance work should be done by the
> database. There might be some users who want to log such conversion
> errors in server logs to avoid such maintenance work. I think we
> should provide an option for where to write, at least. Also, since the
> error tables are normal user tables internally, error logs are also
> replicated to subscribers if there is a publication FOR ALL TABLES,
> unlike system catalogs. I think some users would not like such
> behavior.

save the error metadata to  system catalogs would be more expensive,
please see below explanation.
I have no knowledge of publications.
but i feel there is a feature request: publication FOR ALL TABLES
exclude regex_pattern.
Anyway, that would be another topic.

> Looking at SAVE_ERROR feature closely, I think it consists of two
> separate features. That is, it enables COPY FROM to load data while
> (1) tolerating errors and (2) logging errors to somewhere (i.e., an
> error table). If we implement only (1), it would be like COPY FROM
> tolerate errors infinitely and log errors to /dev/null. The user
> cannot see the error details but I guess it could still help some
> cases as Andres mentioned[1] (it might be a good idea to send the
> number of rows successfully loaded in a NOTICE message if some rows
> could not be loaded). Then with (2), COPY FROM can log error
> information to somewhere such as tables and server logs and the user
> can select it. So I'm thinking we may be able to implement this
> feature incrementally. The first step would be something like an
> option to ignore all errors or an option to specify the maximum number
> of errors to tolerate before raising an ERROR. The second step would
> be to support logging destinations such as server logs and tables.
>
> Regards,
>
> [1] 
> https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm%40awork3.anarazel.de
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com

> feature incrementally. The first step would be something like an
> option to ignore all errors or an option to specify the maximum number
> of errors to tolerate before raising an ERROR. The second step would

I don't think "specify the maximum number  of errors to tolerate
before raising an ERROR." is very useful

QUOTE from [1]
MAXERROR [AS] error_count
If the load returns the error_count number of errors or greater, the
load fails. If the load returns fewer errors, it continues and returns
an INFO message that states the number of rows that could not be
loaded. Use this parameter to allow loads to continue when certain
rows fail to load into the table because of formatting errors or other
inconsistencies in the data.
Set this value to 0 or 1 if you want the load to fail as soon as the
first error occurs. The AS keyword is optional. The MAXERROR default
value is 0 and the limit is 10.
The actual number of errors reported might be greater than the
specified MAXERROR because of the parallel nature of Amazon Redshift.
If any node in the Amazon Redshift cluster detects that MAXERROR has
been exceeded, each node reports all of the errors it has encountered.
END OF QUOTE

option MAXERROR error_count. iiuc, it fails while validating line
error_count + 1, else it raises a notice, tells you how many rows have
errors.

* case when error_count is small, and the copy fails, it only tells
you that at least the error_count line has malformed data. but what if
the actual malformed rows are very big. In this case, this failure
error message is not that helpful.
* case when error_count is very big, and the copy does not fail. then
the actual malformed data rows are very big (still less than
error_count). but there is no error report, you don't know which line
has an error.

Either way, if the file has a large portion of malformed rows, then
the MAXERROR option does not make sense.
so maybe we don't need a threshold for tolerating errors.

however, we can have an option, not actually copy to the table, but
only validate, similar to NOLOAD in [1]

why we save the error:
* if only a small portion of malformed rows then saving the error
metadata would be cheap.
* if a large portion of malformed rows then  copy will be slow but we
saved the error metadata. Now you can fix it based on this error
metadata.

I think saving

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-17 Thread Matthias van de Meent

On Sun, 17 Dec 2023, 21:14 Michail Nikolaev,  wrote:
>
> Hello!
>
> > I've thought about alternative solutions, too: how about getting a new 
> > snapshot every so often?
> > We don't really care about the liveness of the already-scanned data; the 
> > snapshots used for RIC
> > are used only during the scan. C/RIC's relation's lock level means vacuum 
> > can't run to clean up
> > dead line items, so as long as we only swap the backend's reported snapshot 
> > (thus xmin) while
> > the scan is between pages we should be able to reduce the time C/RIC is the 
> > one backend
> > holding back cleanup of old tuples.
>
> Hm, it looks like an interesting idea! It may be more dangerous, but
> at least it feels much more elegant than an LP_DEAD-related way.
> Also, feels like we may apply this to both phases (first and the second 
> scans).
> The original patch (1) was helping only to the second one (after call
> to set_indexsafe_procflags).
>
> But for the first scan we allowed to do so only for non-unique indexes
> because of:
>
> > * The reason for doing that is to avoid
> > * bogus unique-index failures due to concurrent UPDATEs (we might see
> > * different versions of the same row as being valid when we pass over them,
> > * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
> > * does not contain any tuples added to the table while we built the index.

Yes, for that we'd need an extra scan of the index that validates
uniqueness. I think there was a proposal (though it may only have been
an idea) some time ago, about turning existing non-unique indexes into
unique ones by validating the data. Such a system would likely be very
useful to enable this optimization.

> Also, (1) was limited to indexes without expressions and predicates
> (2) because such may execute queries to other tables (sic!).

Note that the use of such expressions would be a violation of the
function's definition; it would depend on data from other tables which
makes the function behave like a STABLE function, as opposed to the
IMMUTABLE that is required for index expressions. So, I don't think we
should specially care about being correct for incorrectly marked
function definitions.

> One possible solution is to add some checks to make sure no
> user-defined functions are used.
> But as far as I understand, it affects only CIC for now and does not
> affect the ability to use the proposed technique (updating snapshot
> time to time).
>
> However, I think we need some more-less formal proof it is safe - it
> is really challenging to keep all the possible cases in the head. I’ll
> try to do something here.

I just realised there is one issue with this design: We can't cheaply
reset the snapshot during the second table scan:
It is critically important that the second scan of R/CIC uses an index
contents summary (made with index_bulk_delete) that was created while
the current snapshot was already registered. If we didn't do that, the
following would occur:

1. The index is marked ready for inserts from concurrent backends, but
not yet ready for queries.
2. We get the bulkdelete data
3. A concurrent backend inserts a new tuple T on heap page P, inserts
it into the index, and commits. This tuple is not in the summary, but
has been inserted into the index.
4. R/CIC resets the snapshot, making T visible.
5. R/CIC scans page P, finds that tuple T has to be indexed but is not
present in the summary, thus inserts that tuple into the index (which
already had it inserted at 3)

This thus would be a logic bug, as indexes assume at-most-once
semantics for index tuple insertion; duplicate insertion are an error.

So, the "reset the snapshot every so often" trick cannot be applied in
phase 3 (the rescan), or we'd have to do an index_bulk_delete call
every time we reset the snapshot. Rescanning might be worth the cost
(e.g. when using BRIN), but that is very unlikely.

Alternatively, we'd need to find another way to prevent us from
inserting these duplicate entries - maybe by storing the scan's data
in a buffer to later load into the index after another
index_bulk_delete()? Counterpoint: for BRIN indexes that'd likely
require a buffer much larger than the result index would be.

Either way, for the first scan (i.e. phase 2 "build new indexes") this
is not an issue: we don't care about what transaction adds/deletes
tuples at that point.
For all we know, all tuples of the table may be deleted concurrently
before we even allow concurrent backends to start inserting tuples,
and the algorithm would still work as it does right now.

> Another possible issue may be caused by the new locking pattern - we
> will be required to wait for all transaction started before the ending
> of the phase to exit.

What do you mean by "new locking pattern"? We already keep an
ShareUpdateExclusiveLock on every heap table we're accessing during
R/CIC, and that should already prevent any concurrent VACUUM
operations, right?

Kind regards,

Matthias van de

Re: Is a clearer memory lifespan for outerTuple and innerTuple useful?

2023-12-17 Thread Andy Fan

Nikita Malakhov  writes:

> Hi!
>
> Maybe, the alternative way is using a separate kind of context, say name it
> 'ToastContext' for all custom data related to Toasted values? What do
> you think?

That should be a candidate. The latest research makes me think the
'detoast_values' should have the same life cycles as tts_values, so the
memory should be managed by TupleTuleSlot (rather than ExprContext) and
be handled in ExecCopySlot / ExecClearSlot stuff.

In TupleTableSlot we already have a tts_mctx MemoryContext, reusing it
needs using 'pfree' to free the detoast values and but a dedicated
memory context pays more costs on the setup, but a more efficient
MemoryContextReset. 

>
> On Sun, Dec 17, 2023 at 4:52 PM Andy Fan  wrote:
>
>  Andy Fan  writes:
>
>  > Andy Fan  writes:
>  >
>  >> ...,  I attached the 2 MemoryContext in
>  >> JoinState rather than MergeJoinState, which is for the "shared detoast
>  >> value"[0] more or less.  
>  >>
>
>  In order to delimit the scope of this discussion, I attached the 2
>  MemoryContext to MergeJoinState. Since the code was writen by Tom at
>  2005, so add Tom to the cc-list. 
>

However this patch can be discussed seperately. 

-- 
Best Regards
Andy Fan

Re: planner chooses incremental but not the best one

2023-12-17 Thread Tomas Vondra




On 12/14/23 11:02, Richard Guo wrote:
> 
> On Tue, Dec 12, 2023 at 4:40 PM Nicolas Lutic  > wrote:
> 
> I've come across a behaviour of the planner I can't explain.
> After a migration from 11 to 15 (on RDS) we noticed a degradation in
> response time on a query, it went from a few seconds to ten minutes.
> A vacuum(analyze) has been realized to be sure that all is clean.
> The 'explain analyze' shows us a change of plan. Postgresql 15 chooses
> `incremental sort` with an index corresponding to the ORDER BY clause
> (on the created_at column). The previous v11 plan used a more efficient
> index.
> 
> By deactivating incremental sort, response times in v15 are equal to
> v11
> one.
> 
> 
> I think this issue is caused by under-estimating the startup cost of
> incremental sort, which in turn is caused by over-estimating the number
> of groups with equal presorted keys by estimate_num_groups().
> 
> We can simulate the same issue with the query below.
> 
> create table t (a int, b int, c int, d int) partition by range (a);
> create table tp1 partition of t for values from (0) to (1000);
> create table tp2 partition of t for values from (1000) to (2000);
> 
> insert into t select i%2000, i%1000, i%300, i from
> generate_series(1,100)i;
> 
> create index on t(b);
> create index on t(c);
> 
> analyze t;
> 
> -- by default incremental sort is chosen
> explain analyze select * from t where b = 3 order by c, d limit 10;
>                                                                    QUERY
> PLAN
> 
>  Limit  (cost=143.03..570.89 rows=10 width=16) (actual
> time=375.399..375.402 rows=10 loops=1)
>    ->  Incremental Sort  (cost=143.03..42671.85 rows=994 width=16)
> (actual time=375.397..375.399 rows=10 loops=1)
>          Sort Key: t.c, t.d
>          Presorted Key: t.c
>          Full-sort Groups: 1  Sort Method: top-N heapsort  Average
> Memory: 25kB  Peak Memory: 25kB
>          Pre-sorted Groups: 1  Sort Method: top-N heapsort  Average
> Memory: 25kB  Peak Memory: 25kB
>          ->  Merge Append  (cost=0.85..42644.84 rows=994 width=16)
> (actual time=11.415..375.289 rows=335 loops=1)
>                Sort Key: t.c
>                ->  Index Scan using tp1_c_idx on tp1 t_1
>  (cost=0.42..21318.39 rows=498 width=16) (actual time=6.666..189.356
> rows=168 loops=1)
>                      Filter: (b = 3)
>                      Rows Removed by Filter: 171537
>                ->  Index Scan using tp2_c_idx on tp2 t_2
>  (cost=0.42..21316.50 rows=496 width=16) (actual time=4.745..185.870
> rows=168 loops=1)
>                      Filter: (b = 3)
>                      Rows Removed by Filter: 171534
>  Planning Time: 0.501 ms
>  Execution Time: 375.477 ms
> (16 rows)
> 
> -- disable incremental sort
> set enable_incremental_sort to off;
> SET
> 
> explain analyze select * from t where b = 3 order by c, d limit 10;
>                                                                QUERY PLAN
> 
>  Limit  (cost=2577.51..2577.53 rows=10 width=16) (actual
> time=2.928..2.930 rows=10 loops=1)
>    ->  Sort  (cost=2577.51..2579.99 rows=994 width=16) (actual
> time=2.925..2.927 rows=10 loops=1)
>          Sort Key: t.c, t.d
>          Sort Method: top-N heapsort  Memory: 25kB
>          ->  Append  (cost=8.28..2556.03 rows=994 width=16) (actual
> time=0.627..2.670 rows=1000 loops=1)
>                ->  Bitmap Heap Scan on tp1 t_1  (cost=8.28..1276.62
> rows=498 width=16) (actual time=0.625..1.688 rows=500 loops=1)
>                      Recheck Cond: (b = 3)
>                      Heap Blocks: exact=500
>                      ->  Bitmap Index Scan on tp1_b_idx
>  (cost=0.00..8.16 rows=498 width=0) (actual time=0.330..0.330 rows=500
> loops=1)
>                            Index Cond: (b = 3)
>                ->  Bitmap Heap Scan on tp2 t_2  (cost=8.27..1274.43
> rows=496 width=16) (actual time=0.178..0.876 rows=500 loops=1)
>                      Recheck Cond: (b = 3)
>                      Heap Blocks: exact=500
>                      ->  Bitmap Index Scan on tp2_b_idx
>  (cost=0.00..8.14 rows=496 width=0) (actual time=0.093..0.093 rows=500
> loops=1)
>                            Index Cond: (b = 3)
>  Planning Time: 0.481 ms
>  Execution Time: 3.031 ms
> (17 rows)
> 
> As we can see the execution time is 375.477 ms by default and 3.031 ms
> if we disable incremental sort.
> 
> When we calculate the cost of incremental sort, we need to estimate the
> number of groups into which the relation is divided by the presorted
> keys, and then calculate the cost of sorting a single group.  If we
> over-estimate the number of groups, the startup cost of incremental sort
> would

Re: JSON Path and GIN Questions

2023-12-17 Thread David E. Wheeler

On Dec 17, 2023, at 16:08, Tom Lane  wrote:

> I'd waited because the discussion was still active, and then it
> kind of slipped off the radar.  I'll take another look and push
> some form of what I suggested.

Right on.

> That doesn't really address the
> jsonpath oddities you were on about, though.

No, I attempted to address those in [a patch][1].

  [1]: https://commitfest.postgresql.org/45/4624/

Best,

David

Re: How to get started with contribution

2023-12-17 Thread Sukhbir Singh

Thanks a lot for sharing. I'm excited about the opportunity to contribute
to your organization and apply my skills to your project, making a valuable
impact.

On Sun, 17 Dec, 2023, 10:04 pm Julien Rouhaud,  wrote:

> Hi,
>
> On Sun, Dec 17, 2023 at 03:09:10PM +, Sukhbir Singh wrote:
> >
> > I am Sukhbir Singh, a B.Tech undergraduate, and I am in my pre-final
> year at
> > Punjab Engineering College (PEC). I am new to open source but know
> Python,
> > PostgreSQL, C, Javascript and node.js. I would love to contribute to your
> > organization, but could you please tell me how to get started?
>
> You can look at
> https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F, it's
> a good
> starting point with many links to additional resources.
>

Re: planner chooses incremental but not the best one

2023-12-17 Thread Tom Lane

Tomas Vondra  writes:
> Yeah, seems like that's the right thing to do. FWIW I've been often
> confused by these fields, because we use tuples and rows as synonyms,
> but in this particular case that's not the same. I wonder if this is
> just a manifestation of this confusion.

For tables, one is the raw number of rows on-disk and the other is the
number of rows predicted to pass the relation's quals.  For something
like an appendrel that doesn't enforce any quals (today anyway), they
should probably be the same; but you need to be sure you're adding
up the right numbers from the inputs.

regards, tom lane

Re: planner chooses incremental but not the best one

2023-12-17 Thread Tomas Vondra




On 12/15/23 09:58, Richard Guo wrote:
> 
> On Thu, Dec 14, 2023 at 6:02 PM Richard Guo  > wrote:
> 
> It seems that we need to improve estimate of distinct values in
> estimate_num_groups() when taking the selectivity of restrictions into
> account.
> 
> In 84f9a35e3 we changed to a new formula to perform such estimation.
> But that does not apply to the case here, because for an appendrel,
> set_append_rel_size() always sets "raw tuples" count equal to "rows",
> and that would make estimate_num_groups() skip the adjustment of the
> estimate using the new formula.
> 
> 
> I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
> Why don't we set it to the accumulated estimate of tuples from each live
> child, like attached?  I believe this aligns more closely with reality.
> 

Yeah, seems like that's the right thing to do. FWIW I've been often
confused by these fields, because we use tuples and rows as synonyms,
but in this particular case that's not the same. I wonder if this is
just a manifestation of this confusion.

> And this would also allow us to adjust the estimate for the number of
> distinct values in estimate_num_groups() for appendrels using the new
> formula introduced in 84f9a35e3.

I don't follow. Why wouldn't it be using the new formula even without
your patch? (using the "wrong" value, ofc, but the same formula).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: brininsert optimization opportunity

2023-12-17 Thread Tomas Vondra



On 12/13/23 09:12, Soumyadeep Chakraborty wrote:
> On Tue, Dec 12, 2023 at 3:25 AM Tomas Vondra
>  wrote:
> 
> 
>> The attached 0001 patch fixes this by adding the call to validate_index,
> which seems like the proper place as it's where the indexInfo is
> allocated and destroyed.
> 
> Yes, and by reading validate_index's header comment, there is a clear
> expectation that we will be adding tuples to the index in the table AM call
> table_index_validate_scan (albeit "validating" doesn't necessarily convey this
> side effect). So, it makes perfect sense to call it here.
> 

OK

> 
>> But this makes me think - are there any other places that might call
>> index_insert without then also doing the cleanup? I did grep for the
>> index_insert() calls, and it seems OK except for this one.
>>
>> There's a call in toast_internals.c, but that seems OK because that only
>> deals with btree indexes (and those don't need any cleanup). The same
>> logic applies to unique_key_recheck(). The rest goes through
>> execIndexing.c, which should do the cleanup in ExecCloseIndices().
>>
>> Note: We should probably call the cleanup even in the btree cases, even
>> if only to make it clear it needs to be called after index_insert().
> 
> Agreed. Doesn't feel great, but yeah all of these btree specific code does 
> call
> through index_* functions, instead of calling btree functions directly. So,
> ideally we should follow through with that pattern and call the cleanup
> explicitly. But we are introducing per-tuple overhead that is totally wasted.

I haven't tried but I very much doubt this will be measurable. It's just
a trivial check if a pointer is NULL. We do far more expensive stuff in
this code path.

> Maybe we can add a comment instead like:
> 
> void
> toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
> {
> int i;
> 
> /*
> * Save a few cycles by choosing not to call index_insert_cleanup(). Toast
> * indexes are btree, which don't have a aminsertcleanup() anyway.
> */
> 
> /* Close relations and clean up things */
> ...
> }
> 
> And add something similar for unique_key_recheck()? That said, I am also not
> opposed to just accepting these wasted cycles, if the commenting seems wonky.
> 

I really don't want to do this sort of stuff unless we know it actually
saves something.

>> I propose we do a much simpler thing instead - allow the cache may be
>> initialized / cleaned up repeatedly, and make sure it gets reset at
>> convenient place (typically after index_insert calls that don't go
>> through execIndexing). That'd mean the cleanup does not need to happen
>> very far from the index_insert(), which makes the reasoning much easier.
>> 0002 does this.
> 
> That kind of goes against the ethos of the ii_AmCache, which is meant
> to capture state to be used across multiple index inserts.

Why would it be against the ethos? The point is that we reuse stuff over
multiple index_insert() calls. If we can do that for all inserts, cool.
But if that's difficult, it's maybe better to cache for smaller batches
of inserts (instead of making it more complex for all index AMs, even
those not doing any caching).

> Also, quoting the current docs:
> 
> "If the index AM wishes to cache data across successive index insertions
> within an SQL statement, it can allocate space
> in indexInfo-ii_Context and store a pointer to the
> data in indexInfo-ii_AmCache (which will be NULL
> initially). After the index insertions complete, the memory will be freed
> automatically. If additional cleanup is required (e.g. if the cache contains
> buffers and tuple descriptors), the AM may define an optional function
> indexinsertcleanup, called before the memory is released."
> 
> The memory will be freed automatically (as soon as ii_Context goes away). So,
> why would we explicitly free it, like in the attached 0002 patch? And the 
> whole
> point of the cleanup function is to do something other than freeing memory, as
> the docs note highlights so well.
> 

Not sure I follow. The whole reason for having the index_insert_cleanup
callback is we can't rely on ii_Context going away, exactly because that
just throws away the memory and we need to release buffers etc.

The only reason why the 0002 patch does pfree() is that it clearly
indicates whether the cache is initialized. We could have a third state
"allocated but not initialized", but that doesn't seem worth it.

If you're saying the docs are misleading in some way, then maybe we need
to clarify that.

> Also, the 0002 patch does introduce per-tuple function call overhead in
> heapam_index_validate_scan().
> 

How come? The cleanup is done only once after the scan completes. Isn't
that exactly what we want to do?

> Also, now we have split brain...in certain situations we want to call
> index_insert_cleanup() per index insert and in certain cases we would like it
> called once for all inserts. Not very easy to understand IMO.
> 
> Finally, I don't think that any index AM would

Re: Is a clearer memory lifespan for outerTuple and innerTuple useful?

2023-12-17 Thread Nikita Malakhov

Hi!

Maybe, the alternative way is using a separate kind of context, say name it
'ToastContext' for all custom data related to Toasted values? What do you
think?

On Sun, Dec 17, 2023 at 4:52 PM Andy Fan  wrote:

>
> Andy Fan  writes:
>
> > Andy Fan  writes:
> >
> >> ...,  I attached the 2 MemoryContext in
> >> JoinState rather than MergeJoinState, which is for the "shared detoast
> >> value"[0] more or less.
> >>
>
> In order to delimit the scope of this discussion, I attached the 2
> MemoryContext to MergeJoinState. Since the code was writen by Tom at
> 2005, so add Tom to the cc-list.
>
>
> --
> Best Regards
> Andy Fan
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: XID formatting and SLRU refactorings

2023-12-17 Thread Thomas Munro

On Tue, Nov 28, 2023 at 11:14 PM Heikki Linnakangas  wrote:
> I think it's pretty sloppy that the "short" filenames can be 4, 5 or 6
> chars long. For pg_multixact/members, which introduced the 5-char case,
> I think we should always pad the filenames 5 characters, and for
> commit_ts which introduced the 6 char case, always pad to 6 characters.
>
> Instead of a "long_segment_names" boolean, how about an integer field,
> to specify the length.
>
> That means that we'll need pg_upgrade to copy pg_multixact/members files
> under the new names. That should be pretty straightforward.

Do you think it could be useful if the file names were not sequential
numbers ..., ...0001, ...0002 but instead used the 64 bit 'index'
number for the contained data?  In the cases where the index is an
fxid, such as pg_xact, pg_serial etc that seems easy to grok, and for
the multixacts or notify it's a bit more arbitrary but that's not
worse (and it is perhaps even meaningful, number of multixacts etc).
For example, pg_serial holds a uint64_t for every xid, so that's 32768
= 0x8000 xids per 256kB file, and you might see the following files on
disk:

8000
0001

... so that it's very clear what fxid ranges are being held.  It might
also make the file unlinking logic more straightforward in the
non-modulo cases (not sure).  Of course you can work it out with
simple arithmetic but I wonder if human administrators who don't have
a C-level understanding of PostgreSQL would find this scheme more
cromulent when trying to understand, quickly, whether the system is
retaining expected data.

(Assuming we actually get the indexes to be 64 bit in the first place.
I started thinking/hacking around how to do that for the specific case of
pg_serial because it's [by its own admission] a complete mess right now,
and I was investigating its disk usage, see nearby thread, but then
I found my way here and realised I'm probably duplicating work that's
already been/being done so I'm trying to catch up here... forgive me if
the above was already covered, so many messages...)

Re: JSON Path and GIN Questions

2023-12-17 Thread Tom Lane

"David E. Wheeler"  writes:
> Hey Tom, are you still thinking about adding this bit to the docs? I took a 
> quick look at master and didn’t see it there.

I'd waited because the discussion was still active, and then it
kind of slipped off the radar.  I'll take another look and push
some form of what I suggested.  That doesn't really address the
jsonpath oddities you were on about, though.

regards, tom lane

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-17 Thread Michail Nikolaev

Hello!

> I've thought about alternative solutions, too: how about getting a new 
> snapshot every so often?
> We don't really care about the liveness of the already-scanned data; the 
> snapshots used for RIC
> are used only during the scan. C/RIC's relation's lock level means vacuum 
> can't run to clean up
> dead line items, so as long as we only swap the backend's reported snapshot 
> (thus xmin) while
> the scan is between pages we should be able to reduce the time C/RIC is the 
> one backend
> holding back cleanup of old tuples.

Hm, it looks like an interesting idea! It may be more dangerous, but
at least it feels much more elegant than an LP_DEAD-related way.
Also, feels like we may apply this to both phases (first and the second scans).
The original patch (1) was helping only to the second one (after call
to set_indexsafe_procflags).

But for the first scan we allowed to do so only for non-unique indexes
because of:

> * The reason for doing that is to avoid
> * bogus unique-index failures due to concurrent UPDATEs (we might see
> * different versions of the same row as being valid when we pass over them,
> * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
> * does not contain any tuples added to the table while we built the index.

Also, (1) was limited to indexes without expressions and predicates
(2) because such may execute queries to other tables (sic!).
One possible solution is to add some checks to make sure no
user-defined functions are used.
But as far as I understand, it affects only CIC for now and does not
affect the ability to use the proposed technique (updating snapshot
time to time).

However, I think we need some more-less formal proof it is safe - it
is really challenging to keep all the possible cases in the head. I’ll
try to do something here.
Another possible issue may be caused by the new locking pattern - we
will be required to wait for all transaction started before the ending
of the phase to exit.

[1]: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql
[2]: 
https://www.postgresql.org/message-id/flat/CAAaqYe_tq_Mtd9tdeGDsgQh%2BwMvouithAmcOXvCbLaH2PPGHvA%40mail.gmail.com#cbe3997b75c189c3713f243e25121c20

GIN-Indexable JSON Patterns

2023-12-17 Thread David E. Wheeler

Hey Hackers,

Quick follow-up to my slew of questions back in [September][1]. I wanted to 
update [my patch][2] to note that only JSON Path equality operators are 
supported by indexes, as [previously discussed][3]. I thought perhaps adding a 
note to this bit of the docs would be useful:

> For these operators, a GIN index extracts clauses of the form accessors_chain 
> = constant out of the jsonpath pattern, and does the index search based on 
> the keys and values mentioned in these clauses. The accessors chain may 
> include .key, [*], and [index] accessors. The jsonb_ops operator class also 
> supports .* and .** accessors, but the jsonb_path_ops operator class does not.

But perhaps that’s what `accessors_chain = constant` is supposed to mean? I’m 
not super clear on it, though, since the operator is `==` and not `=` (and I 
would presume that `!=` would use the index, as well. Is that correct?

If so, how would you feel about something like this?

--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -513,7 +513,7 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ 
'$.tags[*] == "qui"';
 
 For these operators, a GIN index extracts clauses of the form
 accessors_chain
-= constant out of
+== constant out of
 the jsonpath pattern, and does the index search based on
 the keys and values mentioned in these clauses.  The accessors chain
 may include .key,
@@ -522,6 +522,9 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ 
'$.tags[*] == "qui"';
 The jsonb_ops operator class also
 supports .* and .** accessors,
 but the jsonb_path_ops operator class does not.
+Only the == and != SQL/JSON Path Operators
+can use the index.
   
 
   

Best,

David

  [1]: 
https://www.postgresql.org/message-id/15dd78a5-b5c4-4332-acfe-55723259c...@justatheory.com
  [2]: https://commitfest.postgresql.org/45/4624/
  [3]: 
https://www.postgresql.org/message-id/973d6495-cf28-4d06-7d46-758bd2615...@xs4all.nl
  [4]: https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING

Re: JSON Path and GIN Questions

2023-12-17 Thread David E. Wheeler

On Sep 17, 2023, at 18:09, David E. Wheeler  wrote:

> I think this is useful, but also that it’s worth calling out explicitly that 
> functions do not count as indexable operators. True by definition, of course, 
> but I at least had assumed that since an operator is, in a sense, syntax 
> sugar for a function call, they are in some sense the same thing.
> 
> A header might be useful, something like “What Counts as an indexable 
> expression”.

Hey Tom, are you still thinking about adding this bit to the docs? I took a 
quick look at master and didn’t see it there.

Thanks,

David

Re: How to get started with contribution

2023-12-17 Thread Tomas Vondra

On 12/17/23 17:34, Julien Rouhaud wrote:
> Hi,
> 
> On Sun, Dec 17, 2023 at 03:09:10PM +, Sukhbir Singh wrote:
>>
>> I am Sukhbir Singh, a B.Tech undergraduate, and I am in my pre-final year at
>> Punjab Engineering College (PEC). I am new to open source but know Python,
>> PostgreSQL, C, Javascript and node.js. I would love to contribute to your
>> organization, but could you please tell me how to get started?
> 
> You can look at
> https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F, it's a 
> good
> starting point with many links to additional resources.
> 

Yeah, there's also a Developer FAQ (linked from that wiki page) [1],
discussing various steps in the development process in more detail.

But I think all of this ignores a step that's fairly hard for new
contributors - finding what to work on. Without that, knowing the
technical stuff is a bit pointless :-(

Sukhbir, do you have any idea what you'd like to work on? General area
of interest, tool you'd be interested in, etc.? Maybe take a look at
what other people work on in the commitfest app [2], and review a couple
patches that you find interesting.

That'll allow you to setup the environment, get familiar with the core
and various development tasks (running tests, ..).

If you get stuck. ask in this thread and we'll try to help you.

[1] https://wiki.postgresql.org/wiki/Developer_FAQ

[2] https://commitfest.postgresql.org/46/

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Report planning memory in EXPLAIN ANALYZE

2023-12-17 Thread Alvaro Herrera

OK, I propose the following further minor tweaks.  (I modified the docs
following the wording we have for COSTS and BUFFERS).

There are two things that still trouble me a bit.  First, we assume that
the planner is using an AllocSet context, which I guess is true, but if
somebody runs the planner in a context of a different memcxt type, it's
going to be a problem.  So far we don't have infrastructure for creating
a context of the same type as another context.  Maybe it's too fine a
point to worry about, for sure.

The other question is about trying to support the EXPLAIN EXECUTE case.
Do you find that case really useful?  In a majority of cases planning is
not going to happen because it was already done by PREPARE (where we
_don't_ report memory, because we don't have EXPLAIN there), so it seems
a bit weird.  I suppose you could make it useful if you instructed the
user to set plan_cache_mode to custom, assuming that does actually work
(I didn't try).

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 49c61cce69..26809d68d5 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -256,8 +256,9 @@ ROLLBACK;
 
  
   Include information on memory consumption by the query planning phase.
-  Report the precise amount of storage used by planner in-memory
-  structures, and total memory considering allocation overhead.
+  Specifically, include the precise amount of storage used by planner
+  in-memory structures, as well as total memory considering allocation
+  overhead.
   The parameter defaults to FALSE.
  
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 0a18a7abd8..2fa93998a3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -660,7 +660,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, 
ExplainState *es,
ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 
3, es);
}
 
-   if (mem_counts)
+   if (es->memory && mem_counts != NULL)
{
ExplainOpenGroup("Planner Memory", "Planner Memory", true, es);
show_planner_memory(es, mem_counts);
@@ -3813,9 +3813,9 @@ show_planner_memory(ExplainState *es,
if (es->format == EXPLAIN_FORMAT_TEXT)
{
appendStringInfo(es->str,
-"Planner Memory: used=%zu 
bytes allocated=%zu bytes",
-mem_counts->totalspace - 
mem_counts->freespace,
-mem_counts->totalspace);
+"Planner Memory: used=%lld 
bytes allocated=%lld bytes",
+(long long) 
(mem_counts->totalspace - mem_counts->freespace),
+(long long) 
mem_counts->totalspace);
appendStringInfoChar(es->str, '\n');
}
else
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 7e947e023b..e1577c791a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -590,9 +590,12 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause 
*into, ExplainState *es,
if (es->memory)
{
/*
-* Create a new memory context to measure planner's memory 
consumption
-* accurately. We should use the same type of memory context as 
the
-* planner would use. That's usually AllocSet but ensure that.
+* In order to measure planner's memory consumption accurately, 
create
+* a separate AllocSet memory context.
+*
+* XXX if planner is called under some other memory context 
type, this
+* code overrides that.  Do we need support to create context of
+* programmatically determined type?
 */
Assert(IsA(CurrentMemoryContext, AllocSetContext));
planner_ctx = AllocSetContextCreate(CurrentMemoryContext,

Re: How to get started with contribution

2023-12-17 Thread Julien Rouhaud

Hi,

On Sun, Dec 17, 2023 at 03:09:10PM +, Sukhbir Singh wrote:
>
> I am Sukhbir Singh, a B.Tech undergraduate, and I am in my pre-final year at
> Punjab Engineering College (PEC). I am new to open source but know Python,
> PostgreSQL, C, Javascript and node.js. I would love to contribute to your
> organization, but could you please tell me how to get started?

You can look at
https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F, it's a good
starting point with many links to additional resources.

How to get started with contribution

2023-12-17 Thread Sukhbir Singh

Respected sir/madam,
I am Sukhbir Singh, a B.Tech undergraduate, and I am in my pre-final year at 
Punjab Engineering College (PEC). I am new to open source but know Python, 
PostgreSQL, C, Javascript and node.js. I would love to contribute to your 
organization, but could you please tell me how to get started?
Hoping to hear from you soon.
Regards
Sukhbir

Re: XID formatting and SLRU refactorings

2023-12-17 Thread Alexander Korotkov

On Sun, Dec 17, 2023 at 1:48 AM Thomas Munro  wrote:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach=2023-12-16%2005%3A25%3A18
>
> TRAP: failed Assert("epoch > 0"), File: "twophase.c", Line: 969, PID: 71030
> 0xa8edcd  at
> /usr/home/pgbf/buildroot/HEAD/pgsql.build/tmp_install/home/pgbf/buildroot/HEAD/inst/bin/postgres
> 0x613863  at
> /usr/home/pgbf/buildroot/HEAD/pgsql.build/tmp_install/home/pgbf/buildroot/HEAD/inst/bin/postgres
>
> That's the new assertion from 5a1dfde8:
>
> + * The wrap logic is safe here because the span of active xids cannot
> exceed one
> + * epoch at any given time.
> ...
> +   if (unlikely(xid > nextXid))
> +   {
> +   /* Wraparound occured, must be from a prev epoch. */
> +   Assert(epoch > 0);

Thank you for noticing this.  I did some investigations.
AdjustToFullTransactionId() uses TransamVariables->nextXid to convert
TransactionId into FullTransactionId.  However,
ProcArrayApplyRecoveryInfo() first checks two phase transactions then
updates  TransamVariables->nextXid.  Please, see the draft patch
fixing this.  I'll do further check if it has some side-effects.

--
Regards,
Alexander Korotkov

fix_ProcArrayApplyRecoveryInfo_update_nextXid.patch
Description: Binary data

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-17 Thread Hayato Kuroda (Fujitsu)

Dear Thomas, Alexander,

> 17.12.2023 07:02, Thomas Munro wrote:
> > FYI fairywren failed in this test:
> >
> >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2023-1
> 2-16%2022%3A03%3A06
> >
> > ===8<===
> > Restoring database schemas in the new cluster
> > *failure*
> >
> > Consult the last few lines of
> >
> "C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgr
> ade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgra
> de_output.d/20231216T221418.035/log/pg_upgrade_dump_1.log"
> > for
> > the probable cause of the failure.
> > Failure, exiting
> > [22:14:34.598](22.801s) not ok 10 - run of pg_upgrade of old cluster
> > [22:14:34.600](0.001s) #   Failed test 'run of pg_upgrade of old cluster'
> > #   at
> C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/
> 003_logical_slots.pl
> > line 177.
> > ===8<===
> >
> > Without that log it might be hard to figure out what went wrong though :-/
> >
> 
> Yes, but most probably it's the same failure as
>  

Thanks for reporting. Yes, it has been already reported by me [1], and the 
server
log was provided by Andrew [2]. The issue was that a file creation was failed
because the same one was unlink()'d just before but it was in 
STATUS_DELETE_PENDING
status. Kindly Alexander proposed a fix [3] and it looks good to me, but
confirmations by senior and windows-friendly developers are needed to move 
forward.
(at first we thought the issue was solved by updating, but it was not correct)

I know that you have developed there region, so I'm very happy if you check the
forked thread.

[1]: 
https://www.postgresql.org/message-id/flat/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: 
https://www.postgresql.org/message-id/TYAPR01MB5866A4E7342088E91362BEF0F5BBA%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: 
https://www.postgresql.org/message-id/976479cf-dd66-ca19-f40c-5640e30700cb%40gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Is a clearer memory lifespan for outerTuple and innerTuple useful?

2023-12-17 Thread Andy Fan


Andy Fan  writes:

> Andy Fan  writes:
>
>> ...,  I attached the 2 MemoryContext in
>> JoinState rather than MergeJoinState, which is for the "shared detoast
>> value"[0] more or less.  
>>

In order to delimit the scope of this discussion, I attached the 2
MemoryContext to MergeJoinState. Since the code was writen by Tom at
2005, so add Tom to the cc-list. 

>From 20068bdda00716da712fb3f1e554401e81924e19 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" 
Date: Sun, 17 Dec 2023 21:41:34 +0800
Subject: [PATCH v2 1/1] Use MemoryContext instead of ExprContext for
 nodeMergejoin.c

MergeJoin needs to store the values in MergeJoinClause for a longer
lifespan, in the past it uses a dedicate ExprContext, however a
MemoryContext should be OK. This patch does this.
---
 src/backend/executor/nodeMergejoin.c | 29 +---
 src/include/nodes/execnodes.h|  5 +++--
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 3cdab77dfc..4d45305482 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -294,7 +294,7 @@ MJExamineQuals(List *mergeclauses,
 static MJEvalResult
 MJEvalOuterValues(MergeJoinState *mergestate)
 {
-	ExprContext *econtext = mergestate->mj_OuterEContext;
+	ExprContext *econtext = mergestate->js.ps.ps_ExprContext;
 	MJEvalResult result = MJEVAL_MATCHABLE;
 	int			i;
 	MemoryContext oldContext;
@@ -303,9 +303,9 @@ MJEvalOuterValues(MergeJoinState *mergestate)
 	if (TupIsNull(mergestate->mj_OuterTupleSlot))
 		return MJEVAL_ENDOFJOIN;
 
-	ResetExprContext(econtext);
+	MemoryContextReset(mergestate->mj_outerTuple_memory);
 
-	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+	oldContext = MemoryContextSwitchTo(mergestate->mj_outerTuple_memory);
 
 	econtext->ecxt_outertuple = mergestate->mj_OuterTupleSlot;
 
@@ -341,7 +341,7 @@ MJEvalOuterValues(MergeJoinState *mergestate)
 static MJEvalResult
 MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
 {
-	ExprContext *econtext = mergestate->mj_InnerEContext;
+	ExprContext *econtext = mergestate->js.ps.ps_ExprContext;
 	MJEvalResult result = MJEVAL_MATCHABLE;
 	int			i;
 	MemoryContext oldContext;
@@ -350,9 +350,9 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
 	if (TupIsNull(innerslot))
 		return MJEVAL_ENDOFJOIN;
 
-	ResetExprContext(econtext);
+	MemoryContextReset(mergestate->mj_innerTuple_memory);
 
-	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+	oldContext = MemoryContextSwitchTo(mergestate->mj_innerTuple_memory);
 
 	econtext->ecxt_innertuple = innerslot;
 
@@ -1447,6 +1447,7 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
 	TupleDesc	outerDesc,
 innerDesc;
 	const TupleTableSlotOps *innerOps;
+	ExprContext *econtext;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -1471,13 +1472,19 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
 	 */
 	ExecAssignExprContext(estate, >js.ps);
 
+	econtext = mergestate->js.ps.ps_ExprContext;
+
 	/*
-	 * we need two additional econtexts in which we can compute the join
-	 * expressions from the left and right input tuples.  The node's regular
-	 * econtext won't do because it gets reset too often.
+	 * we need two additional contexts in which we can compute the join
+	 * expressions from the left and right input tuples.  The econtext's
+	 * regular ecxt_per_tuple_memory won't do because it gets reset too often.
 	 */
-	mergestate->mj_OuterEContext = CreateExprContext(estate);
-	mergestate->mj_InnerEContext = CreateExprContext(estate);
+	mergestate->mj_outerTuple_memory = AllocSetContextCreate(econtext->ecxt_per_query_memory,
+			 "OuterTupleCtx",
+			 ALLOCSET_SMALL_SIZES);
+	mergestate->mj_innerTuple_memory = AllocSetContextCreate(econtext->ecxt_per_query_memory,
+			 "InnerTupleCtx",
+			 ALLOCSET_SMALL_SIZES);
 
 	/*
 	 * initialize child nodes
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5d7f17dee0..e8af959c01 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2064,8 +2064,9 @@ typedef struct MergeJoinState
 	TupleTableSlot *mj_MarkedTupleSlot;
 	TupleTableSlot *mj_NullOuterTupleSlot;
 	TupleTableSlot *mj_NullInnerTupleSlot;
-	ExprContext *mj_OuterEContext;
-	ExprContext *mj_InnerEContext;
+
+	MemoryContext mj_outerTuple_memory;
+	MemoryContext mj_innerTuple_memory;
 } MergeJoinState;
 
 /* 
-- 
2.34.1


-- 
Best Regards
Andy Fan

44 matches

Mail list logo