RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2021-01-05 Thread movead...@highgo.ca
Thanks for review, and sorry for reply so later.

>I reviewed the patch and found some problems. 
>>+ if(startSegNo != endSegNo)
>>+ else if(record->ReadRecPtr / XLOG_BLCKSZ !=
>>+ if(rmid == RM_XLOG_ID && info == XLOG_SWITCH)
>>+ if(ri == RM_XLOG_ID)
>>+ if(info == XLOG_SWITCH)
>You need to put a space after the "if".
All fix and thanks for point the issue. 

>>@@ -24,6 +24,7 @@
>>#include "common/logging.h"
>>#include "getopt_long.h"
>>#include "rmgrdesc.h"
>>+#include "catalog/pg_control.h"
>I think the include statements should be arranged in alphabetical order.
Fix.

>>+ info = (rj << 4) & ~XLR_INFO_MASK;
>>+ if(info == XLOG_SWITCH)
>>+ XLogDumpStatsRow(psprintf("XLOG/SWITCH_JUNK"),
>>+ 0, total_count, stats->junk_size, total_rec_len,
>>+ 0, total_fpi_len, stats->junk_size, total_len);
 
>Can't be described in the same way as "XLogDumpStatsRow(psprintf("%s/%s", 
>desc->rm_name, id)..."?
>Only this part looks strange.
>Why are the "count" and "fpi_len" fields 0?
The 'SWITCH_JUNK' is not a real record and it relys on  'XLOG_SWITCH' record, 
so I think we can't count
'SWITCH_JUNK', so the "count" is 0. And it never contain FPI, so the "fpi_len" 
is 0.

But 0 value maybe looks strange, so in current version I show it like below:
Type N (%) Record size (%) FPI size (%) Combined size (%)
 - --- --- ---  --- - ---
...
XLOG/SWITCH_JUNK   - ( -) 11006248   ( 
72.26)   -   ( -)11006248 ( 65.78)
Transaction/COMMIT 10(  0.03)  340 (  
0.00)0   (  0.00)  340   (  0.00)

>I think you need to improve the duplicate output in column "XLOG/SWITCH_JUNK".
Yes it's a bug and fixed.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



fix_waldump_size_for_wal_switch_v6.patch
Description: Binary data


Re: Asynchronous Append on postgres_fdw nodes.

2020-11-25 Thread movead...@highgo.ca

I test the patch and occur several issues as blow:

Issue one:
Get a Assert error at 'Assert(bms_is_member(i, node->as_needrequest));' in
ExecAppendAsyncRequest() function when I use more than two foreign table
on different foreign server.

I research the code and do such change then the Assert problom disappear.

@@ -1004,6 +1004,7 @@ ExecAppendAsyncResponse(AsyncRequest *areq) 
bms_del_member(node->as_needrequest, areq->request_index); 
node->as_asyncpending = bms_add_member(node->as_asyncpending, 
areq->request_index); + node->as_lastasyncplan = INVALID_SUBPLAN_INDEX; return 
false; }

Issue two:
Then I test and find if I have sync subplan and async sunbplan, it will run over
the sync subplan then the async turn, I do not know if it is intent.

Issue three:
After code change mentioned in the Issue one, I can not get performance 
improvement.
I query on partitioned table and all sub-partition the time spent on 
partitioned table
always same as the sum of all sub-partition.

Sorry if I have something wrong when test the patch.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca


Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2020-10-16 Thread movead...@highgo.ca

>It looks to me "We can know that length by subtracting the LSN of
>XLOG_SWITCH from the next record's LSN so it doesn't add any
>information."
Sorry,maybe I miss this before.
But I think it will be better to show it clearly.

>So the length of  in this case is:
> 
>LOC(SEG A+1) - ReadRecPtr - LEN(short header) - LEN(XLOG_SWITCH)
Thanks, I should not have missed this and fixed.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


fix_waldump_size_for_wal_switch_v5.patch
Description: Binary data


Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2020-10-16 Thread movead...@highgo.ca
Thanks for all the suggestion, and new patch attached.

>Andres suggested that we don't need that description with per-record
>basis. Do you have a reason to do that?  (For clarity, I'm not
>suggesting that you should reving it.)
I think Andres is saying if we just log it in xlog_desc() then we can not get
the result for '--stats=record' case. And the patch solve the problem.

>+ XLByteToSeg(record->EndRecPtr - 1, endSegNo, record->segcxt.ws_segsize);
>We use XLByteToPrevSeg instead for this purpose.
Thanks and follow the suggestion.

>You forgot to add a correction for short headers.
Infact, we need to consider this matter when the remain size of a wal
segment can not afford a XLogRecord struct for XLOG_SWITCH. 
It's mean that if record->ReadRecPtr is on A wal segment, then
record->EndRecPtr is on (A+2) wal segment. So we need to minus
the longpagehead size on (A+1) wal segment.
In my thought we need not to care the short header, if my understand
is wrong?

>+ if(RM_XLOG_ID == rmid && XLOG_SWITCH == info)
> 
>We need a comment for the special code path.
>We don't follow this kind of convension. Rather use "variable =
>constant".
>+ {
>+ junk_len = xlog_switch_junk_len(record);
>+ stats->count_xlog_switch++;
>+ stats->junk_size += junk_len;
> 
>junk_len is used only the last line above.  We don't need that
>temporary variable.
> 
>+ * If the wal switch record spread on two segments, we should extra minus the
>This comment is sticking out of 80-column border.  However, I'm not
>sure if we have reached a conclustion about the target console-width.
>+ info = (rj << 4) & ~XLR_INFO_MASK;
>+ switch_junk_id = "XLOG/SWITCH_JUNK";
>+ if( XLOG_SWITCH == info && stats->count_xlog_switch > 0)
> 
>This line order is strange. At least switch_junk_id is used only in
>the if-then block.
Thanks and follow the suggestions.

 
>I'm not confindent on which is better, but I feel that this is not a
>work for display side, but for the recorder side like attached.
The patch really seems more clearly, but the new 'OTHERS' may confuse
the users and we hard to handle it with '--rmgr=RMGR' option. So I have
not use this design in this patch, let's see other's opinion.

>Sorry for the confusion, but it would be a separate topic even if we
>are going to fix that..
Sorry, I remove the code, make sense we should discuss it in a
separate topic.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


fix_waldump_size_for_wal_switch_v4.patch
Description: Binary data


Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2020-10-15 Thread movead...@highgo.ca
Thanks for all the suggestions.

>Yeah.  In its current shape, it means that only pg_waldump would be
>able to know this information.  If you make this information part of
>xlogdesc.c, any consumer of the WAL record descriptions would be able
>to show this information, so it would provide a consistent output for
>any kind of tools.
I have change the implement, move some code into xlog_desc().

>On top of that, it seems to me that the calculation used in the patch
>is wrong in two aspects at quick glance:
>1) startSegNo and endSegNo point always to two different segments with
>a XLOG_SWITCH record, so you should check that ReadRecPtr is not at a
>segment border instead before extracting SizeOfXLogLongPHD, no?
Yes you are right, and I use 'record->EndRecPtr - 1' instead.

>2) This stuff should also check after the case of a WAL *page* border
>where you'd need to adjust based on SizeOfXLogShortPHD instead.
The 'junk_len -= SizeOfXLogLongPHD' code is considered for when the
remain size of a wal segment can not afford a XLogRecord struct for
XLOG_SWITCH, it needn't care *page* border.

>I'm not sure the exact motive of this proposal, but if we show the
>wasted length in the stats result, I think it should be other than
>existing record types.
Yes agree, and now it looks like below as new patch:

movead@movead-PC:/h2/pg/bin$ ./pg_waldump -p ../walbk/ -s 0/300 -e 
0/600 -z
Type   N  (%)  Record size  
(%) FPI size  (%)Combined size  (%)
   -  ---  ---  
---   ----  ---
XLOG   5 ( 31.25)  300 
(  0.00)0 (  0.00)  300 (  0.00)
XLOGSwitchJunk 3 ( 18.75) 50330512 
(100.00)0 (  0.00) 50330512 (100.00)


movead@movead-PC:/h2/pg/bin$ ./pg_waldump -p ../walbk/ -s 0/300 -e 
0/600 --stat=record
XLOG/SWITCH3 ( 18.75)   72 
(  0.00)0 (  0.00)   72 (  0.00)
XLOG/SWITCH_JUNK   3 ( 18.75) 50330512 
(100.00)0 (  0.00) 50330512 (100.00)

The shortcoming now is I do not know how to handle the 'count' of SWITCH_JUNK
in pg_waldump results. Currently I regard SWITCH_JUNK as one count.

>By the way, I noticed that --stats=record shows two lines for
>Transaction/COMMIT. The cause is that XLogDumpCountRecord assumes the
>all four bits of xl_info is used to identify record id.
Thanks,I didn't notice it before, and your patch added into v3 patch attached.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


fix_waldump_size_for_wal_switch_v3.patch
Description: Binary data


Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2020-10-12 Thread movead...@highgo.ca

Thanks for reply.

>If you wish to add more information about a XLOG_SWITCH record, I
>don't think that changing the signature of XLogDumpRecordLen() is
>adapted because the record length of this record is defined as
>Horiguchi-san mentioned upthread, and the meaning of junk_len is
>confusing here.  It seems to me that any extra information should be
>added to xlog_desc() where there should be an extra code path for
>(info == XLOG_SWITCH).  XLogReaderState should have all the
>information you are lookng for.
We have two places to use the 'junk_len', one is when we show the 
detail record information, another is when we statistics the percent
of all kind of wal record kinds(by --stat=record). The second place
will not run the xlog_desc(), so it's not a good chance to do the thing.

I am still can not understand why it can't adapted to change the
signature of XLogDumpRecordLen(), maybe we can add a new function
to caculate the 'junk_len' and rename the 'junk_len' as 'skiped_size' or
'switched_size'?




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2020-10-10 Thread movead...@highgo.ca

>I think that the length of the XLOG_SWITCH record is no other than 24
>bytes. Just adding the padding? garbage bytes to that length doesn't
>seem the right thing to me.
>
>If we want pg_waldump to show that length somewhere, it could be shown
>at the end of that record explicitly:
> 
>rmgr: XLOGlen (rec/tot): 24/16776848, tx:  0, lsn: 
>0/02000148, prev 0/02000110, desc: SWITCH, trailing-bytes: 16776944

Thanks, I think it's good idea, and new patch attached.

Here's the lookes:
rmgr: XLOGlen (rec/tot): 24/24, tx:  0, lsn: 
0/03D8, prev 0/0360, desc: SWITCH, trailing-bytes: 16776936




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


fix_waldump_size_for_wal_switch_v2.patch
Description: Binary data


Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2020-10-09 Thread movead...@highgo.ca
Hello hackers,

We know that pg_waldump can statistics size for every kind of records. When I 
use
the feature I find it misses some size for XLOG_SWITCH records. When a user does
a pg_wal_switch(), then postgres will discard the remaining size in the current 
wal
segment, and the pg_waldump tool misses the discard size.

I think it will be better if pg_waldump  can show the matter, so I make a patch
which regards the discard size as a part of XLOG_SWITCH record, it works if we
want to display the detail of wal records or the statistics, and patch attached.

What's your opinion?



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


fix_waldump_size_for_wal_switch.patch
Description: Binary data


[POC]Enable tuple change partition caused by BEFORE TRIGGER

2020-08-21 Thread movead...@highgo.ca
Hello hackers,

Currently, if BEFORE TRIGGER causes a partition change, it reports an error 
'moving row
to another partition during a BEFORE FOR EACH ROW trigger is not supported' and 
fails
to execute. I want to try to address this limitation and have made an initial 
patch to get
feedback  from other hackers.

The implemented approach is similar to when a change partition caused by an 
UPDATE
statement. If it's a BEFORE INSERT TRIGGER then we just need to insert the row 
produced
by a trigger to the new partition, and if it's a BEFORE UPDATE TRIGGER we need 
to delete
the old tuple and insert the row produced by the trigger to the new partition.

In current BEFORE TRIGGER implementation, it reports an error once a trigger 
result out
of current partition, but I think it should check it after finish all triggers 
call, and you can
see the discussion in [1][2]. In the attached patch I have changed this rule, I 
check the
partition constraint only once after all BEFORE TRIGGERS have been called. If 
you do not
agree with this way, I can change the implementation.

And another point is that when inserting to new partition caused by BEFORE 
TRIGGER,
then it will not trigger the BEFORE TRIGGER on a new partition. I think it's 
the right way,
what's more, I think the UPDATE approach cause partition change should not 
trigger the
BEFORE TRIGGERS too, you can see discussed on [1].

[1]https://www.postgresql.org/message-id/2020082017164661079648%40highgo.ca
[2]https://www.postgresql.org/message-id/20200318210213.GA9781@alvherre.pgsql



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


enable_partition_change_by_BEFORE_TRIGGER.patch
Description: Binary data


Small doubt on update a partition when some rows need to move among partition

2020-08-20 Thread movead...@highgo.ca
Hello,

I am trying to handle the limit that we can't do a tuple move caused by BEFORE 
TRIGGER,
during which I get two doubt points:

The first issue:
In ExecBRUpdateTriggers() or ExecBRInsertTriggers() function why we need to 
check the
result slot after every trigger call. If we should check the result slot after 
all triggers are
called.

For example, we have a table t1(i int, j int), and a BEFORE INSERT TRIGGER on 
t1 make i - 1, 
and another BEFORE INSERT TRIGGER on t1 make i + 1. If the first trigger causes 
a partition
move, then the insert query will be interrupted. However, it will not change 
partition after
all triggers are called.

The second issue:
I read the code for partition move caused by an update, it deletes tuple in an 
old partition
and inserts a new tuple in a partition. But during the insert, it triggers the 
trigger on the new
partition, so the result value may be changed again, I want to know if it's 
intended way? In
my mind, if an insert produced by partition move should not trigger before 
trigger again.


I make an initial patch as my thought, sorry if I missing some of the 
historical discussion.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


partition_move_issues.patch
Description: Binary data


Re: [Proposal] Global temporary tables

2020-08-07 Thread movead...@highgo.ca

>I find this is the most latest mail with an attachment, so I test and reply on
>this thread, several points as below:

>1. I notice it produces new relfilenode when new session login and some
>data insert. But the relfilenode column in pg_class still the one when create
>the global temp table. I think you can try to show 0 in this area as what nail
>relation does. 
>I think getting the GTT to have a default relfilenode looks closer to the 
>existing implementation, and setting it to 0 requires extra work and has no 
>clear benefit.
>What do you think?
>I'd like to know the reasons for your suggestion.
The 'relfilenode' mean the file no on disk which different from oid of a 
relation,
 the default one is a wrong for gtt, so I think it's not so good to show it in 
pg_class.

>2. The nail relations handle their relfilenodes by RelMapFile struct, and this
>patch use hash entry and relfilenode_list, maybe RelMapFile approach more
>understandable in my opinion. Sorry if I miss the real design for that.
>We can see the STORAGE and statistics info for the GTT, including relfilenode, 
>through view pg_gtt_relstats

postgres=# \d gtt
Table "public.gtt"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 a  | integer |   |  | 
 b  | integer |   |  | 

postgres=# insert into gtt values(1,1);
INSERT 0 1
postgres=# select * from pg_gtt_relstats ;
 schemaname | tablename | relfilenode | relpages | reltuples | relallvisible | 
relfrozenxid | relminmxid 
+---+-+--+---+---+--+
 public | gtt   |   16384 |0 | 0 | 0 |  
532 |  1
(1 row)

postgres=# truncate gtt;
TRUNCATE TABLE
postgres=# select * from pg_gtt_relstats ;
 schemaname | tablename | relfilenode | relpages | reltuples | relallvisible | 
relfrozenxid | relminmxid 
+---+-+--+---+---+--+
 public | gtt   |   16387 |0 | 0 | 0 |  
533 |  1
(1 row)


I just suggest a way which maybe most naturely to the exist code struct, and 
it's
uo to you.


>3. I get a wrong result of pg_relation_filepath() function for global temp 
>table,
>I think it's necessaryto keep this an correct output.

postgres=# select pg_relation_filepath(oid) from pg_class where relname = 'gtt';
 pg_relation_filepath 
--
 base/13835/t3_16384
(1 row)

I didn't find anything wrong. Could you please give me a demo.

In my opinoin it should show 'base/13835/t3_16387', other than 
'base/13835/t3_16384',
because the relfilenode change to 16387 when you truncate it in step 2.

>4. In gtt_search_by_relid() function, it has not handle the missing_ok argument
>if gtt_storage_local_hash is null. There should be some comments if it's the 
>right
>code.
>This is a problem that has been fixed in global_temporary_table_v34-pg13.patch.
Sorry about it, I can not find it in mail thread and maybe I miss something. 
The mail thread
is so long, it's better to create a new mail thread I think.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: [Proposal] Global temporary tables

2020-08-03 Thread movead...@highgo.ca

>Fixed in global_temporary_table_v29-pg13.patch
>Please check.

I find this is the most latest mail with an attachment, so I test and reply on
this thread, several points as below:

1. I notice it produces new relfilenode when new session login and some
data insert. But the relfilenode column in pg_class still the one when create
the global temp table. I think you can try to show 0 in this area as what nail
relation does. 

2. The nail relations handle their relfilenodes by RelMapFile struct, and this
patch use hash entry and relfilenode_list, maybe RelMapFile approach more
understandable in my opinion. Sorry if I miss the real design for that.

3. I get a wrong result of pg_relation_filepath() function for global temp 
table,
I think it's necessaryto keep this an correct output.

4. In gtt_search_by_relid() function, it has not handle the missing_ok argument
if gtt_storage_local_hash is null. There should be some comments if it's the 
right
code.

5. It's a long patch and hard to review, I think it will pretty good if it can 
be
divided into several subpatches with relatively independent subfunctions.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: pg_resetwal --next-transaction-id may cause database failed to restart.

2020-07-19 Thread movead...@highgo.ca

>It may be OK actually; if you're doing multiple dangerous changes, you'd
>use --dry-run beforehand ... No?  (It's what *I* would do, for sure.)
>Which in turns suggests that it would good to ensure that --dry-run
>*also* emits a warning (not an error, so that any other warnings can
>also be thrown and the user gets the full picture).
Yes that's true, I have chaged the patch and will get a warning rather than
error when we point a --dry-run option.
And I remake the code which looks more clearly.

>I think adding multiple different --force switches makes the UI more
>complex for little added value.
Yes I also feel about that, but I can't convince myself to use --force
to finish the mission, because --force is used when something wrong with
pg_control file and we can listen to hackers' proposals.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


pg_resetwal_transaction_limit_v3.patch
Description: Binary data


Re: POC and rebased patch for CSN based snapshots

2020-07-13 Thread movead...@highgo.ca

>When checking each tuple visibility, we always have to get the CSN
>corresponding to XMIN or XMAX from CSN SLRU. In the past discussion,
>there was the suggestion that CSN should be stored in the tuple header
>or somewhere (like hint bit) to avoid the overhead by very frequehntly
>lookup for CSN SLRU. I'm not sure the conclusion of this discussion.
>But this patch doesn't seem to adopt that idea. So did you confirm that
>such performance overhead by lookup for CSN SLRU is negligible?
This patch came from postgrespro's patch which shows a good performance,
I have simple test on current patch and result no performance decline. 
And not everytime we do a tuple visibility need lookup forCSN SLRU, only xid
large than 'TransactionXmin' need that. Maybe we have not touch the case
which cause bad performance, so it shows good performance temporary. 

>Of course I know that idea has big issue, i.e., there is no enough space
>to store CSN in a tuple header if CSN is 64 bits. If CSN is 32 bits, we may
>be able to replace XMIN or XMAX with CSN corresponding to them. But
>it means that we have to struggle with one more wraparound issue
>(CSN wraparound issue). So it's not easy to adopt that idea...

>Sorry if this was already discussed and concluded...
I think your point with CSN in tuple header is a exciting approach, but I have
not seen the discussion, can you show me the discussion address?




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: POC and rebased patch for CSN based snapshots

2020-07-13 Thread movead...@highgo.ca

>I have doubts that I fully understood your question, but still.
>What real problems do you see here? Transaction t1 doesn't get state of
>shard2 until time at node with shard2 won't reach start time of t1.
>If transaction, that inserted B wants to know about it position in time
>relatively to t1 it will generate CSN, attach to node1 and will see,
>that t1 is not started yet.
 
>Maybe you are saying about the case that someone who has a faster data
>channel can use the knowledge from node1 to change the state at node2?
>If so, i think it is not a problem, or you can explain your idea.
Sorry, I think this is my wrong understand about Clock-SI. At first I expect
we can get a absolutly snapshot, for example B should not include in the
snapshot because it happened after time t1. How ever Clock-SI can not guarantee
that and no design guarantee that at all.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
 


Re: A patch for get origin from commit_ts.

2020-07-09 Thread movead...@highgo.ca

>> I am not sure why, I can not apply your patch, and I open it
>> with vscode and shows a CRLF format, if I change the patch to
>> LF then nothing wrong.
 
>This looks like an issue in your environment, like with git's autocrlf
>or such?  rep-origin-superuser-v7.patch has no CRLF.
Yes thanks, that's my environment problem.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: A patch for get origin from commit_ts.

2020-07-09 Thread movead...@highgo.ca

>> but be careful the new patch is in Windows format now. 
>That would be surprising.  Why do you think that?
I am not sure why, I can not apply your patch, and I open it
with vscode and shows a CRLF format, if I change the patch to
LF then nothing wrong.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
 


Re: A patch for get origin from commit_ts.

2020-07-08 Thread movead...@highgo.ca

>There is a conflict in catversion.h.  If you wish to test the patch,
>please feel free to use the attached where I have updated the
>attribute name to roident.
I think everything is ok, but be careful the new patch is in Windows
format now.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



Re: pg_resetwal --next-transaction-id may cause database failed to restart.

2020-07-08 Thread movead...@highgo.ca

>> ISTM that a reasonable compromise is that if you use -x (or -c, -m, -O)
>> and the input value is outside the range supported by existing files,
>> then it's a fatal error; unless you use --force, which turns it into
>> just a warning.
 
>One potential problem is that you might be using --force for some
>other reason and end up forcing this, too. But maybe that's OK.
Yes it's true, so I try to add a new option to control this behavior, you
can see it in the last mail with attach.
 
>Perhaps we should consider the idea of having pg_resetwal create the
>relevant clog file and zero-fill it, if it doesn't exist already,
>rather than leaving that to to the DBA or the postmaster binary to do
>it. It seems like that is what people would want to happen in this
>situation.
I have considered this idea, but I think it produces files uncontrolled
by postmaster, so I think it may be unacceptable and give up.

In the case we force to write an unsafe value, we can create or extend
related files I think. Do you have any further idea, I can work out a new
patch.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



Re: A patch for get origin from commit_ts.

2020-07-07 Thread movead...@highgo.ca

>Regarding the attribute name, I was actually considering to just use
>"roident" instead.  This is more consistent with pglogical, and that's
>also the field name we use in ReplicationState[OnDisk].  What do you
>think?
Yes that's is the right way, I can see it's 'roident' in pg_replication_origin
catalog too.
What's your v6 patch based on, I can not apply it.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



Re: A patch for get origin from commit_ts.

2020-07-07 Thread movead...@highgo.ca

>Cool, thanks.  I have gone through your patch in details, and updated
>it as the attached.  Here are some comments.
 
>'8000' as OID for the new function was not really random, so to be
>fair with the other patches, I picked up the first random value
>unused_oids has given me instead.
> 
>There were some indentation issues, and pgindent got that fixed.
 
>I think that it is better to use "replication origin" in the docs
>instead of just origin.  I have kept "origin" in the functions for
>now as that sounded cleaner to me, but we may consider using something
>like "reporigin" as well as attribute name.
> 
>The tests could just use tstzrange() to make sure that the timestamps
>have valid values, so I have switched to that, and did not resist to
>do the same in the existing tests.
> 
>+-- Test when it can not find the transaction
>+SELECT * FROM pg_xact_commit_timestamp_origin((:'txid_set_origin'::text::int +
>10)::text::xid) x;
>This test could become unstable, particularly if it gets used in a
>parallel environment, so I have removed it.  Perhaps I am just
>over-pessimistic here though..
> 
>As a side note, I think that we could just remove the alternate output
>of commit_ts/, as it does not really get used because of the
>NO_INSTALLCHECK present in the module's Makefile.  That would be the
>job of a different patch, so I have updated it accordingly.  Glad to
>see that you did not forget to adapt it in your own patch.
> 
>(The change in catversion.h is a self-reminder...)
Thanks for all of that, so many details I still need to pay attention when 
submit a patch.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: pg_resetwal --next-transaction-id may cause database failed to restart.

2020-07-07 Thread movead...@highgo.ca

>The rationale for this interface is unclear to me.  Please explain what
>happens in each case?
>In my proposal, we'd have:
>* Bad value, no --force:
>  - program raises error, no work done.
>* Bad value with --force:
>  - program raises warning but changes anyway.
>* Good value, no --force:
>  - program changes value without saying anything
>* Good value with --force:
>  - same
You have list all cases, maybe you are right it needs to raise a warning
when force a Bad value write which missed in the patch.
And I use '--test' in the patch, not '--force' temporary, maybe it needs
a deep research and discuss.

>The rationale for this interface is convenient knowledgeable access: the
>DBA runs the program with value X, and if the value is good, then
>they're done.  If the program raises an error, DBA has a choice: either
>run with --force because they know what they're doing, or don't do
>anything because they know that they would make a mess.
Yes that's it, in addition the raised error, can tell the DBA to input a good
value.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: pg_resetwal --next-transaction-id may cause database failed to restart.

2020-07-06 Thread movead...@highgo.ca

 >ISTM that a reasonable compromise is that if you use -x (or -c, -m, -O)
>and the input value is outside the range supported by existing files,
>then it's a fatal error; unless you use --force, which turns it into
>just a warning.
I do not think '--force' is a good choice, so I add a '--test, -t' option to
force to write a unsafe value to pg_control.
Do you think it is an acceptable method?




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


pg_resetwal_transaction_limit_v2.patch
Description: Binary data


Re: A patch for get origin from commit_ts.

2020-07-06 Thread movead...@highgo.ca

>That was fast, thanks.  I have not tested the patch, but there are
>two things I missed a couple of hours back.  Why do you need
>pg_last_committed_xact_with_origin() to begin with?  Wouldn't it be
>more simple to just add a new column to pg_last_committed_xact() for
>the replication origin?  Contrary to pg_xact_commit_timestamp() that
>should not be broken for compatibility reasons because it returns only
>one value, we don't have this problem with pg_last_committed_xact() as
>it already returns one tuple with two values.
Yes make sense, changed in new patch.
 
>+{ oid => '4179', descr => 'get commit origin of a transaction',
>A second thing is that the OID of the new function should be in the
>range 8000.., as per the policy introduced in commit a6417078.
>src/include/catalog/unused_oids can be used to pick up a value.
Thanks, very helpful information and I have follow that.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



get_origin_from_commit_ts_v5.patch
Description: Binary data


Re: A patch for get origin from commit_ts.

2020-07-05 Thread movead...@highgo.ca

>+SELECT pg_replication_origin_create('test_commit_ts: get_origin_1');
>+SELECT pg_replication_origin_create('test_commit_ts: get_origin_2');
>+SELECT pg_replication_origin_create('test_commit_ts: get_origin_3');
>
>Why do you need three replication origins to test three times the same
>pattern?  Wouldn't one be enough and why don't you check after the
>timestamp?  I would also two extra tests: one with a NULL input and an
>extra one where the data could not be found.
> 
>+   found = TransactionIdGetCommitTsData(xid, , );
>+
>+   if (!found)
>+   PG_RETURN_NULL();
> 
>This part also looks incorrect to me, I think that you should still
>return two tuples, both marked as NULL.  You can do that just by
>switching the nulls flags to true for the two values if nothing can be
>found.
Thanks for the points and follow them, new patch attached.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


get_origin_from_commit_ts_v4.patch
Description: Binary data


Re: POC and rebased patch for CSN based snapshots

2020-07-04 Thread movead...@highgo.ca
Hello Andrey

>> I have researched your patch which is so great, in the patch only data
>> out of 'global_snapshot_defer_time' can be vacuum, and it keep dead
>> tuple even if no snapshot import at all,right?
>> 
>> I am thanking about a way if we can start remain dead tuple just before
>> we import a csn snapshot.
>> 
>> Base on Clock-SI paper, we should get local CSN then send to shard nodes,
>> because we do not known if the shard nodes' csn bigger or smaller then
>> master node, so we should keep some dead tuple all the time to support
>> snapshot import anytime.
>> 
>> Then if we can do a small change to CLock-SI model, we do not use the
>> local csn when transaction start, instead we touch every shard node for
>> require their csn, and shard nodes start keep dead tuple, and master node
>> choose the biggest csn to send to shard nodes.
>> 
>> By the new way, we do not need to keep dead tuple all the time and do
>> not need to manage a ring buf, we can give to ball to 'snapshot too old'
>> feature. But for trade off, almost all shard node need wait.
>> I will send more detail explain in few days.
>I think, in the case of distributed system and many servers it can be 
>bottleneck.
>Main idea of "deferred time" is to reduce interference between DML 
>queries in the case of intensive OLTP workload. This time can be reduced 
>if the bloationg of a database prevails over the frequency of 
>transaction aborts.
OK there maybe a performance issue, and I have another question about Clock-SI.

For example we have three  nodes, shard1(as master), shard2, shard3, which
(time of node2) > (time of node2) > (time of node3), and you can see a picture:
http://movead.gitee.io/picture/blog_img_bad/csn/clock_si_question.png 
As far as I know about Clock-SI, left part of the blue line will setup as a 
snapshotif master require a snapshot at time t1. But in fact data A should in 
snapshot butnot and data B should out of snapshot but not.
If this scene may appear in your origin patch? Or something my understand 
aboutClock-SI is wrong?



Re: A patch for get origin from commit_ts.

2020-07-04 Thread movead...@highgo.ca
>Thanks.  Movead, please note that the patch is waiting on author?
>Could you send an update if you think that those changes make sense?

I make a patch as Michael Paquier described that use a new function to
return transactionid and origin, and I add a origin version to 
pg_last_committed_xact() too,  now it looks like below:


postgres=# SELECT txid_current() as txid \gset
postgres=# SELECT * FROM  pg_xact_commit_timestamp_origin(:'txid');
   timestamp   |   origin 
-+
 2020-07-04 17:52:10.199623+08 |  1
(1 row)

postgres=# SELECT * FROM pg_last_committed_xact_with_origin();
 xid  |   timestamp  | origin 
-++
 506 | 2020-07-04 17:52:10.199623+08 |  1
(1 row)

postgres=#


---
Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


get_origin_from_commit_ts_v3.patch
Description: Binary data


Re: A patch for get origin from commit_ts.

2020-06-29 Thread movead...@highgo.ca

>> A second thing is that TransactionIdGetCommitTsData() was introdued in
>> core(73c986add). It has only one caller pg_xact_commit_timestamp() which
>> passes RepOriginId as NULL, making last argument to the
>> TransactionIdGetCommitTsData() a dead code in core.
>>
>> Quick code search shows that it is getting used by pglogical (caller:
>> https://sources.debian.org/src/pglogical/2.3.2-1/pglogical_conflict.c/?hl=509#L509).
>> CCing Craig Ringer and Petr Jelinek for the inputs.
 
>Another question that has popped up when doing this review is what
>would be the use-case of adding this information at SQL level knowing
>that logical replication exists since 10?  Having dead code in the
>backend tree is not a good idea of course, so we can also have as
>argument to simplify TransactionIdGetCommitTsData().  Now, pglogical
>has pglogical_xact_commit_timestamp_origin() to get the replication
>origin with its own function so providing an extending equivalent
>returning one row with two fields would be nice for pglogical so as
>this function is not necessary.  As mentioned by Madan, the portion of
>the code using TransactionIdGetCommitTsData() relies on it for
>conflicts of updates (the first win, last win logic at quick glance).

Thanks for the explanation, the origin in commit_ts seems useless, I am just
want to know why it appears there. It's ok to close this issue if we do not
want to touch it now.

And I am more interest in origin in wal, if data from a logical replicate or a 
manual origin then many wal records will get a 'RepOriginId',  'RepOriginId'
in 'xact' wal record may help to do some filter, the other same dead code
too. So can you help me to understand why or the historical reason for that?




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: pg_resetwal --next-transaction-id may cause database failed to restart.

2020-06-24 Thread movead...@highgo.ca

>Yeah, the normal workaround is to create the necessary file manually in
>order to let the system start after such an operation; they are
>sometimes necessary to enable testing weird cases with wraparound and
>such.  So a total rejection to work for these cases would be unhelpful
>precisely for the scenario that those switches were intended to serve.
I think these words should appear in pg_resetwal document if we decide
to do nothing for this issue. 

>Maybe a better answer is to have a new switch in postmaster that creates
>any needed files (incl. producing associated WAL etc); so you'd run
>pg_resetwal -x some-value
>postmaster --create-special-stuff
>then start your server and off you go.
As shown in the document, it looks like to rule a safe input, so I think it's 
better
to rule it and add an option to focus write an unsafe value if necessary.
 
>Now maybe this is too much complication for a mechanism that really
>isn't for general consumption anyway.  I mean, if you're using
>pg_resetwal, you're already playing with fire.
Yes, that's true, I always heard the word "You'd better not use pg_walreset".
But the tool appear in PG code, it's better to improve it than do nothing.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


pg_resetwal --next-transaction-id may cause database failed to restart.

2020-06-22 Thread movead...@highgo.ca
hello hackers,

When I try to use pg_resetwal tool to skip some transaction ID, I get a problem 
that is
the tool can accept all transaction id I offered with '-x' option, however, the 
database
may failed to restart because of can not read file under $PGDATA/pg_xact.  For
example, the 'NextXID' in a database is 1000, if you offer '-x 32769' then the 
database
failed to restart.

I read the document of pg_resetwal tool, it told me to write a 'safe value', 
but I think
pg_resetwal tool should report it and refuse to exec walreset work when using 
an unsafe
value, rather than remaining it until the user restarts the database.

I do a initial patch to limit the input, now it accepts transaction in two ways:
1. The transaction ID is on the same CLOG page with the 'NextXID' in pg_control.
2. The transaction ID is right at the end of a CLOG page.
The input limited above can ensure the database restart successfully.

The same situation with multixact and multixact-offset option and I make
the same change in the patch.

Do you think it is an issue?



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


pg_resetwal_transaction_limit.patch
Description: Binary data


Re: Global snapshots

2020-06-19 Thread movead...@highgo.ca

>> would like to know if the patch related to CSN based snapshot [2] is a
>> precursor for this, if not, then is it any way related to this patch
>> because I see the latest reply on that thread [2] which says it is an
>> infrastructure of sharding feature but I don't understand completely
>> whether these patches are related?
>I need some time to study this patch. At first sight it is different.

This patch[2] is almost base on [3], because I think [1] is talking about 2PC
and FDW, so this patch focus on CSN only and I detach the global snapshot
part and FDW part from the [1] patch. 

I notice CSN will not survival after a restart in [1] patch, I think it may not 
the
right way, may be it is what in last mail "Needs guarantees of monotonically
increasing of the CSN in the case of an instance restart/crash etc" so I try to
add wal support for CSN on this patch.

That's why this thread exist.

> [1] - 
> https://www.postgresql.org/message-id/CA%2Bfd4k4v%2BKdofMyN%2BjnOia8-7rto8tsh9Zs3dd7kncvHp12WYw%40mail.gmail.com
> [2] - https://www.postgresql.org/message-id/2020061911294657960322%40highgo.ca
[3]https://www.postgresql.org/message-id/21BC916B-80A1-43BF-8650-3363CCDAE09C%40postgrespro.ru
 


Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: POC and rebased patch for CSN based snapshots

2020-06-18 Thread movead...@highgo.ca

>You mean that the last generated CSN needs to be WAL-logged because any smaller
>CSN than the last one should not be reused after crash recovery. Right?
Yes that's it. 

>If right, that WAL-logging seems not necessary because CSN mechanism assumes
>CSN is increased monotonically. IOW, even without that WAL-logging, CSN afer
>crash recovery must be larger than that before. No?

CSN collected based on time of  system in this patch, but time is not reliable 
all the
time. And it designed for Global CSN(for sharding) where it may rely on CSN from
other node , which generated from other machine. 

So monotonically is not reliable and it need to keep it's largest CSN in wal in 
case
of crash.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: POC and rebased patch for CSN based snapshots

2020-06-18 Thread movead...@highgo.ca

Thanks for reply.

>AFAIU, this patch is to improve scalability and also will be helpful
>for Global Snapshots stuff, is that right?  If so, how much
>performance/scalability benefit this patch will have after Andres's
>recent work on scalability [1]?
The patch focus on to be an infrastructure of sharding feature, according
to my test almost has the same performance with and without the patch.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: POC and rebased patch for CSN based snapshots

2020-06-18 Thread movead...@highgo.ca

Thanks for reply.

>Probably it's not time to do the code review yet, but when I glanced the patch,
>I came up with one question.
>0002 patch changes GenerateCSN() so that it generates CSN-related WAL records
>(and inserts it into WAL buffers). Which means that new WAL record is generated
>whenever CSN is assigned, e.g., in GetSnapshotData(). Is this WAL generation
>really necessary for CSN?
This is designed for crash recovery, here we record our most new lsn in wal so 
it
will not use a history lsn after a restart. It will not write into wal every 
time, but with
a gap which you can see CSNAddByNanosec() function.

>BTW, GenerateCSN() is called while holding ProcArrayLock. Also it inserts new
>WAL record in WriteXidCsnXlogRec() while holding spinlock. Firstly this is not
>acceptable because spinlocks are intended for *very* short-term locks.
>Secondly, I don't think that WAL generation during ProcArrayLock is good
>design because ProcArrayLock is likely to be bottleneck and its term should
>be short for performance gain.
Thanks for point out which may help me deeply, I will reconsider that.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: POC and rebased patch for CSN based snapshots

2020-06-12 Thread movead...@highgo.ca
Hello hackers,

Currently, I do some changes based on the last version:
1. Catch up to the current  commit (c2bd1fec32ab54).
2. Add regression and document.
3. Add support to switch from xid-base snapshot to csn-base snapshot,
and the same with standby side.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca





0001-CSN-base-snapshot.patch
Description: Binary data


0002-Wal-for-csn.patch
Description: Binary data


0003-snapshot-switch.patch
Description: Binary data


Re: proposal - function string_to_table

2020-06-04 Thread movead...@highgo.ca
+{ oid => '2228', descr => 'split delimited text',
+  proname => 'string_to_table', prorows => '1000', proretset => 't',
+  prorettype => 'text', proargtypes => 'text text',
+  prosrc => 'text_to_table' },
+{ oid => '2282', descr => 'split delimited text with null string',
+  proname => 'string_to_table', prorows => '1000', proretset => 't',
+  prorettype => 'text', proargtypes => 'text text text',
+  prosrc => 'text_to_table_null' },

I go through the patch, and everything looks good to me. But I do not know
why it needs a 'text_to_table_null()', it's ok to put a 'text_to_table' there, 
I think.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: A patch for get origin from commit_ts.

2020-05-13 Thread movead...@highgo.ca

>I have not thought about this matter, but it seems to me that you
>should add this patch to the upcoming commit fest for evaluation:
>https://commitfest.postgresql.org/28/ 
Thanks.

I think about it more detailed, and find it's better to show the 'roident'
other than 'roname'. Because an old 'roident' value will be used
immediately after dropped, and a new patch attached with test case
and documentation.


SELECT pg_xact_commit_origin('490');
 pg_xact_commit_origin 
---
 1
(1 row)

SELECT pg_xact_commit_origin('491');
 pg_xact_commit_origin 
---
 2
(1 row)





Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


get_origin_from_commit_ts_v2.patch
Description: Binary data


A patch for get origin from commit_ts.

2020-05-11 Thread movead...@highgo.ca
Hello hackers,

I am researching about 'origin' in PostgreSQL, mainly it used in logical
decoding to filter transaction from non-local source. I notice that the
'origin' is stored in commit_ts so that I think we are possible to get 'origin'
of a transaction from commit_ts.

But I can not fond any code to get 'origin' from commit_ts, just like it is
producing data which nobody cares about. Can I know what's the purpose
of the 'origin' in commit_ts? Do you think we should add some support
to the careless data?

For example, I add a function to get 'origin' from commit_ts:
===
postgres=# select pg_xact_commit_origin('490');
 pg_xact_commit_origin 
---
 test_origin
(1 row)

postgres=# select pg_xact_commit_origin('491');
 pg_xact_commit_origin 
---
 test_origin1
(1 row)

postgres=#
===




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


get_origin_from_commit_ts.patch
Description: Binary data


Re: recovery_target_action=pause with confusing hint

2020-04-08 Thread movead...@highgo.ca

>To cover your case, what about adding the following description?
 
>-
>There can be delay between a promotion request by users and the trigger of
>a promotion in the server. Note that pg_wal_replay_pause() succeeds
>during that delay, i.e., until a promotion is actually triggered.
>-
Yes that's it.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: recovery_target_action=pause with confusing hint

2020-04-08 Thread movead...@highgo.ca

>> we should notice it in document I think. 
>There is the following explation about the relationship the recovery
>pause and the promotion, in the document. You may want to add more
>descriptions into the docs?
>--
>If a promotion is triggered while recovery is paused, the paused
>state ends and a promotion continues.
>--

For example we can add this words:
First-time pg_wal_replay_pause() called during recovery which triggered
as promotion, pg_wal_replay_pause() show success but it did not really
pause the recovery.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: A bug when use get_bit() function for a long bytea string

2020-04-06 Thread movead...@highgo.ca
Hello hackers,

After several patch change by hacker's proposal, I think it's ready to
commit, can we commit it before doing the code freeze for pg-13?




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: A bug when use get_bit() function for a long bytea string

2020-04-06 Thread movead...@highgo.ca

>In existing releases, the SQL definitions are set_bit(bytea,int4,int4)
>and get_bit(bytea,int4) and cannot be changed to not break the API.
>So the patch meant for existing releases has to deal with a too-narrow
>int32 bit number.
 
>Internally in the C functions, you may convert that number to int64
>if you think it's necessary, but you may not use PG_GETARG_INT64
>to pick a 32-bit argument.
The input parameter of 'set_bit()' function for 'byteaGetBit' has changed
to  'bytea int8 int4', but maybe another 'set_bit()'  for 'bitgetbit' need not
changed. The same with 'get_bit()'.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: A bug when use get_bit() function for a long bytea string

2020-04-02 Thread movead...@highgo.ca

>Some comments about the set_bit/get_bit parts.
>I'm reading long_bytea_string_bug_fix_ver6_pg10.patch, but that
>applies probably to the other files meant for the existing releases
>(I think you could get away with only one patch for backpatching
>and one patch for v13, and committers will sort out how
>to deal with release branches).
Thanks for teaching me.

>byteaSetBit(PG_FUNCTION_ARGS)
>{
>bytea*res = PG_GETARG_BYTEA_P_COPY(0);
>- int32 n = PG_GETARG_INT32(1);
>+ int64 n = PG_GETARG_INT64(1);
>int32 newBit = PG_GETARG_INT32(2);
>The 2nd argument is 32-bit, not 64. PG_GETARG_INT32(1) must be used. 
>+ errmsg("index "INT64_FORMAT" out of valid range, 0.."INT64_FORMAT,
>+ n, (int64)len * 8 - 1)));
>The cast to int64 avoids the overflow, but it may also produce a
>result that does not reflect the actual range, which is limited to
>2^31-1, again because the bit number is a signed 32-bit value. 
>I believe the formula for the upper limit in the 32-bit case is:
>  (len <= PG_INT32_MAX / 8) ? (len*8 - 1) : PG_INT32_MAX;

>These functions could benefit from a comment mentioning that
>they cannot reach the full extent of a bytea, because of the size limit
>on the bit number.

Because the 2nd argument is describing 'bit' location of every byte in bytea
string, so an int32 is not enough for that. I think the change is nothing wrong,
or I have not caught your means? 


>These 2 tests need to allocate big chunks of contiguous memory, so they
>might fail for lack of memory on tiny machines, and even when not failing,
>they're pretty slow to run. Are they worth the trouble?


>These 2 tests are supposed to fail in existing releases because set_bit()
>and get_bit() don't take a bigint as the 2nd argument.
>Also, the same comment as above on how much they allocate.
I have deleted the four test cases because it is not worth the memory and time,
and no new test cases added because it needs time to generate lots of data.

New patch attached.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
<>


Re: A bug when use get_bit() function for a long bytea string

2020-04-02 Thread movead...@highgo.ca

>Sorry about that, attached is the changed patch for PG13, and the one
>for older branches will send sooner.
A little update for the patch, and patches for all stable avilable.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
<>


Re: A bug when use get_bit() function for a long bytea string

2020-04-02 Thread movead...@highgo.ca

>I don't think this has really solved the overflow hazards.  For example,
>in binary_encode we've got
 
>resultlen = enc->encode_len(VARDATA_ANY(data), datalen);
>result = palloc(VARHDRSZ + resultlen);
 
>and all you've done about that is changed resultlen from int to int64.
>On a 64-bit machine, sure, palloc will be able to detect if the
>result exceeds what can be allocated --- but on a 32-bit machine
>it'd be possible for the size_t argument to overflow altogether.
>(Or if you want to argue that it can't overflow because no encoder
>expands the data more than 4X, then we don't need to be making this
>change at all.)
 
>I don't think there's really any way to do that safely without an
>explicit check before we call palloc.
I am sorry I do not very understand these words, and especially
what's the mean by 'size_t'. 
Here I change resultlen from int to int64, is because we can get a right
error report value other than '-1' or another strange number.


 
>Why did you change the outputs from unsigned to signed?  Why didn't
>you change the dlen inputs?  I grant that we know that the input
>can't exceed 1GB in Postgres' usage, but these signatures are just
>randomly inconsistent, and you didn't even add a comment explaining
>why.  Personally I think I'd make them like
>uint64 (*encode_len) (const char *data, size_t dlen);
>which makes it explicit that the dlen argument describes the length
>of a chunk of allocated memory, while the result might exceed that.
I think it makes sence and followed.
 
>Lastly, there is a component of this that can be back-patched and
>a component that can't --- we do not change system catalog contents
>in released branches.  Cramming both parts into the same patch
>is forcing the committer to pull them apart, which is kind of
>unfriendly.
Sorry about that, attached is the changed patch for PG13, and the one
for older branches will send sooner.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


long_bytea_string_bug_fix_ver6.patch
Description: Binary data


Re: recovery_target_action=pause with confusing hint

2020-04-01 Thread movead...@highgo.ca

>So I'd like to propose the attached patch. The patch changes the message
>logged when a promotion is requested, based on whether the recovery is
>in paused state or not.
It is a compromise, we should notice it in document I think.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: recovery_target_action=pause with confusing hint

2020-04-01 Thread movead...@highgo.ca

>This happens because the startup process detects the trigger file
>after pg_wal_replay_pause() succeeds, and then make the recovery
>get out of the paused state. 
Yes that is.

>It might be problematic to end the paused
>state silently? So, to make the situation less confusing, what about
>emitting a log message when ending the paused state because of
>the promotion?
But where to emit it? I think it not so good by emitting to log file,
because the user will not check it everytime. BTW, why
CheckForStandbyTrigger() can not be called in backend.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: recovery_target_action=pause with confusing hint

2020-04-01 Thread movead...@highgo.ca
>Thanks for the explanation again! Maybe I understand your point.
Great.

>As far as I read the code, in the standby mode, the startup process
>periodically checks the trigger file in WaitForWALToBecomeAvailable().
>No?
Yes it is. 

>There can be small delay between the creation of the trigger file
>and the periodic call to CheckForStandbyTrigger() by the startup process.
>If you execute pg_wal_replay_pause() during that delay, it would suceed.
If there be a huge number of wal segments need a standby to rewind,
then it can not be a 'small delay'.

>But you'd like to get rid of that delay completely? In other words,
>both the startup process and the backend calling pg_wal_replay_pause()
>should detect the existence of the trigger file immdiately after
>it's created?
I want to point out the thing, the pg_wal_replay_pause() shows success but
the startup process keeping redo, it may cause something confused. So it's
better to solve the issue.





Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: Re: recovery_target_action=pause with confusing hint

2020-04-01 Thread movead...@highgo.ca

>But, sorry,,, I failed to understand the issue that you reported, yet...
>You mean that the first call of pg_wal_replay_pause() in the step #2
>should check whether the trigger file exists or not? If so, could you
>tell me why we should do that?
Sorry about my pool english.  The 'pg_wal_replay_pause()' is executed
in step 4. I mention it in step 2 is for explain why it need lots of insert
data. 

>BTW, right now only the startup process is allowed to call
>CheckForStandbyTrigger(). So the backend process calling
>pg_wal_replay_pause() and PromoteIsTriggered() is not allowed to call
>CheckForStandbyTrigger(). The current logic is that the startup process
>is responsible for checking the trigger file and set the flag in the shmem
It's here, startup process does not call CheckForStandbyTrigger() to check
the trigger file until a pg_wal_replay_pause() or PromoteIsTriggered() comes,
so first time to call the pg_wal_replay_pause(), it use a wrong 
'SharedPromoteIsTriggered' value.


>if promotion is triggered. Then other processes like backend know
>whether promotion is ongoing or not from the shmem. So basically
>the backend doesn't need to check the trigger file itself.
If backend is not allowed to call CheckForStandbyTrigger(), then you should
find a way to hold it.
In another word, during the recovery if I add the trigger file, the starup 
process
do not know it at all until after a pg_wal_replay_pause() come.

In addition, although the first time I exec 'pg_wal_replay_pause' it shows 
success,
the startup process is keeping redo(no stop). 




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: A bug when use get_bit() function for a long bytea string

2020-03-31 Thread movead...@highgo.ca

>+ int64 res,resultlen;
>It's better to have them on separate lines.
Sorry for that, done.

>-unsigned
>+int64
> hex_decode(const char *src, unsigned len, char *dst)
>Do we want to explicitly cast the return value to int64? Will build on some 
>platform crib if not done so?
>I don't know of such a platform but my knowledge in this area is not great.
I think current change can make sure nothing wrong.

>+ byteNo = (int)(n / 8);
>+ bitNo = (int)(n % 8);
>some comment explaining why this downcasting is safe here?
Done

>-  proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int4',
>+  proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
>   prosrc => 'byteaGetBit' },
> { oid => '724', descr => 'set bit',
>-  proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int4 
>int4',
>+  proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int8 
>int4',
>   prosrc => 'byteaSetBit' },
>Shouldn't we have similar changes for following entries as well?
>{ oid => '3032', descr => 'get bit',
> proname => 'get_bit', prorettype => 'int4', proargtypes => 'bit int4',
>  prosrc => 'bitgetbit' },
>{ oid => '3033', descr => 'set bit',
>  proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
 > prosrc => 'bitsetbit' },
Because 'bitsetbit' and 'bitgetbit' do not have to calculate bit size by 
'multiply 8',
so I think it seems need not to change it.

>The tests you have added are for bytea variant which ultimately calles 
>byteaGet/SetBit(). 
>But I think we also need tests for bit variants which will ultimately call 
>bitgetbit and bitsetbit functions.
As above, it need not to touch 'bitgetbit' and 'bitsetbit'.

>Once you address these comments, I think the patch is good for a committer. 
>So please mark the commitfest entry as such when you post the next version of 
>patch.
Thanks a lot for the detailed review again, and changed patch attached.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


long_bytea_string_bug_fix_ver5.patch
Description: Binary data


Re: recovery_target_action=pause with confusing hint

2020-03-31 Thread movead...@highgo.ca

>> When I test the patch, I find an issue: I start a stream with 
>> 'promote_trigger_file'
>> GUC valid, and exec pg_wal_replay_pause() during recovery and as below it
>> shows success to pause at the first time. I think it use a initialize
>> 'SharedPromoteIsTriggered' value first time I exec the pg_wal_replay_pause().
 
>hm. Are you sure this is related to this patch? Could you explain the exact 
>timing? I mean log_statement = all and relevant logs.
>Most likely this is expected change by 
>https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=496ee647ecd2917369ffcf1eaa0b2cdca07c8730
> 
>My proposal does not change the behavior after this commit, only changing the 
>lines in the logs.
 
I test it again with (92d31085e926253aa650b9d1e1f2f09934d0ddfc), and the
issue appeared again. Here is my test method which quite simple:
1. Setup a base backup by pg_basebackup.
2. Insert lots of data in master for the purpose I have enough time to exec
   pg_wal_replay_pause() when startup the replication.
3. Configure the 'promote_trigger_file' GUC and create the trigger file.
4. Start the backup(standby), connect it immediately, and exec 
pg_wal_replay_pause()
Then it appears, and a test log attached.

I means when I exec the pg_wal_replay_pause() first time, nobody has check the 
trigger state
by CheckForStandbyTrigger(), it use a Initialized 'SharedPromoteIsTriggered' 
value. 
And patch attached can solve the issue.






Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



pg_wal_replay_pause_issue.patch
Description: Binary data


postgresql-2020-04-01_100849.log
Description: Binary data


wal_insert_waring_issue

2020-03-30 Thread movead...@highgo.ca
Hello hackers,

I find a small issue in XLogInsertRecord() function: it checks whether it's able
to insert a wal by XLogInsertAllowed(), when refused it reports an error "cannot
make new WAL entries during recovery".

I notice that XLogInsertAllowed() reject to insert a wal record not only 
sometimes
in recovery status, but also after a 'shutdown' checkpoint. So if it's better to
add an error report message "cannot make new WAL entries during shutdown" which 
I
make a patch attached.

  Best regards,


Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


wal_insert_waring_issue.patch
Description: Binary data


Re: A bug when use get_bit() function for a long bytea string

2020-03-28 Thread movead...@highgo.ca
I want to resent the mail, because last one is in wrong format and hardly to 
read.

In addition, I think 'Size' or 'size_t' is rely on platform, they may can't 
work on 32bit
system. So I choose 'int64' after ashutosh's review.

>>I think the second argument indicates the bit position, which would be max 
>>bytea length * 8. If max bytea length covers whole int32, the second argument 
>>>needs to be wider i.e. int64.
>Yes, it makes sence and followed.

>>I think we need a similar change in byteaGetBit() and byteaSetBit() as well.
Sorry, I think it's my mistake, it is the two functions above should be changed.


>>Some more comments on the patch
>> struct pg_encoding
>> {
>>- unsigned (*encode_len) (const char *data, unsigned dlen);
>>+ int64 (*encode_len) (const char *data, unsigned dlen);
>>  unsigned (*decode_len) (const char *data, unsigned dlen);
>>  unsigned (*encode) (const char *data, unsigned dlen, char *res);
>>  unsigned (*decode) (const char *data, unsigned dlen, char *res);
>> Why not use return type of int64 for rest of the functions here as well?
>>  res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
>>  /* Make this FATAL 'cause we've trodden on memory ... */
>>- if (res > resultlen)
>>+ if ((int64)res > resultlen)
>>
>>if we change return type of all those functions to int64, we won't need this 
>>cast.
>I change the 'encode' function, it needs an int64 return type, but keep other 
>functions in 'pg_encoding', because I think it of no necessary reason.

>>Ok, let's leave it for a committer to decide. 
Well, I change all of them this time, because Tom Lane supports on next mail.

 
>Some more review comments.
>+   int64   res,resultlen;
>We need those on separate lines, possibly.
Done

>+   byteNo = (int32)(n / BITS_PER_BYTE);
>Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise, please
>add a comment explaining the reason for the cast. The comment applies at other
>places where this change appears.
>-   int len;
>+   int64   len;
>Why do we need this change?
>int i;
It is my mistake as describe above, it should not be 'bitgetbit()/bitsetbit()'  
to be changed.



>>It might help to add a test where we could pass the second argument something
>>greater than 1G. But it may be difficult to write such a test case.
>Add two test cases.
 
>+
>+select get_bit(
>+set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024 * 
>1024 * 1024 + 1, 0)
>+   ,1024 * 1024 * 1024 + 1);

>This bit position is still within int4.
>postgres=# select pg_column_size(1024 * 1024 * 1024 + 1); 
> pg_column_size
>
>  4   
>(1 row)

>You want something like
>postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8); 
> pg_column_size
>
>  8   
>(1 row)
I intend to test size large then 1G, and now I think you give a better idea and 
followed.





Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



Re: A bug when use get_bit() function for a long bytea string

2020-03-28 Thread movead...@highgo.ca
>I think the second argument indicates the bit position, which would be max 
>bytea length * 8. If max bytea length covers whole int32, the second argument 
>>needs to be wider i.e. int64.
Yes, it makes sence and followed.

>I think we need a similar change in byteaGetBit() and byteaSetBit() as well.
Sorry, I think it's my mistake, it is the two functions above should be changed.


> Some more comments on the patch
> struct pg_encoding
> {
>- unsigned (*encode_len) (const char *data, unsigned dlen);
>+ int64 (*encode_len) (const char *data, unsigned dlen);
>  unsigned (*decode_len) (const char *data, unsigned dlen);
>  unsigned (*encode) (const char *data, unsigned dlen, char *res);
>  unsigned (*decode) (const char *data, unsigned dlen, char *res);
> Why not use return type of int64 for rest of the functions here as well?
>  res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
>  /* Make this FATAL 'cause we've trodden on memory ... */
>- if (res > resultlen)
>+ if ((int64)res > resultlen)
>
>if we change return type of all those functions to int64, we won't need this 
>cast.
I change the 'encode' function, it needs an int64 return type, but keep other 
functions in 'pg_encoding', because I think it of no necessary reason.

>Ok, let's leave it for a committer to decide. 
Well, I change all of them this time, because Tom Lane supports on next mail.

 
>Some more review comments.
>+   int64   res,resultlen;
Done

>We need those on separate lines, possibly.
>+   byteNo = (int32)(n / BITS_PER_BYTE);
>Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise, please
>add a comment explaining the reason for the cast. The comment applies at other
>places where this change appears.
>-   int len;
>+   int64   len;
>Why do we need this change?
>int i;
It is my mistake as describe above, it should not be 'bitgetbit()/bitsetbit()'  
to be changed.



>It might help to add a test where we could pass the second argument something
>greater than 1G. But it may be difficult to write such a test case.
Add two test cases.
 
>+
>+select get_bit(
>+set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024 * 
>1024 * 1024 + 1, 0)
>+   ,1024 * 1024 * 1024 + 1);

>This bit position is still within int4.
>postgres=# select pg_column_size(1024 * 1024 * 1024 + 1); 
> pg_column_size
>
>  4   
>(1 row)

>You want something like
>postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8); 
> pg_column_size
>
>  8   
>(1 row)
I intend to test size large then 1G, and now I think you give a better idea and 
followed.




Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



long_bytea_string_bug_fix_ver4.patch
Description: Binary data


Re: A bug when use get_bit() function for a long bytea string

2020-03-17 Thread movead...@highgo.ca

Hello thanks for the detailed review,

>I think the second argument indicates the bit position, which would be max 
>bytea length * 8. If max bytea length covers whole int32, the second argument 
>>needs to be wider i.e. int64.
Yes, it makes sence and followed.

> Some more comments on the patch
> struct pg_encoding
> {
>- unsigned (*encode_len) (const char *data, unsigned dlen);
>+ int64 (*encode_len) (const char *data, unsigned dlen);
>  unsigned (*decode_len) (const char *data, unsigned dlen);
>  unsigned (*encode) (const char *data, unsigned dlen, char *res);
>  unsigned (*decode) (const char *data, unsigned dlen, char *res);
> Why not use return type of int64 for rest of the functions here as well?
>  res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
>  /* Make this FATAL 'cause we've trodden on memory ... */
>- if (res > resultlen)
>+ if ((int64)res > resultlen)
>
>if we change return type of all those functions to int64, we won't need this 
>cast.
I change the 'encode' function, it needs an int64 return type, but keep other
functions in 'pg_encoding', because I think it of no necessary reason.

>Right now we are using int64 because bytea can be 1GB, but what if we increase
>that limit tomorrow, will int64 be sufficient? That may be unlikely in the near
>future, but nevertheless a possibility. Should we then define a new datatype
>which resolves to int64 right now but really depends upon the bytea length. I
>am not suggesting that we have to do it right now, but may be something to
>think about.
I decide to use int64 because if we really want to increase the limit,  it 
should be
the same change with 'text', 'varchar' which have the same limit. So it may need
a more general struct. Hence I give up the idea.

> hex_enc_len(const char *src, unsigned srclen)
> {
>- return srclen << 1;
>+ return (int64)(srclen << 1);
>
>why not to convert srclen also to int64. That will also change the pg_encoding
>member definitions. But if encoded length requires int64 to fit the possibly
>values, same would be true for the source lengths. Why can't the source length
>also be int64?
>If still we want the cast, I think it should be ((int64)srclen << 1) rather
>than casting the result.
I prefer the '((int64)srclen << 1)' way.

>  /* 3 bytes will be converted to 4, linefeed after 76 chars */
>- return (srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4);
>+ return (int64)((srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4));
>similar comments as above.
 Followed.


>It might help to add a test where we could pass the second argument something
>greater than 1G. But it may be difficult to write such a test case.
Add two test cases.




Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


long_bytea_string_bug_fix_ver3.patch
Description: Binary data


Re: Re: A bug when use get_bit() function for a long bytea string

2020-03-12 Thread movead...@highgo.ca
Thanks for the reply.

>Why have you used size? Shouldn't we use int64?
Yes, thanks for the point, I have changed the patch.
 
>If get_bit()/set_bit() accept the second argument as int32, it can not
>be used to set bits whose number does not fit 32 bits. I think we need
>to change the type of the second argument as well.
Because int32 can cover the length of bytea that PostgreSQL support,
and I have decided to follow your next point 'not use 64bit int for len',
so I think the second argument can keep int32.

>Also, I think declaring len to be int is fine since 1G would fit an
>int, but what does not fit is len * 8, when performing that
>calculation, we have to widen the result. So, instead of changing the
>datatype of len, it might be better to perform the calculation as
>(int64)len * 8. If we use int64, we could also use INT64_FORMAT
>instead of using %ld.
Have followed and changed the patch.
 
>Since this is a bug it shouldn't wait another commitfest, but still
>add this patch to the commitfest so that it's not forgotten.
Will do.



Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


long_bytea_string_bug_fix_ver2.patch
Description: Binary data


A bug when use get_bit() function for a long bytea string

2020-03-11 Thread movead...@highgo.ca
Hello hackers,

I found an issue about get_bit() and set_bit() function,here it is:

postgres=# select 
get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0);
2020-03-12 10:05:23.296 CST [10549] ERROR:  index 0 out of valid range, 0..-1
2020-03-12 10:05:23.296 CST [10549] STATEMENT:  select 
get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0);
ERROR:  index 0 out of valid range, 0..-1
postgres=# select 
set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1);
2020-03-12 10:05:27.959 CST [10549] ERROR:  index 0 out of valid range, 0..-1
2020-03-12 10:05:27.959 CST [10549] STATEMENT:  select 
set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1);
ERROR:  index 0 out of valid range, 0..-1
postgres=#

PostgreSQL can handle bytea size nearby 1G, but now it reports an
error when 512M. And I research it and found it is byteaSetBit() and
byteaGetBit(), it uses an 'int32 len' to hold bit numbers for the long
bytea data, and obvious 512M * 8bit is an overflow for an int32. 
So I fix it and test ok, as below.

postgres=# select 
get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 
0,1),0); get_bit - 1 (1 row) postgres=# select 
get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 
0,0),0); get_bit - 0 (1 row) postgres=#



And I do a check about if anything else related bytea has this issue, several 
codes have the same issue:
1. byteaout() When formatting bytea as an escape, the 'len' variable should be 
int64, or
it may use an overflowing number. 2. esc_enc_len() Same as above, the 'len' 
variable should be int64, and the return type
should change as int64. Due to esc_enc_len() has same call struct with 
pg_base64_enc_len() and hex_enc_len(), so I want to change the return value of 
the two function. And the opposite function esc_dec_len() seem nothing wrong. 
3. binary_encode() and binary_decode() Here use an 'int32 resultlen' to accept 
an 'unsigned int' function return, which seem unfortable.
I fix all mentioned above, and patch attachments.
How do you think about that?




Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


long_bytea_string_bug_fix.patch
Description: Binary data


Re: Re: Asynchronous Append on postgres_fdw nodes.

2020-03-10 Thread movead...@highgo.ca

>It seems to me that you are setting a path containing ".." to PGDATA.
Thanks point it for me.



Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: Append with naive multiplexing of FDWs

2020-01-29 Thread movead...@highgo.ca
Hello,

>It is "asynchronous append on async-capable'd postgres-fdw scans". It
>could be called as such in the sense that it is intended to be used 
>with sharding.
 Yes that's it.

 
>Did you looked at my benchmarking result upthread?  Even it gives
>significant gain even when gathering large number of tuples from
>multiple servers or even from a single server.  It is because of its
>asynchronous nature.
I mean it gain performance at first, but it mets bottleneck while
increase the number of the nodes.
For example:
   It has 2 nodes, it will gain 200% performance.
   It has 3 nodes, it will gain 300% performance.
   However,
   It has 4 nodes, it gain 300% performance.
   It has 5 nodes, it gain 300% performance.
   ...

 

Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: Re: Append with naive multiplexing of FDWs

2020-01-14 Thread movead...@highgo.ca
Hello

I have tested the patch with a partition table with several foreign
partitions living on seperate data nodes. The initial testing was done
with a partition table having 3 foreign partitions, test was done with
variety of scale facters. The seonnd test was with fixed data per data
node but number of data nodes were increased incrementally to see
the peformance impact as more nodes are added to the cluster. The
test three is similar to the initial test but with much huge data and
4 nodes.

The results are summary is given below and test script attached:

Test ENV
Parent node:2Core 8G
Child Nodes:2Core 4G


Test one:

1.1 The partition struct as below:

 [ ptf:(a int, b int, c varchar)]
(Parent node)
| | |
[ptf1]  [ptf2]  [ptf3]
 (Node1)   (Node2)(Node3)

The table data is partitioned across nodes, the test is done using a
simple select query and a count aggregate as shown below. The result
is an average of executing each query multiple times to ensure reliable
and consistent results.

①select * from ptf where b = 100;
②select count(*) from ptf;

1.2. Test Results

 For ① result:
   scalepernodemasterpatched performance
   2G7s 2s   350%
   5G173s 63s 275%
   10G  462s 156s   296%
   20G  968s 327s   296%
   30G  1472s   494s   297%
   
 For ② result:
   scalepernodemasterpatched performance
   2G1079s   291s   370%
   5G2688s   741s   362%
   10G  4473s   1493s 299%

It takes too long time to test a aggregate so the test was done with a
smaller data size.


1.3. summary

With the table partitioned over 3 nodes, the average performance gain
across variety of scale factors is almost 300%


Test Two
2.1 The partition struct as below:

 [ ptf:(a int, b int, c varchar)]
(Parent node)
| | |
[ptf1] ...  [ptfN]
 (Node1)  (...)(NodeN)

①select * from ptf
②select * from ptf where b = 100;

This test is done with same size of data per node but table is partitioned
across N number of nodes. Each varation (master or patches) is tested
at-least 3 times to get reliable and consistent results. The purpose of the
test is to see impact on performance as number of data nodes are increased.

2.2 The results

For ① result(scalepernode=2G):
nodenumber  masterpatched performance
 2 432s180s  240%
 3 636s 223s 285%
 4 830s 283s 293%
 5 1065s   361s 295%
For ② result(scalepernode=10G):
nodenumber  masterpatched performance
 2 281s140s 201%
 3 421s140s 300%
 4 562s141s 398%
 5 702s141s 497%
 6 833s139s 599%
 7 986s141s 699%
 8 1125s  140s 803%


Test Three

This test is similar to the [test one] but with much huge data and 
4 nodes.

For ① result:
scalepernodemasterpatched performance
  100G6592s   1649s 399%
For ② result:
scalepernodemasterpatched performance
  100G35383  12363 286%
The result show it work well in much huge data.


Summary
The patch is pretty good, it works well when there were little data back to
the parent node. The patch doesn’t provide parallel FDW scan, it ensures
that child nodes can send data to parent in parallel but  the parent can only
sequennly process the data from data nodes.

Providing there is no performance degrdation for non FDW append queries,
I would recomend to consider this patch as an interim soluton while we are
waiting for parallel FDW scan.




Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


script.tar
Description: Binary data


Re: Re: row filtering for logical replication

2019-09-25 Thread movead...@highgo.ca

>Which regression?
Apply the 0001.patch~0005.patch and then do a 'make check', then there be a
failed item. And when you apply the 0006.patch, the failed item disappeared.

>There should be an error because you don't have a PK or REPLICA IDENTITY.
No. I have done the 'ALTER TABLE cities  REPLICA IDENTITY FULL'.

>Even if you create a PK or REPLICA IDENTITY, it won't turn this UPDATE
>into a INSERT and send it to the other node (indeed UPDATE will be
>sent however there isn't a tuple to update). Also, filter columns must
>be in PK or REPLICA IDENTITY. I explain this in documentation.
You should considered the Result2:
 On publication:
  insert into cities values('t1',1,135);
  update cities set altitude=300 where altitude=135;
  postgres=# table cities ;
   name | population | altitude 
  --++--
   t1   |123 |  134
   t1   |  1 |  300
  (2 rows)
  
  On subscription:
  ostgres=# table cities ;
   name | population | altitude 
  --++--
   t1   |  1 |  135

The tuple ('t1',1,135) appeared in both publication and subscription,
but after an update on publication, the tuple is disappeared on 
publication and change nothing on subscription.

The same with Result1, they puzzled me today and I think they will
puzzle the users in the future. It should have a more wonderful design,
for example, a log to notify users that there be a problem during replication
at least.

---
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
 
 
 
 


Re: Re: Email to hackers for test coverage

2019-08-28 Thread movead...@highgo.ca

On 2019-08-29 00:43, peter.eisentr...@2ndquadrant.com wrote:
 
 
> Make spaces and capitalization match surrounding code.
>That's fine, but I suggest that if you really want to make an impact in
>test coverage, look to increase function coverage rather than line
>coverage.  Or look for files that are not covered at all.
 
Thanks for pointing all the things, I will reconsider my way on
code coverage work.

-- 
Movead


Re: Re: Email to hackers for test coverage

2019-08-28 Thread movead...@highgo.ca

On Wed, 28 Aug 2019 11:30:23 +0800 mich...@paquier.xyz wrote
>- numeric is not a test suite designed for sorting, and hijacking it
>to do so it not a good approach.
Absolutely agreement. We can wait for repling from
Robert and Peter G.

>- it would be good to get coverage for the two extra code paths while
>we are on it with NULL datums.
The remained untested line in ApplySortComparator() can be never
reached I think, and I have explained it on last mail. 
Or I don't fully understand what you mean?
 
>- There is no need for two INSERT queries, I am getting the same
>coverage with only one.
Yes It is my mistake,  the key point is 'ORDER BY n1 DESC' not the 
'INSERT'. Infact it can run the code line without any of the two INSERT.

--
Movead


Re: Re: Email to hackers for test coverage

2019-08-27 Thread movead...@highgo.ca

On Tue, 27 Aug 2019 14:07:48 +0800 mich...@paquier.xyz wrote:
> There is a section in float4.sql which deals with overflow and
> underflow, so wouldn't it be better to move the tests there?  You 
> could just trigger the failures with that: 
> =# insert into float4_tbl values ('-10e-70'::float8); 
> ERROR:  22003: value out of range: underflow 
> LOCATION:  check_float4_val, float.h:145 
> =# insert into float4_tbl values ('-10e70'::float8); 
> ERROR:  22003: value out of range: overflow 
> LOCATION:  check_float4_val, float.h:140 
> I would also test all four patterns: 10e70, 10e-70, -10e70, -10e-70.
 
I think your way is much better, so I change the patch and it is  
'regression_20190827.patch' now.

> For the numeric part, this improves the case of
> ApplySortAbbrevFullComparator() where both values are not NULL.  Could
> things be done so as the other code paths are fully covered?  One
> INSERT is fine by the way to add the extra coverage.
There are code lines related to NULL values in
ApplySortAbbrevFullComparator(), but I think the code lines for
comparing a NULL and a NOT NULL can be never reached, because it is
handled in ApplySortComparator() which is called before
ApplySortAbbrevFullComparator(). So I think it is no use to INSERT
a NULL value.

--
Movead


regression_20190827.patch
Description: Binary data


Re: Re: Email to hackers for test coverage

2019-08-26 Thread movead...@highgo.ca
  On Mon, 26 Aug 2019 12:48:40 +0800 mich...@paquier.xyz wrote
>Your patch has forgotten to update the alternate output in
>float4-misrounded-input.out.

Thanks for your remind, I have modified the patch and now it is 
'regression_20190826.patch' in attachment, and I have done a successful
test on Cygwin.

--
Movead


regression_20190826.patch
Description: Binary data


Re: Re: Email to hackers for test coverage

2019-08-24 Thread movead...@highgo.ca
>Hi Movead,
>Please add that to commitfest.

Thanks and I just do it, it is
https://commitfest.postgresql.org/24/2258/

--
Movead Li


Email to hackers for test coverage

2019-08-22 Thread movead...@highgo.ca
Hello hackers,

One of the area that didn't get much attention in the community
recently is analysing and increasing the code coverage of 
PostgreSQL regession test suite. I have started working on the
code coverage by running the GCOV code coverage analysis tool in
order to analyse the current code coverage and come up with test
cases to increase the code coverage. This is going to be a long
exercise so my plan is do it incrementaly. I will be analysing
some area of untested code and then coming up with test cases to
test those lines of code in regression and then moving on next
area of untested code and so on.

So far I have come up with 3 test cases to increase the code
coverage of PostgreSQL regression test suite.

I have performed the regression run for this exercise on this commit:
(Commit version 75c1921cd6c868c5995b88113b4463a4830b9a27):

The regression is executed with make check-world command and the
results are gathered using  'make coverage-html' command. 

Below are the lines of untested code that i have analysed and the
test cases added to regression to test these as part of regression.

1. src/include/utils/float.h:140

Analyze: 
This is an error report line when converting a big float8 value
which a float4 can not storage to float4.

Test case:
Add a test case as below in file float4.sql:
select float4(1234567890123456789012345678901234567890::float8);

2. src/include/utils/float.h:145

Analyze:
This is an error report line when converting a small float8 value
which a float4 can not storage to float4.

Test case:
Add a test case as below in file float4.sql:
select float4(0.01::float8);

3.src/include/utils/sortsupport.h:264

Analyze:
It is reverse sorting for the data type that has abbreviated for
sort, for example macaddr, uuid, numeric, network and I choose
numeric to do it.

Test cast:
Add a test case as below in file numeric.sql:
INSERT INTO num_input_test(n1) values('99.998');
INSERT INTO num_input_test(n1) values('99.997');
SELECT * FROM num_input_test ORDER BY n1 DESC;

Result and patch

By adding the test cases, the test coverage of  float.h increased from
97.7% to 100% and sortsupport.h increased from 76.7% to 80.0%.
 
The increase in code coverage can be seen in the before and after
pictures of GCOV test coverage analysis summary.

The attached patch contain the test cases added in regression for
increasing the coverage.


--
Movead Li


before.png
Description: Binary data


after.png
Description: Binary data


regression_20190820.patch
Description: Binary data