On Mon, Dec 28, 2020 at 3:14 PM vignesh C wrote:
>
> Attached is a patch that was used for the same. The patch is written
> on top of the parallel copy patch.
> The design Amit, Andres & myself voted for that is the leader
> identifying the line bound design and sharing it
r.
> > > If the leader doesn't find the line-endings the workers need to wait
> > > till the leader fill the entire 64K chunk, OTOH, with current approach
> > > the worker can start as soon as leader is able to populate some
> > > minimum number of line-endings
On Wed, Dec 23, 2020 at 3:05 PM Hou, Zhijie wrote:
>
> Hi
>
> > Yes this optimization can be done, I will handle this in the next patch
> > set.
> >
>
> I have a suggestion for the parallel safety-check.
>
> As designed, The leader does not participate in the insertion of data.
> If User use (PARA
Hi
> Yes this optimization can be done, I will handle this in the next patch
> set.
>
I have a suggestion for the parallel safety-check.
As designed, The leader does not participate in the insertion of data.
If User use (PARALLEL 1), there is only one worker process which will do the
insertion
On Mon, Dec 7, 2020 at 3:00 PM Hou, Zhijie wrote:
>
> > Attached v11 patch has the fix for this, it also includes the changes to
> > rebase on top of head.
>
> Thanks for the explanation.
>
> I think there is still chances we can know the size.
>
> +* line_size will be set. Read th
> > 4.
> > A suggestion for CacheLineInfo.
> >
> > It use appendBinaryStringXXX to store the line in memory.
> > appendBinaryStringXXX will double the str memory when there is no enough
> spaces.
> >
> > How about call enlargeStringInfo in advance, if we already know the whole
> line size?
> > It c
Hi Vignesh,
I took a look at the v10 patch set. Here are some comments:
1.
+/*
+ * CheckExprParallelSafety
+ *
+ * Determine if where cluase and default expressions are parallel safe & do not
+ * have volatile expressions, return true if condition satisfies else return
+ * false.
+ */
'cluase'
7;t have the patience to
wait
> > > it finish). Both worker processes are consuming 100% of CPU.
> >
> > I had a look over this problem.
> >
> > the ParallelCopyDataBlock has size limit:
> > uint8 skip_bytes;
> > char
om data, I've
> pushed it to github [1]. The random_string() generates a random string
> with ASCII characters, symbols and a couple special characters (\r\n\t).
> The intent was to try loading data where a fields may span multiple 64kB
> blocks and may contain newlines etc.
>
> T
t; > > > On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila
> > > > wrote:
> > > > >
> > > >
> > > > I have worked to provide a patch for the parallel safety checks. It
> > > > checks if parallely copy can be performed, Parallel copy cannot be
&
On Thu, Oct 29, 2020 at 2:26 PM Daniel Westermann (DWE)
wrote:
>
> On 27/10/2020 15:36, vignesh C wrote:
> >> Attached v9 patches have the fixes for the above comments.
>
> >I did some testing:
>
> I did some testing as well and have a cosmetic remark:
>
> postgres=# copy t1 from '/var/tmp/aa.txt'
On Wed, Oct 28, 2020 at 5:36 PM Hou, Zhijie
wrote:
>
> Hi
>
> I found some issue in v9-0002
>
> 1.
> +
> + elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d,
unprocessed lines:%d, offset:%d, line size:%d",
> +write_pos, lineInfo->first_block,
> +
pg_atomic_read_
On Thu, Oct 29, 2020 at 2:20 PM Heikki Linnakangas wrote:
>
> On 27/10/2020 15:36, vignesh C wrote:
> > Attached v9 patches have the fixes for the above comments.
>
> I did some testing:
>
> /tmp/longdata.pl:
>
> #!/usr/bin/perl
> #
> # Generate three rows:
> # foo
> # longdatalongdatalon
creating
any contention point inside the parallel copy code. However this is causing
another choking point i.e. index insertion if indexes are available on the
table, which is out of scope of parallel copy code. We think that it would
be good to use spinlock-protected worker write position or an at
>
> > >
> > > I have worked to provide a patch for the parallel safety checks. It
> > > checks if parallely copy can be performed, Parallel copy cannot be
> > > performed for the following a) If relation is temporary table b) If
> > > relation is foreign
gt; checks if parallely copy can be performed, Parallel copy cannot be
> > performed for the following a) If relation is temporary table b) If
> > relation is foreign table c) If relation has non parallel safe index
> > expressions d) If relation has triggers present whose type is of
On Tue, Nov 10, 2020 at 7:12 PM vignesh C wrote:
>
> On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila wrote:
> >
>
> I have worked to provide a patch for the parallel safety checks. It
> checks if parallely copy can be performed, Parallel copy cannot be
> performed for the fol
r.
> > > If the leader doesn't find the line-endings the workers need to wait
> > > till the leader fill the entire 64K chunk, OTOH, with current approach
> > > the worker can start as soon as leader is able to populate some
> > > minimum number of line-endings
er this problem.
>
> the ParallelCopyDataBlock has size limit:
> uint8 skip_bytes;
> chardata[DATA_BLOCK_SIZE]; /* data read from file */
>
> It seems the input line is so long that the leader process run out of the
> Shared memory among parallel
/* data read from file */
It seems the input line is so long that the leader process run out of the
Shared memory among parallel copy workers.
And the leader process keep waiting free block.
For the worker process, it wait util line_state becomes LINE_LEADER_POPULATED,
But leader process won
On 03/11/2020 10:59, Amit Kapila wrote:
On Mon, Nov 2, 2020 at 12:40 PM Heikki Linnakangas wrote:
However, the point of parallel copy is to maximize bandwidth.
Okay, but this first-phase (finding the line boundaries) can anyway
be not done in parallel and we have seen in some of the initial
as soon as leader is able to populate some
> > minimum number of line-endings
>
> You can use a smaller block size.
>
Sure, but the same problem can happen if the last line in that block
is too long and we need to peek into the next block. And then there
could be cases where a single
On 02/11/2020 09:10, Heikki Linnakangas wrote:
On 02/11/2020 08:14, Amit Kapila wrote:
We have discussed both these approaches (a) single producer multiple
consumer, and (b) all workers doing the processing as you are saying
in the beginning and concluded that (a) is better, see some of the
rele
is able to populate some
minimum number of line-endings
You can use a smaller block size. However, the point of parallel copy is
to maximize bandwidth. If the workers ever have to sit idle, it means
that the bottleneck is in receiving data from the client, i.e. the
backend is fast enough, and you c
On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas wrote:
>
> Leader process:
>
> The leader process is simple. It picks the next FREE buffer, fills it
> with raw data from the file, and marks it as FILLED. If no buffers are
> FREE, wait.
>
> Worker process:
>
> 1. Claim next READY block from que
On Sat, Oct 31, 2020 at 12:09:32AM +0200, Heikki Linnakangas wrote:
On 30/10/2020 22:56, Tomas Vondra wrote:
I agree this design looks simpler. I'm a bit worried about serializing
the parsing like this, though. It's true the current approach (where the
first phase of parsing happens in the leade
On 30/10/2020 22:56, Tomas Vondra wrote:
I agree this design looks simpler. I'm a bit worried about serializing
the parsing like this, though. It's true the current approach (where the
first phase of parsing happens in the leader) has a similar issue, but I
think it would be easier to improve tha
On Fri, Oct 30, 2020 at 06:41:41PM +0200, Heikki Linnakangas wrote:
On 30/10/2020 18:36, Heikki Linnakangas wrote:
I find this design to be very complicated. Why does the line-boundary
information need to be in shared memory? I think this would be much
simpler if each worker grabbed a fixed-size
data where a fields may span multiple 64kB
blocks and may contain newlines etc.
The non-parallel copy works fine, the parallel one fails. I haven't
investigated the details, but I guess it gets confused about where a
string starts/end, or something like that.
[1] https://github.com/tvondra/random
be in order. It probably would be
faster, or at least not slower, to find all the EOLs in a block in one
tight loop, even when parallel copy is not used.
Something like the attached. It passes the regression tests, but it's
quite incomplete. It's missing handing of "\." as e
On 30/10/2020 18:36, Heikki Linnakangas wrote:
I find this design to be very complicated. Why does the line-boundary
information need to be in shared memory? I think this would be much
simpler if each worker grabbed a fixed-size block of raw data, and
processed that.
In your patch, the leader pr
ar that it needs to be done ASAP, for a chunk at a time,
because that cannot be done in parallel. I think some refactoring in
CopyReadLine() and friends would be in order. It probably would be
faster, or at least not slower, to find all the EOLs in a block in one
tight loop, even when parallel c
On Thu, Oct 29, 2020 at 11:45 AM Amit Kapila wrote:
>
> On Tue, Oct 27, 2020 at 7:06 PM vignesh C wrote:
> >
> [latest version]
>
> I think the parallel-safety checks in this patch
> (v9-0002-Allow-copy-from-command-to-process-data-from-file) are
> incomplete and wrong.
>
One more point, I have
On 27/10/2020 15:36, vignesh C wrote:
>> Attached v9 patches have the fixes for the above comments.
>I did some testing:
I did some testing as well and have a cosmetic remark:
postgres=# copy t1 from '/var/tmp/aa.txt' with (parallel 10);
ERROR: value 10 out of bounds for option
On 27/10/2020 15:36, vignesh C wrote:
Attached v9 patches have the fixes for the above comments.
I did some testing:
/tmp/longdata.pl:
#!/usr/bin/perl
#
# Generate three rows:
# foo
# longdatalongdatalongdata...
# bar
#
# The length of the middle row is given as command line arg.
#
m
tains volatile functions? It
should be checked otherwise as well, no? The similar comment applies
to other checks in this function. Also, I don't think there is a need
to make this function inline.
2.
+/*
+ * IsParallelCopyAllowed
+ *
+ * Check if parallel copy can be allowed.
+ */
+bool
+Is
Hi
I found some issue in v9-0002
1.
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d,
unprocessed lines:%d, offset:%d, line size:%d",
+write_pos, lineInfo->first_block,
+pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+
gt; IsParallelCopyAllowed(). This will ensure that in case of Parallel
> > Copy when the leader has performed all these checks, the worker won't
> > do it again. I also feel that it will make the code look a bit
> > cleaner.
> >
>
> Just rewriting above comment to m
be passed as you have suggested but relid need to be
passed as we will be setting it to pcdata, modified nworkers as
suggested.
> --
>
> +/* DSM keys for parallel copy. */
> +#define PARALLEL_COPY_KEY_SHARED_INFO 1
> +#define PARALLEL_COPY_KEY_CSTATE 2
&g
On Wed, Oct 21, 2020 at 3:50 PM Amit Kapila wrote:
>
> On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy
> wrote:
> >
> >
> > 9. Instead of calling CopyStringToSharedMemory() for each string
> > variable, can't we just create a linked list of all the strings that
> > need to be copied into shm an
line boundary like which line
starts from which data block, what is the starting offset in the data
block, what is the line size, this information will be present in
ParallelCopyLineBoundary. Like you said, each worker processes
WORKER_CHUNK_COUNT 64 lines at a time. Perfo
On Wed, Oct 21, 2020 at 4:20 PM Bharath Rupireddy
wrote:
>
> On Wed, Oct 21, 2020 at 3:18 PM Bharath Rupireddy
> wrote:
> >
> > 17. Remove extra lines after #define IsHeaderLine()
> > (cstate->header_line && cstate->cur_lineno == 1) in copy.h
> >
>
> I missed one comment:
>
> 18. I think we nee
llelCopy(cstate->nworkers, cstate, stmt->attlist,
> +relid);
>
> Do we need to pass cstate->nworkers and relid to BeginParallelCopy()
> function when we are already passing cstate structure, using which
> both of these i
both of these information can be retrieved ?
--
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_WAL_USAGE3
+#define PARALLEL_COPY_BUFFER_USAGE 4
DS
I had a brief look at at this patch. Important work! A couple of first
impressions:
1. The split between patches
0002-Framework-for-leader-worker-in-parallel-copy.patch and
0003-Allow-copy-from-command-to-process-data-from-file.patch is quite
artificial. All the stuff introduced in the first
On Wed, Oct 21, 2020 at 3:18 PM Bharath Rupireddy
wrote:
>
> 17. Remove extra lines after #define IsHeaderLine()
> (cstate->header_line && cstate->cur_lineno == 1) in copy.h
>
I missed one comment:
18. I think we need to treat the number of parallel workers as an
integer similar to the paralle
On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy
wrote:
>
>
> 9. Instead of calling CopyStringToSharedMemory() for each string
> variable, can't we just create a linked list of all the strings that
> need to be copied into shm and call CopyStringToSharedMemory() only
> once? We could avoid 5 func
Hi Vignesh,
I took a look at the v8 patch set. Here are some comments:
1. PopulateCommonCstateInfo() -- can we use PopulateCommonCStateInfo()
or PopulateCopyStateInfo()? And also EstimateCstateSize() --
EstimateCStateSize(), PopulateCstateCatalogInfo() --
PopulateCStateCatalogInfo()?
2. Instead
suggested and details shared by bharath at [1]
> 3) Support of parallel copy for COPY_OLD_FE.
It is handled as part of v8 patch shared at [2]
> 4) Worker has to hop through all the processed chunks before getting
> the chunk which it can process.
Open
> 5) Handling of Tomas's
On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:
>
> On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila
wrote:
> >
> > 2. Do we have tests for toast tables? I think if you implement the
> > previous point some existing tests might cover it but I feel we sh
On Sun, Oct 18, 2020 at 7:47 AM Hou, Zhijie wrote:
>
> Hi Vignesh,
>
> After having a look over the patch,
> I have some suggestions for
> 0003-Allow-copy-from-command-to-process-data-from-file.patch.
>
> 1.
>
> +static uint32
> +EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List
>
Hi Vignesh,
After having a look over the patch,
I have some suggestions for
0003-Allow-copy-from-command-to-process-data-from-file.patch.
1.
+static uint32
+EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List *attnamelist,
+ char **whereClauseStr, c
| wal_records | wal_fpi | wal_bytes
> Sequential Copy | 1116| 0 | 3587669
> Parallel Copy(1 worker) | 1116| 0 | 3587669
> Parallel Copy(4 worker) | 1121| 0 | 3587668
> I no
On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow wrote:
>
> On Thu, Oct 8, 2020 at 5:44 AM vignesh C wrote:
>
> > Attached v6 patch with the fixes.
> >
>
> Hi Vignesh,
>
> I noticed a couple of issues when scanning the code in the following
patch:
>
> v6-0003-Allow-copy-from-command-to-process-d
able from TPC-H - for 75GB
> data set, this largest table is about 64GB once loaded, with another
> 54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and
> NVME storage.
>
> The COPY duration with varying number of workers (specified using the
> parallel COPY option) lo
8570215596364326047 | copy hw from
> > '/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
> > csv, delimiter ',', parallel '2') | 0 | 0 |
> > 0 | 0 | 0 | 0 |
>
gt; > > thread [1] and performance data shown by Peter that this can't be an
> > > independent improvement and rather in some cases it can do harm. Now,
> > > if you need it for a parallel-copy path then we can change it
> > > specifically to the parallel-copy co
I did performance testing on v7 patch set[1] with custom
postgresql.conf[2]. The results are of the triplet form (exec time in
sec, number of workers, gain)
Use case 1: 10million rows, 5.2GB data, 2 indexes on integer columns,
1 index on text column, binary file
(1104.898, 0, 1X), (1112.221, 1, 1X
code. You can have this as a test-only patch for now and
> > > > make sure all existing tests passed with this.
> > > >
> > >
> > > I don't think all the existing copy test cases(except the new test cases
> > > added in the parallel copy patch set) would run
> I don't think all the existing copy test cases(except the new test cases
> > added in the parallel copy patch set) would run inside the parallel worker
> > if force_parallel_mode is on. This is because, the parallelism will be
> > picked up for parallel copy only if
egression will be executed via
> > new worker code. You can have this as a test-only patch for now and
> > make sure all existing tests passed with this.
> >
>
> I don't think all the existing copy test cases(except the new test cases
> added in the pa
On Fri, Oct 9, 2020 at 5:40 PM Amit Kapila wrote:
>
> > Looking a bit deeper into this, I'm wondering if in fact your
> > EstimateStringSize() and EstimateNodeSize() functions should be using
> > BUFFERALIGN() for EACH stored string/node (rather than just calling
> > shm_toc_estimate_chunk() once
On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow wrote:
>
> On Thu, Oct 8, 2020 at 5:44 AM vignesh C wrote:
>
> > Attached v6 patch with the fixes.
> >
>
> Hi Vignesh,
>
> I noticed a couple of issues when scanning the code in the following patch:
>
> v6-0003-Allow-copy-from-command-to-process-d
On Thu, Oct 8, 2020 at 12:14 AM vignesh C wrote:
>
> On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote:
> > > > + */
> > > > +typedef struct ParallelCopyLineBoundary
> > > >
> > > > Are we doing all this state management to avoid using locks while
> > > > processing lines? If so, I think we can
; > independent improvement and rather in some cases it can do harm. Now,
> > if you need it for a parallel-copy path then we can change it
> > specifically to the parallel-copy code path but I don't understand
> > your reason completely.
> >
>
> Whenev
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila wrote:
>
> On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote:
> >
> > Few additional comments:
> > ==
>
> Some more comments:
>
> v5-0002-Framewor
On Mon, Sep 28, 2020 at 6:37 PM Ashutosh Sharma wrote:
>
> On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila wrote:
> >
> > On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote:
> > >
> > > Thanks Ashutosh for your comments.
> > >
> > > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma
> > > wrote:
> > > >
On Thu, Oct 8, 2020 at 5:44 AM vignesh C wrote:
> Attached v6 patch with the fixes.
>
Hi Vignesh,
I noticed a couple of issues when scanning the code in the following patch:
v6-0003-Allow-copy-from-command-to-process-data-from-file.patch
In the following code, it will put a junk uint16 va
On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila wrote:
>
> On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote:
> >
> > Thanks Ashutosh for your comments.
> >
> > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma
> > wrote:
> > >
> > > Hi Vignesh,
> > >
> > > I've spent some time today looking at your ne
On Tue, Sep 29, 2020 at 3:16 PM Greg Nancarrow wrote:
>
> Hi Vignesh and Bharath,
>
> Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as
> parallel-unsafe.
> Can you explain why this is?
Yes we don't need to restrict parallelism for RI_TRIGGER_PK cases as
able from TPC-H - for 75GB
> data set, this largest table is about 64GB once loaded, with another
> 54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and
> NVME storage.
>
> The COPY duration with varying number of workers (specified using the
> parallel COPY option) lo
with another
54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and
NVME storage.
The COPY duration with varying number of workers (specified using the
parallel COPY option) looks like this:
workersduration
-
01366
1
On Tue, Sep 29, 2020 at 3:16 PM Greg Nancarrow wrote:
>
> Hi Vignesh and Bharath,
>
> Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as
> parallel-unsafe.
> Can you explain why this is?
>
I don't think we need to restrict this case and even if there is
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila wrote:
>
> On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote:
> >
> > Few additional comments:
> > ==
>
> Some more comments:
>
Thanks Amit for the comments, I will work on the comments and provide
a patch in the next few days.
Re
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote:
>
> Few additional comments:
> ==
Some more comments:
v5-0002-Framework-for-leader-worker-in-parallel-copy
===
1.
These values
+ * help in handover of multiple rec
Hi Vignesh and Bharath,
Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as
parallel-unsafe.
Can you explain why this is?
Regards,
Greg Nancarrow
Fujitsu Australia
On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila wrote:
>
> On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote:
> >
> > Thanks Ashutosh for your comments.
> >
> > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma
> > wrote:
> > >
> > > Hi Vignesh,
> > >
> > > I've spent some time today looking at your ne
On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote:
>
> Thanks Ashutosh for your comments.
>
> On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma wrote:
> >
> > Hi Vignesh,
> >
> > I've spent some time today looking at your new set of patches and I've
> > some thoughts and queries which I would like to
On Wed, Jul 22, 2020 at 7:48 PM vignesh C wrote:
>
> On Tue, Jul 21, 2020 at 3:54 PM Amit Kapila wrote:
> >
>
> > Review comments:
> > ===
> >
> > 0001-Copy-code-readjustment-to-support-parallel-copy
> > 1.
> > @@ -807,8 +83
On Thu, Sep 24, 2020 at 3:00 PM Bharath Rupireddy
wrote:
>
> >
> > > Have you tested your patch when encoding conversion is needed? If so,
> > > could you please point out the email that has the test results.
> > >
> >
> > We have not yet done encoding testing, we will do and post the results
> >
>
> > Have you tested your patch when encoding conversion is needed? If so,
> > could you please point out the email that has the test results.
> >
>
> We have not yet done encoding testing, we will do and post the results
> separately in the coming days.
>
Hi Ashutosh,
I ran the tests ensuring p
20-09-24 10:57:08.927 JST [83335] LOG: totaltableinsertiontime =
17133.251 ms
2020-09-24 10:58:17.420 JST [90905] LOG: totaltableinsertiontime =
15352.753 ms
>
> Test results show that Parallel COPY with 1 worker is performing
> better than normal COPY in the test scenarios run.
>
r user
> processes(except system processes) are running. Is it possible for you to do
> the same?
>
> Please capture and share the timing logs with us.
>
Yes, I have ensured the system is as idle as possible prior to testing.
I have attached the test results obtained a
ult configuration, 1 worker: 156.299, 153.293, 170.307
>
> With Patch, custom configuration, 0 worker: 197.234, 195.866, 196.049
> With Patch, custom configuration, 1 worker: 157.173, 158.287, 157.090
>
Hi Greg,
If you still observe the issue in your testing environment, I'm
On Wed, Sep 16, 2020 at 1:20 PM Greg Nancarrow wrote:
>
> Fortunately I have been given permission to share the exact table
> definition and data I used, so you can check the behaviour and timings
> on your own test machine.
>
Thanks Greg for the script. I ran your test case and I didn't observe
es if the leader is very fast
> compared to the workers then the leader quickly populates one line and
> sets the state to LINE_LEADER_POPULATED. State is changed to
> LINE_LEADER_POPULATED when we are checking the currr_line_state.
> I feel this will not be a problem because, Leader will
ormal copy (you have tested, right?).
> 3. Was the run performed on release build?
For generating the perf data I sent (normal copy vs parallel copy with
1 worker), I used a debug build (-g -O0), as that is needed for
generating all the relevant perf data for Postgres code. Previously I
ran with a
he following results from loading a 2GB CSV file (100
> rows, 4 indexes):
>
> Copy TypeDuration (s) Load factor
> ===
> Normal Copy 190.891-
>
> Parallel Copy
> (#workers)
> 1
On Tue, Sep 1, 2020 at 3:39 PM Greg Nancarrow wrote:
>
> Hi Vignesh,
>
> >Can you share with me the script you used to generate the data & the ddl of
> >the table, so that it will help me check that >scenario you faced the
> >>problem.
>
> Unfortunately I can't directly share it (considered comp
le definition, multiplying the number of
records to produce a 5GB and 9.5GB CSV file.
I got the following results:
(1) Postgres default settings, 5GB CSV (53 rows):
Copy TypeDuration (s) Load factor
===
Normal Copy 132.1
On Tue, Sep 1, 2020 at 3:39 PM Greg Nancarrow wrote:
>
> Hi Vignesh,
>
> >Can you share with me the script you used to generate the data & the ddl of
> >the table, so that it will help me check that >scenario you faced the
> >>problem.
>
> Unfortunately I can't directly share it (considered comp
Hi Vignesh,
>Can you share with me the script you used to generate the data & the ddl of
>the table, so that it will help me check that >scenario you faced the >problem.
Unfortunately I can't directly share it (considered company IP),
though having said that it's only doing something that is rel
On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow wrote:
> - Parallel Copy with 1 worker ran slower than normal Copy in a couple
> of cases (I did question if allowing 1 worker was useful in my patch
> review).
Thanks Greg for your review & testing.
I had executed various tests with 1
oughts?
> > >
> > > Hi Vignesh,
> > >
> > > I don't really have any further comments on the code, but would like
> > > to share some results of some Parallel Copy performance tests I ran
> > > (attached).
> > >
> > > The te
y further comments on the code, but would like
> > to share some results of some Parallel Copy performance tests I ran
> > (attached).
> >
> > The tests loaded a 5GB CSV data file into a 100 column table (of
> > different data types). The following were varied as part
On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow wrote:
>
> > I have attached new set of patches with the fixes.
> > Thoughts?
>
> Hi Vignesh,
>
> I don't really have any further comments on the code, but would like
> to share some results of some Parallel Copy
> I have attached new set of patches with the fixes.
> Thoughts?
Hi Vignesh,
I don't really have any further comments on the code, but would like
to share some results of some Parallel Copy performance tests I ran
(attached).
The tests loaded a 5GB CSV data file into a 100 colum
ate
and/or the read line_size ("dataSize") doesn't actually correspond to
the read line state?
(sorry, still not 100% convinced that the synchronization and checks
are safe in all cases)
(3) v3-0006-Parallel-Copy-For-Binary-Format-Files.patch
>raw_buf is not used in parallel c
not yet struck any execution problems other than
some option validation and associated error messages on boundary cases.
One general question that I have: is there a user benefit (over the normal
non-parallel COPY) to allowing "COPY ... FROM ... WITH (PARALLEL 1)"?
My following comments
rebased the patch over head & attached.
> >>
> >I rebased v2-0006-Parallel-Copy-For-Binary-Format-Files.patch.
> >
> >Putting together all the patches rebased on to the latest commit
> >b8fdee7d0ca8bd2165d46fb1468f75571b706a01. Patches from 0001 to 0005
> &g
1 - 100 of 221 matches
Mail list logo