Re: [PATCHES] COPY FROM performance improvements

Andrew Dunstan Sat, 25 Jun 2005 04:33:00 -0700

Luke Lonergan said:
> I've attached Alon's patch ported to the CVS trunk.  It applies cleanly
> and passes the regressions.  With fsync=false it is 40% faster loading
> a sample dataset with 15 columns of varied type.  It's 19% faster with
> fsync=true.
>
> This patch separates the CopyFrom code into two pieces, the new logic
> for delimited data and the existing logic for CSV and Binary.
>



A few of quick comments - I will probably have many more later when I have
time to review this in depth.

1. Postgres does context diffs for patches, not unidiffs.

2. This comment raises a flag in my mind:

+ * each attribute begins. If a specific attribute is not used for this
+ * COPY command (ommitted from the column list), a value of 0 will be
assigned.+ * For example: for table foo(a,b,c,d,e) and COPY foo(a,b,e)
+ * attr_offsets may look something like this after this routine
+ * returns: [0,20,0,0,55]. That means that column "a" value starts
+ * at byte offset 0, "b" in 20 and "e" in 55, in attr_bytebuf.

Would it not be better to mark missing attributes with something that can't
be a valid offset, like -1?


3. This comment needs improving:

+/*
+ * Copy FROM file to relation with faster processing.
+ */

4. We should indeed do this for CSV, especially since a lot of the relevant
logic for detecting attribute starts is already there for CSV in
CopyReadLine. I'm prepared to help you do that if necessary, since I'm
guilty of perpetrating that code.

cheers

andrew



---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [PATCHES] COPY FROM performance improvements

Reply via email to