Re: [PATCHES] COPY FROM performance improvements

Alon Goldshuv Sat, 25 Jun 2005 10:18:28 -0700

Andrew,

Thanks for your initial feedback.


> 2. This comment raises a flag in my mind:
> 
> + * each attribute begins. If a specific attribute is not used for this
> + * COPY command (ommitted from the column list), a value of 0 will be
> assigned.+ * For example: for table foo(a,b,c,d,e) and COPY foo(a,b,e)
> + * attr_offsets may look something like this after this routine
> + * returns: [0,20,0,0,55]. That means that column "a" value starts
> + * at byte offset 0, "b" in 20 and "e" in 55, in attr_bytebuf.
> 
> Would it not be better to mark missing attributes with something that can't
> be a valid offset, like -1?

That's probably a good idea. Generally, these offsets of attributes that are
not in the column list will not be looked at in the conversion loop
(functioncall3) but it wouldn't hurt to make them non-valid just in case.
 
> 3. This comment needs improving:
> 
> +/*
> + * Copy FROM file to relation with faster processing.
> + */
True. Probably along with several others that I missed. There was a lot of
new code and not a lot of time... I'll make a second pass at comments.

> 4. We should indeed do this for CSV, especially since a lot of the relevant
> logic for detecting attribute starts is already there for CSV in
> CopyReadLine. I'm prepared to help you do that if necessary, since I'm
> guilty of perpetrating that code.

That will be great. My idea is to eventually keep processing for BINARY in
one routine while text can be in another (CSV+delimited). The reason being
is that there is a lot in common for the processing of the 2 text types. The
CopyReadLineBuffered will have to be changed just a little to accommodate
embedded newlines inside CSV quotes, and the CopyReadAttributeCSV would need
to change as well.

Question for you -- you'll probably notice that I made the escape of the
delimited text format (previously '\\') a constant variable "escapec".
Reason being - maybe some people would like to choose an escape for
delimited COPY too, or even disable it. So it's a good base to start with.
However, I just noticed that CSV uses that exact variable name... Sorry
about that. Any suggestion on how to rename it to something different?

Thx,
Alon.

> 
> cheers
> 
> andrew
> 
> 
> 



---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [PATCHES] COPY FROM performance improvements

Reply via email to