Neil Conway said: > On Wed, 2005-06-01 at 16:34 -0700, Alon Goldshuv wrote: >> 1) The patch includes 2 parallel parsing code paths. One is the >> regular COPY path that we all know, and the other is the improved one >> that I wrote. This is only temporary, as there is a lot of code >> duplication > > Right; I really dislike the idea of having two separate code paths for > COPY. When you say this approach is "temporary", are you suggesting > that you intend to reimplement your changes as > improvements/replacements of the existing COPY code path rather than as > a parallel code path? >
It's not an all or nothing deal. When we put in CSV handling, we introduced two new routines for attribute input/output and otherwise used the rest of the COPY code. When I did a fix for the multiline problem, it was originally done with a separate read line function for CSV mode - Bruce didn't like that so I merged it back into the existing code. In restrospect, given this discussion, that might not have been an optimal choice. But the point is that you can break out at several levels. Incidentally, there might be a good case for allowing the user to set the line end explicitly, but you can't just hardwire it - we will get massive Windows breakage. What is more, in CSV mode line end sequences can occur within logical lines. You need to take that into account. It's tricky and easy to get badly wrong. I will be the first to admit that there are probably some very good possibilities for optimisation of this code. My impression though has been that in almost all cases it's fast enough anyway. I know that on some very modest hardware I have managed to load a 6m row TPC line-items table in just a few minutes. Before we start getting too hung up, I'd be interested to know just how much data people want to load and how fast they want it to be. If people have massive data loads that take hours, days or weeks then it's obviously worth improving if we can. I'm curious to know what size datasets people are really handling this way. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly