On Sun, 20 May 2007, Andrew Dunstan wrote:

I've had a preference for INSERT from the beginning here that this reinforces.
COPY is our standard bulk insert mechanism. I think arguing against it would be a very hard sell.

Let me say my final peace on this subject...if I considered this data to be strictly bulk insert, then I'd completely agree here. Most of the really interesting applications I was planning to build on top of this mechanism are more interactive than that though. Here's a sample:

-Write a daemon that lives on the server, connects to a logging database, and pops into an idle loop based on LISTEN. -A client app wants to see the recent logs files. It uses NOTIFY to ask the daemon for them and LISTENs for a response. -The daemon wakes up, reads all the log files since it last did something, and appends those records to the log file table. It sends out a NOTIFY to say the log file table is current.

That enables remote clients to grab the log files from the server whenever they please, so they can actually monitor themselves. Benchmarking is the initial app I expect to call this, and with some types of tests I expect the daemon to be importing every 10 minutes or so.

Assuming a unique index on the data to prevent duplication is a required feature, I can build this using the COPY format logs as well, but that requires I either a) am 100% perfect in making sure I never pass over the same data twice, which is particularly annoying when the daemon gets restarted, or b) break the COPY into single lines and insert them one at a time, at which point I'm not bulk loading at all. If these were INSERT statements instead, I'd have a lot more tolerance for error, because the worst problem I'd ever run into is spewing some unique key violation errors into the logs if I accidentally imported too much. With COPY, any mistake or synchronization issue and I lose the whole import.

I don't mean to try and stir this back up again as an argument (particularly not on this list). There are plenty of other apps where COPY is clearly the best approach, you can easily make a case that my app is a fringe application rather than a mainstream one, and on the balance this job is still far easier than my current approach of parsing the logs. I just wanted to give a sample of how using COPY impacts the dynamics of how downstream applications will have to work with this data, so you can see that my contrary preference isn't completely random here.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to