Re: [HACKERS] VLDB Features

2007-12-20 Thread Josh Berkus
Tom, > Sure ... but you'll find that it's not large enough to be useful. > Once you remove all the interesting consistency checks such as > unique indexes and foreign keys, the COPY will tend to go through > just fine, and then you're still stuck trying to weed out bad data > without very good too

Re: [HACKERS] VLDB Features

2007-12-18 Thread Decibel!
On Dec 12, 2007, at 1:26 PM, Markus Schiltknecht wrote: Josh Berkus wrote: Sure. Imagine you have a 5TB database on a machine with 8 cores and only one concurrent user. You'd like to have 1 core doing I/ O, and say 4-5 cores dividing the scan and join processing into 4-5 chunks. Ah, righ

Re: [HACKERS] VLDB Features

2007-12-18 Thread Michał Zaborowski
2007/12/16, Tom Lane <[EMAIL PROTECTED]>: > Hannu Krosing <[EMAIL PROTECTED]> writes: > > But can't we _define_ such a subset, where we can do a transactionless > > load ? > > Sure ... but you'll find that it's not large enough to be useful. > Once you remove all the interesting consistency checks

Re: [HACKERS] VLDB Features

2007-12-16 Thread NikhilS
Hi, On Dec 15, 2007 1:14 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > NikhilS <[EMAIL PROTECTED]> writes: > > Any errors which occur before doing the heap_insert should not require > > any recovery according to me. > > A sufficient (though far from all-encompassing) rejoinder to that is > "triggers

Re: [HACKERS] VLDB Features

2007-12-16 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes: > But can't we _define_ such a subset, where we can do a transactionless > load ? Sure ... but you'll find that it's not large enough to be useful. Once you remove all the interesting consistency checks such as unique indexes and foreign keys, the COPY wil

Re: [HACKERS] VLDB Features

2007-12-16 Thread Hannu Krosing
Ühel kenal päeval, L, 2007-12-15 kell 01:12, kirjutas Tom Lane: > Josh Berkus <[EMAIL PROTECTED]> writes: > > There's no way we can do a transactionless load, then? I'm thinking of the > > load-into-new-partition which is a single pass/fail operation. Would > > ignoring individual row errors i

Re: [HACKERS] VLDB Features

2007-12-16 Thread Trent Shipley
On Saturday 2007-12-15 02:14, Simon Riggs wrote: > On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote: > > Neil Conway <[EMAIL PROTECTED]> writes: > > > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY > > > to drop (and log) rows that contain malformed data. That is, rows with

Re: [HACKERS] VLDB Features

2007-12-15 Thread Pavel Stehule
On 16/12/2007, Neil Conway <[EMAIL PROTECTED]> wrote: > On Tue, 2007-12-11 at 19:11 -0500, Greg Smith wrote: > > I'm curious what you feel is missing that pgloader doesn't fill that > > requirement: http://pgfoundry.org/projects/pgloader/ > > For complicated ETL, I agree that using an external too

Re: [HACKERS] VLDB Features

2007-12-15 Thread Neil Conway
On Tue, 2007-12-11 at 19:11 -0500, Greg Smith wrote: > I'm curious what you feel is missing that pgloader doesn't fill that > requirement: http://pgfoundry.org/projects/pgloader/ For complicated ETL, I agree that using an external tool makes the most sense. But I think there is still merit in ad

Re: [HACKERS] VLDB Features

2007-12-15 Thread Simon Riggs
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote: > Neil Conway <[EMAIL PROTECTED]> writes: > > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY > > to drop (and log) rows that contain malformed data. That is, rows with > > too many or too few columns, rows that result in con

Re: [HACKERS] VLDB Features

2007-12-14 Thread Tom Lane
NikhilS <[EMAIL PROTECTED]> writes: > Any errors which occur before doing the heap_insert should not require > any recovery according to me. A sufficient (though far from all-encompassing) rejoinder to that is "triggers and CHECK constraints can do anything". > The overhead of having a subtransac

Re: [HACKERS] VLDB Features

2007-12-14 Thread NikhilS
Hi, > > Another approach would be to distinguish between errors that require a > subtransaction to recover to a consistent state, and less serious errors > that don't have this requirement (e.g. invalid input to a data type > input function). If all the errors that we want to tolerate during a > b

Re: [HACKERS] VLDB Features

2007-12-14 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes: > There's no way we can do a transactionless load, then? I'm thinking of the > load-into-new-partition which is a single pass/fail operation. Would > ignoring individual row errors in for this case still cause these kinds of > problems? Given that COPY

Re: [HACKERS] VLDB Features

2007-12-14 Thread Trent Shipley
On Friday 2007-12-14 16:22, Tom Lane wrote: > Neil Conway <[EMAIL PROTECTED]> writes: > > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY > > to drop (and log) rows that contain malformed data. That is, rows with > > too many or too few columns, rows that result in constraint

Re: [HACKERS] VLDB Features

2007-12-14 Thread Josh Berkus
Tom, > I think such an approach is doomed to hopeless unreliability. There is > no concept of an error that doesn't require a transaction abort in the > system now, and that doesn't seem to me like something that can be > successfully bolted on after the fact. Also, there's a lot of > bookkeepin

Re: [HACKERS] VLDB Features

2007-12-14 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes: > One approach would be to essentially implement the pg_bulkloader > approach inside the backend. That is, begin by doing a subtransaction > for every k rows (with k = 1000, say). If you get any errors, then > either repeat the process with k/2 until you loca

Re: [HACKERS] VLDB Features

2007-12-14 Thread Neil Conway
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote: > If we could somehow only do a subtransaction per failure, things would > be much better, but I don't see how. One approach would be to essentially implement the pg_bulkloader approach inside the backend. That is, begin by doing a subtransaction

Re: [HACKERS] VLDB Features

2007-12-14 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes: > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY > to drop (and log) rows that contain malformed data. That is, rows with > too many or too few columns, rows that result in constraint violations, > and rows containing columns where the

Re: [HACKERS] VLDB Features

2007-12-14 Thread Andrew Dunstan
Neil Conway wrote: On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote: How did you do it ? Did you enchance COPY command or was it something completely new ? By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY to drop (and log) rows that contain malformed data

Re: [HACKERS] VLDB Features

2007-12-14 Thread Neil Conway
On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote: > How did you do it ? > > Did you enchance COPY command or was it something completely new ? By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY to drop (and log) rows that contain malformed data. That is, rows with too ma

Re: [HACKERS] VLDB Features

2007-12-14 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-12-11 kell 15:41, kirjutas Neil Conway: > On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote: > > Just so you don't lose sight of it, one of the biggest VLDB features we're > > missing is fault-tolerant bulk load. > > I actually had to cook up a version of this for T

Re: [HACKERS] VLDB Features

2007-12-13 Thread Markus Schiltknecht
Hello Gregory, Gregory Stark wrote: Oracle is using Direct I/O so they need the reader and writer threads to avoid blocking on i/o all the time. We count on the OS doing readahead and buffering our writes so we don't have to. Direct I/O and needing some way to do asynchronous writes and reads ar

Re: [HACKERS] VLDB Features

2007-12-12 Thread Gregory Stark
"Josh Berkus" <[EMAIL PROTECTED]> writes: > Markus, > >> > Parallel Query >> >> Uh.. this only makes sense in a distributed database, no? I've thought >> about parallel querying on top of Postgres-R. Does it make sense >> implementing some form of parallel querying apart from the distribution >> o

Re: [HACKERS] VLDB Features

2007-12-12 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 > Greenplum as well as other Real Life stuff. For those of us here who have no idea what you are talking about can you define what "Real Life" is like? Joshua D. Drake - -- The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ Sales

Re: [HACKERS] VLDB Features

2007-12-12 Thread Gavin Sherry
On Wed, Dec 12, 2007 at 08:26:16PM +0100, Markus Schiltknecht wrote: > >>Isn't Gavin Sherry working on this? Haven't read anything from him > >>lately... > > > >Me neither. Swallowed by Greenplum and France. > > Hm.. good for him, I guess! Yes, I'm around -- just extremely busy with a big releas

Re: [HACKERS] VLDB Features

2007-12-12 Thread Markus Schiltknecht
Hi Josh, Josh Berkus wrote: Sure. Imagine you have a 5TB database on a machine with 8 cores and only one concurrent user. You'd like to have 1 core doing I/O, and say 4-5 cores dividing the scan and join processing into 4-5 chunks. Ah, right, thank for enlightenment. Heck, I'm definitely to

Re: [HACKERS] VLDB Features

2007-12-12 Thread Josh Berkus
Markus, > > Parallel Query > > Uh.. this only makes sense in a distributed database, no? I've thought > about parallel querying on top of Postgres-R. Does it make sense > implementing some form of parallel querying apart from the distribution > or replication engine? Sure. Imagine you have a 5TB

Re: [HACKERS] VLDB Features

2007-12-12 Thread Markus Schiltknecht
Hi, Josh Berkus wrote: Here's the other VLDB features we're missing: Parallel Query Uh.. this only makes sense in a distributed database, no? I've thought about parallel querying on top of Postgres-R. Does it make sense implementing some form of parallel querying apart from the distribution

Re: [HACKERS] VLDB Features

2007-12-12 Thread Simon Riggs
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote: > Simon, we should start a VLDB-Postgres developer wiki page. http://developer.postgresql.org/index.php/DataWarehousing -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)---

Re: [HACKERS] VLDB Features

2007-12-12 Thread Dimitri Fontaine
Hi, Le mercredi 12 décembre 2007, Josh Berkus a écrit : > > I'm curious what you feel is missing that pgloader doesn't fill that > > requirement: http://pgfoundry.org/projects/pgloader/ > > Because pgloader is implemented in middleware, it carries a very high > overhead if you have bad rows. As

Re: [HACKERS] VLDB Features

2007-12-11 Thread Josh Berkus
Greg, > I'm curious what you feel is missing that pgloader doesn't fill that > requirement: http://pgfoundry.org/projects/pgloader/ Because pgloader is implemented in middleware, it carries a very high overhead if you have bad rows. As little as 1% bad rows will slow down loading by 20% due t

Re: [HACKERS] VLDB Features

2007-12-11 Thread Greg Smith
On Tue, 11 Dec 2007, Josh Berkus wrote: Just so you don't lose sight of it, one of the biggest VLDB features we're missing is fault-tolerant bulk load. Unfortunately, I don't know anyone who's working on it. I'm curious what you feel is missing that pgloader doesn't fill that requirement: h

Re: [HACKERS] VLDB Features

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote: > Here's the other VLDB features we're missing: > > Parallel Query > Windowing Functions > Parallel Index Build (not sure how this works exactly, but it speeds Oracle > up considerably) > On-disk Bitmap Index (anyone game to finish GP patch?)

Re: [HACKERS] VLDB Features

2007-12-11 Thread Neil Conway
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote: > Just so you don't lose sight of it, one of the biggest VLDB features we're > missing is fault-tolerant bulk load. I actually had to cook up a version of this for Truviso recently. I'll take a look at submitting a cleaned-up implementation fo

Re: [HACKERS] VLDB Features

2007-12-11 Thread Josh Berkus
Hannu, > COPY ... WITH ERRORS TO ... Yeah, that's a start. > or something more advanced, like bulkload which can be continued after > crash ? Well, we could also use a loader which automatically parallelized, but that functionality can be done at the middleware level. WITH ERRORS is the most

Re: [HACKERS] VLDB Features

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote: > Simon. > > > VLDB Features I'm expecting to work on are > > - Read Only Tables/WORM tables > > - Advanced Partitioning > > - Compression > > plus related performance features > > Just so you don't lose sight of it, one of the biggest VLDB fe

Re: [HACKERS] VLDB Features

2007-12-11 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-12-11 kell 10:53, kirjutas Josh Berkus: > Simon. > > > VLDB Features I'm expecting to work on are > > - Read Only Tables/WORM tables > > - Advanced Partitioning > > - Compression > > plus related performance features > > Just so you don't lose sight of it, one of the b

Re: [HACKERS] VLDB Features

2007-12-11 Thread Josh Berkus
Simon. > VLDB Features I'm expecting to work on are > - Read Only Tables/WORM tables > - Advanced Partitioning > - Compression > plus related performance features Just so you don't lose sight of it, one of the biggest VLDB features we're missing is fault-tolerant bulk load. Unfortunately, I don

[HACKERS] VLDB Features

2007-12-11 Thread Simon Riggs
I'm starting work on next projects for 8.4. Many applications have the need to store very large data volumes for both archival and analysis. The analytic databases are commonly known as Data Warehouses, though there isn't a common term for large archival data stores. The use cases for those can of