Tom,
> Sure ... but you'll find that it's not large enough to be useful.
> Once you remove all the interesting consistency checks such as
> unique indexes and foreign keys, the COPY will tend to go through
> just fine, and then you're still stuck trying to weed out bad data
> without very good too
On Dec 12, 2007, at 1:26 PM, Markus Schiltknecht wrote:
Josh Berkus wrote:
Sure. Imagine you have a 5TB database on a machine with 8 cores
and only one concurrent user. You'd like to have 1 core doing I/
O, and say 4-5 cores dividing the scan and join processing into
4-5 chunks.
Ah, righ
2007/12/16, Tom Lane <[EMAIL PROTECTED]>:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > But can't we _define_ such a subset, where we can do a transactionless
> > load ?
>
> Sure ... but you'll find that it's not large enough to be useful.
> Once you remove all the interesting consistency checks
Hi,
On Dec 15, 2007 1:14 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> NikhilS <[EMAIL PROTECTED]> writes:
> > Any errors which occur before doing the heap_insert should not require
> > any recovery according to me.
>
> A sufficient (though far from all-encompassing) rejoinder to that is
> "triggers
Hannu Krosing <[EMAIL PROTECTED]> writes:
> But can't we _define_ such a subset, where we can do a transactionless
> load ?
Sure ... but you'll find that it's not large enough to be useful.
Once you remove all the interesting consistency checks such as
unique indexes and foreign keys, the COPY wil
Ühel kenal päeval, L, 2007-12-15 kell 01:12, kirjutas Tom Lane:
> Josh Berkus <[EMAIL PROTECTED]> writes:
> > There's no way we can do a transactionless load, then? I'm thinking of the
> > load-into-new-partition which is a single pass/fail operation. Would
> > ignoring individual row errors i
On Saturday 2007-12-15 02:14, Simon Riggs wrote:
> On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote:
> > Neil Conway <[EMAIL PROTECTED]> writes:
> > > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
> > > to drop (and log) rows that contain malformed data. That is, rows with
On 16/12/2007, Neil Conway <[EMAIL PROTECTED]> wrote:
> On Tue, 2007-12-11 at 19:11 -0500, Greg Smith wrote:
> > I'm curious what you feel is missing that pgloader doesn't fill that
> > requirement: http://pgfoundry.org/projects/pgloader/
>
> For complicated ETL, I agree that using an external too
On Tue, 2007-12-11 at 19:11 -0500, Greg Smith wrote:
> I'm curious what you feel is missing that pgloader doesn't fill that
> requirement: http://pgfoundry.org/projects/pgloader/
For complicated ETL, I agree that using an external tool makes the most
sense. But I think there is still merit in ad
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
> > to drop (and log) rows that contain malformed data. That is, rows with
> > too many or too few columns, rows that result in con
NikhilS <[EMAIL PROTECTED]> writes:
> Any errors which occur before doing the heap_insert should not require
> any recovery according to me.
A sufficient (though far from all-encompassing) rejoinder to that is
"triggers and CHECK constraints can do anything".
> The overhead of having a subtransac
Hi,
>
> Another approach would be to distinguish between errors that require a
> subtransaction to recover to a consistent state, and less serious errors
> that don't have this requirement (e.g. invalid input to a data type
> input function). If all the errors that we want to tolerate during a
> b
Josh Berkus <[EMAIL PROTECTED]> writes:
> There's no way we can do a transactionless load, then? I'm thinking of the
> load-into-new-partition which is a single pass/fail operation. Would
> ignoring individual row errors in for this case still cause these kinds of
> problems?
Given that COPY
On Friday 2007-12-14 16:22, Tom Lane wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
> > to drop (and log) rows that contain malformed data. That is, rows with
> > too many or too few columns, rows that result in constraint
Tom,
> I think such an approach is doomed to hopeless unreliability. There is
> no concept of an error that doesn't require a transaction abort in the
> system now, and that doesn't seem to me like something that can be
> successfully bolted on after the fact. Also, there's a lot of
> bookkeepin
Neil Conway <[EMAIL PROTECTED]> writes:
> One approach would be to essentially implement the pg_bulkloader
> approach inside the backend. That is, begin by doing a subtransaction
> for every k rows (with k = 1000, say). If you get any errors, then
> either repeat the process with k/2 until you loca
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote:
> If we could somehow only do a subtransaction per failure, things would
> be much better, but I don't see how.
One approach would be to essentially implement the pg_bulkloader
approach inside the backend. That is, begin by doing a subtransaction
Neil Conway <[EMAIL PROTECTED]> writes:
> By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
> to drop (and log) rows that contain malformed data. That is, rows with
> too many or too few columns, rows that result in constraint violations,
> and rows containing columns where the
Neil Conway wrote:
On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote:
How did you do it ?
Did you enchance COPY command or was it something completely new ?
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data
On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote:
> How did you do it ?
>
> Did you enchance COPY command or was it something completely new ?
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too ma
Ühel kenal päeval, T, 2007-12-11 kell 15:41, kirjutas Neil Conway:
> On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
> > Just so you don't lose sight of it, one of the biggest VLDB features we're
> > missing is fault-tolerant bulk load.
>
> I actually had to cook up a version of this for T
Hello Gregory,
Gregory Stark wrote:
Oracle is using Direct I/O so they need the reader and writer threads to avoid
blocking on i/o all the time. We count on the OS doing readahead and buffering
our writes so we don't have to. Direct I/O and needing some way to do
asynchronous writes and reads ar
"Josh Berkus" <[EMAIL PROTECTED]> writes:
> Markus,
>
>> > Parallel Query
>>
>> Uh.. this only makes sense in a distributed database, no? I've thought
>> about parallel querying on top of Postgres-R. Does it make sense
>> implementing some form of parallel querying apart from the distribution
>> o
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
> Greenplum as well as other Real Life stuff.
For those of us here who have no idea what you are talking about can
you define what "Real Life" is like?
Joshua D. Drake
- --
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
Sales
On Wed, Dec 12, 2007 at 08:26:16PM +0100, Markus Schiltknecht wrote:
> >>Isn't Gavin Sherry working on this? Haven't read anything from him
> >>lately...
> >
> >Me neither. Swallowed by Greenplum and France.
>
> Hm.. good for him, I guess!
Yes, I'm around -- just extremely busy with a big releas
Hi Josh,
Josh Berkus wrote:
Sure. Imagine you have a 5TB database on a machine with 8 cores and only one
concurrent user. You'd like to have 1 core doing I/O, and say 4-5 cores
dividing the scan and join processing into 4-5 chunks.
Ah, right, thank for enlightenment. Heck, I'm definitely to
Markus,
> > Parallel Query
>
> Uh.. this only makes sense in a distributed database, no? I've thought
> about parallel querying on top of Postgres-R. Does it make sense
> implementing some form of parallel querying apart from the distribution
> or replication engine?
Sure. Imagine you have a 5TB
Hi,
Josh Berkus wrote:
Here's the other VLDB features we're missing:
Parallel Query
Uh.. this only makes sense in a distributed database, no? I've thought
about parallel querying on top of Postgres-R. Does it make sense
implementing some form of parallel querying apart from the distribution
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:
> Simon, we should start a VLDB-Postgres developer wiki page.
http://developer.postgresql.org/index.php/DataWarehousing
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
---(end of broadcast)---
Hi,
Le mercredi 12 décembre 2007, Josh Berkus a écrit :
> > I'm curious what you feel is missing that pgloader doesn't fill that
> > requirement: http://pgfoundry.org/projects/pgloader/
>
> Because pgloader is implemented in middleware, it carries a very high
> overhead if you have bad rows. As
Greg,
> I'm curious what you feel is missing that pgloader doesn't fill that
> requirement: http://pgfoundry.org/projects/pgloader/
Because pgloader is implemented in middleware, it carries a very high overhead
if you have bad rows. As little as 1% bad rows will slow down loading by 20%
due t
On Tue, 11 Dec 2007, Josh Berkus wrote:
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
who's working on it.
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: h
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:
> Here's the other VLDB features we're missing:
>
> Parallel Query
> Windowing Functions
> Parallel Index Build (not sure how this works exactly, but it speeds Oracle
> up considerably)
> On-disk Bitmap Index (anyone game to finish GP patch?)
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
> Just so you don't lose sight of it, one of the biggest VLDB features we're
> missing is fault-tolerant bulk load.
I actually had to cook up a version of this for Truviso recently. I'll
take a look at submitting a cleaned-up implementation fo
Hannu,
> COPY ... WITH ERRORS TO ...
Yeah, that's a start.
> or something more advanced, like bulkload which can be continued after
> crash ?
Well, we could also use a loader which automatically parallelized, but that
functionality can be done at the middleware level. WITH ERRORS is the
most
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
> Simon.
>
> > VLDB Features I'm expecting to work on are
> > - Read Only Tables/WORM tables
> > - Advanced Partitioning
> > - Compression
> > plus related performance features
>
> Just so you don't lose sight of it, one of the biggest VLDB fe
Ühel kenal päeval, T, 2007-12-11 kell 10:53, kirjutas Josh Berkus:
> Simon.
>
> > VLDB Features I'm expecting to work on are
> > - Read Only Tables/WORM tables
> > - Advanced Partitioning
> > - Compression
> > plus related performance features
>
> Just so you don't lose sight of it, one of the b
Simon.
> VLDB Features I'm expecting to work on are
> - Read Only Tables/WORM tables
> - Advanced Partitioning
> - Compression
> plus related performance features
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don
I'm starting work on next projects for 8.4.
Many applications have the need to store very large data volumes for
both archival and analysis. The analytic databases are commonly known as
Data Warehouses, though there isn't a common term for large archival
data stores. The use cases for those can of
39 matches
Mail list logo