Re: [HACKERS] [GENERAL] 7.4Beta
On Fri, 15 Aug 2003, Gavin Sherry wrote: On Thu, 14 Aug 2003, Stephan Szabo wrote: That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. Perhaps it would be easier to allow SKIP VALIDATION (or something) with ALTER TABLE ADD which can set FkConstraint-skip_validation. If we're just handling pg_dump output, then presumably the data is already validated. On the other handle, it might encourage users to bypass FKs when they feel like it... We've talked about stuff like that in the past, but we seem to generally get stuck about how to specify it. If we add it to the alter table add as an option then we're generating statements that are almost like a standard sql statement, but not quite, and some people didn't like that. A set option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad, but I'd be worried about making it apply to the normal checks as well (which I believe was suggested last time this came up). In any case, making the full checks faster for when you really do care isn't a bad plan. :) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [GENERAL] 7.4Beta
On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote: I can also attest to the horrendously long time it takes to restore the ADD FOREIGN KEY section... That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. Surely in the default case it would reduce to using the new hashed IN() feature, so it'd be a lot faster? If we wrote the query using IN that'd be the hope (I've not played with it enough to guarantee that) However, on a simple test comparing select * from fk where not exists(select * from pk where pk.key=fk.key) and key is not null; (doing seq scan/subplan doing index scan - which is probably close to the current system) and select * from fk where key in (select key from pk) and key is not null on a pk table with 100k rows and fk table with 1m rows gives me a difference of about 2x on my machine. But that's with a single column int4 key, I haven't tried multi-column keys or larger key types. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] 7.4Beta
On Thu, 14 Aug 2003, Stephan Szabo wrote: That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. Perhaps it would be easier to allow SKIP VALIDATION (or something) with ALTER TABLE ADD which can set FkConstraint-skip_validation. If we're just handling pg_dump output, then presumably the data is already validated. On the other handle, it might encourage users to bypass FKs when they feel like it... Thanks, Gavin ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] 7.4Beta
On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote: 1. Add the FK to the table BEFORE COPYing data 2. Use the old update blah set reltriggers = 0 trick 3. restore the data 4. Undo step 2 The problem with that is that I think the reltriggers=0 trick only works if you're superuser, I thought that's why the trigger disabling became optional. A set that affected only atac would probably be reasonable in the dump (and presumably easy to do). It'd also carry over to future cases where we separate some check constraints (for examples ones that refer directly or indirectly to the same table in a subselect). Well yes, this would be a super-user only ability. Are you worried about people restoring dumps as non-superuser? Basically, yes (you might dump only some tables that you own for example). I think that's why the data only dumps no longer do the reltriggers thing by default and you need an option to get at it. We could make a similar option for this case, but it'd only work when restored by a superuser. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] [GENERAL] 7.4Beta
I can also attest to the horrendously long time it takes to restore the ADD FOREIGN KEY section... That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. Surely in the default case it would reduce to using the new hashed IN() feature, so it'd be a lot faster? Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [GENERAL] 7.4Beta
1. Add the FK to the table BEFORE COPYing data 2. Use the old update blah set reltriggers = 0 trick 3. restore the data 4. Undo step 2 The problem with that is that I think the reltriggers=0 trick only works if you're superuser, I thought that's why the trigger disabling became optional. A set that affected only atac would probably be reasonable in the dump (and presumably easy to do). It'd also carry over to future cases where we separate some check constraints (for examples ones that refer directly or indirectly to the same table in a subselect). Well yes, this would be a super-user only ability. Are you worried about people restoring dumps as non-superuser? Chris ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] 7.4Beta
On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote: We've talked about stuff like that in the past, but we seem to generally get stuck about how to specify it. If we add it to the alter table add as an option then we're generating statements that are almost like a standard sql statement, but not quite, and some people didn't like that. A set option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad, but I'd be worried about making it apply to the normal checks as well (which I believe was suggested last time this came up). In any case, making the full checks faster for when you really do care isn't a bad plan. :) Or, this: 1. Add the FK to the table BEFORE COPYing data 2. Use the old update blah set reltriggers = 0 trick 3. restore the data 4. Undo step 2 The problem with that is that I think the reltriggers=0 trick only works if you're superuser, I thought that's why the trigger disabling became optional. A set that affected only atac would probably be reasonable in the dump (and presumably easy to do). It'd also carry over to future cases where we separate some check constraints (for examples ones that refer directly or indirectly to the same table in a subselect). ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [GENERAL] 7.4Beta
We've talked about stuff like that in the past, but we seem to generally get stuck about how to specify it. If we add it to the alter table add as an option then we're generating statements that are almost like a standard sql statement, but not quite, and some people didn't like that. A set option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad, but I'd be worried about making it apply to the normal checks as well (which I believe was suggested last time this came up). In any case, making the full checks faster for when you really do care isn't a bad plan. :) Or, this: 1. Add the FK to the table BEFORE COPYing data 2. Use the old update blah set reltriggers = 0 trick 3. restore the data 4. Undo step 2 Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] 7.4Beta
We've talked about stuff like that in the past, but we seem to generally get stuck about how to specify it. If we add it to the alter table add as an option then we're generating statements that are almost like a standard sql statement, but not quite, and some people didn't like that. A set option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad, but I'd be worried about making it apply to the normal checks as well (which I believe was suggested last time this came up). In any case, making the full checks faster for when you really do care isn't a bad plan. :) How about having an 'i'm restoring' SET var: SET data_restore = true; Which means (among other things) that FK's aren't checked? Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] 7.4Beta
Stephan Szabo wrote: On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote: I throw last nights backup at it. Data went in in about 1/2 an hour then the constraints went in and they took at age. about 2 hours. Is there anyway to speed up the database constraint code? Because quite frankly at the current speed your probably better off without the constraints (Same problem with 7.3 come to think about it.) I can also attest to the horrendously long time it takes to restore the ADD FOREIGN KEY section... That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. This is clearly a case for a statement level trigger, as soon as affected rows can be identified. One remark on that enable/disable triggers stuff: from a user's perspective, I wouldn't consider a constraint trigger as a trigger, so if I'd disable all triggers on a table, I still would expect all constraints to be checked. Regards, Andreas ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] 7.4 beta 1 getting out of swap
Joe Conway [EMAIL PROTECTED] writes: Tom Lane wrote: 3. Set up a long-lived cache internal to the array functions that can translate element type OID to the needed lookup data, and won't leak memory across repeated calls. This is not the fastest or most general solution, but it seems the most localized and safest fix. It seems to me that #3 is the least risky, and even if it isn't the best possible performance, this is the initial implementation of indexes on arrays, so it isn't like we're taking away something. Maybe solution #2 is better held as a performance enhancement for 7.5. I'm leaning that way too. It occurs to me also that the same cache could be used to eliminate repeated lookups in sorting setup --- which would not be much of a win percentagewise, compared to the sort itself, but still it seems worth doing. Do you want me to take a shot at this since I created the mess? Actually I led you down the garden path on that, IIRC --- I was the one who insisted these lookups needed to be cached. I'll work on fixing it. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
ADD FOREIGN KEY (was Re: [HACKERS] [GENERAL] 7.4Beta)
Stephan Szabo [EMAIL PROTECTED] writes: On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote: I can also attest to the horrendously long time it takes to restore the ADD FOREIGN KEY section... That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. In simple cases I think that the creation of indexes would be enough to get you a passable plan --- CREATE INDEX does update pg_class.reltuples, so the planner will know how big the tables are, and for single-column primary keys the existence of a unique index is enough to cue the planner that the column is unique, even without any ANALYZE stats. Those are the biggest levers on the plan choice. This assumes that pg_dump always dumps CREATE INDEX before ADD FOREIGN KEY; I'm not certain if there's anything to enforce that at the moment... I assume what you have in mind is to replace validateForeignKeyConstraint() with something that does a join of the two tables via an SPI command. But supposing that we want to keep the present ability to report (one of) the failing key values, it seems like the query has to look like SELECT keycolumns FROM referencing_table WHERE keycolumns NOT IN (SELECT refcols FROM referenced_table); which is only gonna do the right thing for one of the MATCH styles (not sure which, offhand ... actually it may not do the right thing for any match style if there are nulls in referenced_table ...). How would you make it work for all the MATCH styles? And will it really be all that efficient? (NOT IN is a lot more circumscribed than IN.) regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] 7.4Beta
Stephan Szabo [EMAIL PROTECTED] writes: select * from fk where not exists(select * from pk where pk.key=fk.key) and key is not null; (doing seq scan/subplan doing index scan - which is probably close to the current system) Actually, even that would probably be noticeably better than the current system. I haven't profiled it (someone should) but I suspect that executor startup/shutdown time is a huge hit. Even though the trigger is caching a plan, it has to instantiate that plan for each referencing tuple --- and the executor is not designed for quick startup/shutdown. (Of course, this would become less relevant if the triggers got rewritten to not go through SPI ...) regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] New function: epoch_to_timestamp...
David Fetter [EMAIL PROTECTED] writes: Is there a reasonable way to add an optional param or two to mark time zones, or should that just the application programmer's hassle? The return type should be timestamptz, which makes the transformation timezone-independent. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: ADD FOREIGN KEY (was Re: [HACKERS] [GENERAL] 7.4Beta)
On Fri, 15 Aug 2003, Tom Lane wrote: Stephan Szabo [EMAIL PROTECTED] writes: On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote: I can also attest to the horrendously long time it takes to restore the ADD FOREIGN KEY section... That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. In simple cases I think that the creation of indexes would be enough to get you a passable plan --- CREATE INDEX does update pg_class.reltuples, so the planner will know how big the tables are, and for single-column primary keys the existence of a unique index is enough to cue the planner that the column is unique, even without any ANALYZE stats. Those are the biggest levers on the plan choice. This assumes that pg_dump always dumps CREATE INDEX before ADD FOREIGN KEY; I'm not certain if there's anything to enforce that at the moment... I assume what you have in mind is to replace validateForeignKeyConstraint() with something that does a join of the two tables via an SPI command. But supposing that we want to keep the present ability to report (one of) the failing key values, it seems like the query has to look like SELECT keycolumns FROM referencing_table WHERE keycolumns NOT IN (SELECT refcols FROM referenced_table); which is only gonna do the right thing for one of the MATCH styles (not sure which, offhand ... actually it may not do the right thing for any match style if there are nulls in referenced_table ...). Yes, in practice, you'd have to put IS NOT NULL checks in the subselect, which is fine for the two match types we support since a referenced row with a NULL isn't a choice for a referenced row for those. I think MATCH PARTIAL might have to fall back to the repeated check unless we can make the query work which would be harder because you only want to compare the columns for a particular row where the keycolumn case is not null and I can't think of a query for that that'd be particularly clean and likely to be fast, then again I don't think the constraint would be either. :( It'd probably be: MATCH unspecified: SELECT keycolumns FROM referencing_table WHERE (keycolumns) NOT IN (SELECT refcols FROM referenced_table WHERE refcol1 IS NOT NULL AND ... ) AND keycolumn1 IS NOT NULL AND ...; MATCH FULL: (something like, I haven't tried it) SELECT keycolumns FROM referencing_table WHERE ((keycolumns) NOT IN (SELECT refcols FROM referenced_table WHERE refcol1 IS NOT NULL AND ...) AND (keycolumn1 IS NOT NULL AND ...) ) OR ((keycolumn1 IS NOT NULL)!=(keycolumn2 IS NOT NULL) OR ...) How would you make it work for all the MATCH styles? And will it really be all that efficient? (NOT IN is a lot more circumscribed than IN.) I'm not really sure yet. Limited tests seem to show that it'll probably be as fast if not faster for all reasonable cases, but I'd want to generate a much larger random data set and actually put it in to make a fair comparison (maybe temporarily with a set to allow people to try both cases on real world data). One other advantage here is that we don't need to get row locks while checking this if we've already gotten the exclusive table locks on both tables involved. I'm not sure if we do that currently though. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] 7.4Beta
On Fri, 15 Aug 2003, Andreas Pflug wrote: Stephan Szabo wrote: That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. This is clearly a case for a statement level trigger, as soon as affected rows can be identified. Well, I think single inserts might be more expensive (because the query is more involved for the table joining case) using a statement level trigger, so we'd probably want to profile the cases. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [GENERAL] 7.4Beta
Stephan Szabo wrote: On Fri, 15 Aug 2003, Andreas Pflug wrote: Stephan Szabo wrote: That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. This is clearly a case for a statement level trigger, as soon as affected rows can be identified. Well, I think single inserts might be more expensive (because the query is more involved for the table joining case) using a statement level trigger, so we'd probably want to profile the cases. This really depends. If a constraint is just a check on the inserted/updated column, so no other row needs to be checked, there's no faster way then the current row trigger. But FK constraints need to execute a query to retrieve the referenced row, and every RDBMS prefers to execute a single statement with many rows over many statements with a single row, because the first will profit from optimization. And even if only a single row is inserted or updated, there's still the need to lookup the reference. Regards, Andreas ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] 7.4Beta
On Fri, 15 Aug 2003, Andreas Pflug wrote: Stephan Szabo wrote: On Fri, 15 Aug 2003, Andreas Pflug wrote: Stephan Szabo wrote: That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. This is clearly a case for a statement level trigger, as soon as affected rows can be identified. Well, I think single inserts might be more expensive (because the query is more involved for the table joining case) using a statement level trigger, so we'd probably want to profile the cases. This really depends. If a constraint is just a check on the inserted/updated column, so no other row needs to be checked, there's no faster way then the current row trigger. But FK constraints need to execute a query to retrieve the referenced row, and every RDBMS prefers to execute a single statement with many rows over many statements with a single row, because the first will profit from optimization. And even if only a single row is inserted or updated, there's still the need to lookup the reference. I don't think that addresses the issue I brought up. If you're doing a bunch of single inserts: begin; insert into foo values (1); insert into foo values (1); insert into foo values (1); insert into foo values (1); insert into foo values (1); end; Each of those statement triggers is still only going to be dealing with a single row. If you're in immediate mode there's not much you can do about that since the constraint is checked between inserts. If you're in deferred mode, right now it won't help because it's not going to batch them, it's going to be 5 statement triggers AFAICT each with its own 1 row affected table. I believe that the more complicated join the old/new table with the pk table and do the constraint check is going to be slightly slower than the current row behavior for such cases because the trigger query is going to be more complicated. What would be nice would be some way to choose whether to use a single query per statement vs a simpler query per row based on what's happening. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] Copyright (C) 1996-2002
Today I've d-loaded PostgreSQL 7.3.4. I've seen in $PGSQLD/doc/html/index.html it still says Copyright (C) 1996-2002 shouldn't it be 2003? Regards, Christoph PS I've sent this to [EMAIL PROTECTED] before. But in return I've got Your message to pgsql-docs has been delayed, and requires the approval of the moderators, for the following reason(s): The author (Christoph Haller [EMAIL PROTECTED]) is not a member of any of the restrict_post groups. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] 7.4Beta
Stephan Szabo wrote: On Fri, 15 Aug 2003, Andreas Pflug wrote: Stephan Szabo wrote: On Fri, 15 Aug 2003, Andreas Pflug wrote: Stephan Szabo wrote: That really needs to be rewritten to do a single check over the table rather than running the constraint for every row. I keep meaning to get around to it and never actually do. :( I'm not sure that in practice you'll get a better plan at restore time depending on what the default statistics give you. This is clearly a case for a statement level trigger, as soon as affected rows can be identified. Well, I think single inserts might be more expensive (because the query is more involved for the table joining case) using a statement level trigger, so we'd probably want to profile the cases. This really depends. If a constraint is just a check on the inserted/updated column, so no other row needs to be checked, there's no faster way then the current row trigger. But FK constraints need to execute a query to retrieve the referenced row, and every RDBMS prefers to execute a single statement with many rows over many statements with a single row, because the first will profit from optimization. And even if only a single row is inserted or updated, there's still the need to lookup the reference. I don't think that addresses the issue I brought up. If you're doing a bunch of single inserts: begin; insert into foo values (1); insert into foo values (1); insert into foo values (1); insert into foo values (1); insert into foo values (1); end; Each of those statement triggers is still only going to be dealing with a single row. If you're in immediate mode there's not much you can do about that since the constraint is checked between inserts. If you're in deferred mode, right now it won't help because it's not going to batch them, it's going to be 5 statement triggers AFAICT each with its own 1 row affected table. I believe that the more complicated join the old/new table with the pk table and do the constraint check is going to be slightly slower than the current row behavior for such cases because the trigger query is going to be more complicated. What would be nice would be some way to choose whether to use a single query per statement vs a simpler query per row based on what's happening. Deferring the constraint check would mean checking 5 single rows, right. But I still can't see why you think that a row level trigger would be cheaper in this case. I had a look at ri_triggers.c and what's coded there looks just as I expected, doing a query on the referenced table. the queries might look a bit different when checking multiple rows at once, but carefully designed I doubt that there would be a performance hit from this. In case it *is* significantly slower, single row updates could be handled separately using the current triggers, and statement triggers for multiple rows. This would cover both scenarios best. At the moment, update/insert scales not too good. Best thing in the situation above would certainly be if all 5 rows would be checked in a single query, but that looks quite impossible because a mixture of inserts/updates/deletes on different tables might be deferred. Regards, Andreas ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] 7.4Beta
On Fri, 15 Aug 2003, Andreas Pflug wrote: Stephan Szabo wrote: On Fri, 15 Aug 2003, Andreas Pflug wrote: Stephan Szabo wrote: Well, I think single inserts might be more expensive (because the query is more involved for the table joining case) using a statement level trigger, so we'd probably want to profile the cases. This really depends. If a constraint is just a check on the inserted/updated column, so no other row needs to be checked, there's no faster way then the current row trigger. But FK constraints need to execute a query to retrieve the referenced row, and every RDBMS prefers to execute a single statement with many rows over many statements with a single row, because the first will profit from optimization. And even if only a single row is inserted or updated, there's still the need to lookup the reference. I don't think that addresses the issue I brought up. If you're doing a bunch of single inserts: begin; insert into foo values (1); insert into foo values (1); insert into foo values (1); insert into foo values (1); insert into foo values (1); end; Each of those statement triggers is still only going to be dealing with a single row. If you're in immediate mode there's not much you can do about that since the constraint is checked between inserts. If you're in deferred mode, right now it won't help because it's not going to batch them, it's going to be 5 statement triggers AFAICT each with its own 1 row affected table. I believe that the more complicated join the old/new table with the pk table and do the constraint check is going to be slightly slower than the current row behavior for such cases because the trigger query is going to be more complicated. What would be nice would be some way to choose whether to use a single query per statement vs a simpler query per row based on what's happening. Deferring the constraint check would mean checking 5 single rows, right. But I still can't see why you think that a row level trigger would be cheaper in this case. I had a look at ri_triggers.c and what's coded there looks just as I expected, doing a query on the referenced table. the queries might look a bit different when checking multiple rows at once, but carefully designed I doubt that there would be a performance hit from this. In case it *is* significantly slower, single row updates I don't know if there will be or not, but in one case it's a single table select with constant values, in the other it's probably some kind of scan and subselect. I'm just not going to rule out the possibility, so we should profile it in large transactions with say 100k single inserts and see. could be handled separately using the current triggers, and statement triggers for multiple rows. This would cover both scenarios best. At the Yep. I'd wish that it could do it without actually needing to queue up both triggers, but I don't know how if that'd be possible without tying some knowledge of the fk functions deeper down. Best thing in the situation above would certainly be if all 5 rows would be checked in a single query, but that looks quite impossible because a mixture of inserts/updates/deletes on different tables might be deferred. Yeah, the 5 above are pretty easy to show that it's safe, but other cases and referential action cases won't necessarily be so easy. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] best way to retreive the next record in a multi column index
Can anybody help me with this? (sorry for posting on hackers) I need to be able determine the next row based on a non unique key (index). I have solved this problem, but I would like to know if there is a simpler solution. For those of you who have ever dealt with COBOL, this is an on the fly sql construction of a 'READ NEXT' statement following a START. Very similar to cursors, but because of the transactional limitations of cursors they cannot be used in this context. Example: I have a table t with columns a, b, c. I have values a1, b1, c1 for those columns and would like to know the next value in the table when ordered by a, b. I have values a1, b1, and oid1 and would like to find the very next record in the table (essentially looking for the next record in the index). I have two solutions: one with 'or' logic and one with 'and' logic. Note: if the index we are scanning has the unique constraint, the oid part of the logic (and the index) can be left out. create index t_idx on t(a, b, oid); *or* logic: select * from t where a a1 OR (a = a1 and b b1) OR (a = a1 and b = b1 and oid oid1) order by a, b, oid *and* logic select * from t where a = a1 AND (a a1 or b = b1) AND (a a1 or b b1 or oid oid1) order by a, b, oid I think, of the two, the or logic is much better. The problem with both approaches is that when we have a 4 column based key (common in COBOL) our index is based on a,b,c,d,o and the number of comparisons (and our select statement) becomes large, and performance is very important! If some logical genius knows how to reduce the above logic into a more direct approach, feel free to comment. Postgres properly optimizes both cases, and uses the key even for a table with 1 million + records in it, the answer comes back right away. My question is: is there a simpler way to do this? AFIK there is no way in sql to directly find the 'next' or 'previous' record in an ordered index (in other words, I have oid n, what is the next oid in the index?) without using the above logic. In other words, I am missing the ability to deal with a multi column index value in a comparison as a single entity. p.s. the above queries are 'sliding window' queries similar to cursors. If your table traversal can be defined by an (unique) index, you can use the above templates to slide over the tables without the use of a cursor. Merlin
Re: [HACKERS] [GENERAL] 7.4Beta
-Original Message- From: Andreas Pflug [mailto:[EMAIL PROTECTED] Sent: Friday, August 15, 2003 10:49 AM To: Stephan Szabo Cc: PostgreSQL-development Subject: Re: [HACKERS] [GENERAL] 7.4Beta Stephan Szabo wrote: I don't know if there will be or not, but in one case it's a single table select with constant values, in the other it's probably some kind of scan and subselect. I'm just not going to rule out the possibility, so we should profile it in large transactions with say 100k single inserts and see. You're talking about bulk operations, that should be handled carefully either. Usually loading all data into a temporary table, and making a INSERT INTO xxx SELECT FROM tmptable should give a better performance if indices and constraints are concerned. PostgreSQL shouldn't be considered to accept the most abusive ways of operation, but it should offer a reasonable set of tools enabling the jobs in a convenient way. Best situation available is if many small random transactions are performed good, for TPC like loads, as well as bulk operations. Nobody should expect that a database will smootly convert a bunch of single transactions into an optimized bulk one. That's the job of a programmer. Yeah, the 5 above are pretty easy to show that it's safe, but other cases and referential action cases won't necessarily be so easy. So it's the programmers responsibility to offer mass data to the backend, not separate inserts that by chance might be handled in a similar way. A RDBMS is not a clairvoyant. Simplification of bulk operations can be very important for customers (on the other hand). For the CONNX tool set, we offer an escape on INSERT/SELECT that performs the operation in bulk mode. There are serious downsides to bulk operations also (such as not being logged and therefore existing outside of a transaction). Therefore, they are useful really only in cases where a complete destruction and repopulation is called for. If anything goes haywire, you can't simply roll it back. Yet the speed savings can be enormous (orders of magnitude). Compared to iteration over a set of prepared inserts, a bulk insert (such as using Microsoft's BCP API or Oracles Direct Path loading) can be 100 times faster. If you are moving gigabytes of data and performing a complete refresh, the method to use becomes obvious. When we go outside of the language bounds, a curly braced escape notation is used. For instance, an insert/select might look like this: INSERT INTO destination_table SELECT column_list FROM source_table {fn commitcount 1000} {bulkmode} The commit count says to use batches of 1000 rows and bulkmode says to use the fastest possible insert method. Imagine (for instance) that PostgreSQL can use an ODBC {or similar} data source as a table. Then, with a syntax such as the above (or SELECT INTO etc.), you could very rapidly move data from one system into another. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] 7.4Beta
Dann Corbit wrote: Simplification of bulk operations can be very important for customers (on the other hand). For the CONNX tool set, we offer an escape on INSERT/SELECT that performs the operation in bulk mode. There are serious downsides to bulk operations also (such as not being logged and therefore existing outside of a transaction). Therefore, they are useful really only in cases where a complete destruction and repopulation is called for. If anything goes haywire, you can't simply roll it back. Yet the speed savings can be enormous (orders of magnitude). Compared to iteration over a set of prepared inserts, a bulk insert (such as using Microsoft's BCP API or Oracles Direct Path loading) can be 100 times faster. If you are moving gigabytes of data and performing a complete refresh, the method to use becomes obvious. When we go outside of the language bounds, a curly braced escape notation is used. For instance, an insert/select might look like this: INSERT INTO destination_table SELECT column_list FROM source_table {fn commitcount 1000} {bulkmode} The commit count says to use batches of 1000 rows and bulkmode says to use the fastest possible insert method. Imagine (for instance) that PostgreSQL can use an ODBC {or similar} data source as a table. Then, with a syntax such as the above (or SELECT INTO etc.), you could very rapidly move data from one system into another. When saying bulk operation I don't necessarily mean using bulk load or stuff like that. What I mean is handling large amounts of similar data at the same time. That doesn't say anything about transactions or logging problems. Imagine you have 100k or rows to load, each having FKs to (hopefully) existing rows - Step 1: load the rows into a temp table - Step 2: identify duplicates, logging and deleting them - Step 3: insert all data satisfying the FK constraints. - Step 4: log all that didn't insert. This are relatively few statements (not the simplest), which can be handled in a safe manner. Just an example, how a RDBMS can do the job in a mass oriented (and thus optimizable) way. Regards, Andreas ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] 7.4 LOG: invalid message length
Gmane, I just checked in a fix to the jdbc driver for this. The problem was that the connection termination message was being passed the wrong length, which really didn't have any other adverse side effect than this message in the log, since the connection was no longer being used. thanks, --Barry Gmane Smith wrote: Using Mac OS X 10.2.6 Pgsql 7.4 (postgresql-7.4beta1) from postgresql.org devpgjdbc2.jar and WebObjects 5.2 I get LOG: invalid message length when EOModeler creates my database. When I built PostgreSQL I specified --with-pam --without-readline Since I don't have ant the --with-java failed so I left it off. Should I go back to a more stable JDBC driver? Or should I press on? Thanks. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Copyright (C) 1996-2002
Christoph Haller [EMAIL PROTECTED] writes: Today I've d-loaded PostgreSQL 7.3.4. I've seen in $PGSQLD/doc/html/index.html it still says Copyright (C) 1996-2002 shouldn't it be 2003? We only update the copyright notices when we are preparing a major release. (Bruce just did it a week or two back for 7.4, for example.) Updating for minor releases would create a lot of churn in the stable CVS branches, for little purpose. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] Bounds error in LockMethodInit().
In lmgr.c you have a static LOCKMASK LockConflicts[] with 9 elements in it. You call LockMethodTableInit() with that pointer, and MAX_LOCKMODES - 1 (10 - 1 = 9) That calls LockMethodInit with the same arguments, but it does numModes++. So you basicly have a for loop that looks like: for (i = 0; i 10; i++, conflictsP++) The last item you try to copy is conflictsP is not within the the LockConflicts array anymore. I have no idea what that numModes++ line is doing there. Kurt ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] 7.4Beta
Jan Wieck [EMAIL PROTECTED] writes: I'm thinking instead of a way to cache entire executors for this. Each SPI plan used during a transaction would need it's own executor, and I don't know offhand what type and how much resources an executor requires (I think it's only some memory that get's initialized and the VFD's opened). Hmm. This is probably more feasible now than it would have been a year ago, because I did some cleanup work to ensure that executor state is localized into a specific memory context. I'm not certain about the amount of overhead either, but it's surely worth a try. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] 7.4Beta
-Original Message- From: Andreas Pflug [mailto:[EMAIL PROTECTED] Sent: Friday, August 15, 2003 11:36 AM To: Dann Corbit Cc: Stephan Szabo; PostgreSQL-development Subject: Re: [HACKERS] [GENERAL] 7.4Beta Dann Corbit wrote: Simplification of bulk operations can be very important for customers (on the other hand). For the CONNX tool set, we offer an escape on INSERT/SELECT that performs the operation in bulk mode. There are serious downsides to bulk operations also (such as not being logged and therefore existing outside of a transaction). Therefore, they are useful really only in cases where a complete destruction and repopulation is called for. If anything goes haywire, you can't simply roll it back. Yet the speed savings can be enormous (orders of magnitude). Compared to iteration over a set of prepared inserts, a bulk insert (such as using Microsoft's BCP API or Oracles Direct Path loading) can be 100 times faster. If you are moving gigabytes of data and performing a complete refresh, the method to use becomes obvious. When we go outside of the language bounds, a curly braced escape notation is used. For instance, an insert/select might look like this: INSERT INTO destination_table SELECT column_list FROM source_table {fn commitcount 1000} {bulkmode} The commit count says to use batches of 1000 rows and bulkmode says to use the fastest possible insert method. Imagine (for instance) that PostgreSQL can use an ODBC {or similar} data source as a table. Then, with a syntax such as the above (or SELECT INTO etc.), you could very rapidly move data from one system into another. When saying bulk operation I don't necessarily mean using bulk load or stuff like that. What I mean is handling large amounts of similar data at the same time. That doesn't say anything about transactions or logging problems. Imagine you have 100k or rows to load, each having FKs to (hopefully) existing rows - Step 1: load the rows into a temp table - Step 2: identify duplicates, logging and deleting them - Step 3: insert all data satisfying the FK constraints. - Step 4: log all that didn't insert. This are relatively few statements (not the simplest), which can be handled in a safe manner. Just an example, how a RDBMS can do the job in a mass oriented (and thus optimizable) way. I really, really like SQL*Server's solution to the problem you have stated above. A typical example is you have a pile of incoming words and you want to create a dictionary. An example might be a library database, where you have an abstract for each item. You parse the words and look for new ones to add. What SQL*Server allows is an index with an option called IGNORE DUPLICATES that simply tosses out rows that are already in the table. For applications like what I have described and what you have described it is an incredibly useful extension. Once I got used to it, I found myself using it all the time. Of course, you must be very careful to ensure that the duplicates really are completely unimportant. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Behavior of equality_oper and ordering_oper
There are a couple of routines in parse_oper.c that are intended to look up appropriate '=' and '' operators given a datatype OID. These are used in quite a variety of places, for example to determine the semantics of GROUP BY. It's bothered me for some time that these routines depend on actually looking for an operator named '='. (ordering_oper used to depend on looking for the name '', too, but at the moment it looks for a merge sort operator associated with '='.) The system in general does not assume that operator names determine operator semantics, so depending on names here is a big wart. equality_oper tries to apply some additional checks to verify that '=' really does behave like equality, but that's just a kluge. What's worse, as of 7.3 the lookup results could vary depending on your schema search path. This is bad news, especially for user-defined types. Up to now I've seen no way around this; I've wished we could have a type's pg_type entry link directly to the proper '=' operator, but there are some circularity issues there. (CREATE TYPE couldn't set the link because the operator wouldn't exist yet when you're creating the type.) Today it occurred to me that we could look in pg_opclass for a default btree opclass for the datatype. If we find one, then the Equal and Less members of the opclass are the operators we want. (If we don't find one, we could try for a default hash opclass, which would give us Equal, but not Less, for a few additional datatypes.) This seems like a much cleaner approach for two reasons: the opclass structure declares directly that the operators have the semantics we are looking for, and the search is not dependent on schema visibility. (We only allow one default opclass per datatype/AM, so the result would be unique.) This lookup would perhaps be a little slower than the name-based operator lookup, since AFAICS there's no way to use the catcaches for it. However, we already realized we need a datatype-to-comparison-operator lookup cache to avoid memory leakage in array_cmp. If we cache the results in a specialized hashtable then it certainly won't be any slower than what the code does now. There are a few built-in types for which this would change the behavior. Presently, these operators are believed by equality_oper to be '=', but do not have any corresponding btree or hash opclass: =(tid,tid) | tideq =(box,box) | box_eq =(path,path) | path_n_eq =(tinterval,tinterval) | tintervaleq =(money,money) | cash_eq =(aclitem,aclitem) | aclitemeq =(circle,circle) | circle_eq =(lseg,lseg) | lseg_eq =(line,line) | line_eq In several of these cases, equality_oper is actually wrong --- box_eq for example compares areas, which is not what one would consider the normal equality behavior for boxes. The only ones that really ought to be found are the ones for TID, MONEY, and ACLITEM. I'm not particularly concerned about losing the ability to group by any of those datatypes, but if anyone is, we could talk about forcing an initdb to add the necessary comparison operators. There are also a small number of operators that are found by ordering_oper but would not be found by a btree opclass search: (box,box) | box_eq (reltime,reltime) | reltimeeq (money,money) | cash_eq (circle,circle) | circle_eq Again, I'm not too concerned about this; only MONEY actually has comparison semantics that would support a btree index, but if it's not btree-indexable does it need to be groupable? Comments, objections? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] best way to retreive the next record in a multi column index
On Fri, Aug 15, 2003 at 13:42:23 -0400, Merlin Moncure [EMAIL PROTECTED] wrote: Example: I have a table t with columns a, b, c. I have values a1, b1, c1 for those columns and would like to know the next value in the table when ordered by a, b. I have values a1, b1, and oid1 and would like to find the very next record in the table (essentially looking for the next record in the index). I have two solutions: one with 'or' logic and one with 'and' logic. Note: if the index we are scanning has the unique constraint, the oid part of the logic (and the index) can be left out. How about something like the following: select * from t where a = a1 and b = b1 order by a, b limit 1 offset 1; ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] UnixWare on Current CVS: Success!
Yep, Just tested it with a full check out... GREAT JOB! Thanks to uou all! On Thu, 14 Aug 2003, Larry Rosenman wrote: Date: Thu, 14 Aug 2003 22:23:17 -0500 From: Larry Rosenman [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: UnixWare on Current CVS: Success! Thanks to Tom and Bruce, we can now compile CVS 7.4 on UnixWare with OUT any tweaks, including the --enable-thread-safety switch. Here is what I used, and is now running on lerami: CC=cc CXX=CC ./configure --prefix=/usr/local/pgsql --enable-syslog \ --with-CXX --enable-multibyte --enable-cassert \ --with-includes=/usr/local/include --with-libs=/usr/local/lib \ --with-tcl --with-tclconfig=/usr/local/lib \ --with-tkconfig=/usr/local/lib --enable-locale \ --enable-thread-safety Thanks, Guys! LER -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: [EMAIL PROTECTED] -- Make your life a dream, make your dream a reality. (St Exupery) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] best way to retreive the next record in a multi column index
Bruno Wolff III wrote: How about something like the following: select * from t where a = a1 and b = b1 order by a, b limit 1 offset 1; Well, this may have recently changed, but the offset clause is not suitable for arbitrary jumps over large tables. Essentially, pg does an index lookup to the first element then sequential scans until the offset criteria is met. Even if that was not the case there is another problem: Suppose while you are iterating over your table another backend deletes a row after your initial start position; this will cause a record to get skipped! (unless inside a transaction, of course, but that can't be assumed). I also spent a lot of time thinking about use some type of concatenation and functional indices to get around the multi column issue (then things would be really simple!). This turned out to be a very complicated and I ended up giving it up: I was stymied in the creation of a 'universal concatenation' function, plus losing the elegant syntax to do partials was a loss. Merlin ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] 7.4Beta
Is there a TODO here? --- Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: I'm thinking instead of a way to cache entire executors for this. Each SPI plan used during a transaction would need it's own executor, and I don't know offhand what type and how much resources an executor requires (I think it's only some memory that get's initialized and the VFD's opened). Hmm. This is probably more feasible now than it would have been a year ago, because I did some cleanup work to ensure that executor state is localized into a specific memory context. I'm not certain about the amount of overhead either, but it's surely worth a try. regards, tom lane -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] New function: epoch_to_timestamp...
On Fri, Aug 15, 2003 at 06:57:24AM -0400, Tom Lane wrote: David Fetter [EMAIL PROTECTED] writes: Is there a reasonable way to add an optional param or two to mark time zones, or should that just the application programmer's hassle? The return type should be timestamptz, which makes the transformation timezone-independent. Do you mean the default one should read as follows, or that the default one should take params (integer, output_timezone, input_timezone), or...? CREATE OR REPLACE FUNCTION epoch_to_timestampz (INTEGER) RETURNS TIMESTAMPZ AS ' DECLARE the_sql TEXT; the_record RECORD; the_time TIMESTAMPZ; BEGIN the_sql := ''SELECT January 1 1970::timestamp + '' || $1 || '' seconds::interval AS foo''; RAISE NOTICE ''%'', the_sql; FOR the_record IN EXECUTE the_sql LOOP the_time := the_record.foo; END LOOP; RETURN the_time; END; ' LANGUAGE 'plpgsql'; Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100cell: +1 415 235 3778 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] 7.4Beta
Bruce Momjian wrote: Is there a TODO here? Maybe!? It's one of these premature things noone can tell by now. So the TODO would be investigation for now. Jan --- Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: I'm thinking instead of a way to cache entire executors for this. Each SPI plan used during a transaction would need it's own executor, and I don't know offhand what type and how much resources an executor requires (I think it's only some memory that get's initialized and the VFD's opened). Hmm. This is probably more feasible now than it would have been a year ago, because I did some cleanup work to ensure that executor state is localized into a specific memory context. I'm not certain about the amount of overhead either, but it's surely worth a try. regards, tom lane -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match