[HACKERS] [RFC] Comments on PostPic project - Repost
I had no answer from pgsql-general, so I'm reposting here. Hello, I would like to receive comments/suggestions about this project: http://github.com/drotiro/postpic. In short, it's an extension that enables image processing within the database, adding a new type (image) and several functions. The SQL and Java interfaces are documented on the project's wiki, so I'm not talking about these here, but instead present some detail on the datatype's implementation. The image is represented by a struct containing some attributes (dimensions, some exif tag: shoot date, exposure time...) and a large object holding the actual image data. The idea is to have attributes stored directly to allow for efficient searching, while the large object seemed a reasonable choice to store the possibly large image data (what are the LOBs for?). With the current large objects implementation, when a new lo is created it lives in the pg_largeobjects table, until someone calls lo_unlink on it. In my case: I create the lo on behalf of the user, then store its oid in the image's internal representation. At this point, the image can be inserted in a table, processed and so on, but when it gets deleted the corresponding lo remains dangling, unless someone or something (eg. a trigger) takes care on destroying it. Is there a way of placing some kind of hook on an object's deletion? A clean way to do a reference counting on large objects? To avoid polluting pg_largeobjects, almost all of the image processing functions in PostPic return a 'temporary_image' object, which is just an alias on bytea. (Btw: I defined it using a DOMAIN. A better way?). Temporary images can be converted back to images when needed via a cast (often there is a variant of the function doing this automatically). Thanks in advance for your suggestions and contribution, Domenico.
Re: [HACKERS] Getting to beta1
Josh Berkus j...@agliodbs.com writes: Yes, and on pgsql-docs rather than on this mailing list. Or ... J.F.D.I (Just F Do It). That is, if someone contributed a whole buncha new text to the tutorial on pgsql-docs, I can't imagine it being rejected out of hand. That was the bulk of the question, thanks for such a clear answer. I'll see about giving some time to that, out of the critical path to beta. For my part, I plan to just write the tutorial in whatever tool makes it easiest to write (likely Lyx, but maybe OOo). Then people can discuss what portions belong in the docs, or not. Should I follow you on that choice or just send a doc patch with poor SGML markup? -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication, and walsender during recovery
Fujii Masao wrote: On Mon, Jan 18, 2010 at 2:19 PM, Fujii Masao masao.fu...@gmail.com wrote: When I configured a cascaded standby (i.e, made the additional standby server connect to the standby), I got the following errors, and a cascaded standby didn't start replication. ERROR: timeline 0 of the primary does not match recovery target timeline 1 I didn't care about that case so far. To avoid a confusing error message, we should forbid a startup of walsender during recovery, and emit a suitable message? Or support such cascade-configuration? Though I don't think that the latter is difficult to be implemented, ISTM it's not the time to do that now. We got the consensus that the cascading standby feature should be postponed to v9.1 or later. But when we wrongly make the standby connect to another standby, the following confusing message is still output. FATAL: timeline 0 of the primary does not match recovery target timeline 1 How about emitting the following message instead? Here is the patch. FATAL: recovery is in progress HINT: cannot accept the standby server during recovery. Commmitted. I edited the message and error code a bit: ereport(FATAL, (errcode(ERRCODE_CANNOT_CONNECT_NOW), errmsg(recovery is still in progress, can't accept WAL streaming connections))); ERRCODE_CANNOT_CONNECT_NOW is what we use when the system is shutting down etc, so that that seems appropriate. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug in 9.0Alpha4
Hi, I noticed a problem with the source code of 9.0Alpha 4. In parse_agg.c, there is a call made to transformSortClause. 00098 torder = transformSortClause http://doxygen.postgresql.org/parse__clause_8c.html#53199c36a198b5acf15a26fbd7311f79(pstate, 00099 agg-aggorder http://doxygen.postgresql.org/structAggref.html#f477b6dc44bd60585cabf8608dcf2047, 00100 tlist, 00101 true /* fix unknowns */ , 00102 true /* force SQL99 rules */ ); 00103 Here agg-aggorder should be a List of SortGroupClause pointers, whereas transformSortClause expects the second argument as a list of SortBy pointers. I verified the doxygen code by downloading the 9.0alpha4 version. I am trying to understand this piece of code, while i thought i should report this bug. Thanks, Gokul.
Re: [HACKERS] [RFC] Comments on PostPic project - Repost
Domenico Rotiroti wrote: I would like to receive comments/suggestions about this project: http://github.com/drotiro/postpic. In short, it's an extension that enables image processing within the database, adding a new type (image) and several functions. The image is represented by a struct containing some attributes (dimensions, some exif tag: shoot date, exposure time...) and a large object holding the actual image data. The idea is to have attributes stored directly to allow for efficient searching, while the large object seemed a reasonable choice to store the possibly large image data (what are the LOBs for?). With the current large objects implementation, when a new lo is created it lives in the pg_largeobjects table, until someone calls lo_unlink on it. In my case: I create the lo on behalf of the user, then store its oid in the image's internal representation. At this point, the image can be inserted in a table, processed and so on, but when it gets deleted the corresponding lo remains dangling, unless someone or something (eg. a trigger) takes care on destroying it. Is there a way of placing some kind of hook on an object's deletion? A clean way to do a reference counting on large objects? If you want a system with reference counts, you'd probably have to write it yourself using triggers. There's the vacuumlo contrib module that removes orphaned large objects. To avoid polluting pg_largeobjects, almost all of the image processing functions in PostPic return a 'temporary_image' object, which is just an alias on bytea. (Btw: I defined it using a DOMAIN. A better way?). Temporary images can be converted back to images when needed via a cast (often there is a variant of the function doing this automatically). Why don't you use bytea instead of large objects in the database? That way you won't have to worry about orphaned large objects, and you don't have to convert to bytea upon retrieval. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in 9.0Alpha4
Hi, I think, this should be the probable fix. There is agg_order in ParseFuncOrColumn, which should get passed on to transformAggregateCall and that should be placed in this call, instead of agg-aggorder. Thanks, Gokul. On Tue, Mar 16, 2010 at 5:19 PM, Gokulakannan Somasundaram gokul...@gmail.com wrote: Hi, I noticed a problem with the source code of 9.0Alpha 4. In parse_agg.c, there is a call made to transformSortClause. 00098 torder = transformSortClause http://doxygen.postgresql.org/parse__clause_8c.html#53199c36a198b5acf15a26fbd7311f79(pstate, 00099 agg-aggorder http://doxygen.postgresql.org/structAggref.html#f477b6dc44bd60585cabf8608dcf2047, 00100 tlist, 00101 true /* fix unknowns */ , 00102 true /* force SQL99 rules */ ); 00103 Here agg-aggorder should be a List of SortGroupClause pointers, whereas transformSortClause expects the second argument as a list of SortBy pointers. I verified the doxygen code by downloading the 9.0alpha4 version. I am trying to understand this piece of code, while i thought i should report this bug. Thanks, Gokul.
Re: [HACKERS] [RFC] Comments on PostPic project - Repost
On Tue, Mar 16, 2010 at 1:04 PM, Albe Laurenz laurenz.a...@wien.gv.atwrote: Domenico Rotiroti wrote: I would like to receive comments/suggestions about this project: http://github.com/drotiro/postpic. In short, it's an extension that enables image processing within the database, adding a new type (image) and several functions. The image is represented by a struct containing some attributes (dimensions, some exif tag: shoot date, exposure time...) and a large object holding the actual image data. The idea is to have attributes stored directly to allow for efficient searching, while the large object seemed a reasonable choice to store the possibly large image data (what are the LOBs for?). With the current large objects implementation, when a new lo is created it lives in the pg_largeobjects table, until someone calls lo_unlink on it. In my case: I create the lo on behalf of the user, then store its oid in the image's internal representation. At this point, the image can be inserted in a table, processed and so on, but when it gets deleted the corresponding lo remains dangling, unless someone or something (eg. a trigger) takes care on destroying it. Is there a way of placing some kind of hook on an object's deletion? A clean way to do a reference counting on large objects? If you want a system with reference counts, you'd probably have to write it yourself using triggers. There's the vacuumlo contrib module that removes orphaned large objects. To avoid polluting pg_largeobjects, almost all of the image processing functions in PostPic return a 'temporary_image' object, which is just an alias on bytea. (Btw: I defined it using a DOMAIN. A better way?). Temporary images can be converted back to images when needed via a cast (often there is a variant of the function doing this automatically). Why don't you use bytea instead of large objects in the database? That way you won't have to worry about orphaned large objects, and you don't have to convert to bytea upon retrieval. Yours, Laurenz Albe
Re: [HACKERS] [RFC] Comments on PostPic project - Repost
Albe Laurenz laurenz.a...@wien.gv.at wrote: There's the vacuumlo contrib module that removes orphaned large objects. Thanks for pointing out. Vacuumlo looks at oid and lo columns to find orphaned lobs, but the idea could be easily adapted/extended to looks for image columns. Why don't you use bytea instead of large objects in the database? That way you won't have to worry about orphaned large objects, and you don't have to convert to bytea upon retrieval. When I started coding PostPic I knew little about PostgreSQL's large objects implementation, and was used to more 'traditional' BLOB types found in other databases. Regards, Domenico
Re: [HACKERS] Bug in 9.0Alpha4
Gokulakannan Somasundaram escribió: Hi, I noticed a problem with the source code of 9.0Alpha 4. In parse_agg.c, there is a call made to transformSortClause. 00098 torder = transformSortClause http://doxygen.postgresql.org/parse__clause_8c.html#53199c36a198b5acf15a26fbd7311f79(pstate, 00099 agg-aggorder http://doxygen.postgresql.org/structAggref.html#f477b6dc44bd60585cabf8608dcf2047, 00100 tlist, 00101 true /* fix unknowns */ , 00102 true /* force SQL99 rules */ ); 00103 Here agg-aggorder should be a List of SortGroupClause pointers, whereas transformSortClause expects the second argument as a list of SortBy pointers. I verified the doxygen code by downloading the 9.0alpha4 version. I am trying to understand this piece of code, while i thought i should report this bug. Wow, it seems you're correct. This is quite obscure -- the result of the compiler not being able to check the type of pointers we store in Lists :-( -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] An idle thought
A few days ago there was a thread on one of our lists where someone was suprised it took as much i/o to delete data as it took to insert it in the first place. At first this does seem surprising but the fact that Postgres stores its transaction information inline with the data and does all i/o in blocks makes this inevitable. http://archives.postgresql.org/pgsql-performance/2010-03/msg00141.php However then I started thinking about this case and wondered if it wouldn't be possible to optimize. One of the suggested optimizations was to look at using TRUNCATE. But I wonder why it's necessary to use a dedicated command. Shouldn't it be possible for the system to notice this situation and do effectively the same thing itself? I'm picturing storing a bit in the visibility map indicating that *no* records are visible in a given page. If that bit is set then the page is virtually empty even if it contains data. If anyone tries to load the page they should just initialize a new page instead which will overwrite it. That way a big batch delete just has to set a bunch of flags in the visibility map to do the bulk delete. There are a couple problems with the way I've described this idea here. Firstly, if the deleting transaction hasn't committed yet or later aborts then of course the records need to still be visible. So either the visibility map would need an xmax for the page as a whole or it would need to only be set after the page has actually been vacuumed. Secondly there's the whole retail vacuum problem -- any index entries referring to this page would be left dangling unless there's some kind of retail vacuum or perhaps a page version number. I'm not 100% sure this idea is workable but if it is it might make batch deletes a lot less painful. Several orders of magnitude less i/o whenever a single transaction deletes many rows. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An idle thought
Greg Stark st...@mit.edu writes: However then I started thinking about this case and wondered if it wouldn't be possible to optimize. One of the suggested optimizations was to look at using TRUNCATE. But I wonder why it's necessary to use a dedicated command. Shouldn't it be possible for the system to notice this situation and do effectively the same thing itself? Not unless you'd like DELETE to always acquire exclusive lock... There are a couple problems with the way I've described this idea here. Precisely because of the lack of lock. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Comments on PostPic project - Repost
Albe Laurenz laurenz.a...@wien.gv.at wrote: There's the vacuumlo contrib module that removes orphaned large objects. Thanks for pointing out. Vacuumlo looks at oid and lo columns to find orphaned lobs, but the idea could be easily adapted/extended to looks for image columns. Why don't you use bytea instead of large objects in the database? That way you won't have to worry about orphaned large objects, and you don't have to convert to bytea upon retrieval. When I started coding PostPic I knew little about PostgreSQL's large objects implementation, and was used to more 'traditional' BLOB types found in other databases. Regards, Domenico
Re: [HACKERS] Bug in 9.0Alpha4
Gokulakannan Somasundaram gokul...@gmail.com writes: I noticed a problem with the source code of 9.0Alpha 4. In parse_agg.c, there is a call made to transformSortClause. ... Here agg-aggorder should be a List of SortGroupClause pointers, whereas transformSortClause expects the second argument as a list of SortBy pointers. Uh, no, read the comment at the head of transformAggregateCall: * parse_func.c has recognized the function as an aggregate, and has set * up all the fields of the Aggref except aggdistinct and agglevelsup. * However, the args list is just bare expressions, and the aggorder list * hasn't been transformed at all. * * Here we convert the args list into a targetlist by inserting TargetEntry * nodes, and then transform the aggorder and agg_distinct specifications to * produce lists of SortGroupClause nodes. (That might also result in adding * resjunk expressions to the targetlist.) transformSortClause is passed the untransformed aggorder list, which is in fact a list of SortBy nodes, and it returns the transformed list (SortGroupClause nodes), which is stored back into the aggorder field a bit further down. There are a number of regression tests that would fail in obvious ways if this code didn't work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in 9.0Alpha4
transformSortClause is passed the untransformed aggorder list, which is in fact a list of SortBy nodes, and it returns the transformed list (SortGroupClause nodes), which is stored back into the aggorder field a bit further down. There are a number of regression tests that would fail in obvious ways if this code didn't work. Right Tom. I got confused, because the comment at Aggref struct definition told that it is a list of SortGroupClause. May be you can update your comments there. Thanks, Gokul.
[HACKERS] parametrized NOTIFY - issue in plpgsql, maybe ToDo
Hello I am testing NOTIFY statement. It have to be used from plpgsql via EXECUTE, because doesn't support paramaters. Can be it documented somewhere? create or replace function foo(a varchar) returns void as $$ begin execute 'notify xxx, ' || quote_literal(a); return; end; $$ language plpgsql; CREATE FUNCTION Time: 2,513 ms pa...@postgres:5432=# select foo('pavel'); foo - (1 row) Time: 31,732 ms pa...@postgres:5432=# listen xxx; LISTEN Time: 0,271 ms pa...@postgres:5432=# select foo('pavel'); foo - (1 row) Time: 436,058 ms Asynchronous notification xxx with payload pavel received from server process with PID 4730. pa...@postgres:5432=# -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parametrized NOTIFY - issue in plpgsql, maybe ToDo
Pavel Stehule pavel.steh...@gmail.com writes: I am testing NOTIFY statement. It have to be used from plpgsql via EXECUTE, because doesn't support paramaters. Can be it documented somewhere? It is. Use the function instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parametrized NOTIFY - issue in plpgsql, maybe ToDo
2010/3/16 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: I am testing NOTIFY statement. It have to be used from plpgsql via EXECUTE, because doesn't support paramaters. Can be it documented somewhere? It is. Use the function instead. ok thank you for info Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in 9.0Alpha4
Gokulakannan Somasundaram gokul...@gmail.com writes: transformSortClause is passed the untransformed aggorder list, which is in fact a list of SortBy nodes, and it returns the transformed list (SortGroupClause nodes), which is stored back into the aggorder field a bit further down. Right Tom. I got confused, because the comment at Aggref struct definition told that it is a list of SortGroupClause. May be you can update your comments there. I think that comment is fine. The reason this is confusing is that ParseFuncOrColumn uses the Aggref node to carry a couple of things that logically are input parameters to transformAggregateCall(). Although this affects nothing else and is commented at both ends, apparently it's confusing anyway. When we were doing the ordered-aggregates patch, I considered passing all those values as explicit parameters to transformAggregateCall, and having it build the Aggref node from scratch and return it. However having seven or eight parameters to transformAggregateCall (and more in future if we ever add more features here) didn't really seem to be better style than abusing Aggref a bit. But maybe it is the best way after all. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dyamic updates of NEW with pl/pgsql
On 13.03.10 18:38 , Tom Lane wrote: I wrote: ... Maybe it would work to devise a notation that allows fetching or storing a field that has a runtime-determined name, but prespecifies the field type. Actually only the fetch end of it is an issue, since when storing the field datatype can be inferred from the expression you're trying to assign to the field. [ after more thought ] I wonder if it could work to treat the result of a record-fieldname operator as being of UNKNOWN type initially, and resolve its actual type in the parser in the same way we do for undecorated literals and parameters, to wit * you can explicitly cast it, viz (record-fieldname)::bigint * you can let it be inferred from context, such as the type of whatever it's compared to * throw error if type is not inferrable Then at runtime, if the actual type of the field turns out to not be what the parser inferred, either throw error or attempt a run-time type coercion. Throwing error seems safer, because it would avoid surprises of both semantic (unexpected behavior) and performance (expensive conversion you weren't expecting to happen) varieties. But possibly an automatic coercion would be useful enough to justify those risks. This is more or less what I've done in my pg_record_inspect module, only without parser or executor changes (it works with 8.4). The code can be found on http://github.com/fgp/pg_record_inspect. The module contains the function fieldvalue(RECORD, field NAME, defval ANYELEMENT, coerce BOOLEAN) RETURNS ANYELEMENT which returns the field named field from the record. The expected field type is specified by providing a default value in defval of the expected type. Since that argument's type is ANYELEMENT, just like the return type, the type system copes perfectly with the varying return type. You can choose whether to auto-coerce the field's value if it has a type other than defval's type or whether to raise an error. So in essence I'm using the ANYELEMENT trick to get a poor man's version of your idea that doesn't require core changes. My post about this module got zero responses though... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication, and walsender during recovery
On Tue, Mar 16, 2010 at 6:11 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Commmitted. I edited the message and error code a bit: ereport(FATAL, (errcode(ERRCODE_CANNOT_CONNECT_NOW), errmsg(recovery is still in progress, can't accept WAL streaming connections))); ERRCODE_CANNOT_CONNECT_NOW is what we use when the system is shutting down etc, so that that seems appropriate. Thanks! I agree that ERRCODE_CANNOT_CONNECT_NOW is more suitable. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in 9.0Alpha4
2010/3/17 Tom Lane t...@sss.pgh.pa.us: Gokulakannan Somasundaram gokul...@gmail.com writes: transformSortClause is passed the untransformed aggorder list, which is in fact a list of SortBy nodes, and it returns the transformed list (SortGroupClause nodes), which is stored back into the aggorder field a bit further down. Right Tom. I got confused, because the comment at Aggref struct definition told that it is a list of SortGroupClause. May be you can update your comments there. I think that comment is fine. The reason this is confusing is that ParseFuncOrColumn uses the Aggref node to carry a couple of things that logically are input parameters to transformAggregateCall(). Although this affects nothing else and is commented at both ends, apparently it's confusing anyway. When we were doing the ordered-aggregates patch, I considered passing all those values as explicit parameters to transformAggregateCall, and having it build the Aggref node from scratch and return it. However having seven or eight parameters to transformAggregateCall (and more in future if we ever add more features here) didn't really seem to be better style than abusing Aggref a bit. But maybe it is the best way after all. Thoughts? Well, I think the point is args and aggorder are hidden in the Aggref passed to transformAggregateCall, although they will be transformed in the function. Isn't it enough to add more parameters for them (agg_distinct is passed separately) and to leave the Aggref pointer passing as present? Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dyamic updates of NEW with pl/pgsql
On Tue, Mar 16, 2010 at 5:53 PM, Florian Pflug fgp.phlo@gmail.com wrote: which returns the field named field from the record. The expected field type is specified by providing a default value in defval of the expected type. Since that argument's type is ANYELEMENT, just like the return type, the type system copes perfectly with the varying return type. You can choose whether to auto-coerce the field's value if it has a type other than defval's type or whether to raise an error. So in essence I'm using the ANYELEMENT trick to get a poor man's version of your idea that doesn't require core changes. My post about this module got zero responses though... Why should we use what you've already written when we can just write it ourselves? Next you are going to say you're already using it and it works really well :-). I think it's pretty cool. Is it safe to have the main functions immutable and not stable though? Is there any benefit missed by not going through pl/pgsql directly (I'm guessing maybe more elegant caching)? It's a little weird that you can return anyelement from your function in cases that don't guarantee a type from the query. Are there any downsides to doing that? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers