Re: [PATCHES] [HACKERS] Text - C string
Brendan Jurd [EMAIL PROTECTED] writes: Well ... if somebody does want to change the representation of xml down the road, he's going to have to touch all the sites where the code converts to and from cstring anyway, right? True. With that in mind, please find attached my followup patch. It cleans up another 21 sites manually copying between cstring and varlena, for a net reduction of 115 lines of code. I applied most of this, but not the parts that were dependent on the assumption that bytea and text are the same. That is unlikely to remain true if we ever get around to putting locale/encoding information into text values. Furthermore it's a horrid type pun, because bytea can contain embedded nulls whereas cstrings can't; you were making unwarranted assumptions about whether, say, cstring_to_text_with_len would allow embedded nulls to go by. And lastly, quite a few of those changes were just plain broken, eg several places in selfuncs.c where you allowed strlen() to be applied to a bytea converted to cstring. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Text - C string
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sat, Mar 29, 2008 at 5:40 AM, Brendan Jurd wrote: On 29/03/2008, Tom Lane wrote: I intentionally didn't touch xml.c, nor anyplace that is not dealing in text, even if it happens to be binary-compatible with text. Hmm, okay. My original submission did include a few such changes; for example, in xml_in and xml_out_internal I saw that the conversion between xmltype and cstring was identical to the text conversion, so I went ahead and used the text functions. Feedback upthread suggested that it was okay to go ahead with casting in identical cases. [1] I saw that these changes made it into the commit, so I assumed that it was the right call. If we don't want to meddle with xmltype/bytea/VarChar at all, we'll have to revert those changes, and I'll have to seriously scale back the cleanup patch I was about to post. Still not sure where we stand on the above. To cast, or not to cast? Cheers, BJ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.7 (GNU/Linux) Comment: http://getfiregpg.org iD8DBQFIBaFy5YBsbHkuyV0RAsMmAKDHaEb7aMudKJgVxcf5RRcOtAJ+bwCgivLI 5B3xJze46NGWjEyOpq/TSaY= =BObd - -END PGP SIGNATURE- -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.7 (GNU/Linux) Comment: http://getfiregpg.org iD8DBQFIBaIl5YBsbHkuyV0RArDDAJ0QThLXAhzy+NX+2YsF+q4z/sIy1QCeJPiW s/rVns3FFQVP5g9DTOshDfQ= =4Tdh -END PGP SIGNATURE- -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Text - C string
Brendan Jurd [EMAIL PROTECTED] writes: If we don't want to meddle with xmltype/bytea/VarChar at all, we'll have to revert those changes, and I'll have to seriously scale back the cleanup patch I was about to post. Still not sure where we stand on the above. To cast, or not to cast? I dunno. I know there was previously some handwaving about representing XML values in some more intelligent fashion than a plain text string, but I have no idea if anyone is likely to do anything about it in the foreseeable future. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Text - C string
Tom Lane wrote: Brendan Jurd [EMAIL PROTECTED] writes: If we don't want to meddle with xmltype/bytea/VarChar at all, we'll have to revert those changes, and I'll have to seriously scale back the cleanup patch I was about to post. Still not sure where we stand on the above. To cast, or not to cast? I dunno. I know there was previously some handwaving about representing XML values in some more intelligent fashion than a plain text string, but I have no idea if anyone is likely to do anything about it in the foreseeable future. It seems very unlikely to me. cheers andrew -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Text - C string
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, Apr 17, 2008 at 1:16 AM, Tom Lane wrote: Brendan Jurd writes: If we don't want to meddle with xmltype/bytea/VarChar at all, we'll have to revert those changes, and I'll have to seriously scale back the cleanup patch I was about to post. Still not sure where we stand on the above. To cast, or not to cast? I dunno. I know there was previously some handwaving about representing XML values in some more intelligent fashion than a plain text string, but I have no idea if anyone is likely to do anything about it in the foreseeable future. Well ... if somebody does want to change the representation of xml down the road, he's going to have to touch all the sites where the code converts to and from cstring anyway, right? So the only real difference this will make is that, instead of having to replace four lines of VARDATA/memcpy per site, he'll have to replace a single function call. That doesn't seem like a negative. With that in mind, please find attached my followup patch. It cleans up another 21 sites manually copying between cstring and varlena, for a net reduction of 115 lines of code. I didn't attempt to work through every reference to VARDATA, but I did look at every hit from a `grep -Rn VARDATA . | grep memcpy`. All regression tests passed (contrib tests included) on gentoo. Patch added to commitfest queue. Cheers, BJ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: http://getfiregpg.org iD8DBQFIBm105YBsbHkuyV0RAmqfAKCfyNyFdciqX4QV81sG9MhPt+KXuACfe694 3d/ICZF6yqV6K20X3TVX+So= =CvRM -END PGP SIGNATURE- text-cstring-followup.diff.bz2 Description: BZip2 compressed data -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Text - C string
On 26/03/2008, Tom Lane [EMAIL PROTECTED] wrote: There are no textout/textin calls left, but I may have missed some places that were doing it the hard way with direct palloc/memcpy manipulations. It might be worth trolling all the VARDATA() references to see if any more are easily replaceable. I think there are perhaps a dozen such sites, and I hope to submit a cleanup patch that gets rid of these soon. I did come across one site I'm not sure about in utils/adt/xml.c:291 /* * We need a null-terminated string to pass to parse_xml_decl(). Rather * than make a separate copy, make the temporary result one byte bigger * than it needs to be. */ result = palloc(nbytes + 1 + VARHDRSZ); SET_VARSIZE(result, nbytes + VARHDRSZ); memcpy(VARDATA(result), str, nbytes); str = VARDATA(result); str[nbytes] = '\0'; The author seemed pretty sure he didn't want to make a separate copy of the string, so I'm thinking there's not a whole lot I can do to improve this site. I notice in particular that xfunc.sgml contains sample C functions to copy and concatenate text. While these aren't directly replaceable with the new functions, I wonder whether we ought to change the examples to make them less certain to break if we ever change text's representation. Yes, these sample C functions are shown in tutorial/funcs.c and funcs_new.c as well. I agree that the examples could do with changing, but don't have any keen insight on what to change them to. -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Text - C string
Brendan Jurd [EMAIL PROTECTED] writes: On 26/03/2008, Tom Lane [EMAIL PROTECTED] wrote: There are no textout/textin calls left, but I may have missed some places that were doing it the hard way with direct palloc/memcpy manipulations. It might be worth trolling all the VARDATA() references to see if any more are easily replaceable. I think there are perhaps a dozen such sites, and I hope to submit a cleanup patch that gets rid of these soon. I did come across one site I'm not sure about in utils/adt/xml.c:291 I intentionally didn't touch xml.c, nor anyplace that is not dealing in text, even if it happens to be binary-compatible with text. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Text - C string
On 29/03/2008, Tom Lane [EMAIL PROTECTED] wrote: I intentionally didn't touch xml.c, nor anyplace that is not dealing in text, even if it happens to be binary-compatible with text. Hmm, okay. My original submission did include a few such changes; for example, in xml_in and xml_out_internal I saw that the conversion between xmltype and cstring was identical to the text conversion, so I went ahead and used the text functions. Feedback upthread suggested that it was okay to go ahead with casting in identical cases. [1] I saw that these changes made it into the commit, so I assumed that it was the right call. If we don't want to meddle with xmltype/bytea/VarChar at all, we'll have to revert those changes, and I'll have to seriously scale back the cleanup patch I was about to post. Regards, BJ [1] http://archives.postgresql.org/pgsql-hackers/2007-09/msg01094.php -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Text - C string
Brendan Jurd [EMAIL PROTECTED] writes: As discussed on -hackers, I'm trying to get rid of some redundant code by creating a widely useful set of functions to convert between text and C string in the backend. Applied with revisions --- the functions were modified as per recent discussion, and I fixed a lot more potential call sites. There are no textout/textin calls left, but I may have missed some places that were doing it the hard way with direct palloc/memcpy manipulations. It might be worth trolling all the VARDATA() references to see if any more are easily replaceable. I notice in particular that xfunc.sgml contains sample C functions to copy and concatenate text. While these aren't directly replaceable with the new functions, I wonder whether we ought to change the examples to make them less certain to break if we ever change text's representation. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Text - C string
Brendan Jurd [EMAIL PROTECTED] writes: One of the questions in the original patch submission was whether it would be worth changing all those DirectFunctionCall(textin) and (textout) calls to use the new functions. Is it worthwhile avoiding the fmgr overhead? I think that's worth doing just on notational clarity grounds. The small cycle savings doesn't excite me, but understanding DirectFunctionCall1(textin, CStringGetDatum(foo)) just involves more different bits of trivia than cstring_to_text(foo). Last time I looked, the codebase had shifted quite a bit since I originally wrote the patch. So it probably needs some work to apply cleanly on the latest sources anyway. Yeah, with wide-impact patches like this you are always going to have that problem. One point though is that we don't have to improve every call site at the same time. I'd be inclined to put in the new functions and hit some representative sample of utils/adt/ files in the first commit, and then incrementally fix other stuff. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Text - C string
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --- Brendan Jurd wrote: As discussed on -hackers, I'm trying to get rid of some redundant code by creating a widely useful set of functions to convert between text and C string in the backend. The new extern functions, declared in include/utils/builtins.h and defined in backend/utils/adt/varlena.c, are: char * text_cstring(const text *t) char * text_cstring_limit(const text *t, int len) text * cstring_text(const char *s) text * cstring_text_limit(const char *s, int len) Within varlena.c, the actual conversions are performed by: char * do_text_cstring(const text *t, const int len) text * do_cstring_text(const char *s, const int len) These functions now do the work for the fmgr functions textin and textout, as well as being directly accessible by backend code. I've searched through the backend for any code which converted between text and C string manually (with memcpy and VARDATA), replacing with calls to one of the four new functions as appropriate. I came across some areas which were using the same, or similar, conversion technique on other varlena data types, such as bytea or xmltype. In cases where the conversion was completely identical I used the new functions. In cases with any differences (even if they seemed minor) I played it safe and left them alone. I'd now like to submit my work so far for review. This patch compiled cleanly on Linux and passed all parallel regression tests. It appears to be performance-neutral based on a few rough tests; I haven't tried to profile the changes in detail. There is still a lot of code out there using DirectFunctionCall1 to call text(in|out)). I've decided to wait for some community feedback on the patch as it stands before replacing those calls. There are a great many, and it would be a shame to have to go through them more than once. I would naively expect that replacing fmgr calls with direct calls would lead to a performance gain (no fmgr overhead), but honestly I'm not sure whether that would actually make a difference. Thanks for your time, BJ [ Attachment, skipping... ] ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] [HACKERS] Text - C string
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --- Brendan Jurd wrote: As discussed on -hackers, I'm trying to get rid of some redundant code by creating a widely useful set of functions to convert between text and C string in the backend. The new extern functions, declared in include/utils/builtins.h and defined in backend/utils/adt/varlena.c, are: char * text_cstring(const text *t) char * text_cstring_limit(const text *t, int len) text * cstring_text(const char *s) text * cstring_text_limit(const char *s, int len) Within varlena.c, the actual conversions are performed by: char * do_text_cstring(const text *t, const int len) text * do_cstring_text(const char *s, const int len) These functions now do the work for the fmgr functions textin and textout, as well as being directly accessible by backend code. I've searched through the backend for any code which converted between text and C string manually (with memcpy and VARDATA), replacing with calls to one of the four new functions as appropriate. I came across some areas which were using the same, or similar, conversion technique on other varlena data types, such as bytea or xmltype. In cases where the conversion was completely identical I used the new functions. In cases with any differences (even if they seemed minor) I played it safe and left them alone. I'd now like to submit my work so far for review. This patch compiled cleanly on Linux and passed all parallel regression tests. It appears to be performance-neutral based on a few rough tests; I haven't tried to profile the changes in detail. There is still a lot of code out there using DirectFunctionCall1 to call text(in|out)). I've decided to wait for some community feedback on the patch as it stands before replacing those calls. There are a great many, and it would be a shame to have to go through them more than once. I would naively expect that replacing fmgr calls with direct calls would lead to a performance gain (no fmgr overhead), but honestly I'm not sure whether that would actually make a difference. Thanks for your time, BJ [ Attachment, skipping... ] ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Text - C string
As discussed on -hackers, I'm trying to get rid of some redundant code by creating a widely useful set of functions to convert between text and C string in the backend. The new extern functions, declared in include/utils/builtins.h and defined in backend/utils/adt/varlena.c, are: char * text_cstring(const text *t) char * text_cstring_limit(const text *t, int len) text * cstring_text(const char *s) text * cstring_text_limit(const char *s, int len) Within varlena.c, the actual conversions are performed by: char * do_text_cstring(const text *t, const int len) text * do_cstring_text(const char *s, const int len) These functions now do the work for the fmgr functions textin and textout, as well as being directly accessible by backend code. I've searched through the backend for any code which converted between text and C string manually (with memcpy and VARDATA), replacing with calls to one of the four new functions as appropriate. I came across some areas which were using the same, or similar, conversion technique on other varlena data types, such as bytea or xmltype. In cases where the conversion was completely identical I used the new functions. In cases with any differences (even if they seemed minor) I played it safe and left them alone. I'd now like to submit my work so far for review. This patch compiled cleanly on Linux and passed all parallel regression tests. It appears to be performance-neutral based on a few rough tests; I haven't tried to profile the changes in detail. There is still a lot of code out there using DirectFunctionCall1 to call text(in|out)). I've decided to wait for some community feedback on the patch as it stands before replacing those calls. There are a great many, and it would be a shame to have to go through them more than once. I would naively expect that replacing fmgr calls with direct calls would lead to a performance gain (no fmgr overhead), but honestly I'm not sure whether that would actually make a difference. Thanks for your time, BJ text-cstring_1.diff.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate