Re: [PATCHES] [HACKERS] Text - C string

2008-05-04 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 Well ... if somebody does want to change the representation of xml
 down the road, he's going to have to touch all the sites where the
 code converts to and from cstring anyway, right?

True.

 With that in mind, please find attached my followup patch.  It cleans
 up another 21 sites manually copying between cstring and varlena, for
 a net reduction of 115 lines of code.

I applied most of this, but not the parts that were dependent on the
assumption that bytea and text are the same.  That is unlikely to remain
true if we ever get around to putting locale/encoding information into
text values.  Furthermore it's a horrid type pun, because bytea can
contain embedded nulls whereas cstrings can't; you were making
unwarranted assumptions about whether, say, cstring_to_text_with_len
would allow embedded nulls to go by.  And lastly, quite a few of those
changes were just plain broken, eg several places in selfuncs.c where
you allowed strlen() to be applied to a bytea converted to cstring.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Text - C string

2008-04-16 Thread Brendan Jurd
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

- -BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sat, Mar 29, 2008 at 5:40 AM, Brendan Jurd  wrote:
 On 29/03/2008, Tom Lane  wrote:
   I intentionally didn't touch xml.c, nor anyplace that is not dealing
in text, even if it happens to be binary-compatible with text.
  

  Hmm, okay.  My original submission did include a few such changes; for
  example, in xml_in and xml_out_internal I saw that the conversion
  between xmltype and cstring was identical to the text conversion, so I
  went ahead and used the text functions.  Feedback upthread suggested
  that it was okay to go ahead with casting in identical cases. [1]

  I saw that these changes made it into the commit, so I assumed that it
  was the right call.

  If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
  have to revert those changes, and I'll have to seriously scale back
  the cleanup patch I was about to post.

Still not sure where we stand on the above.  To cast, or not to cast?

Cheers,
BJ
- -BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.7 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFIBaFy5YBsbHkuyV0RAsMmAKDHaEb7aMudKJgVxcf5RRcOtAJ+bwCgivLI
5B3xJze46NGWjEyOpq/TSaY=
=BObd
- -END PGP SIGNATURE-

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.7 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFIBaIl5YBsbHkuyV0RArDDAJ0QThLXAhzy+NX+2YsF+q4z/sIy1QCeJPiW
s/rVns3FFQVP5g9DTOshDfQ=
=4Tdh
-END PGP SIGNATURE-

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Text - C string

2008-04-16 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
 have to revert those changes, and I'll have to seriously scale back
 the cleanup patch I was about to post.

 Still not sure where we stand on the above.  To cast, or not to cast?

I dunno.  I know there was previously some handwaving about representing
XML values in some more intelligent fashion than a plain text string,
but I have no idea if anyone is likely to do anything about it in the
foreseeable future.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Text - C string

2008-04-16 Thread Andrew Dunstan



Tom Lane wrote:

Brendan Jurd [EMAIL PROTECTED] writes:
  

If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
have to revert those changes, and I'll have to seriously scale back
the cleanup patch I was about to post.
  


  

Still not sure where we stand on the above.  To cast, or not to cast?



I dunno.  I know there was previously some handwaving about representing
XML values in some more intelligent fashion than a plain text string,
but I have no idea if anyone is likely to do anything about it in the
foreseeable future.


  


It seems very unlikely to me.

cheers

andrew

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Text - C string

2008-04-16 Thread Brendan Jurd
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Apr 17, 2008 at 1:16 AM, Tom Lane  wrote:
 Brendan Jurd  writes:

  If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
   have to revert those changes, and I'll have to seriously scale back
   the cleanup patch I was about to post.

   Still not sure where we stand on the above.  To cast, or not to cast?

  I dunno.  I know there was previously some handwaving about representing
  XML values in some more intelligent fashion than a plain text string,
  but I have no idea if anyone is likely to do anything about it in the
  foreseeable future.


Well ... if somebody does want to change the representation of xml
down the road, he's going to have to touch all the sites where the
code converts to and from cstring anyway, right?

So the only real difference this will make is that, instead of having
to replace four lines of VARDATA/memcpy per site, he'll have to
replace a single function call.  That doesn't seem like a negative.

With that in mind, please find attached my followup patch.  It cleans
up another 21 sites manually copying between cstring and varlena, for
a net reduction of 115 lines of code.

I didn't attempt to work through every reference to VARDATA, but I did
look at every hit from a  `grep -Rn VARDATA . | grep memcpy`.

All regression tests passed (contrib tests included) on gentoo.

Patch added to commitfest queue.

Cheers,
BJ
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFIBm105YBsbHkuyV0RAmqfAKCfyNyFdciqX4QV81sG9MhPt+KXuACfe694
3d/ICZF6yqV6K20X3TVX+So=
=CvRM
-END PGP SIGNATURE-


text-cstring-followup.diff.bz2
Description: BZip2 compressed data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Text - C string

2008-03-28 Thread Brendan Jurd
On 26/03/2008, Tom Lane [EMAIL PROTECTED] wrote:
  There are no textout/textin calls left, but I may have missed some
  places that were doing it the hard way with direct palloc/memcpy
  manipulations.  It might be worth trolling all the VARDATA() references
  to see if any more are easily replaceable.


I think there are perhaps a dozen such sites, and I hope to submit a
cleanup patch that gets rid of these soon.

I did come across one site I'm not sure about in utils/adt/xml.c:291

/*
 * We need a null-terminated string to pass to parse_xml_decl().  Rather
 * than make a separate copy, make the temporary result one byte bigger
 * than it needs to be.
 */
result = palloc(nbytes + 1 + VARHDRSZ);
SET_VARSIZE(result, nbytes + VARHDRSZ);
memcpy(VARDATA(result), str, nbytes);
str = VARDATA(result);
str[nbytes] = '\0';

The author seemed pretty sure he didn't want to make a separate copy
of the string, so I'm thinking there's not a whole lot I can do to
improve this site.

  I notice in particular that xfunc.sgml contains sample C functions to
  copy and concatenate text.  While these aren't directly replaceable
  with the new functions, I wonder whether we ought to change the examples
  to make them less certain to break if we ever change text's
  representation.


Yes, these sample C functions are shown in tutorial/funcs.c and
funcs_new.c as well.  I agree that the examples could do with
changing, but don't have any keen insight on what to change them to.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Text - C string

2008-03-28 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 On 26/03/2008, Tom Lane [EMAIL PROTECTED] wrote:
 There are no textout/textin calls left, but I may have missed some
 places that were doing it the hard way with direct palloc/memcpy
 manipulations.  It might be worth trolling all the VARDATA() references
 to see if any more are easily replaceable.

 I think there are perhaps a dozen such sites, and I hope to submit a
 cleanup patch that gets rid of these soon.
 I did come across one site I'm not sure about in utils/adt/xml.c:291

I intentionally didn't touch xml.c, nor anyplace that is not dealing
in text, even if it happens to be binary-compatible with text.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Text - C string

2008-03-28 Thread Brendan Jurd
On 29/03/2008, Tom Lane [EMAIL PROTECTED] wrote:
 I intentionally didn't touch xml.c, nor anyplace that is not dealing
  in text, even if it happens to be binary-compatible with text.


Hmm, okay.  My original submission did include a few such changes; for
example, in xml_in and xml_out_internal I saw that the conversion
between xmltype and cstring was identical to the text conversion, so I
went ahead and used the text functions.  Feedback upthread suggested
that it was okay to go ahead with casting in identical cases. [1]

I saw that these changes made it into the commit, so I assumed that it
was the right call.

If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
have to revert those changes, and I'll have to seriously scale back
the cleanup patch I was about to post.

Regards,
BJ

[1] http://archives.postgresql.org/pgsql-hackers/2007-09/msg01094.php

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Text - C string

2008-03-25 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 As discussed on -hackers, I'm trying to get rid of some redundant code
 by creating a widely useful set of functions to convert between text
 and C string in the backend.

Applied with revisions --- the functions were modified as per recent
discussion, and I fixed a lot more potential call sites.

There are no textout/textin calls left, but I may have missed some
places that were doing it the hard way with direct palloc/memcpy
manipulations.  It might be worth trolling all the VARDATA() references
to see if any more are easily replaceable.

I notice in particular that xfunc.sgml contains sample C functions to
copy and concatenate text.  While these aren't directly replaceable
with the new functions, I wonder whether we ought to change the examples
to make them less certain to break if we ever change text's
representation.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Text - C string

2008-03-19 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 One of the questions in the original patch submission was whether it
 would be worth changing all those DirectFunctionCall(textin) and
 (textout) calls to use the new functions.  Is it worthwhile avoiding
 the fmgr overhead?

I think that's worth doing just on notational clarity grounds.
The small cycle savings doesn't excite me, but understanding
DirectFunctionCall1(textin, CStringGetDatum(foo)) just involves
more different bits of trivia than cstring_to_text(foo).

 Last time I looked, the codebase had shifted quite a bit since I
 originally wrote the patch.  So it probably needs some work to apply
 cleanly on the latest sources anyway.

Yeah, with wide-impact patches like this you are always going to have
that problem.  One point though is that we don't have to improve every
call site at the same time.  I'd be inclined to put in the new functions
and hit some representative sample of utils/adt/ files in the first
commit, and then incrementally fix other stuff.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Text - C string

2007-11-04 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Brendan Jurd wrote:
 As discussed on -hackers, I'm trying to get rid of some redundant code
 by creating a widely useful set of functions to convert between text
 and C string in the backend.
 
 The new extern functions, declared in include/utils/builtins.h and
 defined in backend/utils/adt/varlena.c, are:
 
 char * text_cstring(const text *t)
 char * text_cstring_limit(const text *t, int len)
 text * cstring_text(const char *s)
 text * cstring_text_limit(const char *s, int len)
 
 Within varlena.c, the actual conversions are performed by:
 
 char * do_text_cstring(const text *t, const int len)
 text * do_cstring_text(const char *s, const int len)
 
 These functions now do the work for the fmgr functions textin and
 textout, as well as being directly accessible by backend code.
 
 I've searched through the backend for any code which converted between
 text and C string manually (with memcpy and VARDATA), replacing with
 calls to one of the four new functions as appropriate.
 
 I came across some areas which were using the same, or similar,
 conversion technique on other varlena data types, such as bytea or
 xmltype.  In cases where the conversion was completely identical I
 used the new functions.  In cases with any differences (even if they
 seemed minor) I played it safe and left them alone.
 
 I'd now like to submit my work so far for review.  This patch compiled
 cleanly on Linux and passed all parallel regression tests.  It appears
 to be performance-neutral based on a few rough tests; I haven't tried
 to profile the changes in detail.
 
 There is still a lot of code out there using DirectFunctionCall1 to
 call text(in|out)).  I've decided to wait for some community feedback
 on the patch as it stands before replacing those calls.  There are a
 great many, and it would be a shame to have to go through them more
 than once.
 
 I would naively expect that replacing fmgr calls with direct calls
 would lead to a performance gain (no fmgr overhead), but honestly I'm
 not sure whether that would actually make a difference.
 
 Thanks for your time,
 BJ

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Text - C string

2007-11-03 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Brendan Jurd wrote:
 As discussed on -hackers, I'm trying to get rid of some redundant code
 by creating a widely useful set of functions to convert between text
 and C string in the backend.
 
 The new extern functions, declared in include/utils/builtins.h and
 defined in backend/utils/adt/varlena.c, are:
 
 char * text_cstring(const text *t)
 char * text_cstring_limit(const text *t, int len)
 text * cstring_text(const char *s)
 text * cstring_text_limit(const char *s, int len)
 
 Within varlena.c, the actual conversions are performed by:
 
 char * do_text_cstring(const text *t, const int len)
 text * do_cstring_text(const char *s, const int len)
 
 These functions now do the work for the fmgr functions textin and
 textout, as well as being directly accessible by backend code.
 
 I've searched through the backend for any code which converted between
 text and C string manually (with memcpy and VARDATA), replacing with
 calls to one of the four new functions as appropriate.
 
 I came across some areas which were using the same, or similar,
 conversion technique on other varlena data types, such as bytea or
 xmltype.  In cases where the conversion was completely identical I
 used the new functions.  In cases with any differences (even if they
 seemed minor) I played it safe and left them alone.
 
 I'd now like to submit my work so far for review.  This patch compiled
 cleanly on Linux and passed all parallel regression tests.  It appears
 to be performance-neutral based on a few rough tests; I haven't tried
 to profile the changes in detail.
 
 There is still a lot of code out there using DirectFunctionCall1 to
 call text(in|out)).  I've decided to wait for some community feedback
 on the patch as it stands before replacing those calls.  There are a
 great many, and it would be a shame to have to go through them more
 than once.
 
 I would naively expect that replacing fmgr calls with direct calls
 would lead to a performance gain (no fmgr overhead), but honestly I'm
 not sure whether that would actually make a difference.
 
 Thanks for your time,
 BJ

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] Text - C string

2007-10-01 Thread Brendan Jurd
As discussed on -hackers, I'm trying to get rid of some redundant code
by creating a widely useful set of functions to convert between text
and C string in the backend.

The new extern functions, declared in include/utils/builtins.h and
defined in backend/utils/adt/varlena.c, are:

char * text_cstring(const text *t)
char * text_cstring_limit(const text *t, int len)
text * cstring_text(const char *s)
text * cstring_text_limit(const char *s, int len)

Within varlena.c, the actual conversions are performed by:

char * do_text_cstring(const text *t, const int len)
text * do_cstring_text(const char *s, const int len)

These functions now do the work for the fmgr functions textin and
textout, as well as being directly accessible by backend code.

I've searched through the backend for any code which converted between
text and C string manually (with memcpy and VARDATA), replacing with
calls to one of the four new functions as appropriate.

I came across some areas which were using the same, or similar,
conversion technique on other varlena data types, such as bytea or
xmltype.  In cases where the conversion was completely identical I
used the new functions.  In cases with any differences (even if they
seemed minor) I played it safe and left them alone.

I'd now like to submit my work so far for review.  This patch compiled
cleanly on Linux and passed all parallel regression tests.  It appears
to be performance-neutral based on a few rough tests; I haven't tried
to profile the changes in detail.

There is still a lot of code out there using DirectFunctionCall1 to
call text(in|out)).  I've decided to wait for some community feedback
on the patch as it stands before replacing those calls.  There are a
great many, and it would be a shame to have to go through them more
than once.

I would naively expect that replacing fmgr calls with direct calls
would lead to a performance gain (no fmgr overhead), but honestly I'm
not sure whether that would actually make a difference.

Thanks for your time,
BJ


text-cstring_1.diff.gz
Description: GNU Zip compressed data

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate