Re: [HACKERS] Text - C string

2008-03-25 Thread Tom Lane
I've been working some more on Brendan Jurd's patch to simplify text -
C string conversions.  It seems we have consensus on the names for the
base operations:

extern text *cstring_to_text(const char *s);
extern char *text_to_cstring(const text *t);

Brendan's patch also included cstring_text_limit(const char *s, int len)
which was defined as copying Min(len, strlen(s)) bytes.  I didn't find
this to be particularly useful.  In the first place, all potential
callers are passing the exact desired length, so the strlen() call is
just a waste of cycles.  In the second place, at least some callers pass
text that is not embedded in a known-to-be-null-terminated string (it
could be a section of a text datum instead); which means there is a
nonzero chance of the strlen running off the end of memory and dumping
core.  So I propose instead

extern text *cstring_to_text_with_len(const char *s, int len);

which just takes the given length as gospel.  Brendan had also proposed
text_to_cstring_limit(const text *t, int len) with similar Min()
semantics, but what this was doing was replacing copies into
limited-size local buffers with a palloc.  If we did that we might
as well just use text_to_cstring.  What I think is more useful is
a strlcpy()-like function that copies into a caller-supplied buffer
of limited size.  For lack of a better idea I propose defining it
*exactly* like strlcpy:

extern size_t textlcpy(char *dst, const text *src, size_t siz);

I've also found that there are lots and lots of places where the
text end of the conversion needs to be a Datum not a text *,
so it seems worthwhile to introduce a couple of macros to minimize
notation in that case:

#define CStringGetTextDatum(s) PointerGetDatum(cstring_to_text(s))
#define TextDatumGetCString(d) text_to_cstring((text *) DatumGetPointer(d))

Lastly, the originally submitted text-to-something functions would
work correctly on plain and 1-byte-header datums, but not on
compressed or toasted-out-of-line datums.  There are a whole lot of
places where that's not good enough.  Rather than expecting the caller
to use the right detoasting macro everywhere, it seems best to make
these functions cope with any variant.  That also avoids memory
leakage by allowing the intermediate copy to be pfree'd.  (I had
suggested that the pfree might be pointless, but I reconsidered ---
if the text object is large enough to be compressed or toasted,
we're talking about at least several K, so it's worth not leaking.)

In short, the infrastructure I'm currently testing is the above
definitions with the attached implementation.  Last call for
objections ...

regards, tom lane


/*
 * cstring_to_text
 *
 * Create a text value from a null-terminated C string.
 *
 * The new text value is freshly palloc'd with a full-size VARHDR.
 */
text *
cstring_to_text(const char *s)
{
return cstring_to_text_with_len(s, strlen(s));
}

/*
 * cstring_to_text_with_len
 *
 * Same as cstring_to_text except the caller specifies the string length;
 * the string need not be null_terminated.
 */
text *
cstring_to_text_with_len(const char *s, int len)
{
text   *result = (text *) palloc(len + VARHDRSZ);

SET_VARSIZE(result, len + VARHDRSZ);
memcpy(VARDATA(result), s, len);

return result;
}

/*
 * text_to_cstring
 *
 * Create a palloc'd, null-terminated C string from a text value.
 *
 * We support being passed a compressed or toasted text value.
 * This is a bit bogus since such values shouldn't really be referred to as
 * text *, but it seems useful for robustness.  If we didn't handle that
 * case here, we'd need another routine that did, anyway.
 */
char *
text_to_cstring(const text *t)
{
char   *result;
text   *tunpacked = pg_detoast_datum_packed((struct varlena *) t);
int len = VARSIZE_ANY_EXHDR(tunpacked);

result = (char *) palloc(len + 1);
memcpy(result, VARDATA_ANY(tunpacked), len);
result[len] = '\0';

if (tunpacked != t)
pfree(tunpacked);

return result;
}

/*
 * textlcpy --- exactly like strlcpy(), except source is a text value.
 *
 * Copy src to string dst of size siz.  At most siz-1 characters
 * will be copied.  Always NUL terminates (unless siz == 0).
 * Returns strlen(src); if retval = siz, truncation occurred.
 *
 * We support being passed a compressed or toasted text value.
 * This is a bit bogus since such values shouldn't really be referred to as
 * text *, but it seems useful for robustness.  If we didn't handle that
 * case here, we'd need another routine that did, anyway.
 */
size_t
textlcpy(char *dst, const text *src, size_t siz)
{
text   *srcunpacked = pg_detoast_datum_packed((struct varlena *) 
src);
size_t  srclen = VARSIZE_ANY_EXHDR(srcunpacked);

if (siz  0)
{
siz--;
if (siz = srclen)
siz = 

Re: [HACKERS] Text - C string

2008-03-25 Thread Brendan Jurd
On 26/03/2008, Tom Lane [EMAIL PROTECTED] wrote:
  Brendan's patch also included cstring_text_limit(const char *s, int len)
  which was defined as copying Min(len, strlen(s)) bytes.  I didn't find
  this to be particularly useful.  In the first place, all potential
  callers are passing the exact desired length, so the strlen() call is
  just a waste of cycles.  In the second place, at least some callers pass
  text that is not embedded in a known-to-be-null-terminated string (it
  could be a section of a text datum instead); which means there is a
  nonzero chance of the strlen running off the end of memory and dumping
  core.  So I propose instead

  extern text *cstring_to_text_with_len(const char *s, int len);


That all makes sense to me.  I think the new name is good.  It's
pretty long, but I'm not seeing a shorter name that accurately
describes the function.

  which just takes the given length as gospel.  Brendan had also proposed
  text_to_cstring_limit(const text *t, int len) with similar Min()
  semantics, but what this was doing was replacing copies into
  limited-size local buffers with a palloc.  If we did that we might
  as well just use text_to_cstring.  What I think is more useful is
  a strlcpy()-like function that copies into a caller-supplied buffer
  of limited size.  For lack of a better idea I propose defining it
  *exactly* like strlcpy:

  extern size_t textlcpy(char *dst, const text *src, size_t siz);


I'm all for providing a function with this behaviour, but is
textlcpy() a bit ambiguous?  It's not clear from the name whether the
function copies text - text, text - cstring or cstring - text.  In
fact, if I didn't already know better I'd probably assume that the
function copied text - text with length, in the same way strlcpy
copies string - string.

A text_to_cstring_with_len() or text_to_cstring_limit() might be more
to the point, and more consistent with the other functions in the
family.

On the other hand, maybe some difference in naming would help make it
obvious to callers that, unlike its siblings, textlcpy() takes the
destination string as an argument rather than returning it.
text_to_cstring_lcpy()?

  I've also found that there are lots and lots of places where the
  text end of the conversion needs to be a Datum not a text *,
  so it seems worthwhile to introduce a couple of macros to minimize
  notation in that case:

  #define CStringGetTextDatum(s) PointerGetDatum(cstring_to_text(s))
  #define TextDatumGetCString(d) text_to_cstring((text *) DatumGetPointer(d))


Yes, I recall coming across a number of sites where these macros would
come in handy.

  Lastly, the originally submitted text-to-something functions would
  work correctly on plain and 1-byte-header datums, but not on
  compressed or toasted-out-of-line datums.  There are a whole lot of
  places where that's not good enough.  Rather than expecting the caller
  to use the right detoasting macro everywhere, it seems best to make
  these functions cope with any variant.  That also avoids memory
  leakage by allowing the intermediate copy to be pfree'd.  (I had
  suggested that the pfree might be pointless, but I reconsidered ---
  if the text object is large enough to be compressed or toasted,
  we're talking about at least several K, so it's worth not leaking.)


Excellent.  My patch didn't contemplate dealing with
compressed/toasted datums because, quite frankly, I didn't know *how*
to deal with them correctly.  Much to learn about varlenas, I still
have.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Text - C string

2008-03-25 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 On 26/03/2008, Tom Lane [EMAIL PROTECTED] wrote:
 ... What I think is more useful is
 a strlcpy()-like function that copies into a caller-supplied buffer
 of limited size.  For lack of a better idea I propose defining it
 *exactly* like strlcpy:
 
 extern size_t textlcpy(char *dst, const text *src, size_t siz);

 I'm all for providing a function with this behaviour, but is
 textlcpy() a bit ambiguous?

Fair enough, I'm not wedded to that name.  Search-and-replace is
still easy enough at this point ...

 A text_to_cstring_with_len() or text_to_cstring_limit() might be more
 to the point, and more consistent with the other functions in the
 family.

Hmm.  The thing that's bothering me is that the length is the size
of the *destination*, which is not like cstring_to_text_with_len,
so using a closely similar name might be confusing.  Of those two
I'd go with text_to_cstring_limit.  Another thought that comes to
mind is

void text_to_cstring_buffer(const text *src, char *dst, size_t dst_len)

Anyone have other ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Text - C string

2008-03-25 Thread Bruce Momjian
Tom Lane wrote:
  A text_to_cstring_with_len() or text_to_cstring_limit() might be more
  to the point, and more consistent with the other functions in the
  family.
 
 Hmm.  The thing that's bothering me is that the length is the size
 of the *destination*, which is not like cstring_to_text_with_len,
 so using a closely similar name might be confusing.  Of those two
 I'd go with text_to_cstring_limit.  Another thought that comes to
 mind is
 
 void text_to_cstring_buffer(const text *src, char *dst, size_t dst_len)

I think I like buffer.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Text - C string

2008-03-25 Thread Alvaro Herrera
Tom Lane escribió:
 Brendan Jurd [EMAIL PROTECTED] writes:

  A text_to_cstring_with_len() or text_to_cstring_limit() might be more
  to the point, and more consistent with the other functions in the
  family.
 
 Hmm.  The thing that's bothering me is that the length is the size
 of the *destination*, which is not like cstring_to_text_with_len,
 so using a closely similar name might be confusing.  Of those two
 I'd go with text_to_cstring_limit.  Another thought that comes to
 mind is
 
 void text_to_cstring_buffer(const text *src, char *dst, size_t dst_len)

text_to_cstring_buffer seems okay.  I did wonder for a bit whether it
should be 

void text_to_cstring_buffer(const text *src, char *buf, size_t buf_len)

but then the src/dst pair seems better than src/buf.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Text - C string

2008-03-25 Thread Pavel Stehule

  extern text *cstring_to_text_with_len(const char *s, int len);

buffer_to_text ???

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Text - C string

2008-03-20 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Volkan YAZICI [EMAIL PROTECTED] writes:
 But I'd vote for TextPGetCString style Tom suggested for the eye-habit
 compatibility with the rest of the code.

 If there are not additional votes, I'll go with TextPGetCString
 and CStringGetTextP.

I would have voted for text_to_cstring etc. I can see the logic for the above
but it's just such a pain to type...

Fwiw I didn't actually find text_cstring confusing because all our sql cast
functions are defined that way.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Text - C string

2008-03-20 Thread Alvaro Herrera
Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Volkan YAZICI [EMAIL PROTECTED] writes:
  But I'd vote for TextPGetCString style Tom suggested for the eye-habit
  compatibility with the rest of the code.
 
  If there are not additional votes, I'll go with TextPGetCString
  and CStringGetTextP.
 
 I would have voted for text_to_cstring etc. I can see the logic for the above
 but it's just such a pain to type...

+1 for text_to_cstring et al.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Text - C string

2008-03-20 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 If there are not additional votes, I'll go with TextPGetCString
 and CStringGetTextP.
 
 I would have voted for text_to_cstring etc. I can see the logic for the above
 but it's just such a pain to type...

 +1 for text_to_cstring et al.

Well, that's all right with me too.

It occurs to me that this proposal is still leaving something on the
table.  Consider a SQL-callable function that takes a text argument and
wants to turn it into a C string for processing.  With the proposal as
it stands you'd have to do something like

text   *t = PG_GETARG_TEXT_P(0);
char   *c = text_to_cstring(t);

Now you might be smart enough to optimize that to

text   *t = PG_GETARG_TEXT_PP(0);
char   *c = text_to_cstring(t);

which would avoid a useless copy for short-header text datums, but it's
still leaking an extra copy of the text if the input is compressed or
toasted out-of-line.  I'm imagining instead

char   *c = PG_GETARG_TEXT_AS_CSTRING(0);

This would expand to a call on say text_datum_to_cstring(Datum d)
which would detoast, convert, and then free the detoasted copy if
needed.

On the other hand, it could be argued that this is usually a waste of
effort.  (I frequently point out to people that retail pfree's in
SQL-callable functions are often less efficient than letting the next
context reset clean up.)  And one thing I don't like about this notation
is that the declared type of the local variable no longer matches up
with the SQL-level declaration of the function.

Comments anyone?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Text - C string

2008-03-19 Thread Volkan YAZICI
On Wed, 19 Mar 2008, Sam Mason [EMAIL PROTECTED] writes:
 ...
   char * str = cstring_of_text(src_text);
 ...

 I think I got my original inspiration for doing it this way around from
 the Caml language.

Also, used in Common Lisp as class accessors:

  char *s = cstring_of(text);
  text *t = text_of(cstring);

But I'd vote for TextPGetCString style Tom suggested for the eye-habit
compatibility with the rest of the code.


Regards.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Text - C string

2008-03-19 Thread Tom Lane
Volkan YAZICI [EMAIL PROTECTED] writes:
 But I'd vote for TextPGetCString style Tom suggested for the eye-habit
 compatibility with the rest of the code.

If there are not additional votes, I'll go with TextPGetCString
and CStringGetTextP.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Text - C string

2007-11-03 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Brendan Jurd wrote:
 As discussed on -hackers, I'm trying to get rid of some redundant code
 by creating a widely useful set of functions to convert between text
 and C string in the backend.
 
 The new extern functions, declared in include/utils/builtins.h and
 defined in backend/utils/adt/varlena.c, are:
 
 char * text_cstring(const text *t)
 char * text_cstring_limit(const text *t, int len)
 text * cstring_text(const char *s)
 text * cstring_text_limit(const char *s, int len)
 
 Within varlena.c, the actual conversions are performed by:
 
 char * do_text_cstring(const text *t, const int len)
 text * do_cstring_text(const char *s, const int len)
 
 These functions now do the work for the fmgr functions textin and
 textout, as well as being directly accessible by backend code.
 
 I've searched through the backend for any code which converted between
 text and C string manually (with memcpy and VARDATA), replacing with
 calls to one of the four new functions as appropriate.
 
 I came across some areas which were using the same, or similar,
 conversion technique on other varlena data types, such as bytea or
 xmltype.  In cases where the conversion was completely identical I
 used the new functions.  In cases with any differences (even if they
 seemed minor) I played it safe and left them alone.
 
 I'd now like to submit my work so far for review.  This patch compiled
 cleanly on Linux and passed all parallel regression tests.  It appears
 to be performance-neutral based on a few rough tests; I haven't tried
 to profile the changes in detail.
 
 There is still a lot of code out there using DirectFunctionCall1 to
 call text(in|out)).  I've decided to wait for some community feedback
 on the patch as it stands before replacing those calls.  There are a
 great many, and it would be a shame to have to go through them more
 than once.
 
 I would naively expect that replacing fmgr calls with direct calls
 would lead to a performance gain (no fmgr overhead), but honestly I'm
 not sure whether that would actually make a difference.
 
 Thanks for your time,
 BJ

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Text - C string

2007-10-01 Thread Brendan Jurd
As discussed on -hackers, I'm trying to get rid of some redundant code
by creating a widely useful set of functions to convert between text
and C string in the backend.

The new extern functions, declared in include/utils/builtins.h and
defined in backend/utils/adt/varlena.c, are:

char * text_cstring(const text *t)
char * text_cstring_limit(const text *t, int len)
text * cstring_text(const char *s)
text * cstring_text_limit(const char *s, int len)

Within varlena.c, the actual conversions are performed by:

char * do_text_cstring(const text *t, const int len)
text * do_cstring_text(const char *s, const int len)

These functions now do the work for the fmgr functions textin and
textout, as well as being directly accessible by backend code.

I've searched through the backend for any code which converted between
text and C string manually (with memcpy and VARDATA), replacing with
calls to one of the four new functions as appropriate.

I came across some areas which were using the same, or similar,
conversion technique on other varlena data types, such as bytea or
xmltype.  In cases where the conversion was completely identical I
used the new functions.  In cases with any differences (even if they
seemed minor) I played it safe and left them alone.

I'd now like to submit my work so far for review.  This patch compiled
cleanly on Linux and passed all parallel regression tests.  It appears
to be performance-neutral based on a few rough tests; I haven't tried
to profile the changes in detail.

There is still a lot of code out there using DirectFunctionCall1 to
call text(in|out)).  I've decided to wait for some community feedback
on the patch as it stands before replacing those calls.  There are a
great many, and it would be a shame to have to go through them more
than once.

I would naively expect that replacing fmgr calls with direct calls
would lead to a performance gain (no fmgr overhead), but honestly I'm
not sure whether that would actually make a difference.

Thanks for your time,
BJ


text-cstring_1.diff.gz
Description: GNU Zip compressed data

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Text - C string

2007-09-27 Thread Brendan Jurd
On 9/22/07, Tom Lane [EMAIL PROTECTED] wrote:
 On grounds of code-space savings I think it might be worth making
 these things be simple functions declared in builtins.h; that would
 also make it much easier to change their implementations.

I've noticed that this pattern isn't exclusive to the text type; other
varlena types like bytea and xmltype seem to have a common requirement
to translate to and fro C strings for various jobs.

Does it make sense to go one level lower, and make these functions
work for any varlena?

So far, I've got the following functions doing the work:

char * text_cstring(text *t)
char * text_cstring_limit(text *t, int len)
text * cstring_text(char *s)

It wouldn't be difficult at this point to make those functions
'varlena' rather than 'text', and then bytea and xmltype (and any
other future types that want to inherit from varlena) can take
advantage of them.

Thanks for your time,
BJ

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Text - C string

2007-09-27 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 So far, I've got the following functions doing the work:

 char * text_cstring(text *t)
 char * text_cstring_limit(text *t, int len)
 text * cstring_text(char *s)

 It wouldn't be difficult at this point to make those functions
 'varlena' rather than 'text', and then bytea and xmltype (and any
 other future types that want to inherit from varlena) can take
 advantage of them.

Mmm, but the conversions are generally not identical --- for instance,
bytea needs to do escaping/de-escaping, and I doubt that XML will stick
to dumb flat-string representation for long, and for that matter text
itself is likely to change someday for better locale support.  Where the
representations and conversions *are* identical, one can just cast.
I'd vote for keeping the names focused on text ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Text - C string

2007-09-22 Thread Brendan Jurd
On 9/22/07, Tom Lane [EMAIL PROTECTED] wrote:
 Brendan Jurd [EMAIL PROTECTED] writes:
  I just noticed a couple of macros defined in src/include/tsearch/ts_utils.h:

  #define TextPGetCString(t)
  DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(t)))
  #define CStringGetTextP(c) DatumGetTextP(DirectFunctionCall1(textin,
  CStringGetDatum(c)))

 I think if you look around you'll find several similar things in various
 contrib modules.  It would make some sense to try to unify all this.
 I'm not particularly for making it macros in postgres.h though ---
 that's no help if the macros require referencing stuff in builtins.h.

 On grounds of code-space savings I think it might be worth making
 these things be simple functions declared in builtins.h; that would
 also make it much easier to change their implementations.

You're right about finding similar things in various places.  Even
varlena.c has a set of these macros (PG_TEXT_GET_STR etc), but it
doesn't look they've really been utilised.

I'm happy to take a swing at this.  Declaring in builtins.h makes sense.

The thing that's got me confused at the moment is what naming
convention to use for the functions.  Looking in builtins.h you might
get the impression that we use lower_underscore for functions that are
called via fmgr, UPPER_UNDERSCORE for macros and CamelCase for
ordinary internal C functions, but there are plenty of exceptions to
disprove that rule.  I see camel cased macros and lowercased internal
functions.  Camel cased identifiers sometimes start with uppercase,
sometimes lowercase.

So the name for the text - cstring function could be any of:

text_cstr
text_to_cstr
textToCString
TextToCString

Is there any kind of authoritative naming convention I can refer to?

Thanks for your time,
BJ

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Text - C string

2007-09-22 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 The thing that's got me confused at the moment is what naming
 convention to use for the functions.

Well, almost any convention you like has some precedent somewhere in
the PG code, given all the contributors over the years.  Almost the
only thing we actively discourage is Hungarian notation, and I think
there's even some of that in some corners.

Personally I would vote against something like TextPGetCString because
it would look like one of the family of macros that are named FooGetBar.
Maybe use text_to_cstring and cstring_to_text?  It's not real important
though.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Text - C string

2007-09-21 Thread Gregory Stark
Brendan Jurd [EMAIL PROTECTED] writes:

 Surely having the exact same four lines of code written out in dozens
 of places is a Bad Thing, but perhaps there is some reasoning behind
 this that I am missing?

The canonical way to do it is with

DatumGetCString(DirectFunctionCall1(textout, t))


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Text - C string

2007-09-21 Thread Brendan Jurd
On 9/22/07, Gregory Stark [EMAIL PROTECTED] wrote:
 The canonical way to do it is with

 DatumGetCString(DirectFunctionCall1(textout, t))

Ah, I see.  Thanks.

In that case, would it be helpful if I submitted a patch for the
various code fragments that do this locally, updating them to use
DatumGetCString?

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Text - C string

2007-09-21 Thread Gregory Stark

Brendan Jurd [EMAIL PROTECTED] writes:

 On 9/22/07, Gregory Stark [EMAIL PROTECTED] wrote:
 The canonical way to do it is with

 DatumGetCString(DirectFunctionCall1(textout, t))

 Ah, I see.  Thanks.

 In that case, would it be helpful if I submitted a patch for the
 various code fragments that do this locally, updating them to use
 DatumGetCString?

I would be interested in seeing just a list of such places if you have it
handy. I don't think we consider it wrong to violate the text data type
abstraction barrier like you describe though. 

I'm interested because any such code is possibly either failing to take into
account toasted data or is unnecessarily detoasting packed varlenas.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Text - C string

2007-09-21 Thread Brendan Jurd
On 9/22/07, Gregory Stark [EMAIL PROTECTED] wrote:
 The canonical way to do it is with

 DatumGetCString(DirectFunctionCall1(textout, t))

I just noticed a couple of macros defined in src/include/tsearch/ts_utils.h:

#define TextPGetCString(t)
DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(t)))
#define CStringGetTextP(c) DatumGetTextP(DirectFunctionCall1(textin,
CStringGetDatum(c)))

Seems these would actually be convenient in quite a lot of places in
the backend.  Is there any downside to moving these two into
src/include/postgres.h?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Text - C string

2007-09-21 Thread Brendan Jurd
Well, a couple of specific cases that I came across are
quote_identifier() in src/backend/utils/adt/quote.c, and
do_to_timestamp() in src/backend/utils/adt/formatting.c (line 3349).

I was getting a rough notion of how common the duplication was using

$ egrep -Rn -C 2 'memcpy.*VARDATA' src/backend

Not all of these are genuine duplications of textout and textin (you
have to eyeball them individually to work that out) but it's a
reasonable starting point.

The files matched under src/backend are as follows.

src/backend/libpq/be-fsstubs.c
src/backend/utils/mb/mbutils.c
src/backend/utils/adt/timestamp.c
src/backend/utils/adt/nabstime.c
src/backend/utils/adt/xml.c
src/backend/utils/adt/quote.c
src/backend/utils/adt/oracle_compat.c
src/backend/utils/adt/varchar.c
src/backend/utils/adt/ruleutils.c
src/backend/utils/adt/varlena.c
src/backend/utils/adt/tsginidx.c
src/backend/utils/adt/cash.c
src/backend/utils/adt/date.c
src/backend/utils/adt/genfile.c
src/backend/utils/adt/network.c
src/backend/utils/adt/selfuncs.c
src/backend/utils/adt/formatting.c
src/backend/utils/adt/version.c
src/backend/utils/adt/pgstatfuncs.c
src/backend/access/heap/tuptoaster.c
src/backend/access/common/heaptuple.c
src/backend/storage/large_object/inv_api.c
src/backend/executor/execQual.c
src/backend/catalog/pg_conversion.c

On 9/22/07, Gregory Stark [EMAIL PROTECTED] wrote:

 Brendan Jurd [EMAIL PROTECTED] writes:

  On 9/22/07, Gregory Stark [EMAIL PROTECTED] wrote:
  The canonical way to do it is with
 
  DatumGetCString(DirectFunctionCall1(textout, t))
 
  Ah, I see.  Thanks.
 
  In that case, would it be helpful if I submitted a patch for the
  various code fragments that do this locally, updating them to use
  DatumGetCString?

 I would be interested in seeing just a list of such places if you have it
 handy. I don't think we consider it wrong to violate the text data type
 abstraction barrier like you describe though.

 I'm interested because any such code is possibly either failing to take into
 account toasted data or is unnecessarily detoasting packed varlenas.

 --
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com

 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Text - C string

2007-09-21 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 I just noticed a couple of macros defined in src/include/tsearch/ts_utils.h:

 #define TextPGetCString(t)
 DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(t)))
 #define CStringGetTextP(c) DatumGetTextP(DirectFunctionCall1(textin,
 CStringGetDatum(c)))

 Seems these would actually be convenient in quite a lot of places in
 the backend.  Is there any downside to moving these two into
 src/include/postgres.h?

I think if you look around you'll find several similar things in various
contrib modules.  It would make some sense to try to unify all this.
I'm not particularly for making it macros in postgres.h though ---
that's no help if the macros require referencing stuff in builtins.h.

On grounds of code-space savings I think it might be worth making
these things be simple functions declared in builtins.h; that would
also make it much easier to change their implementations.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend