Re: mcvstats serialization code is still shy of a load

2019-07-05 Thread Alvaro Herrera
On 2019-Jul-05, Tom Lane wrote:

> FWIW, I don't think there's a need for every catversion on the back branch
> to look older than any catversion on HEAD.  The requirement so far as the
> core code is concerned is only for non-equality.  Now, extension code does
> often do something like "if catversion >= xxx", but in practice they're
> only concerned about numbers used by released versions.

pg_upgrade also uses >= catversion comparison for a couple of things.  I
don't think it affects this case, but it's worth keeping in mind.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: mcvstats serialization code is still shy of a load

2019-07-05 Thread Tom Lane
Tomas Vondra  writes:
> I've pushed the REL_12_STABLE backpatches too, now. I've ended up using
> 201907031 and 201907032 - those values precede the first catversion bump
> in master (201907041), so the back branch looks "older". And there's a
> bit of slack for additional bumps (if the unlikely case we need them).

FWIW, I don't think there's a need for every catversion on the back branch
to look older than any catversion on HEAD.  The requirement so far as the
core code is concerned is only for non-equality.  Now, extension code does
often do something like "if catversion >= xxx", but in practice they're
only concerned about numbers used by released versions.  HEAD's catversion
will be strictly greater than v12's soon enough, even if you had made it
not so today.  So I think sticking to today's-date-with-some-N is better
than artificially assigning other dates.

What's done is done, and there's no need to change it, but now you
know what to do next time.

> We might have "fixed" this by backpatching the commit with the extra
> catversion bump (7b925e12) but the commit seems a bit too large for
> that. It's fairly isolated though. But it seems like a bad practice.

Yeah, that approach flies in the face of the notion of feature freeze.

regards, tom lane




Re: mcvstats serialization code is still shy of a load

2019-07-05 Thread Tomas Vondra

On Fri, Jul 05, 2019 at 10:36:59AM +0200, Tomas Vondra wrote:

On Fri, Jul 5, 2019, 03:28 Tom Lane  wrote:


Tomas Vondra  writes:
> I was about to push into REL_12_STABLE, when I realized that maybe we
> need to do something about the catversion first. REL_12_STABLE is still
> on 201906161, while master got to 201907041 thanks to commit
> 7b925e12703.  Simply cherry-picking the commits would get us to
> 201907052 in both branches, but that'd be wrong as the catalogs do
> differ. I suppose this is what you meant by "catversion differences to
> cope with".

Yeah, exactly.

My recommendation is to use 201907051 on v12 and 201907052
on master (or whatever is $today for you).  They need to be
different now that the branches' catalog histories have diverged,
and it seems to me that the back branch should be "older".

regards, tom lane



Unfortunately, master is already using both 201907051 and 201907052 (two of
the patches I pushed touched the catalog), so we can't quite do exactly
that. We need to use 201907042 and 201907043 or something preceding 201907041
(which is the extra catversion on master).

At this point there's no perfect sequence, thanks to the extra commit on
master, so REL_12_STABLE can't be exactly "older" :-(

Barring objections, I'll go ahead with 201907042+201907043 later today,
before someone pushes another catversion-bumping patch.



I've pushed the REL_12_STABLE backpatches too, now. I've ended up using
201907031 and 201907032 - those values precede the first catversion bump
in master (201907041), so the back branch looks "older". And there's a
bit of slack for additional bumps (if the unlikely case we need them).

We might have "fixed" this by backpatching the commit with the extra
catversion bump (7b925e12) but the commit seems a bit too large for
that. It's fairly isolated though. But it seems like a bad practice.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: mcvstats serialization code is still shy of a load

2019-07-05 Thread Tomas Vondra
On Fri, Jul 5, 2019, 03:28 Tom Lane  wrote:

> Tomas Vondra  writes:
> > I was about to push into REL_12_STABLE, when I realized that maybe we
> > need to do something about the catversion first. REL_12_STABLE is still
> > on 201906161, while master got to 201907041 thanks to commit
> > 7b925e12703.  Simply cherry-picking the commits would get us to
> > 201907052 in both branches, but that'd be wrong as the catalogs do
> > differ. I suppose this is what you meant by "catversion differences to
> > cope with".
>
> Yeah, exactly.
>
> My recommendation is to use 201907051 on v12 and 201907052
> on master (or whatever is $today for you).  They need to be
> different now that the branches' catalog histories have diverged,
> and it seems to me that the back branch should be "older".
>
> regards, tom lane
>

Unfortunately, master is already using both 201907051 and 201907052 (two of
the patches I pushed touched the catalog), so we can't quite do exactly
that. We need to use 201907042 and 201907043 or something preceding 201907041
(which is the extra catversion on master).

At this point there's no perfect sequence, thanks to the extra commit on
master, so REL_12_STABLE can't be exactly "older" :-(

Barring objections, I'll go ahead with 201907042+201907043 later today,
before someone pushes another catversion-bumping patch.

regards

>


Re: mcvstats serialization code is still shy of a load

2019-07-04 Thread Tom Lane
Tomas Vondra  writes:
> I was about to push into REL_12_STABLE, when I realized that maybe we
> need to do something about the catversion first. REL_12_STABLE is still
> on 201906161, while master got to 201907041 thanks to commit
> 7b925e12703.  Simply cherry-picking the commits would get us to
> 201907052 in both branches, but that'd be wrong as the catalogs do
> differ. I suppose this is what you meant by "catversion differences to
> cope with".

Yeah, exactly.

My recommendation is to use 201907051 on v12 and 201907052
on master (or whatever is $today for you).  They need to be
different now that the branches' catalog histories have diverged,
and it seems to me that the back branch should be "older".

regards, tom lane




Re: mcvstats serialization code is still shy of a load

2019-07-04 Thread Tomas Vondra

On Tue, Jul 02, 2019 at 10:38:29AM +0200, Tomas Vondra wrote:

On Sun, Jun 30, 2019 at 08:30:33PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

Attached is a slightly improved version of the serialization patch.


I reviewed this patch, and tested it on hppa and ppc.  I found one
serious bug: in the deserialization varlena case, you need

-   dataptr += MAXALIGN(len);
+   dataptr += MAXALIGN(len + VARHDRSZ);

(approx. line 1100 in mcv.c).  Without this, the output data is corrupt,
plus the Assert a few lines further down about dataptr having been
advanced by the correct amount fires.  (On one machine I tested on,
that happened during the core regression tests.  The other machine
got through regression, but trying to do "select * from pg_stats_ext;"
afterwards exhibited the crash.  I didn't investigate closely, but
I suspect the difference has to do with different MAXALIGN values,
4 and 8 respectively.)

The attached patch (a delta atop your v2) corrects that plus some
cosmetic issues.



Thanks.


If we're going to push this, it would be considerably less complicated
to do so before v12 gets branched --- not long after that, there will be
catversion differences to cope with.  I'm planning to make the branch
tomorrow (Monday), probably ~1500 UTC.  Just sayin'.



Unfortunately, I was travelling on Sunday and was quite busy on Monday, so
I've been unable to push this before the branching :-(

I'll push by the end of this week, once I get home.



I've pushed the fix (along with the pg_mcv_list_item fix) into master,
hopefully the buildfarm won't be upset about it.

I was about to push into REL_12_STABLE, when I realized that maybe we
need to do something about the catversion first. REL_12_STABLE is still
on 201906161, while master got to 201907041 thanks to commit
7b925e12703.  Simply cherry-picking the commits would get us to
201907052 in both branches, but that'd be wrong as the catalogs do
differ. I suppose this is what you meant by "catversion differences to
cope with".

I suppose this is not the first time this happened - how did we deal
with it in the past? I guess we could use some "past" non-conflicting
catversion number in the REL_12_STABLE branch (say, 201907030x) but
maybe that'd be wrong?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: mcvstats serialization code is still shy of a load

2019-07-02 Thread Tomas Vondra

On Sun, Jun 30, 2019 at 08:30:33PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

Attached is a slightly improved version of the serialization patch.


I reviewed this patch, and tested it on hppa and ppc.  I found one
serious bug: in the deserialization varlena case, you need

-   dataptr += MAXALIGN(len);
+   dataptr += MAXALIGN(len + VARHDRSZ);

(approx. line 1100 in mcv.c).  Without this, the output data is corrupt,
plus the Assert a few lines further down about dataptr having been
advanced by the correct amount fires.  (On one machine I tested on,
that happened during the core regression tests.  The other machine
got through regression, but trying to do "select * from pg_stats_ext;"
afterwards exhibited the crash.  I didn't investigate closely, but
I suspect the difference has to do with different MAXALIGN values,
4 and 8 respectively.)

The attached patch (a delta atop your v2) corrects that plus some
cosmetic issues.



Thanks.


If we're going to push this, it would be considerably less complicated
to do so before v12 gets branched --- not long after that, there will be
catversion differences to cope with.  I'm planning to make the branch
tomorrow (Monday), probably ~1500 UTC.  Just sayin'.



Unfortunately, I was travelling on Sunday and was quite busy on Monday, so
I've been unable to push this before the branching :-(

I'll push by the end of this week, once I get home.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: mcvstats serialization code is still shy of a load

2019-06-30 Thread Tom Lane
Tomas Vondra  writes:
> Attached is a slightly improved version of the serialization patch.

I reviewed this patch, and tested it on hppa and ppc.  I found one
serious bug: in the deserialization varlena case, you need

-   dataptr += MAXALIGN(len);
+   dataptr += MAXALIGN(len + VARHDRSZ);

(approx. line 1100 in mcv.c).  Without this, the output data is corrupt,
plus the Assert a few lines further down about dataptr having been
advanced by the correct amount fires.  (On one machine I tested on,
that happened during the core regression tests.  The other machine
got through regression, but trying to do "select * from pg_stats_ext;"
afterwards exhibited the crash.  I didn't investigate closely, but
I suspect the difference has to do with different MAXALIGN values,
4 and 8 respectively.)

The attached patch (a delta atop your v2) corrects that plus some
cosmetic issues.

If we're going to push this, it would be considerably less complicated
to do so before v12 gets branched --- not long after that, there will be
catversion differences to cope with.  I'm planning to make the branch
tomorrow (Monday), probably ~1500 UTC.  Just sayin'.

regards, tom lane

diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 256728a..cfe7e54 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -40,13 +40,14 @@
  * stored in a separate array (deduplicated, to minimize the size), and
  * so the serialized items only store uint16 indexes into that array.
  *
- * Each serialized item store (in this order):
+ * Each serialized item stores (in this order):
  *
  * - indexes to values	  (ndim * sizeof(uint16))
  * - null flags			  (ndim * sizeof(bool))
  * - frequency			  (sizeof(double))
  * - base_frequency		  (sizeof(double))
  *
+ * There is no alignment padding within an MCV item.
  * So in total each MCV item requires this many bytes:
  *
  *	 ndim * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double)
@@ -61,7 +62,7 @@
 	(VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))
 
 /*
- * Size of the serialized MCV list, exluding the space needed for
+ * Size of the serialized MCV list, excluding the space needed for
  * deduplicated per-dimension values. The macro is meant to be used
  * when it's not yet safe to access the serialized info about amount
  * of data for each column.
@@ -619,6 +620,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
 		else if (info[dim].typlen == -1)	/* varlena */
 		{
 			info[dim].nbytes = 0;
+			info[dim].nbytes_aligned = 0;
 			for (i = 0; i < info[dim].nvalues; i++)
 			{
 Size		len;
@@ -646,6 +648,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
 		else if (info[dim].typlen == -2)	/* cstring */
 		{
 			info[dim].nbytes = 0;
+			info[dim].nbytes_aligned = 0;
 			for (i = 0; i < info[dim].nvalues; i++)
 			{
 Size		len;
@@ -743,7 +746,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
  * assumes little endian behavior.  store_att_byval does
  * almost what we need, but it requires properly aligned
  * buffer - the output buffer does not guarantee that. So we
- * simply use a static Datum variable (which guarantees proper
+ * simply use a local Datum variable (which guarantees proper
  * alignment), and then copy the value from it.
  */
 store_att_byval(, value, info[dim].typlen);
@@ -759,14 +762,14 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
 			}
 			else if (info[dim].typlen == -1)	/* varlena */
 			{
-uint32		len = VARSIZE_ANY_EXHDR(value);
+uint32		len = VARSIZE_ANY_EXHDR(DatumGetPointer(value));
 
 /* copy the length */
 memcpy(ptr, , sizeof(uint32));
 ptr += sizeof(uint32);
 
 /* data from the varlena value (without the header) */
-memcpy(ptr, VARDATA(DatumGetPointer(value)), len);
+memcpy(ptr, VARDATA_ANY(DatumGetPointer(value)), len);
 ptr += len;
 			}
 			else if (info[dim].typlen == -2)	/* cstring */
@@ -1100,7 +1103,7 @@ statext_mcv_deserialize(bytea *data)
 	map[dim][i] = PointerGetDatum(dataptr);
 
 	/* skip to place of the next deserialized value */
-	dataptr += MAXALIGN(len);
+	dataptr += MAXALIGN(len + VARHDRSZ);
 }
 			}
 			else if (info[dim].typlen == -2)
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 6778746..8fc5419 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -36,7 +36,7 @@ typedef struct DimensionInfo
 {
 	int			nvalues;		/* number of deduplicated values */
 	int			nbytes;			/* number of bytes (serialized) */
-	int			nbytes_aligned;	/* deserialized data with alignment */
+	int			nbytes_aligned;	/* size of deserialized data with alignment */
 	int			typlen;			/* pg_type.typlen */
 	bool		typbyval;		/* pg_type.typbyval */
 

Re: mcvstats serialization code is still shy of a load

2019-06-29 Thread Tomas Vondra

On Thu, Jun 27, 2019 at 01:26:32PM +0200, Tomas Vondra wrote:

On Thu, Jun 27, 2019 at 12:04:30AM -0400, Tom Lane wrote:

Tomas Vondra  writes:

OK. Attached is a patch ditching the alignment in serialized data. I've
ditched the macros to access parts of serialized data, and everything
gets copied.


I lack energy to actually read this patch right now, and I don't currently
have an opinion about whether it's worth another catversion bump to fix
this stuff in v12.  But I did test the patch, and I can confirm it gets
through the core regression tests on hppa (both gaur's host environment
with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1).



Thanks for running it through regression tests, that alone is a very
useful piece of information for me.

As for the catversion bump - I'd probably vote to do it. Not just because
of this serialization stuff, but to fix the pg_mcv_list_items function.
It's not something I'm very enthusiastic about (kinda embarassed about it,
really), but it seems better than shipping something that we'll need to
rework in PG13.



Attached is a slightly improved version of the serialization patch. The
main difference is that when serializing varlena values, the previous
patch version stored

   length (uint32) + full varlena (incl. the header)

which is kinda redundant, because the varlena stores the length too. So
now it only stores the length + data, without the varlena header. I
don't think there's a better way to store varlena values without
enforcing alignment (which is what happens in current master).

There's one additional change I failed to mention before - I had to add
another field to DimensionInfo, tracking how much space will be needed
for deserialized data. This is needed because the deserialization
allocates the whole MCV as a single chunk of memory, to reduce palloc
overhead. It could parse the data twice (first to determine the space,
then to actually parse it), this allows doing just a single pass. Which
seems useful for large MCV lists, but maybe it's not worth it?

Barring objections I'll commit this together with the pg_mcv_list_items
fix, posted in a separate thread. Of course, this requires catversion
bump - an alternative would be to keep enforcing the alignment, but
tweak the macros to work on all platforms without SIGBUS.

Considering how troublesome this serialiation part of the patch turner
out to be, I'm not really sure by anything at this point. So I'd welcome
thoughts about the proposed changes.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 5fe61ea0a4..256728a02a 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -52,15 +52,7 @@
  *  ndim * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double)
  */
 #define ITEM_SIZE(ndims)   \
-   MAXALIGN((ndims) * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))
-
-/*
- * Macros for convenient access to parts of a serialized MCV item.
- */
-#define ITEM_INDEXES(item) ((uint16 *) (item))
-#define ITEM_NULLS(item,ndims) ((bool *) (ITEM_INDEXES(item) + 
(ndims)))
-#define ITEM_FREQUENCY(item,ndims) ((double *) (ITEM_NULLS(item, ndims) + 
(ndims)))
-#define ITEM_BASE_FREQUENCY(item,ndims)((double *) 
(ITEM_FREQUENCY(item, ndims) + 1))
+   ((ndims) * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))
 
 /*
  * Used to compute size of serialized MCV list representation.
@@ -68,10 +60,16 @@
 #define MinSizeOfMCVList   \
(VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))
 
+/*
+ * Size of the serialized MCV list, exluding the space needed for
+ * deduplicated per-dimension values. The macro is meant to be used
+ * when it's not yet safe to access the serialized info about amount
+ * of data for each column.
+ */
 #define SizeOfMCVList(ndims,nitems)\
-   (MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
-MAXALIGN((ndims) * sizeof(DimensionInfo)) + \
-MAXALIGN((nitems) * ITEM_SIZE(ndims)))
+   ((MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
+((ndims) * sizeof(DimensionInfo)) + \
+((nitems) * ITEM_SIZE(ndims)))
 
 static MultiSortSupport build_mss(VacAttrStats **stats, int numattrs);
 
@@ -491,19 +489,16 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
int i;
int dim;
int ndims = mcvlist->ndimensions;
-   int itemsize = ITEM_SIZE(ndims);
 
SortSupport ssup;
DimensionInfo *info;
 
Sizetotal_length;
 
-   /* allocate the item just once */
-   char   *item = palloc0(itemsize);
-
/* serialized items (indexes into arrays, etc.) */
-   char   *raw;
+   bytea  *raw;
char   *ptr;
+   

Re: mcvstats serialization code is still shy of a load

2019-06-27 Thread Tomas Vondra

On Thu, Jun 27, 2019 at 12:04:30AM -0400, Tom Lane wrote:

Tomas Vondra  writes:

OK. Attached is a patch ditching the alignment in serialized data. I've
ditched the macros to access parts of serialized data, and everything
gets copied.


I lack energy to actually read this patch right now, and I don't currently
have an opinion about whether it's worth another catversion bump to fix
this stuff in v12.  But I did test the patch, and I can confirm it gets
through the core regression tests on hppa (both gaur's host environment
with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1).



Thanks for running it through regression tests, that alone is a very
useful piece of information for me.

As for the catversion bump - I'd probably vote to do it. Not just because
of this serialization stuff, but to fix the pg_mcv_list_items function.
It's not something I'm very enthusiastic about (kinda embarassed about it,
really), but it seems better than shipping something that we'll need to
rework in PG13.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tom Lane
Tomas Vondra  writes:
> OK. Attached is a patch ditching the alignment in serialized data. I've
> ditched the macros to access parts of serialized data, and everything
> gets copied.

I lack energy to actually read this patch right now, and I don't currently
have an opinion about whether it's worth another catversion bump to fix
this stuff in v12.  But I did test the patch, and I can confirm it gets
through the core regression tests on hppa (both gaur's host environment
with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1).

regards, tom lane




Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tomas Vondra

On Wed, Jun 26, 2019 at 12:31:13PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:

#define SizeOfMCVList(ndims,nitems) \
is both woefully underdocumented and completely at variance with
reality.  It doesn't seem to be accounting for the actual data values.



I agree about the macro being underdocumented, but AFAICS it's used
correctly to check the expected length. It can't include the data values
directly, because that's variable amount of data - and it's encoded in not
yet verified part of the data.


Well, it should have some other name then.  Or *at least* a comment.
It's unbelievably misleading as it stands.



True.


That being said, maybe this is unnecessarily defensive and we should just
trust the values not being corrupted.


No, I'm on board with checking the lengths.  I just don't like how
hard it is to discern what's being checked.



Understood.


I think that part of the problem here is that the way this code is
written, "maxaligned" is no such thing.  What you're actually maxaligning
seems to be the offset from the start of the data area of a varlena value,



I don't think so. The pointers should be maxaligned with respect to the
whole varlena value, which is what 'raw' points to.


[ squint ... ]  OK, I think I misread this:

statext_mcv_deserialize(bytea *data)
{
...
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
raw = (char *) data;

I think this is confusing in itself --- I read it as "raw = (char *) ptr"
and I think most other people would assume that too based on the order
of operations.  It'd read better as

/* remember start of datum for maxalign reference */
raw = (char *) data;

/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);



Yeah, that'd have been better.


Another problem with this code is that it flat doesn't work for
non-4-byte-header varlenas: it'd do the alignment differently than the
serialization side did.  That's okay given that the two extant call sites
are guaranteed to pass detoasted datums.  But using VARDATA_ANY gives a
completely misleading appearance of being ready to deal with short-header
varlenas, and heaven forbid there should be any comment to discourage
future coders from trying.  So really what I'd like to see here is

/* remember start of datum for maxalign reference */
raw = (char *) data;

/* alignment logic assumes full-size datum header */
Assert(VARATT_IS_4B(data));

/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);

Or, of course, this could all go away if we got rid of the
bogus maxaligning...



OK. Attached is a patch ditching the alignment in serialized data. I've
ditched the macros to access parts of serialized data, and everything
gets copied.

The main complication is with varlena values, which may or may not have
4B headers (for now there's the PG_DETOAST_DATUM call, but as you
mentioned we may want to remove it in the future). So I've stored the
length as uint32 separately, followed by the full varlena value (thanks
to that the deserialization is simpler). Not sure if that's the best
solution, though, because this way we store the length twice.

I've kept the alignment in the deserialization code, because there it
allows us to allocate the whole value as a single chunk, which I think
is useful (I admit I don't have any measurements to demonstrate that).
But if we decide to rework this later, we can - it's just in-memory
representation, not on-disk.

Is this roughly what you had in mind?

FWIW I'm sure some of the comments are stale and/or need clarification,
but it's a bit too late over here, so I'll look into that tomorrow.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 5fe61ea0a4..93c774ea1c 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -52,15 +52,7 @@
  *  ndim * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double)
  */
 #define ITEM_SIZE(ndims)   \
-   MAXALIGN((ndims) * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))
-
-/*
- * Macros for convenient access to parts of a serialized MCV item.
- */
-#define ITEM_INDEXES(item) ((uint16 *) (item))
-#define ITEM_NULLS(item,ndims) ((bool *) (ITEM_INDEXES(item) + 
(ndims)))
-#define ITEM_FREQUENCY(item,ndims) ((double *) (ITEM_NULLS(item, ndims) + 
(ndims)))
-#define ITEM_BASE_FREQUENCY(item,ndims)((double *) 
(ITEM_FREQUENCY(item, ndims) + 1))
+   ((ndims) * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))
 
 /*
  * Used to compute size of serialized MCV list representation.
@@ -68,10 +60,15 @@
 #define MinSizeOfMCVList   \
(VARHDRSZ + 

Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tom Lane
Tomas Vondra  writes:
> On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>> #define SizeOfMCVList(ndims,nitems)  \
>> is both woefully underdocumented and completely at variance with
>> reality.  It doesn't seem to be accounting for the actual data values.

> I agree about the macro being underdocumented, but AFAICS it's used
> correctly to check the expected length. It can't include the data values
> directly, because that's variable amount of data - and it's encoded in not
> yet verified part of the data.

Well, it should have some other name then.  Or *at least* a comment.
It's unbelievably misleading as it stands.

> That being said, maybe this is unnecessarily defensive and we should just
> trust the values not being corrupted.

No, I'm on board with checking the lengths.  I just don't like how
hard it is to discern what's being checked.

>> I think that part of the problem here is that the way this code is
>> written, "maxaligned" is no such thing.  What you're actually maxaligning
>> seems to be the offset from the start of the data area of a varlena value,

> I don't think so. The pointers should be maxaligned with respect to the
> whole varlena value, which is what 'raw' points to.

[ squint ... ]  OK, I think I misread this:

statext_mcv_deserialize(bytea *data)
{
...
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
raw = (char *) data;

I think this is confusing in itself --- I read it as "raw = (char *) ptr"
and I think most other people would assume that too based on the order
of operations.  It'd read better as

/* remember start of datum for maxalign reference */
raw = (char *) data;

/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);

Another problem with this code is that it flat doesn't work for
non-4-byte-header varlenas: it'd do the alignment differently than the
serialization side did.  That's okay given that the two extant call sites
are guaranteed to pass detoasted datums.  But using VARDATA_ANY gives a
completely misleading appearance of being ready to deal with short-header
varlenas, and heaven forbid there should be any comment to discourage
future coders from trying.  So really what I'd like to see here is

/* remember start of datum for maxalign reference */
raw = (char *) data;

/* alignment logic assumes full-size datum header */
Assert(VARATT_IS_4B(data));

/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);

Or, of course, this could all go away if we got rid of the
bogus maxaligning...

regards, tom lane




Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tomas Vondra

On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:

Tomas Vondra  writes:

Attached is a patch that should (hopefully) fix this. It essentially
treats the item as (char *) and does all pointer arithmetics without any
additional casts. So there are no intermediate casts.


This passes the eyeball test, and it also allows my OpenBSD/hppa
installation to get through the core regression tests, so I think
it's good as far as it goes.  Please push.

However ... nosing around in mcv.c, I noticed that the next macro:

/*
* Used to compute size of serialized MCV list representation.
*/
#define MinSizeOfMCVList\
(VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))

#define SizeOfMCVList(ndims,nitems) \
(MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
 MAXALIGN((ndims) * sizeof(DimensionInfo)) + \
 MAXALIGN((nitems) * ITEM_SIZE(ndims)))

is both woefully underdocumented and completely at variance with
reality.  It doesn't seem to be accounting for the actual data values.
No doubt this is why it's not used in the places where it'd matter;
the tests that do use it are testing much weaker conditions than they
should.



I agree about the macro being underdocumented, but AFAICS it's used
correctly to check the expected length. It can't include the data values
directly, because that's variable amount of data - and it's encoded in not
yet verified part of the data.

So this only includes parts with known lengths, and then the code does
this:

   for (dim = 0; dim < ndims; dim++)
   {
   ...
   expected_size += MAXALIGN(info[dim].nbytes);
   }

and uses that to check the actual length.

   if (VARSIZE_ANY(data) != expected_size)
   elog(ERROR, ...);

That being said, maybe this is unnecessarily defensive and we should just
trust the values not being corrupted. So if we get pg_mcv_list value, we'd
simply assume it's OK.


The fix keeps the binary format as is, so the serialized MCV items are
max-aligned. That means we can access the uint16 indexes directly, but we
need to copy the rest of the fields (because those may not be aligned). In
hindsight that seems a bit silly, we might as well copy everything, not
care about the alignment and maybe save a few more bytes.


I think that part of the problem here is that the way this code is
written, "maxaligned" is no such thing.  What you're actually maxaligning
seems to be the offset from the start of the data area of a varlena value,
which is generally going to be a maxaligned palloc result plus 4 bytes.
So "aligned" double values are actually guaranteed to be on odd word
boundaries not even ones.



I don't think so. The pointers should be maxaligned with respect to the
whole varlena value, which is what 'raw' points to. At least that was the
intent of code like this:

   raw = palloc0(total_length);

   ...

   /* the header may not be exactly aligned, so make sure it is */
   ptr = raw + MAXALIGN(ptr - raw);

If it's not like that in some place, it's a bug.


What's more, it's difficult to convince oneself that the maxaligns done
in different parts of the code are all enforcing the same choices about
which substructures get pseudo-maxaligned and which don't, because the
logic doesn't line up very well.



Not sure. If there's a way to make it clearer, I'm ready to do the work.
Unfortunately it's hard for me to judge that, because I've spent so much
time on that code that it seems fairly clear to me.


If we do need another catversion bump before v12, I'd vote for ripping
out Every Single One of the "maxalign" operations in this code, just
on the grounds of code simplicity and bug reduction.



Hmmm, OK. The original reason to keep the parts aligned was to be able to
reference the parts directly during processing. If we get rid of the
alignment, we'll have to memcpy everything during deserialization. But
if it makes the code simpler, it might be worth it - this part of the
code was clearly the weakest part of the patch.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tom Lane
Tomas Vondra  writes:
> Attached is a patch that should (hopefully) fix this. It essentially
> treats the item as (char *) and does all pointer arithmetics without any
> additional casts. So there are no intermediate casts.

This passes the eyeball test, and it also allows my OpenBSD/hppa
installation to get through the core regression tests, so I think
it's good as far as it goes.  Please push.

However ... nosing around in mcv.c, I noticed that the next macro:

/*
 * Used to compute size of serialized MCV list representation.
 */
#define MinSizeOfMCVList\
(VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))

#define SizeOfMCVList(ndims,nitems) \
(MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
 MAXALIGN((ndims) * sizeof(DimensionInfo)) + \
 MAXALIGN((nitems) * ITEM_SIZE(ndims)))

is both woefully underdocumented and completely at variance with
reality.  It doesn't seem to be accounting for the actual data values.
No doubt this is why it's not used in the places where it'd matter;
the tests that do use it are testing much weaker conditions than they
should.

> The fix keeps the binary format as is, so the serialized MCV items are
> max-aligned. That means we can access the uint16 indexes directly, but we
> need to copy the rest of the fields (because those may not be aligned). In
> hindsight that seems a bit silly, we might as well copy everything, not
> care about the alignment and maybe save a few more bytes.

I think that part of the problem here is that the way this code is
written, "maxaligned" is no such thing.  What you're actually maxaligning
seems to be the offset from the start of the data area of a varlena value,
which is generally going to be a maxaligned palloc result plus 4 bytes.
So "aligned" double values are actually guaranteed to be on odd word
boundaries not even ones.

What's more, it's difficult to convince oneself that the maxaligns done
in different parts of the code are all enforcing the same choices about
which substructures get pseudo-maxaligned and which don't, because the
logic doesn't line up very well.

If we do need another catversion bump before v12, I'd vote for ripping
out Every Single One of the "maxalign" operations in this code, just
on the grounds of code simplicity and bug reduction.

regards, tom lane




Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tomas Vondra

On Wed, Jun 26, 2019 at 09:49:46AM +0200, Tomas Vondra wrote:

On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote:

I'm seeing a reproducible bus error here:

#0  0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable 
"stats" is not available.
)
  at mcv.c:785
785 memcpy(ITEM_BASE_FREQUENCY(item, ndims), 
>base_frequency, sizeof(double));

What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as

#define ITEM_BASE_FREQUENCY(item,ndims) ((double *) (ITEM_FREQUENCY(item, 
ndims) + 1))

the compiler is assuming that the first argument to memcpy is
double-aligned, and it is generating code that depends on that being
true, and of course it isn't true and kaboom.

You can *not* cast something to an aligned pointer type if it's not
actually certain to be aligned suitably for that type.  In this example,
even if you wrote "(char *)" in front of this, it wouldn't save you;
the compiler would still be entitled to believe that the intermediate
cast value meant something.  The casts in the underlying macros
ITEM_FREQUENCY and so on are equally unsafe.



OK. So the solution is to ditch the casts altogether, and then do plain
pointer arithmetics like this:

#define ITEM_INDEXES(item)  (item)
#define ITEM_NULLS(item,ndims)  (ITEM_INDEXES(item) + (ndims))
#define ITEM_FREQUENCY(item,ndims)  (ITEM_NULLS(item, ndims) + (ndims))
#define ITEM_BASE_FREQUENCY(item,ndims) (ITEM_FREQUENCY(item, ndims) + 
sizeof(double))

Or is that still relying on alignment, somehow?



Attached is a patch that should (hopefully) fix this. It essentially
treats the item as (char *) and does all pointer arithmetics without any
additional casts. So there are no intermediate casts.

I have no way to test this, so I may either wait for you to test this
first, or push and wait. It seems to fail only on a very small number of
buildfarm animals, so having a confirmation would be nice.

The fix keeps the binary format as is, so the serialized MCV items are
max-aligned. That means we can access the uint16 indexes directly, but we
need to copy the rest of the fields (because those may not be aligned). In
hindsight that seems a bit silly, we might as well copy everything, not
care about the alignment and maybe save a few more bytes. But that would
require catversion bump. OTOH we may beed to do that anyway, to fix the
pg_mcv_list_items() signature (as discussed in the other MCV thread).

regards 


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 5fe61ea0a4..01ea05b486 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -57,10 +57,10 @@
 /*
  * Macros for convenient access to parts of a serialized MCV item.
  */
-#define ITEM_INDEXES(item) ((uint16 *) (item))
-#define ITEM_NULLS(item,ndims) ((bool *) (ITEM_INDEXES(item) + 
(ndims)))
-#define ITEM_FREQUENCY(item,ndims) ((double *) (ITEM_NULLS(item, ndims) + 
(ndims)))
-#define ITEM_BASE_FREQUENCY(item,ndims)((double *) 
(ITEM_FREQUENCY(item, ndims) + 1))
+#define ITEM_INDEXES(item) ((char *) item)
+#define ITEM_NULLS(item,ndims) (ITEM_INDEXES(item) + (ndims) * 
sizeof(uint16))
+#define ITEM_FREQUENCY(item,ndims) (ITEM_NULLS(item, ndims) + (ndims) * 
sizeof(bool))
+#define ITEM_BASE_FREQUENCY(item,ndims)(ITEM_FREQUENCY(item, ndims) + 
sizeof(double))
 
 /*
  * Used to compute size of serialized MCV list representation.
@@ -751,6 +751,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
for (i = 0; i < mcvlist->nitems; i++)
{
MCVItem*mcvitem = >items[i];
+   uint16 *indexes = (uint16 *) ITEM_INDEXES(item);
 
/* don't write beyond the allocated space */
Assert(ptr <= raw + total_length - itemsize);
@@ -773,10 +774,10 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
Assert(value != NULL);  /* serialization or 
deduplication error */
 
/* compute index within the array */
-   ITEM_INDEXES(item)[dim] = (uint16) (value - 
values[dim]);
+   indexes[dim] = (uint16) (value - values[dim]);
 
/* check the index is within expected bounds */
-   Assert(ITEM_INDEXES(item)[dim] < info[dim].nvalues);
+   Assert(indexes[dim] < info[dim].nvalues);
}
 
/* copy NULL and frequency flags into the item */
@@ -1087,10 +1088,13 @@ statext_mcv_deserialize(bytea *data)
item->isnull = (bool *) isnullptr;
isnullptr += MAXALIGN(sizeof(bool) * ndims);
 
+   /*
+* To access indexes, we can just point to the right place (we 

Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tom Lane
Tomas Vondra  writes:
> On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote:
>> You can *not* cast something to an aligned pointer type if it's not
>> actually certain to be aligned suitably for that type.

> OK. So the solution is to ditch the casts altogether, and then do plain
> pointer arithmetics like this:

> #define ITEM_INDEXES(item)(item)
> #define ITEM_NULLS(item,ndims)(ITEM_INDEXES(item) + (ndims))
> #define ITEM_FREQUENCY(item,ndims)(ITEM_NULLS(item, ndims) + (ndims))
> #define ITEM_BASE_FREQUENCY(item,ndims)   (ITEM_FREQUENCY(item, ndims) + 
> sizeof(double))

> Or is that still relying on alignment, somehow?

No, constructs like a char* pointer plus n times sizeof(something) should
be safe.

regards, tom lane




Re: mcvstats serialization code is still shy of a load

2019-06-26 Thread Tomas Vondra

On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote:

I'm seeing a reproducible bus error here:

#0  0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable 
"stats" is not available.
)
   at mcv.c:785
785 memcpy(ITEM_BASE_FREQUENCY(item, ndims), 
>base_frequency, sizeof(double));

What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as

#define ITEM_BASE_FREQUENCY(item,ndims) ((double *) (ITEM_FREQUENCY(item, 
ndims) + 1))

the compiler is assuming that the first argument to memcpy is
double-aligned, and it is generating code that depends on that being
true, and of course it isn't true and kaboom.

You can *not* cast something to an aligned pointer type if it's not
actually certain to be aligned suitably for that type.  In this example,
even if you wrote "(char *)" in front of this, it wouldn't save you;
the compiler would still be entitled to believe that the intermediate
cast value meant something.  The casts in the underlying macros
ITEM_FREQUENCY and so on are equally unsafe.



OK. So the solution is to ditch the casts altogether, and then do plain
pointer arithmetics like this:

#define ITEM_INDEXES(item)  (item)
#define ITEM_NULLS(item,ndims)  (ITEM_INDEXES(item) + (ndims))
#define ITEM_FREQUENCY(item,ndims)  (ITEM_NULLS(item, ndims) + (ndims))
#define ITEM_BASE_FREQUENCY(item,ndims) (ITEM_FREQUENCY(item, ndims) + 
sizeof(double))

Or is that still relying on alignment, somehow?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services