Re: [HACKERS] Bit data type header reduction in some cases

2014-02-25 Thread Heikki Linnakangas

On 02/25/2014 08:23 AM, Haribabu Kommi wrote:

It's regarding a Todo item of Bit data type header reduction in some
cases. The header contains two parts. 1)  The varlena header is
automatically converted to 1 byte header from 4 bytes in case of small
data. 2) The bit length header called bit_len to store the actual bit
length which is of 4 bytes in size. I want to reduce this bit_len size to 1
byte in some cases as similar to varlena header. With this change the size
of the column reduced by 3 bytes, thus shows very good decrease in disk
usage.

I am planning to the change it based on the total length of bits data. If
the number of bits is less than 0x7F then one byte header can be chosen
instead of 4 byte header. During reading of the data, the similar way it
can be calculated.


Since we're designing a new format, how about using bit padding instead 
of an explicit length field? Add one 1-bit to the end of the bit data, 
followed by zeros. That's even more compact. See 
https://en.wikipedia.org/wiki/Padding_%28cryptography%29#Bit_padding



The problem I am thinking is, as it can cause problems to old databases
having bit columns when they
upgrade to a newer version without using pg_dumpall. Is there anyway to
handle this problem? Or Is there any better way to handle the problem
itself? please let me know your suggestions.


On a little-endian system, you could easily use the most-significant bit 
(sign bit) of the bit_len field to indicate that it's a new-format datum:


/*
 * Little-endian
 */
typedef struct
{
int32   vl_len_;
int32   bit_len;
bits8   bit_dat[1];
} VarBit_Old;

typedef struct
{
int32   vl_len_;
uint8   bit_len;
bits8   bit_dat[1]; /* var len */
} VarBit_New;

#define IS_NEW_FORMAT(((VarBit_New *) x)-bit_len  0x80)

On a big-endian system that's more difficult, because the MSB would not 
be the first byte of the field. You could still make it work if the new 
format would have the length byte in fourth byte of the Datum. Something 
like this:


/*
 * Big-endian
 */
typedef struct
{
int32   vl_len_;
int32   bit_len;
bits8   bit_dat[1];
} VarBit_Old;

typedef struct
{
int32   vl_len_;
bits8   bit_dat_first_three[3];
uint8   bit_len;
bits8   bit_dat_rest[1]; /* var len */
} VarBit_New;

#define IS_NEW_FORMAT(((VarBit_New *) x)-bit_len  0x80)

That's not a complete solution yet, you also need a special case for a 
new-format value with less than 4 bytes of bits. Such a field would 
presumably not have the bit_len field at the fourth byte, because it 
would be shorter than that, but you could distinguish it by looking at 
the length in the varsize header. So in big-endian, you'd have one more 
format for very small varbits:


/* Big-endian, less than 4 bytes of bits data */
typedef struct
{
int32   vl_len_;
uint8   bit_len;
bits8   bit_dat[1]; /* var len */
} VarBit_New_Tiny;

The same basic idea also works with bit padding.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bit data type header reduction in some cases

2014-02-25 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 02/25/2014 08:23 AM, Haribabu Kommi wrote:
 It's regarding a Todo item of Bit data type header reduction in some
 cases. The header contains two parts. 1)  The varlena header is
 automatically converted to 1 byte header from 4 bytes in case of small
 data. 2) The bit length header called bit_len to store the actual bit
 length which is of 4 bytes in size. I want to reduce this bit_len size to 1
 byte in some cases as similar to varlena header. With this change the size
 of the column reduced by 3 bytes, thus shows very good decrease in disk
 usage.

 [ various contorted schemes for doing that backward-compatibly ]

TBH, this sounds like a huge amount of effort and a significant risk of
new bugs in return for not darn much.  Who uses bit values anyway?
They were removed from the SQL spec more than 10 years ago.  And of that
population, who cares about a byte or two per value?  The field demand
for this is not only zero, it's probably negative.  (The thread referenced
by the TODO entry shows no evidence of user demand.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bit data type header reduction in some cases

2014-02-25 Thread Bruce Momjian
On Tue, Feb 25, 2014 at 09:32:55AM -0500, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  On 02/25/2014 08:23 AM, Haribabu Kommi wrote:
  It's regarding a Todo item of Bit data type header reduction in some
  cases. The header contains two parts. 1)  The varlena header is
  automatically converted to 1 byte header from 4 bytes in case of small
  data. 2) The bit length header called bit_len to store the actual bit
  length which is of 4 bytes in size. I want to reduce this bit_len size to 1
  byte in some cases as similar to varlena header. With this change the size
  of the column reduced by 3 bytes, thus shows very good decrease in disk
  usage.
 
  [ various contorted schemes for doing that backward-compatibly ]
 
 TBH, this sounds like a huge amount of effort and a significant risk of
 new bugs in return for not darn much.  Who uses bit values anyway?
 They were removed from the SQL spec more than 10 years ago.  And of that
 population, who cares about a byte or two per value?  The field demand
 for this is not only zero, it's probably negative.  (The thread referenced
 by the TODO entry shows no evidence of user demand.)

OK, TODO item removed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bit data type header reduction in some cases

2014-02-24 Thread Haribabu Kommi
Hi,

It's regarding a Todo item of Bit data type header reduction in some
cases. The header contains two parts. 1)  The varlena header is
automatically converted to 1 byte header from 4 bytes in case of small
data. 2) The bit length header called bit_len to store the actual bit
length which is of 4 bytes in size. I want to reduce this bit_len size to 1
byte in some cases as similar to varlena header. With this change the size
of the column reduced by 3 bytes, thus shows very good decrease in disk
usage.

I am planning to the change it based on the total length of bits data. If
the number of bits is less than 0x7F then one byte header can be chosen
instead of 4 byte header. During reading of the data, the similar way it
can be calculated.

The problem I am thinking is, as it can cause problems to old databases
having bit columns when they
upgrade to a newer version without using pg_dumpall. Is there anyway to
handle this problem? Or Is there any better way to handle the problem
itself? please let me know your suggestions.

Regards,
Hari Babu
Fujitsu Australia