Bug#787570: debsources: DB: change sha256 column data type from string to bytea or bit string

2015-06-03 Thread Christoph Berg
Re: Stefano Zacchiroli 2015-06-02 
20150602214710.10925.72165.reportbug@timira.takhisis.invalid
 We should switch to a more economic (and efficient) data type for storing
 sha256 checksums. Good options seem to be either bytea [1] or fixed-size bit
 strings [2]. Suggestions welcome!

Definitely bytea.

I've never really seen bit used in practise. It's mostly present
because it's an SQL standard type. For input/output, everything gets
converted to a lengthy series of 0/1s, while bytea's default hex
transfer encoding is especially easy to parse back to the hex format
you'd want anyway. (Don't worry about the fact that there's no length
modifier for bytea, it won't change the behaviour in practise.)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#787570: debsources: DB: change sha256 column data type from string to bytea or bit string

2015-06-02 Thread Stefano Zacchiroli
Package: qa.debian.org
Severity: normal
User: qa.debian@packages.debian.org
Usertags: debsources

Currently, the sha256 column of the checksum table in Debsources' Postgres DB
has type character varying(64):

  sha256 | character varying(64) | not null
 
Such a data type is wasteful in terms of disk space. And it shows:

  debsources= select count(*) from checksums;
count
  --
   41151812

  public | checksums| table| debsources | 4890 MB|

We should switch to a more economic (and efficient) data type for storing
sha256 checksums. Good options seem to be either bytea [1] or fixed-size bit
strings [2]. Suggestions welcome!

A good, concrete way to help with this bug would be providing sample SQL
snippets to create temporary tables with the new data types, and convert /
inject into them the content of the current checksum table. That would allow to
easily benchmark disk usage and query/index efficiency.

Cheers.

[1]: http://www.postgresql.org/docs/9.4/static/datatype-binary.html#AEN5497
[2]: http://www.postgresql.org/docs/9.4/static/datatype-bit.html
   
-- System Information:
Debian Release: stretch/sid
  APT prefers testing
  APT policy: (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=it_IT.utf8, LC_CTYPE=it_IT.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org