If there are 9k possible entries in the lookup table, in order to achieve space savings, the keys will need to be 1 or 2 bytes. For simplicity, let's say you go with the 2 byte version. For 30 billion cells you will save 2 bytes per cell at best (from 4 bytes to 2) for a total savings of 60Gb and at worst it will take more size because the lookup keys will be longer than the actual value being looked up.
The added complexity of a lookup table would not make that savings worth it to me, but you know your data best. Just my $0.02 --Tom On Sunday, September 16, 2012, Rita <[email protected]> wrote: > Yes, I am trying to save on disk space because of limited resouces and the > table will be around 30 billion rows. > > The lookup table itself will be around 9k rows so its not too bad. A > character's range will be from 1 to 4. > > I suppose I really should worry about it too much. > > > > > > On Sun, Sep 16, 2012 at 6:16 PM, Stack <[email protected]> wrote: > >> On Sat, Sep 15, 2012 at 8:09 AM, Rita <[email protected]> wrote: >> > I am debating if a lookup table would help my situation. >> > >> > I have a bunch of codes which map with timestamp (unsigned int). The >> codes >> > look like this >> > >> > AA4 >> > AAA5 >> > A21 >> > A4 >> > ... >> > Z435 >> > >> > The size range from 1 character to 4 characters (1 to 4 bytes, >> > respectively). >> > >> > >> > Would adding a lookup table for all my codes help in reducing space? If >> so, >> > what would be the best way to hash something like this? >> > >> >> You are trying to save on disk space? You could make your keys binary >> four bytes max null prefixed if < 4 characters? Why are you trying to >> save disk space? You want a lookup table so you can have a code that >> is smaller than that of the 1-4 character codes? >> >> St.Ack >> St.Ack >> > > > > -- > --- Get your facts first, then you can distort them as you please.-- >
