Martin v. Löwis wrote: 0 the ideal hash
:) can't be argued with >....... > So: what are your input data, and what is the > distribution among them? > > Regards, > Martin > I'm trying to create UniqueID's for dynamic postscript fonts. According to my resources we don't actually need to use these, but if they are required by a particular postscript program (perhaps to make a print run efficient) then the private range of these ID's is 4000000<=UID<=4999999 ie a range of one million. So I probably really need an 18 bit hash The data going into the font consists of fontBBox '[-415 -431 2014 2033]' charmaps ['dup (\000) 0 get /C0 put',......] metrics ['/C0 1251 def',.....] bboxes ['/C29 [0 0 512 0] def',.......] chardefs ['/C0 {newpath 224 418 m 234 336 ......def}',......] ie a bunch of lists of strings which are eventually joined together and written out with a template to make the postscript definition. The UniqueID is used by PS interpreters to avoid recreating particular glyphs so ideally I would number these fonts sequentially using a global count, but in practice several processes separated by application and time can produce postscript which eventually gets merged back together. If the UID's clash then the printer produces very strange output. I'm fairly sure there's no obvious python way to ensure the separated processes can communicate except via the printer. So either I use a python based scheme which reduces the risk of clashes ie random or some data based hash scheme or I attempt to produce a postscript solution like looking for a private global sequence number. I'm not sure my postscript is really good enough to do the latter so I hoped to pursue a python based approach which has a low probability of busting. Originally I thought the range was a 16bit number which is why I started with 16bit hashes. -- Robin Becker -- http://mail.python.org/mailman/listinfo/python-list