Carl Friedrich Bolz-Tereick pushed to branch branch/py3.9 at PyPy / pypy
Commits: 7f8d83b5 by Carl Friedrich Bolz-Tereick at 2022-07-26T19:25:32+02:00 randomly fix some typos - - - - - 41f576f4 by Carl Friedrich Bolz-Tereick at 2022-08-05T18:00:08+02:00 stop using a trie and switch to a DAWG for the bidirectional name<->unicode code point mapping it's smaller and lookups (in both directions) are faster --HG-- branch : unicodedata-dawg - - - - - 7d4c154a by Carl Friedrich Bolz-Tereick at 2022-08-05T18:38:52+02:00 just check all the names of the first 65536 characters --HG-- branch : unicodedata-dawg - - - - - a0c974ae by Carl Friedrich Bolz-Tereick at 2022-08-05T20:30:51+02:00 use startswith --HG-- branch : unicodedata-dawg - - - - - ff84054e by Carl Friedrich Bolz-Tereick at 2022-08-05T20:58:46+02:00 save some more bytes --HG-- branch : unicodedata-dawg - - - - - c24a2c24 by Carl Friedrich Bolz-Tereick at 2022-08-06T14:53:11+02:00 intermediate check-in: more compact one character edges (relies on alphabet being ascii) --HG-- branch : unicodedata-dawg - - - - - 4779cbdd by Carl Friedrich Bolz-Tereick at 2022-08-06T19:48:18+02:00 use leb128 to encode the count, saves another 22kb another intermediate checkin with lots of mess around --HG-- branch : unicodedata-dawg - - - - - dacaf7f0 by Carl Friedrich Bolz-Tereick at 2022-08-06T20:13:36+02:00 switch order of fields in edge encoding --HG-- branch : unicodedata-dawg - - - - - da0db534 by Carl Friedrich Bolz-Tereick at 2022-08-06T20:38:29+02:00 make edge target encoding also be varsized --HG-- branch : unicodedata-dawg - - - - - e3299d09 by Carl Friedrich Bolz-Tereick at 2022-08-06T21:20:46+02:00 compress further by storing offsets --HG-- branch : unicodedata-dawg - - - - - e93701ca by Carl Friedrich Bolz-Tereick at 2022-08-06T21:49:40+02:00 remove cruft --HG-- branch : unicodedata-dawg - - - - - f2f764af by Carl Friedrich Bolz-Tereick at 2022-08-07T14:20:10+02:00 refactor to not have the separate size computatio --HG-- branch : unicodedata-dawg - - - - - 899613f9 by Carl Friedrich Bolz-Tereick at 2022-08-07T14:37:52+02:00 reshuffle a bit --HG-- branch : unicodedata-dawg - - - - - b08242c8 by Carl Friedrich Bolz-Tereick at 2022-08-07T14:49:28+02:00 put the bit somewhere else, at a slight cost --HG-- branch : unicodedata-dawg - - - - - c957417c by Carl Friedrich Bolz-Tereick at 2022-08-07T15:38:53+02:00 add "final" bits to the edges and remove the edge count --HG-- branch : unicodedata-dawg - - - - - 8fbad397 by Carl Friedrich Bolz-Tereick at 2022-08-07T19:08:35+02:00 add some hypothesis tests and fix the found problems --HG-- branch : unicodedata-dawg - - - - - 0a376a5c by Carl Friedrich Bolz-Tereick at 2022-08-07T20:54:01+02:00 fix rpython --HG-- branch : unicodedata-dawg - - - - - defc1eaa by Carl Friedrich Bolz-Tereick at 2022-08-07T21:30:02+02:00 argh, actual fix --HG-- branch : unicodedata-dawg - - - - - 1064c1e2 by Carl Friedrich Bolz-Tereick at 2022-08-09T13:18:11+02:00 Use base compression again for names, make printed output less enormous --HG-- branch : unicodedata-dawg - - - - - 99f6a17c by Carl Friedrich Bolz-Tereick at 2022-08-09T17:04:42+02:00 use int32 for codepoints, not C longs --HG-- branch : unicodedata-dawg - - - - - de0804c3 by Carl Friedrich Bolz-Tereick at 2022-08-09T17:23:47+02:00 small improvements --HG-- branch : unicodedata-dawg - - - - - 7121a2f6 by Carl Friedrich Bolz-Tereick at 2022-08-10T22:03:36+02:00 use a single big db to store almost all information. 10% space and is much faster. use CPython's code for db page tables logic --HG-- branch : unicodedata-dawg - - - - - f45ee617 by Carl Friedrich Bolz-Tereick at 2022-08-11T15:56:06+02:00 intermediate checkin: rewrite code generation infrastructure and estimate sizes --HG-- branch : unicodedata-dawg - - - - - d30dc835 by Carl Friedrich Bolz-Tereick at 2022-08-12T12:48:04+02:00 more switching to the code writer --HG-- branch : unicodedata-dawg - - - - - 41f07249 by Carl Friedrich Bolz-Tereick at 2022-08-12T16:42:05+02:00 tweak guesses --HG-- branch : unicodedata-dawg - - - - - 71b05811 by Carl Friedrich Bolz-Tereick at 2022-08-12T19:08:52+02:00 do the composition data differently --HG-- branch : unicodedata-dawg - - - - - 9b83c891 by Carl Friedrich Bolz-Tereick at 2022-08-12T19:39:01+02:00 share composition_data --HG-- branch : unicodedata-dawg - - - - - aecc840d by Carl Friedrich Bolz-Tereick at 2022-08-12T20:34:46+02:00 fix --HG-- branch : unicodedata-dawg - - - - - 68126b7e by Carl Friedrich Bolz-Tereick at 2022-08-12T21:24:37+02:00 integrate composition data into the decomposition tables --HG-- branch : unicodedata-dawg - - - - - 4a6d166e by Carl Friedrich Bolz-Tereick at 2022-08-12T21:46:58+02:00 compress pre- and postfix constants --HG-- branch : unicodedata-dawg - - - - - 73e77d19 by Carl Friedrich Bolz-Tereick at 2022-08-13T12:57:31+02:00 tests and fixes --HG-- branch : unicodedata-dawg - - - - - 3071f3e8 by Carl Friedrich Bolz-Tereick at 2022-08-14T11:08:44+02:00 unify all char lists into the same output list. also include casefolds. --HG-- branch : unicodedata-dawg - - - - - 4ad5b92d by Carl Friedrich Bolz-Tereick at 2022-08-14T11:11:16+02:00 remove some old unicode versions, only keep those for py 2.7, and 3.6 onwards --HG-- branch : unicodedata-dawg - - - - - e80baf28 by Carl Friedrich Bolz-Tereick at 2022-08-14T11:17:17+02:00 fix tests --HG-- branch : unicodedata-dawg - - - - - 3c137c46 by Carl Friedrich Bolz-Tereick at 2022-08-14T13:28:07+02:00 refactor the db generation --HG-- branch : unicodedata-dawg - - - - - 7d5bc5a3 by Carl Friedrich Bolz-Tereick at 2022-08-14T16:30:46+02:00 use methods to generate less "unknown" --HG-- branch : unicodedata-dawg - - - - - d7159bf7 by Carl Friedrich Bolz-Tereick at 2022-08-14T20:48:10+02:00 failing test --HG-- branch : unicodedata-dawg - - - - - 051d71fb by Carl Friedrich Bolz-Tereick at 2022-08-15T13:25:10+02:00 lookup should not return aliases by default --HG-- branch : unicodedata-dawg - - - - - d1ce4fe4 by Carl Friedrich Bolz-Tereick at 2022-08-15T13:46:28+02:00 fix test --HG-- branch : unicodedata-dawg - - - - - 515c84d8 by Carl Friedrich Bolz-Tereick at 2022-08-15T13:46:44+02:00 print estimated size --HG-- branch : unicodedata-dawg - - - - - a15b0dee by Carl Friedrich Bolz-Tereick at 2022-08-15T13:47:12+02:00 oops --HG-- branch : unicodedata-dawg - - - - - 627d9b0a by Carl Friedrich Bolz-Tereick at 2022-08-15T13:47:32+02:00 regenerate everything --HG-- branch : unicodedata-dawg - - - - - 34574429 by Carl Friedrich Bolz-Tereick at 2022-08-15T17:18:37+02:00 try to document the API of the rpython unicodedb --HG-- branch : unicodedata-dawg - - - - - 6f01c6dc by Carl Friedrich Bolz-Tereick at 2022-08-15T21:02:10+02:00 merge unicodedata-dawg: replace the trie of names in unicodedata with a directed acyclic word graph to make it more compact. also various other improvements to make unicodedata more compact. shrinks pypy2 by 2.1mb, pypy3 by 2.6mb - - - - - 83c79073 by Carl Friedrich Bolz-Tereick at 2022-08-16T12:43:50+02:00 merge default --HG-- branch : py3.8 - - - - - f829717a by Carl Friedrich Bolz-Tereick at 2022-08-16T12:44:46+02:00 merge py3.8 --HG-- branch : py3.9 - - - - - 6 changed files: - pypy/module/unicodedata/interp_ucd.py - pypy/module/unicodedata/test/test_hyp.py - pypy/module/unicodedata/test/test_unicodedata.py - rpython/rlib/rstring.py - − rpython/rlib/unicodedata/CaseFolding-6.0.0.txt - − rpython/rlib/unicodedata/CaseFolding-6.1.0.txt View it on Heptapod: https://foss.heptapod.net/pypy/pypy/-/compare/d5fea74a2ab1ad379b1dea5ffb172c107a9df9a0...f829717a21130386acd8d452848e8fe4c8e2fbd0 -- View it on Heptapod: https://foss.heptapod.net/pypy/pypy/-/compare/d5fea74a2ab1ad379b1dea5ffb172c107a9df9a0...f829717a21130386acd8d452848e8fe4c8e2fbd0 You're receiving this email because of your account on foss.heptapod.net.
_______________________________________________ pypy-commit mailing list -- pypy-commit@python.org To unsubscribe send an email to pypy-commit-le...@python.org https://mail.python.org/mailman3/lists/pypy-commit.python.org/ Member address: arch...@mail-archive.com