Re: [Python-Dev] 3.0 C API to decode bytes into unicode?
On Aug 1, 2008, at 14:30, M.-A. Lemburg wrote: On 2008-08-01 15:06, Barry Scott wrote: I cannot see how I implement decode() for bytes objects using the C API for PyCXX library, I'd assuming that I should find a PyBytes_Decode function but cannot find it in beta 2. What is the preferred way to do this? PyUnicode_FromEncodedObject() should to the trick. Thanks thats what I've use. Barry ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Base-96
It's true, I didn't pay attention to that. So the next encoding possible would of base-128 (7-bits encoding), althought I don't know if were possible since that there would than use non-printable characters and could change the text (by use of chars. as Backspace or Delete). On 2 ago, 03:21, Steve Holden <[EMAIL PROTECTED]> wrote: > 96 is approximately 2^6.585 > > For some reason, integral powers of two seem so much more, well, > POWERFUL, if you know what I mean. Frankly I think you are being either > optimistic or charitable in suggesting that such a use case might exist. > > There's a reason that DEC called their equivalent of base64 "6-bit > encoding". > > But then I wanted to keep integer division as it was, so I am clearly a > techno-luddite. If the world wants fractional bits I'm sure it's only a > matter of time before some genius decides to design a 67.9-bit computer. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Base-96
The standard high-bit-density encoding past base-64 is base-85 (http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes. It works, is an RFC somewhere, ... and maybe should find it's way into the Python standard library's codec package at some point. - Josiah On Sat, Aug 2, 2008 at 12:57 AM, Kless <[EMAIL PROTECTED]> wrote: > It's true, I didn't pay attention to that. > > So the next encoding possible would of base-128 (7-bits encoding), > althought I don't know if were possible since that there would than > use non-printable characters and could change the text (by use of > chars. as Backspace or Delete). > > On 2 ago, 03:21, Steve Holden <[EMAIL PROTECTED]> wrote: >> 96 is approximately 2^6.585 >> >> For some reason, integral powers of two seem so much more, well, >> POWERFUL, if you know what I mean. Frankly I think you are being either >> optimistic or charitable in suggesting that such a use case might exist. >> >> There's a reason that DEC called their equivalent of base64 "6-bit >> encoding". >> >> But then I wanted to keep integer division as it was, so I am clearly a >> techno-luddite. If the world wants fractional bits I'm sure it's only a >> matter of time before some genius decides to design a 67.9-bit computer. > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Base-96
Josiah Carlson wrote: > The standard high-bit-density encoding past base-64 is base-85 > (http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes > as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes. It works, > is an RFC somewhere, RFC 1924, published on April 1, 1996, to shorten the representation of IPv6 addresses, so that you can write ssh '4)+k&C#VzJ4br>0wv%Yp' instead of having to write ssh 1080:0:0:0:8:800:200C:417A Most notably, section 7 (implementation issues) points out Many current processors do not find 128 bit integer arithmetic, as required for this technique, a trivial operation. This is not considered a serious drawback in the representation, but a flaw of the processor designs. For arbitrary-sized data, you'd have to give up 128-bit arithmetic, of course, and represent the input data to encode as a long integer. Regards, Martin P.S. Just in case it isn't clear: I would oppose any specific proposal to add this Ascii85 algorithm to the standard library. It would sound like we don't have any real problems to solve. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Base-96
On Sat, Aug 2, 2008 at 10:09 AM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Josiah Carlson wrote: >> The standard high-bit-density encoding past base-64 is base-85 >> (http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes >> as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes. It works, >> is an RFC somewhere, > > RFC 1924, published on April 1, 1996, to shorten the representation > of IPv6 addresses, so that you can write > > ssh '4)+k&C#VzJ4br>0wv%Yp' > > instead of having to write > > ssh 1080:0:0:0:8:800:200C:417A > > Most notably, section 7 (implementation issues) points out > > Many current processors do not find 128 bit integer arithmetic, as > required for this technique, a trivial operation. This is not > considered a serious drawback in the representation, but a flaw of > the processor designs. > > For arbitrary-sized data, you'd have to give up 128-bit arithmetic, > of course, and represent the input data to encode as a long integer. > > Regards, > Martin > > P.S. Just in case it isn't clear: I would oppose any specific proposal > to add this Ascii85 algorithm to the standard library. It would sound > like we don't have any real problems to solve. Original intent (encoding IPV6 addresses) != current usefulness (a more efficient ascii encoding of binary data). Generally, I'm of the opinion that base64 (as an ascii encoding of binary data) is sufficient for any needs I have, but there are cases where having a more efficient representation would be useful. I would also not suggest addition in the 2.6/3.0 timeframe, at best it would be 2.7/3.1, and only if someone submits a patch with testcases (note that the wiki page provides C source for one-shot encoding and decoding that doesn't require 128-bit arithmetic). Sounds to me like a project for the OP. - Josiah ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Base-96
On Sat, Aug 2, 2008 at 10:37 AM, Josiah Carlson <[EMAIL PROTECTED]> wrote: > On Sat, Aug 2, 2008 at 10:09 AM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >> Josiah Carlson wrote: >>> The standard high-bit-density encoding past base-64 is base-85 >>> (http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes >>> as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes. It works, >>> is an RFC somewhere, >> >> RFC 1924, published on April 1, 1996, to shorten the representation >> of IPv6 addresses, so that you can write >> >> ssh '4)+k&C#VzJ4br>0wv%Yp' >> >> instead of having to write >> >> ssh 1080:0:0:0:8:800:200C:417A >> >> Most notably, section 7 (implementation issues) points out >> >> Many current processors do not find 128 bit integer arithmetic, as >> required for this technique, a trivial operation. This is not >> considered a serious drawback in the representation, but a flaw of >> the processor designs. >> >> For arbitrary-sized data, you'd have to give up 128-bit arithmetic, >> of course, and represent the input data to encode as a long integer. >> >> Regards, >> Martin >> >> P.S. Just in case it isn't clear: I would oppose any specific proposal >> to add this Ascii85 algorithm to the standard library. It would sound >> like we don't have any real problems to solve. Same here. > Original intent (encoding IPV6 addresses) != current usefulness (a > more efficient ascii encoding of binary data). That was an April Fool's RFC. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Base-85
Martin v. Löwis v.loewis.de> writes: > > P.S. Just in case it isn't clear: I would oppose any specific proposal > to add this Ascii85 algorithm to the standard library. It would sound > like we don't have any real problems to solve. According to Wikipedia, "its main modern use is in Adobe's PostScript and Portable Document Format file formats". It is also used by git for diffs of binary files, and those diffs are supposedly understood by other VCSes like Mercurial... indeed, Mercurial has a Python extension for base85 encoding (but licensed under the GPL): http://selenic.com/hg/index.cgi/file/cbdfd08eabc9/mercurial/base85.c (I suppose Bazaar has something similar) Endly, since this encoding allows to pack more bytes into the same number of ASCII characters than its traditional alternatives, it is likely to gain traction in applications which need to create a pure ASCII representation of binary data. Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Base-85
On Sat, Aug 2, 2008 at 12:58 PM, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > It is also used by git for diffs of binary files, and those diffs are > supposedly > understood by other VCSes like Mercurial... I'm very interested in this (for Rietveld). Where can I learn more about how git handles diffs of binary files? Does it actually show adds and deletes of sections of the file? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Base-96
On Sat, Aug 2, 2008 at 11:57 AM, Guido van Rossum <[EMAIL PROTECTED]> wrote: > That was an April Fool's RFC. See also http://en.wikipedia.org/wiki/April_Fools%27_Day_RFC -- it has a ton of these. Great fun reading through some of them on an idle Saturday afternoon. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Base-85
Le samedi 02 août 2008 à 14:07 -0700, Guido van Rossum a écrit :
> On Sat, Aug 2, 2008 at 12:58 PM, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> > It is also used by git for diffs of binary files, and those diffs are
> > supposedly
> > understood by other VCSes like Mercurial...
>
> I'm very interested in this (for Rietveld). Where can I learn more
> about how git handles diffs of binary files? Does it actually show
> adds and deletes of sections of the file?
>
Well, I'm not sure. I just tried with Mercurial, first committing a
binary file with the following structure:
part1 part3
and then changing it to the following structure:
part1 part2 part3 part2
(part{1,2,3} being some binary chunks of 400 bytes each
from /dev/urandom)
The "git-style" diff given by Mercurial is then:
diff --git a/binfile b/binfile
index
acfa6ffc5287c6e9cd400af7b8ab09d072a28b02..5b9a69212ae8f39bf41fbf2194db2b730dcb0ae9
GIT binary patch
literal 1600
zc%1Fi`#%#1003~2SQat~^D2)~Q>OB2Y-`ijkz&}wZbC#-o=0t7jaVepQ|D0_8*3gJ
[EMAIL PROTECTED]|_rrbvhcAJotX5m|hB+RX(Aa5xSa4Y^GkS%y10Hva
z3^q{I&[EMAIL PROTECTED]&+M&87%65P%egC&>7+Bgzx0-lUziyCW?%ELc
z;eHsAnXOY+YY~y3f6~CD+?JujZGa*JV=V-x-twhC^~z}e+->VcW=&UqfNg97Mxf3d
zP2!VM#<4|n+(B|5rOUMBfQ=w}vEdoi_TK&[EMAIL PROTECTED]
zqYkn__U%akFI([EMAIL PROTECTED]<}W8T2DXa{`gkjO1RO?{-Yz3
z-yd-sVx%pSu0elCXI*-RPErV~&bEbl*yk6ff?mV<[EMAIL PROTECTED]
zvw~4tJo7xMUB<[EMAIL PROTECTED]pWHk^!$4x~AJ$M!$>LrjA^UDjlQD0igDHf?>kfTL8C
zT`$ToUViVhgSRSdJij81#v>3jDpj>m&hGZ(a4UQqYSc{FI)@&=mHL-D&8&McIqHsU
zfI-aCDfNmLo!AZjn=0JV{sMJcsTSiO;$}r>P(?>*s6&cpc-Lu5__+c38NPW>O=Ze3
zuFNT^b+XahK^P*sIU93zA4btzW?3CiviE&Xi9>Bo
zUeM*=71Le)&!sv725-*vgS1naIlau
zItQO-*hG^8yP=|O<[EMAIL PROTECTED]
zL(7(rp`ZFnuqIS(+L{IckNZ%BsAvS7nS6!Qvhd<7{m}yj9>[EMAIL PROTECTED];vRo8VN
z{7^{BvC^#ss2zI20|7B%d7&`+;}M|Lb4kcpnT&a>ztw$PNWnC?zuEIbUm<$TLPu{@
zFY}#Iu~hB^StZra*ohA%zVI_1hLduWZa6S~5X9x-A^(HQYtpfwIrX2BO0ucP;atl(
[EMAIL PROTECTED]&U#BnZ>E7FQ)[EMAIL PROTECTED]
zBp#{xc{d%j`Q}GwfJ*5YN5L1jO$egiqV_Dr=2I4()xVmuAhxjtq%Tv8Xpj0eNbDC`
zr)~pTMOKjpr4)Ts!=w3K?TKMAmsB;SKU4G([EMAIL PROTECTED]
zz=K?Re2pPf)sp-XNp`QKAIQ}l7Qjr5SbMa;QUT4(|#}kQ>Zx58&XEdwZV}rT}gi
zAJ(u*feBc}#SFv;[EMAIL PROTECTED];Ya([EMAIL PROTECTED])ms1`
zEe~IK=-G~B2d_gQ&0mTKxO#ub8Q)9wI!6xVMVM}&lvlFHsV+BT+H~fhnnW~!EEnc+
zD=DAs=Y01@@>AfcR=Yz7$Z5mZ8P~c0From that I don't know what can be done with the diff. Looking at the
Mercurial source code suggests that you can encode deltas in the patch,
but that Mercurial doesn't support it (see "# TODO: deltas"):
http://www.selenic.com/hg/index.cgi/file/cbdfd08eabc9/mercurial/patch.py#l1117
A basic explanation of binary diffs here:
http://www.selenic.com/pipermail/mercurial/2008-July/020184.html
The explanation mentions base-64 but it was corrected in a later message
here:
http://www.selenic.com/pipermail/mercurial/2008-July/020192.html
Regards
Antoine.
PS: here are the commands I've typed:
$ hg init bindiff
$ cd bindiff/
$ dd if=/dev/urandom of=part1 bs=1 count=400
[snip output]
$ dd if=/dev/urandom of=part2 bs=1 count=400
[snip output]
$ dd if=/dev/urandom of=part3 bs=1 count=400
[snip output]
$ cat part1 part3 > binfile
$ hg add binfile
$ hg ci -m "added binfile"
$ cat part1 part2 part3 > binfile
$ hg di
diff -r 19cfb10c4a01 binfile
Binary file binfile has changed
$ hg di --git
[produces the patch above]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] What to do with uuid?
I was running the test suite and I noticed test_uuid's warning message is still up: test_uuid WARNING: uuid.getnode is unreliable on many platforms. It is disabled until the code and/or test can be fixed properly. WARNING: uuid._ifconfig_getnode is unreliable on many platforms. It is disabled until the code and/or test can be fixed properly. WARNING: uuid._unixdll_getnode is unreliable on many platforms. It is disabled until the code and/or test can be fixed properly. The state of uuid has been like this for a while now. Are we going to actively try to fix this, or was uuid a mistake? I guess my real question is how long are we willing to let the code sit in this partial state before we decide the module just is not getting enough attention? -Brett ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Base-95 (Re: Base-96)
Kless wrote: So the next encoding possible would of base-128 (7-bits encoding) A while ago I wanted to pack as much information as possible into a string of printable characters, and I came up with a base-95 encoding that packs 9 bytes into 11 characters. The application involved representing data using Python string literals, so it was important that only printable characters were used. I settled on the 9/11 combination as a reasonable compromise between packing efficiency and not having the block size too long. If anyone's interested, I could dig out the encoding and decoding routines I wrote. -- Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
