date:20080802

Re: [Python-Dev] 3.0 C API to decode bytes into unicode?

2008-08-02 Thread Barry Scott



On Aug 1, 2008, at 14:30, M.-A. Lemburg wrote:


On 2008-08-01 15:06, Barry Scott wrote:
I cannot see how I implement decode() for bytes objects using the  
C API

for PyCXX library,
I'd assuming that I should find a PyBytes_Decode function but  
cannot find it

in beta 2.
What is the preferred way to do this?


PyUnicode_FromEncodedObject() should to the trick.


Thanks thats what I've use.

Barry

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Base-96

2008-08-02 Thread Kless

It's true, I didn't pay attention to that.

So the next encoding possible would of base-128 (7-bits encoding),
althought I don't know if were possible since that there would than
use non-printable characters and could change the text (by use of
chars. as Backspace or Delete).

On 2 ago, 03:21, Steve Holden <[EMAIL PROTECTED]> wrote:
> 96 is approximately 2^6.585
>
> For some reason, integral powers of two seem so much more, well,
> POWERFUL, if you know what I mean. Frankly I think you are being either
> optimistic or charitable in suggesting that such a use case might exist.
>
> There's a reason that DEC called their equivalent of base64 "6-bit
> encoding".
>
> But then I wanted to keep integer division as it was, so I am clearly a
> techno-luddite. If the world wants fractional bits I'm sure it's only a
> matter of time before some genius decides to design a 67.9-bit computer.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Base-96

2008-08-02 Thread Josiah Carlson

The standard high-bit-density encoding past base-64 is base-85
(http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes
as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes.  It works,
is an RFC somewhere, ... and maybe should find it's way into the
Python standard library's codec package at some point.

 - Josiah

On Sat, Aug 2, 2008 at 12:57 AM, Kless <[EMAIL PROTECTED]> wrote:
> It's true, I didn't pay attention to that.
>
> So the next encoding possible would of base-128 (7-bits encoding),
> althought I don't know if were possible since that there would than
> use non-printable characters and could change the text (by use of
> chars. as Backspace or Delete).
>
> On 2 ago, 03:21, Steve Holden <[EMAIL PROTECTED]> wrote:
>> 96 is approximately 2^6.585
>>
>> For some reason, integral powers of two seem so much more, well,
>> POWERFUL, if you know what I mean. Frankly I think you are being either
>> optimistic or charitable in suggesting that such a use case might exist.
>>
>> There's a reason that DEC called their equivalent of base64 "6-bit
>> encoding".
>>
>> But then I wanted to keep integer division as it was, so I am clearly a
>> techno-luddite. If the world wants fractional bits I'm sure it's only a
>> matter of time before some genius decides to design a 67.9-bit computer.
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Base-96

2008-08-02 Thread Martin v. Löwis

Josiah Carlson wrote:
> The standard high-bit-density encoding past base-64 is base-85
> (http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes
> as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes.  It works,
> is an RFC somewhere,

RFC 1924, published on April 1, 1996, to shorten the representation
of IPv6 addresses, so that you can write

  ssh '4)+k&C#VzJ4br>0wv%Yp'

instead of having to write

  ssh 1080:0:0:0:8:800:200C:417A

Most notably, section 7 (implementation issues) points out

   Many current processors do not find 128 bit integer arithmetic, as
   required for this technique, a trivial operation.  This is not
   considered a serious drawback in the representation, but a flaw of
   the processor designs.

For arbitrary-sized data, you'd have to give up 128-bit arithmetic,
of course, and represent the input data to encode as a long integer.

Regards,
Martin

P.S. Just in case it isn't clear: I would oppose any specific proposal
to add this Ascii85 algorithm to the standard library. It would sound
like we don't have any real problems to solve.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Base-96

2008-08-02 Thread Josiah Carlson

On Sat, Aug 2, 2008 at 10:09 AM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
>> The standard high-bit-density encoding past base-64 is base-85
>> (http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes
>> as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes.  It works,
>> is an RFC somewhere,
>
> RFC 1924, published on April 1, 1996, to shorten the representation
> of IPv6 addresses, so that you can write
>
>  ssh '4)+k&C#VzJ4br>0wv%Yp'
>
> instead of having to write
>
>  ssh 1080:0:0:0:8:800:200C:417A
>
> Most notably, section 7 (implementation issues) points out
>
>   Many current processors do not find 128 bit integer arithmetic, as
>   required for this technique, a trivial operation.  This is not
>   considered a serious drawback in the representation, but a flaw of
>   the processor designs.
>
> For arbitrary-sized data, you'd have to give up 128-bit arithmetic,
> of course, and represent the input data to encode as a long integer.
>
> Regards,
> Martin
>
> P.S. Just in case it isn't clear: I would oppose any specific proposal
> to add this Ascii85 algorithm to the standard library. It would sound
> like we don't have any real problems to solve.

Original intent (encoding IPV6 addresses) != current usefulness (a
more efficient ascii encoding of binary data).  Generally, I'm of the
opinion that base64 (as an ascii encoding of binary data) is
sufficient for any needs I have, but there are cases where having a
more efficient representation would be useful. I would also not
suggest addition in the 2.6/3.0 timeframe, at best it would be
2.7/3.1, and only if someone submits a patch with testcases (note that
the wiki page provides C source for one-shot encoding and decoding
that doesn't require 128-bit arithmetic).

Sounds to me like a project for the OP.

 - Josiah
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Base-96

2008-08-02 Thread Guido van Rossum

On Sat, Aug 2, 2008 at 10:37 AM, Josiah Carlson
<[EMAIL PROTECTED]> wrote:
> On Sat, Aug 2, 2008 at 10:09 AM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>> Josiah Carlson wrote:
>>> The standard high-bit-density encoding past base-64 is base-85
>>> (http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes
>>> as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes.  It works,
>>> is an RFC somewhere,
>>
>> RFC 1924, published on April 1, 1996, to shorten the representation
>> of IPv6 addresses, so that you can write
>>
>>  ssh '4)+k&C#VzJ4br>0wv%Yp'
>>
>> instead of having to write
>>
>>  ssh 1080:0:0:0:8:800:200C:417A
>>
>> Most notably, section 7 (implementation issues) points out
>>
>>   Many current processors do not find 128 bit integer arithmetic, as
>>   required for this technique, a trivial operation.  This is not
>>   considered a serious drawback in the representation, but a flaw of
>>   the processor designs.
>>
>> For arbitrary-sized data, you'd have to give up 128-bit arithmetic,
>> of course, and represent the input data to encode as a long integer.
>>
>> Regards,
>> Martin
>>
>> P.S. Just in case it isn't clear: I would oppose any specific proposal
>> to add this Ascii85 algorithm to the standard library. It would sound
>> like we don't have any real problems to solve.

Same here.

> Original intent (encoding IPV6 addresses) != current usefulness (a
> more efficient ascii encoding of binary data).

That was an April Fool's RFC.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Base-85

2008-08-02 Thread Antoine Pitrou

Martin v. Löwis  v.loewis.de> writes:
> 
> P.S. Just in case it isn't clear: I would oppose any specific proposal
> to add this Ascii85 algorithm to the standard library. It would sound
> like we don't have any real problems to solve.

According to Wikipedia, "its main modern use is in Adobe's PostScript and
Portable Document Format file formats".

It is also used by git for diffs of binary files, and those diffs are supposedly
understood by other VCSes like Mercurial... indeed, Mercurial has a Python
extension for base85 encoding (but licensed under the GPL):
http://selenic.com/hg/index.cgi/file/cbdfd08eabc9/mercurial/base85.c
(I suppose Bazaar has something similar)

Endly, since this encoding allows to pack more bytes into the same number of
ASCII characters than its traditional alternatives, it is likely to gain
traction in  applications which need to create a pure ASCII representation of
binary data.

Regards

Antoine.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Base-85

2008-08-02 Thread Guido van Rossum

On Sat, Aug 2, 2008 at 12:58 PM, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> It is also used by git for diffs of binary files, and those diffs are 
> supposedly
> understood by other VCSes like Mercurial...

I'm very interested in this (for Rietveld). Where can I learn more
about how git handles diffs of binary files? Does it actually show
adds and deletes of sections of the file?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Base-96

2008-08-02 Thread Guido van Rossum

On Sat, Aug 2, 2008 at 11:57 AM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> That was an April Fool's RFC.

See also http://en.wikipedia.org/wiki/April_Fools%27_Day_RFC -- it has
a ton of these. Great fun reading through some of them on an idle
Saturday afternoon. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Base-85

2008-08-02 Thread Antoine Pitrou

Le samedi 02 août 2008 à 14:07 -0700, Guido van Rossum a écrit :
> On Sat, Aug 2, 2008 at 12:58 PM, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> > It is also used by git for diffs of binary files, and those diffs are 
> > supposedly
> > understood by other VCSes like Mercurial...
> 
> I'm very interested in this (for Rietveld). Where can I learn more
> about how git handles diffs of binary files? Does it actually show
> adds and deletes of sections of the file?
> 

Well, I'm not sure. I just tried with Mercurial, first committing a
binary file with the following structure:
part1 part3
and then changing it to the following structure:
part1 part2 part3 part2

(part{1,2,3} being some binary chunks of 400 bytes each
from /dev/urandom)

The "git-style" diff given by Mercurial is then:

diff --git a/binfile b/binfile
index 
acfa6ffc5287c6e9cd400af7b8ab09d072a28b02..5b9a69212ae8f39bf41fbf2194db2b730dcb0ae9
GIT binary patch
literal 1600
zc%1Fi`#%#1003~2SQat~^D2)~Q>OB2Y-`ijkz&}wZbC#-o=0t7jaVepQ|D0_8*3gJ
[EMAIL PROTECTED]|_rrbvhcAJotX5m|hB+RX(Aa5xSa4Y^GkS%y10Hva
z3^q{I&[EMAIL PROTECTED]&+M&87%65P%egC&>7+Bgzx0-lUziyCW?%ELc
z;eHsAnXOY+YY~y3f6~CD+?JujZGa*JV=V-x-twhC^~z}e+->VcW=&UqfNg97Mxf3d
zP2!VM#<4|n+(B|5rOUMBfQ=w}vEdoi_TK&[EMAIL PROTECTED]
zqYkn__U%akFI([EMAIL PROTECTED]<}W8T2DXa{`gkjO1RO?{-Yz3
z-yd-sVx%pSu0elCXI*-RPErV~&bEbl*yk6ff?mV<[EMAIL PROTECTED]
zvw~4tJo7xMUB<[EMAIL PROTECTED]pWHk^!$4x~AJ$M!$>LrjA^UDjlQD0igDHf?>kfTL8C
zT`$ToUViVhgSRSdJij81#v>3jDpj>m&hGZ(a4UQqYSc{FI)@&=mHL-D&8&McIqHsU
zfI-aCDfNmLo!AZjn=0JV{sMJcsTSiO;$}r>P(?>*s6&cpc-Lu5__+c38NPW>O=Ze3
zuFNT^b+XahK^P*sIU93zA4btzW?3CiviE&Xi9>Bo
zUeM*=71Le)&!sv725-*vgS1naIlau
zItQO-*hG^8yP=|O<[EMAIL PROTECTED]
zL(7(rp`ZFnuqIS(+L{IckNZ%BsAvS7nS6!Qvhd<7{m}yj9>[EMAIL PROTECTED];vRo8VN
z{7^{BvC^#ss2zI20|7B%d7&`+;}M|Lb4kcpnT&a>ztw$PNWnC?zuEIbUm<$TLPu{@
zFY}#Iu~hB^StZra*ohA%zVI_1hLduWZa6S~5X9x-A^(HQYtpfwIrX2BO0ucP;atl(
[EMAIL PROTECTED]&U#BnZ>E7FQ)[EMAIL PROTECTED]
zBp#{xc{d%j`Q}GwfJ*5YN5L1jO$egiqV_Dr=2I4()xVmuAhxjtq%Tv8Xpj0eNbDC`
zr)~pTMOKjpr4)Ts!=w3K?TKMAmsB;SKU4G([EMAIL PROTECTED]
zz=K?Re2pPf)sp-XNp`QKAIQ}l7Qjr5SbMa;Q&#UT4(|#}kQ>Zx58&XEdwZV}rT}gi
zAJ(u*feBc}#SFv;[EMAIL PROTECTED];Ya([EMAIL PROTECTED])ms1`
zEe~IK=-G~B2d_gQ&0mTKxO#ub8Q)9wI!6xVMVM}&lvlFHsV+BT+H~fhnnW~!EEnc+
zD=DAs=Y01@@>AfcR=Yz7$Z5mZ8P~c0From that I don't know what can be done with the diff. Looking at the
Mercurial source code suggests that you can encode deltas in the patch,
but that Mercurial doesn't support it (see "# TODO: deltas"):
http://www.selenic.com/hg/index.cgi/file/cbdfd08eabc9/mercurial/patch.py#l1117

A basic explanation of binary diffs here:
http://www.selenic.com/pipermail/mercurial/2008-July/020184.html
The explanation mentions base-64 but it was corrected in a later message
here:
http://www.selenic.com/pipermail/mercurial/2008-July/020192.html

Regards

Antoine.


PS: here are the commands I've typed:

$ hg init bindiff
$ cd bindiff/
$ dd if=/dev/urandom of=part1 bs=1 count=400
[snip output]
$ dd if=/dev/urandom of=part2 bs=1 count=400
[snip output]
$ dd if=/dev/urandom of=part3 bs=1 count=400
[snip output]
$ cat part1 part3 > binfile
$ hg add binfile
$ hg ci -m "added binfile"
$ cat part1 part2 part3 > binfile
$ hg di
diff -r 19cfb10c4a01 binfile
Binary file binfile has changed
$ hg di --git
[produces the patch above]


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] What to do with uuid?

2008-08-02 Thread Brett Cannon

I was running the test suite and I noticed test_uuid's warning message
is still up:

test_uuid
WARNING: uuid.getnode is unreliable on many platforms.
It is disabled until the code and/or test can be fixed properly.
WARNING: uuid._ifconfig_getnode is unreliable on many platforms.
It is disabled until the code and/or test can be fixed properly.
WARNING: uuid._unixdll_getnode is unreliable on many platforms.
It is disabled until the code and/or test can be fixed properly.

The state of uuid has been like this for a while now. Are we going to
actively try to fix this, or was uuid a mistake? I guess my real
question is how long are we willing to let the code sit in this
partial state before we decide the module just is not getting enough
attention?

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Base-95 (Re: Base-96)

2008-08-02 Thread Greg Ewing


Kless wrote:


So the next encoding possible would of base-128 (7-bits encoding)


A while ago I wanted to pack as much information as
possible into a string of printable characters, and
I came up with a base-95 encoding that packs 9 bytes
into 11 characters.

The application involved representing data using
Python string literals, so it was important that only
printable characters were used. I settled on the
9/11 combination as a reasonable compromise between
packing efficiency and not having the block size
too long.

If anyone's interested, I could dig out the
encoding and decoding routines I wrote.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.0 C API to decode bytes into unicode?

Re: [Python-Dev] Base-96

Re: [Python-Dev] Base-96

Re: [Python-Dev] Base-96

Re: [Python-Dev] Base-96

Re: [Python-Dev] Base-96

Re: [Python-Dev] Base-85

Re: [Python-Dev] Base-85

Re: [Python-Dev] Base-96

Re: [Python-Dev] Base-85

[Python-Dev] What to do with uuid?

[Python-Dev] Base-95 (Re: Base-96)

12 matches

Site Navigation

Mail list logo

Footer information