Re: A few questiosn about encoding

2013-06-13 Thread jmfauth

--

UTF-8, Unicode (consortium): 1 to 4 *Unicode Transformation Unit*

UTF-8, ISO 10646: 1 to 6 *Unicode Transformation Unit*

(still actual, unless tealy freshly modified)

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish = Greek (subprocess complain)

2013-06-10 Thread jmfauth

-

A coding scheme works with three sets. A *unique* set
of CHARACTERS, a *unique* set of CODE POINTS and a *unique*
set of ENCODED CODE POINTS, unicode or not.

The relation between the set of characters and the set of the
code points is a *human* table, created with a sheet of paper
and a pencil, a deliberate choice of characters with integers
as labels.

The relation between the set of the code points and the
set of encoded code points is a mathematical operation.

In the case of an 8bits coding scheme, like iso-XXX,
this operation is a no-op, the relation is an identity.
Shortly: set of code points == set of encoded code points.

In the case of unicode, The Unicode consortium endorses
three such mathematical operations called UTF-8, UTF-16 and
UTF-32 where UTF means Unicode Transformation Format, a
confusing wording meaning at the same time, the process
and the result of the process. This Unicode Transformation does
not produce bytes, it produces words/chunks/tokens of *bits* with
lengths 8, 16, 32, called Unicode Transformation Units (from this
the names UTF-8, -16, -32). At this level, only a structure has
been defined (there is no computing). Very important, an healthy
coding scheme works conceptually only with this *unique set
of encoded code points, not with bytes, characters or code points.

The last step, the machine implementation: it is up to the
processor, the compiler, the language to implement all these
Unicode Transformation Units with of course their related
specifities: char, w_char, int, long, endianess, rune (Go
language), ...

Not too over-simplified or not too over-complicated and enough
to understand one, if not THE, design mistake of the flexible
string representation.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish = Greek (subprocess complain)

2013-06-05 Thread jmfauth
On 5 juin, 19:43, Νικόλαος Κούρας nikos.gr...@gmail.com wrote:
 Ôç ÔåôÜñôç, 5 Éïõíßïõ 2013 8:56:36 ð.ì. UTC+3, ï ÷ñÞóôçò Steven D'Aprano 
 Ýãñáøå:

 Somehow, I don't know how because I didn't see it happen, you have one or
 more files in that directory where the file name as bytes is invalid when
 decoded as UTF-8, but your system is set to use UTF-8. So to fix this you
 need to rename the file using some tool that doesn't care quite so much
 about encodings. Use the bash command line to rename each file in turn
 until the problem goes away.

 But renaming ia hsell access like 'mv 'Euxi tou Ihsou.mp3' 'Åõ÷Þ ôïõ 
 Éçóïõ.mp3' leade to that unknown encoding of this bytestream 
 '\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3'

 But please tell me Steven what linux tool you think it can encode the weird 
 filename to proper 'Åõ÷Þ ôïõ Éçóïõ.mp3' utf-8?

 or we cna write a script as i suggested to decode back the bytestream using 
 all sorts of available decode charsets boiling down to the original greek 
 letters.

---

see
http://bugs.python.org/issue13643, msg msg149949 - (view)   Author:
Antoine Pitrou (pitrou)


Quote:

So, you're complaining about something which works, kind of:

$ touch héhé
$ LANG=C python3 -c import os; print(os.listdir())
['h\udcc3\udca9h\udcc3\udca9']

 This makes robustly working with non-ascii filenames on different
 platforms needlessly annoying, given no modern nix should have problems
 just using UTF-8 in these cases.

So why don't these supposedly modern systems at least set the
appropriate environment variables for Python to infer the proper
character encoding?
(since these modern systems don't have a well-defined encoding...)

Answer: because they are not modern at all, they are antiquated,
inadapted and obsolete pieces of software designed and written by
clueless Anglo-American people. Please report bugs against these
systems. The culprit is not Python, it's the Unix crap and the utterly
clueless attitude of its maintainers (filesystems are just bytes,
yeah, whatever...).

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyWart: The problem with print

2013-06-04 Thread jmfauth
On 2 juin, 20:09, Rick Johnson rantingrickjohn...@gmail.com wrote:

 
 

 I never purposely inject ANY superfluous cycles in my code except in
 the case of testing or development. To me it's about professionalism.
 Let's consider a thought exercise shall we?






The flexible string representation is the perfect example
of this lack of professionalism.
Wrong by design, a non understanding of the mathematical logic,
of the coding of characters, of Unicode and of the usage of
characters (everything is tight together).

How is is possible to arrive to such a situation ?
The answer if far beyond my understanding (although
I have my opinion on the subject).

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python b'...' notation

2013-05-31 Thread jmfauth
On 31 mai, 00:19, alcyon st...@terrafirma.us wrote:
 On Wednesday, May 29, 2013 3:19:42 PM UTC-7, Cameron Simpson wrote:
  On 29May2013 13:14, Ian Kelly ian.g.ke...@gmail.com wrote:

  | On Wed, May 29, 2013 at 12:33 PM, alcyon st...@terrafirma.us wrote:

  |  This notation displays hex values except when they are 'printable', in 
  which case it displays that printable character.  How do I get it to force 
  hex for all bytes?  Thanks, Steve

  |

  | Is this what you want?

  |

  |  ''.join('%02x' % x for x in b'hello world')

  | '68656c6c6f20776f726c64'

  Not to forget binascii.hexlify.

  --

  Cameron Simpson c...@zip.com.au

  Every particle continues in its state of rest or uniform motion in a 
  straight

  line except insofar as it doesn't.      - Sir Arther Eddington

 Thanks for the binascii.hexlify tip. I was able to make it work but I did 
 have to write a function to get it exactly the string I wanted.  I wanted, 
 for example, b'\n\x00' to display as 0x0A 0x00 or b'!\xff(\xc0' to 
 display as 0x21 0xFF 0x28 0xC0.



 a = b'!\xff(\xc0\n\x00'
 z = ['0x{:02X}'.format(c) for c in b]
 z
['0x21', '0xFF', '0x28', '0xC0', '0x0A', '0x00']
 s = ' '.join(z)
 s
'0x21 0xFF 0x28 0xC0 0x0A 0x00'


jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to get an integer from a sequence of bytes

2013-05-30 Thread jmfauth
On 30 mai, 20:42, Ian Kelly ian.g.ke...@gmail.com wrote:
 On Thu, May 30, 2013 at 12:26 PM, Mok-Kong Shen

 mok-kong.s...@t-online.de wrote:
  Am 27.05.2013 17:30, schrieb Ned Batchelder:

  On 5/27/2013 10:45 AM, Mok-Kong Shen wrote:

  From an int one can use to_bytes to get its individual bytes,
  but how can one reconstruct the int from the sequence of bytes?

  The next thing in the docs after int.to_bytes is int.from_bytes:
 http://docs.python.org/3.3/library/stdtypes.html#int.from_bytes

  I am sorry to have overlooked that. But one thing I yet wonder is why
  there is no direct possibilty of converting a byte to an int in [0,255],
  i.e. with a constrct int(b), where b is a byte.

 The bytes object can be viewed as a sequence of ints.  So if b is a
 bytes object of non-zero length, then b[0] is an int in range(0, 256).



Well, Python now speaks only integer, the rest is
commodity and there is a good coherency.

 bin(255)
'0b'
 oct(255)
'0o377'
 255
255
 hex(255)
'0xff'

 int('0b', 2)
255
 int('0o377', 8)
255
 int('255')
255
 int('0xff', 16)
255

 0b
255
 0o377
255
 255
255
 0xff
255

 type(0b)
class 'int'
 type(0o377)
class 'int'
 type(255)
class 'int'
 type(0xff)
class 'int'

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Harmonic distortion of a input signal

2013-05-23 Thread jmfauth
On 20 mai, 19:56, Christian Gollwitzer aurio...@gmx.de wrote:
 Oops, I thought we were posting to comp.dsp. Nevertheless, I think
 numpy.fft does mixed-radix (can't check it now)

 Am 20.05.13 19:50, schrieb Christian Gollwitzer:







  Am 20.05.13 19:23, schrieb jmfauth:
  Non sense.

  Dito.

  The discrete fft algorithm is valid only if the number of data
  points you transform does correspond to a power of 2 (2**n).

  Where did you get this? The DFT is defined for any integer point number
  the same way.

  Just if you want to get it fast, you need to worry about the length. For
  powers of two, there is the classic Cooley-Tukey. But there do exist FFT
  algorithms for any other length. For example, there is the Winograd
  transform for a set of small numbers, there is mixed-radix to reduce
  any length which can be factored, and there is finally Bluestein which
  works for any size, even for a prime. All of the aforementioned
  algorithms are O(log n) and are implemented in typical FFT packages. All
  of them should result (up to rounding differences) in the same thing as
  the naive DFT sum. Therefore, today

  Keywords to the problem: apodization, zero filling, convolution
  product, ...

  Not for a periodic signal of integer length.

  eg.http://en.wikipedia.org/wiki/Convolution

  How long do you read this group?

       Christian

--

Forget what I wrote.
I'm understanding what I wanted to say, it is badly
formulated.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Harmonic distortion of a input signal

2013-05-20 Thread jmfauth
Non sense.

The discrete fft algorithm is valid only if the number of data
points you transform does correspond to a power of 2 (2**n).

Keywords to the problem: apodization, zero filling, convolution
product, ...

eg. http://en.wikipedia.org/wiki/Convolution

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Diacretical incensitive search

2013-05-17 Thread jmfauth



The handling of diacriticals is especially a nice case
study. One can use it to toy with some specific features of
Unicode, normalisation, decomposition, ...

... and also to show how Unicode can be badly implemented.

First and quick example that came to my mind (Py325 and Py332):

 timeit.repeat(ud.normalize('NFKC', ud.normalize('NFKD', 'ᶑḗḖḕḹ')), 
 import unicodedata as ud)
[2.929404406789672, 2.923327801150208, 2.923659417064755]

 timeit.repeat(ud.normalize('NFKC', ud.normalize('NFKD', 'ᶑḗḖḕḹ')), 
 import unicodedata as ud)
[3.8437222586746884, 3.829490737203514, 3.819266963414293]

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PDF generator decision

2013-05-14 Thread jmfauth
On 14 mai, 17:05, Christian Jurk co...@commx.ws wrote:
 Hi folks,

 This questions may be asked several times already, but the development of 
 relevant software continues day-for-day. For some time now I've been using 
 xhtml2pdf [1] to generate PDF documents from HTML templates (which are 
 rendered through my Django-based web application. This have been working for 
 some time now but I'm constantly adding new templates and they are not 
 looking like I want it (sometimes bold text is bold, sometimes not, layout 
 issues, etc). I'd like to use something else than xhtml2pdf.

 So far I'd like to ask which is the (probably) best way to create PDFs in 
 Python (3)? It is important for me that I am able to specify not only 
 background graphics, paragaphs, tables and so on but also to specify page 
 headers/footers. The reason is that I have a bunch of documents to be 
 generated (including Invoice templates, Quotes - stuff like that).

 Any advice is welcome. Thanks.

 [1]https://github.com/chrisglass/xhtml2pdf

-

1) Use Python to collect your data (db, pictures, texts, ...)
and/or to create the material (text, graphics, ...) that will
be the contents (source) of your your pdf's.
2) Put this source in .tex file (a plain text file).
3) Let it compile with a TeX engine.

- I can not figure out something more versatile and basically
simple (writing a text file).
- Do not forget you are the only one who knows the content
and the layout of your document(s).

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode humor

2013-05-10 Thread jmfauth
On 8 mai, 15:19, Roy Smith r...@panix.com wrote:
 Apropos to any of the myriad unicode threads that have been going on
 recently:

 http://xkcd.com/1209/

--


This reflects a lack of understanding of Unicode.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why do Perl programmers make more money than Python programmers

2013-05-07 Thread jmfauth
On 6 mai, 09:49, Fábio Santos fabiosantos...@gmail.com wrote:
 On 6 May 2013 08:34, Chris Angelico ros...@gmail.com wrote:

  Well you see, it was 70 bytes back in the Python 2 days (I'll defer to
  Steven for data points earlier than that), but with Python 3, there
  were two versions: one was 140 bytes representing 70 characters, the
  other 280 bytes representing 70 characters. In Python 3.3, they were
  merged, and a trivial amount of overhead added, so now it's 80 bytes
  representing 70 characters. But you have an absolute guarantee that
  it's correct now.

  Of course, the entire code can be represented as a single int now. You
  used to have to use a long.

  ChrisA
  --

 Thanks. You have made my day.

 I may rise the average pay of a Python programmer in Portugal. I have asked
 for a raise back in December, and was told that it wouldn't happen before
 this year. I have done well. I think I deserve better pay than a
 supermarket employee now. I am sure that my efforts were appreciated and I
 will be rewarded. I am being sarcastic.

 The above paragraph wouldn't be true if I programmed in perl, c++ or lisp.


-


1) The memory gain for many of us (usually non ascii users)
just become irrelevant.

 sys.getsizeof('maçã')
41
 sys.getsizeof('abcd')
29

2) More critical, Py 3.3, just becomes non unicode compliant,
(eg European languages or ascii typographers !)

 import timeit
 timeit.timeit('abcd'*1000 + 'a')
2.186670111428325
 timeit.timeit('abcd'*1000 + '€')
2.9951699820528432
 timeit.timeit('abcd'*1000 + 'œ')
3.0036780444886233
 timeit.timeit('abcd'*1000 + 'ẞ')
3.004992278824048
 timeit.timeit('maçã'*1000 + 'œ')
3.231025618708202
 timeit.timeit('maçã'*1000 + '€')
3.215894398100758
 timeit.timeit('maçã'*1000 + 'œ')
3.224407974255655
 timeit.timeit('maçã'*1000 + '’')
3.2206342273566406
 timeit.timeit('abcd'*1000 + '’')
2.991440344906

3) Python is pround to cover the whole unicode range,
unfortunately it breaks the BMP range.
Small GvR exemple (ascii) from the the bug list,
but with non ascii characters.

# Py 3.2, all chars

 timeit.repeat(a = 'hundred'; 'x' in a)
[0.09087790617297742, 0.07456871885972305, 0.07449940353376405]
 timeit.repeat(a = 'maçãé€ẞ'; 'x' in a)
[0.10088136800095526, 0.07488497003487282, 0.07497594640028638]


# Py 3.3 ascii and non ascii chars
 timeit.repeat(a = 'hundred'; 'x' in a)
[0.11426985953005442, 0.10040049292649655, 0.09920834808588097]
 timeit.repeat(a = 'maçãé€ẞ'; 'é' in a)
[0.2345595188256766, 0.21637172864154763, 0.2179096624382737]


There are plenty of good reasons to use Python. There are
also plenty of good reasons to not use (or now to drop)
Python and to realize that if you wish to process text
seriously, you are better served by using corporate
products or tools using Unicode properly.

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Is Unicode support so hard...

2013-04-20 Thread jmfauth
In a previous post,

http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226#
,

Chris “Kwpolska” Warrick wrote:

“Is Unicode support so hard, especially in the 21st century?”

--

Unicode is not really complicate and it works very well (more
than two decades of development if you take into account
iso-14).

But, - I can say, as usual - people prefer to spend their
time to make a better Unicode than Unicode and it usually
fails. Python does not escape to this rule.

-

I'm busy with TeX (unicode engine variant), fonts and typography.
This gives me plenty of ideas to test the flexible string
representation (FSR). I should recognize this FSR is failing
particulary very well...

I can almost say, a delight.

jmf
Unicode lover
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: While loop help

2013-04-09 Thread jmfauth
On 9 avr, 15:32, thomasancill...@gmail.com wrote:
 I'm new to learning python and creating a basic program to convert units of 
 measurement which I will eventually expand upon but im trying to figure out 
 how to loop the entire program. When I insert a while loop it only loops the 
 first 2 lines. Can someone provide a detailed beginner friendly explanation. 
 Here is my program.

 #!/usr/bin/env python
 restart = true
 while restart == true:
 #Program starts here
     print To start the Unit Converter please type the number next to the 
 conversion you would like to perform
     choice = input(\n1:Inches to Meter\n2:Millileters to Pint\n3:Acres to 
 Square-Miles\n)

 #If user enters 1:Program converts inches to meters
     if choice == 1:
         number = int(raw_input(\n\nType the amount in Inches you would like 
 to convert to Meters.\n))
         operation = Inches to Meters
         calc = round(number * .0254, 2)
         print \n,number,Inches =,calc,Meters
         restart = raw_input(If you would like to perform another conversion 
 type: true\n

 #If user enters 2:Program converts millimeters to pints
     elif choice == 2:
         number = int(raw_input(\n\nType the amount in Milliliters you would 
 like to convert to Pints.\n))
         operation = Milliliters to Pints
         calc = round(number * 0.0021134,2)
         print \n,number,Milliliters =,calc,Pints
         restart = raw_input(If you would like to perform another conversion 
 type: true\n)

 #If user enter 3:Program converts kilometers to miles
     elif choice == 3:
         number = int(raw_input(\n\nType the amount in Kilometers you would 
 like to convert to Miles.\n))
         operation = Kilometers to Miles
         calc = round(number * 0.62137,2)
         print \n,number,Kilometers =,calc,Miles
         restart = raw_input(If you would like to perform another conversion 
 type: true\n)

-

More (very) important:

meter: lower case m
kilometre: lower case k
milli: lower case m

http://www.bipm.org/en/home/



Less important:

Start with something simple and increase the complexity eg:

 # Py 3.2
 while True:
... s = input('km: ')
... if s == 'q':
... break
... a = float(s)
... print('{} [kilometre] == {} [metre]'.format(a, a * 1000))
...
km: 1
1.0 [kilometre] == 1000.0 [metre]
km: 1.3456
1.3456 [kilometre] == 1345.6 [metre]
km: q

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: In defence of 80-char lines

2013-04-04 Thread jmfauth
On 4 avr, 03:36, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
 Although PEP 8 is only compulsory for the Python standard library, many
 users like to stick to PEP 8 for external projects.

 http://www.python.org/dev/peps/pep-0008/

 With perhaps one glaring exception: many people hate, or ignore, PEP 8's
 recommendation to limit lines to 80 characters. (Strictly speaking, 79
 characters.)

 Here is a good defence of 80 char lines:

 http://wrongsideofmemphis.com/2013/03/25/80-chars-per-line-is-great/

 --
 Steven

-

With unicode fonts, where even the monospaced fonts
present char widths with a variable width depending on
the unicode block (obvious reasons), speaking of a text
width in chars has not even a sense.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of int/long in Python 3

2013-04-03 Thread jmfauth


This FSR is wrong by design. A naive way to embrace Unicode.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of int/long in Python 3

2013-04-02 Thread jmfauth
On 2 avr, 01:43, Neil Hodgson nhodg...@iinet.net.au wrote:
 Mark Lawrence:

  You've given many examples of the same type of micro benchmark, not many
  examples of different types of benchmark.

     Trying to work out what jmfauth is on about I found what appears to
 be a performance regression with '' string comparisons on Windows
 64-bit. Its around 30% slower on a 25 character string that differs in
 the last character and 70-100% on a 100 character string that differs at
 the end.

     Can someone else please try this to see if its reproducible? Linux
 doesn't show this problem.

  c:\python32\python -u charwidth.py
 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)]
 a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']176
 [0.7116295577956576, 0.7055591343157613, 0.7203483026429418]

 a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']176
 [0.7664397841378787, 0.7199902325464409, 0.713719289812504]

 a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']176
 [0.7341851791817691, 0.6994205901833599, 0.7106807593741005]

 a=['C:/Users/Neil/Documents/','C:/Users/Neil/Documents/']180
 [0.7346812372666784, 0.699543377914, 0.7064768417728411]

  c:\python33\python -u charwidth.py
 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit
 (AMD64)]
 a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']108
 [0.9913326076446045, 0.9455845241056282, 0.9459076605341776]

 a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']192
 [1.0472289217234318, 1.0362342484091207, 1.0197109728048384]

 a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']192
 [1.0439643704533834, 0.9878581050301687, 0.9949265834034335]

 a=['C:/Users/Neil/Documents/','C:/Users/Neil/Documents/']312
 [1.0987483965446412, 1.0130257167690004, 1.024832248526499]

     Here is the code:

 # encoding:utf-8
 import os, sys, timeit
 print(sys.version)
 examples = [
 a=['$b','$z'],
 a=['$λ','$η'],
 a=['$b','$η'],
 a=['$\U0002','$\U00020001']]
 baseDir = C:/Users/Neil/Documents/
 #~ baseDir = C:/Users/Neil/Documents/Visual Studio
 2012/Projects/Sigma/QtReimplementation/HLFKBase/Win32/x64/Debug
 for t in examples:
      t = t.replace($, baseDir)
      # Using os.write as simple way get UTF-8 to stdout
      os.write(sys.stdout.fileno(), t.encode(utf-8))
      print(sys.getsizeof(t))
      print(timeit.repeat(a[0]  a[1],t,number=500))
      print()

     For a more significant performance difference try replacing the
 baseDir setting with (may be wrapped):
 baseDir = C:/Users/Neil/Documents/Visual Studio
 2012/Projects/Sigma/QtReimplementation/HLFKBase/Win32/x64/Debug

     Neil



Hi,

c:\python32\pythonw -u charwidth.py
3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app
\stringbenchz']168
[0.8343414906182101, 0.8336184057396241, 0.8330473419738562]

a=['D:\jm\jmpy\py3app\stringbenchλ','D:\jm\jmpy\py3app
\stringbenchη']168
[0.818378092261062, 0.8180854713107406, 0.8192279926793571]

a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app
\stringbenchη']168
[0.8131353330542339, 0.8126985677326912, 0.8122744051977042]

a=['D:\jm\jmpy\py3app\stringbenchð €€','D:\jm\jmpy\py3app
\stringbench𠀁']172
[0.8271094603211102, 0.82704053883214, 0.8265781741004083]

Exit code: 0
c:\Python33\pythonw -u charwidth.py
3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit
(Intel)]
a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app
\stringbenchz']94
[1.3840254166697845, 1.3933888932429768, 1.391664674507438]

a=['D:\jm\jmpy\py3app\stringbenchλ','D:\jm\jmpy\py3app
\stringbenchη']176
[1.6217970707185678, 1.6279369907932706, 1.6207041728220117]

a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app
\stringbenchη']176
[1.5150522562729396, 1.5130369919353992, 1.5121890607025037]

a=['D:\jm\jmpy\py3app\stringbenchð €€','D:\jm\jmpy\py3app
\stringbench𠀁']316
[1.6135375194801664, 1.6117739170366434, 1.6134331526540109]

Exit code: 0

- win7 32-bits
- The file is in utf-8
- Do not be afraid by this output, it is just a copy/paste for your
excellent editor, the coding output pane is configured to use the
locale
coding.
- Of course and as expected, similar behaviour from a console. (Which
btw
show, how good is you application).

==

Something different.

From a previous msg, on this thread.

---

 Sure. And over a different set of samples, it is less compact. If you
 write a lot of Latin-1, Python will use one byte per character, while
 UTF-8 will use two bytes per character.

I think you mean writing a lot of Latin-1 characters outside
ASCII.
However, even people writing texts in, say, French will find that only
a
small proportion of their text is outside ASCII and so the cost of
UTF-8
is correspondingly small.

The counter-problem is that a French document that needs to
include
one mathematical symbol (or emoji) outside Latin-1 will double in size
as a Python string

Re: Performance of int/long in Python 3

2013-04-02 Thread jmfauth
On 2 avr, 10:03, Chris Angelico ros...@gmail.com wrote:
 On Tue, Apr 2, 2013 at 6:24 PM, jmfauth wxjmfa...@gmail.com wrote:
  An editor may reflect very well the example a gave. You enter
  thousand ascii chars, then - boum - as you enter a non ascii
  char, your editor (assuming is uses a mechanism like the FSR),
  has to internally reencode everything!

 That assumes that the editor stores the entire buffer as a single
 Python string. Frankly, I think this unlikely; the nature of
 insertions and deletions makes this impractical. (I've known editors
 that do function this way. They're utterly unusable on large files.)

 ChrisA



No, no, no, no, ... as we say in French (this is a kindly
form).

The length of a string may have its importance. This
bad behaviour may happen on every char. The most
complicated chars are the chars with diacritics and
ligatured [1, 2] chars, eg chars used in Arabic script [2].

It is somehow funny to see, the FSR fails precisely
on problems Unicode will solve/handle, eg normalization or
sorting [3].

No really a problem for those you are endorsing the good
work Unicode does [5].


[1] A point which was not, in my mind, very well understood
when I read the PEP393 discussion.

[2] Take a unicode TeX compliant engine and toy with
the decomposed form of these chars. A very good way, to
understand what can be really a char, when you wish to
process text seriously.

[3] I only test and tested these chars blindly with the help
of the doc I have. Btw, when I test complicated Arabic chars,
I noticed, Py33 crashes, it does not really crash, it get stucked
in some king of infinite loop (or is it due to timeit?).

[4] Am I the only one who test this kind of stuff?

[5] Unicode is a fascinating construction.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of int/long in Python 3

2013-04-02 Thread jmfauth
On 2 avr, 10:35, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
 On Tue, 02 Apr 2013 19:03:17 +1100, Chris Angelico wrote:

 So what? Who cares if it takes 0.2 second to insert a character
 instead of 0.1 second? That's still a hundred times faster than you
 can type.

-

This not the problem. The interesting point is that they
are good and less good Unicode implementations.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of int/long in Python 3

2013-04-02 Thread jmfauth
On 2 avr, 16:03, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
 On Tue, 02 Apr 2013 11:58:11 +0100, Steve Simmons wrote:

 I'm sure you didn't intend to be insulting, but some of us *have* taken
 JMF seriously, at least at first. His repeated overblown claims of how
 Python is destroying Unicode ...


Sorrry I never claimed this, I'm just seeing on how Python is becoming
less Unicode friendly.


 This feature is a *memory optimization*, not a speed optimization,

I totaly agree, and utf-8 is doing that with a great art. (see Neil
Hodgson
comment).
(Do not interpret this as if i'm saying Python should use utf-8, as
I'have read).

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of int/long in Python 3

2013-04-02 Thread jmfauth
On 2 avr, 18:57, rusi rustompm...@gmail.com wrote:
 On Apr 2, 8:17 pm, Ethan Furman et...@stoneleaf.us wrote:

  Simmons (too many Steves!), I know you're new so don't have all the history 
  with jmf that many
  of us do, but consider that the original post was about numbers, had 
  nothing to do with
  characters or unicode *in any way*, and yet jmf still felt the need to 
  bring unicode up.

 Just for reference, here is the starting para of Chris' original mail
 that started this thread.

  The Python 3 merge of int and long has effectively penalized
  small-number arithmetic by removing an optimization. As we've seen
  from PEP 393 strings (jmf aside), there can be huge benefits from
  having a single type with multiple representations internally. Is
  there value in making the int type have a machine-word optimization in
  the same way?

 ie it mentions numbers, strings, PEP 393 *AND jmf.*  So while it is
 true that jmf has been butting in with trollish behavior into
 completely unrelated threads with his unicode rants, that cannot be
 said for this thread.

-

That's because you did not understand the analogy, int/long - FSR.

One another illustration,

 def AddOne(i):
... if 0  i = 100:
... return i + 10 + 10 + 10 - 10 - 10 - 10 + 1
... elif 100  i = 1000:
... return i + 100 + 100 + 100  + 100 - 100 - 100 - 100 - 100
+ 1
... else:
... return i + 1
...

Do it work? yes.
Is is correct? this can be discussed.

Now replace i by a char, a representent of each subset
of the FSR, select a method where this FST behave badly
and take a look of what happen.


 timeit.repeat('a' * 1000 + 'z')
[0.6532032148133153, 0.6407248807756699, 0.6407264561239894]
 timeit.repeat('a' * 1000 + '9')
[0.6429508479509245, 0.6242782443215589, 0.6240490311410927]


 timeit.repeat('a' * 1000 + '€')
[1.095694927496563, 1.0696347279235603, 1.0687741939041082]
 timeit.repeat('a' * 1000 + 'ẞ')
[1.0796421281222877, 1.0348612767961853, 1.035325216876231]
 timeit.repeat('a' * 1000 + '\u2345')
[1.0855414137412112, 1.0694677410017164, 1.0688096392412945]


 timeit.repeat('œ' * 1000 + '\U00010001')
[1.237314015362017, 1.2226262553064657, 1.21994619397816]
 timeit.repeat('œ' * 1000 + '\U00010002')
[1.245773635836997, 1.2303978424029651, 1.2258257877430765]

Where does it come from? Simple, the FSR breaks the
simple rules used in all coding schemes (unicode or not).
1) a unique set of chars
2) the same algorithm for all chars.

And again that's why utf-8 is working very smoothly.

The corporates which understood this very well and
wanted to incorporate, let say, the used characters
of the French language had only the choice to
create new coding schemes (eg mac-roman, cp1252).

In unicode, the latin-1 range is real plague.

After years of experience, I'm still fascinated to see
the corporates has solved this issue easily and the free
software is still relying on latin-1.
I never succeed to find an explanation.

Even, the TeX folks, when they shifted to the Cork
encoding in 199?, were aware of this and consequently
provides special package(s).

No offense, this is in my mind why corporate software
will always be corporate software and hobbyist software
will always stay at the level of hobbyist software.

A French windows user, understanding nothing in the
coding of characters, assuming he is aware of its
existence (!), has certainly no problem.


Fascinating how it is possible to use Python to teach,
to illustrate, to explain the coding of the characters. No?


jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of int/long in Python 3

2013-04-01 Thread jmfauth
-


I'm not whining or and I'm not complaining (and never did).
I always exposed facts.

I'm not especially interested in Python, I'm interested in
Unicode.

Usualy when I posted examples, there are confirmed.


What I see is this (std download-abled Python's on Windows 7 (and
other
Windows/platforms/machines):

Py32
 import timeit
 timeit.repeat('a' * 1000 + 'ẞ')
[0.7005365263669056, 0.6810694766790423, 0.6811978680727229]
 timeit.repeat('a' * 1000 + 'z')
[0.7105829560031083, 0.6904999426964764, 0.6938637184431968]

Py33
import timeit
timeit.repeat('a' * 1000 + 'ẞ')
[1.1484035160337613, 1.1233738895227505, 1.1215708962703874]
timeit.repeat('a' * 1000 + 'z')
[0.6640958193635527, 0.6469043692851528, 0.645896142397]

I have systematically such a behaviour, in 99.9% of my tests.
When there is something better, it is usually because something else
(3.2/3.3) has been modified.

I have my idea where this is coming from.

Question: When it is claimed, that this has been tested,
do you mean stringbench.py as proposed many times by Terry?
(Thanks for an answer).

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of int/long in Python 3

2013-04-01 Thread jmfauth
On 1 avr, 21:28, Chris Angelico ros...@gmail.com wrote:
 On Tue, Apr 2, 2013 at 6:15 AM, jmfauth wxjmfa...@gmail.com wrote:
  Py32
  import timeit
  timeit.repeat('a' * 1000 + 'ẞ')
  [0.7005365263669056, 0.6810694766790423, 0.6811978680727229]
  timeit.repeat('a' * 1000 + 'z')
  [0.7105829560031083, 0.6904999426964764, 0.6938637184431968]

  Py33
  import timeit
  timeit.repeat('a' * 1000 + 'ẞ')
  [1.1484035160337613, 1.1233738895227505, 1.1215708962703874]
  timeit.repeat('a' * 1000 + 'z')
  [0.6640958193635527, 0.6469043692851528, 0.645896142397]

 This is what's called a microbenchmark. Can you show me any instance
 in production code where an operation like this is done repeatedly, in
 a time-critical place? It's a contrived example, and it's usually
 possible to find regressions in any system if you fiddle enough with
 the example. Do you have, for instance, a web server that can handle
 1000 tps on 3.2 and only 600 tps on 3.3, all other things being equal?

 ChrisA

-

Of course this is an example, as many I gave. Examples you may find in
apps.

Can you point and give at least a bunch of examples, showing
there is no regression, at least to contradict me. The only
one I succeed to see (in month), is the one given by Steven, a status
quo.

I will happily accept them. The only think I read is this is faster,
it has been tested, ...

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-31 Thread jmfauth
--

Neil Hodgson:

The counter-problem is that a French document that needs to include
one mathematical symbol (or emoji) outside Latin-1 will double in size
as a Python string.

Serious developers/typographers/users know that you can not compose
a text in French with latin-1. This is now also the case with
German (Germany).

---

Neil's comment is correct,

 sys.getsizeof('a' * 1000 + 'z')
1026
 sys.getsizeof('a' * 1000 + '€')
2040

This is not really the problem. Serious users may
notice sooner or later, Python and Unicode are walking in
opposite directions (technically and in spirit).

 timeit.repeat('a' * 1000 + 'ẞ')
[1.1088995672090292, 1.0842266613261913, 1.1010779011941594]
 timeit.repeat('a' * 1000 + 'z')
[0.6362570846925735, 0.6159128762502917, 0.6200501673623791]


(Just an opinion)

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-28 Thread jmfauth
On 28 mar, 07:12, Ethan Furman et...@stoneleaf.us wrote:
 On 03/27/2013 08:49 PM, rusi wrote:

  In particular You are a liar is as bad as You are an idiot
  The same statement can be made non-abusively thus: ... is not true
  because ...

 I don't agree.  With all the posts and micro benchmarks and other drivel that 
 jmf has inflicted on us, I find it /very/
 hard to believe that he forgot -- which means he was deliberately lying.

 At some point we have to stop being gentle / polite / politically correct and 
 call a shovel a shovel... er, spade.

 --
 ~Ethan~

---

The problem is elsewhere. Nobody understand the examples
I gave on this list, because nobody understand Unicode.
These examples are not random examples, they are well
thought.

If you were understanding the coding of the characters,
Unicode and what this flexible representation does, it
would not be a problem for you to create analog examples.

So, we are turning into circles.

This flexible representation succeeds to cumulate in one
shoot all the design mistakes it is possible to do, when
one wishes to implements Unicode.

Example of a good Unicode understanding.
If you wish 1) to preserve memory, 2) to cover the whole range
of Unicode, 3) to keep maximum performance while preserving the
good work Unicode.org as done (normalization, sorting), there
is only one solution: utf-8. For this you have to understand,
what is really a unicode transformation format.

Why all the actors, active in the text field, like MicroSoft,
Apple, Adobe, the unicode compliant TeX engines, the foundries,
the organisation in charge of the OpenType font specifications,
are able to handle all this stuff correctly (understanding +
implementation) and Python not?, I should say this is going
beyond my understanding.

Python has certainly and definitvely not revolutionize
Unicode.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-28 Thread jmfauth
On 28 mar, 11:30, Chris Angelico ros...@gmail.com wrote:
 On Thu, Mar 28, 2013 at 8:03 PM, jmfauth wxjmfa...@gmail.com wrote:

-

 You really REALLY need to sort out in your head the difference between
 correctness and performance. I still haven't seen one single piece of
 evidence from you that Python 3.3 fails on any point of Unicode
 correctness.

That's because you are not understanding unicode. Unicode takes
you from the character to the unicoded transformed fomat via
the code point, working with a unique set of characters with
a contigoous range of code points.
Then it is up to the implementors (languages, compilers, ...)
to implement this utf.

 Covering the whole range of Unicode has never been a
 problem.

... for all those, who are following the scheme explained above.
And it magically works smoothly. Of course, there are some variations
due to the Character Encoding Form wich is later influenced by the
Character Encoding Scheme (the serialization of the character Encoding
Scheme).

Rough explanation in other words.
I does not matter if you are using utf-8, -16, -32, ucs2 or ucs4.
All the single characters are handled in the same way with the same
algorithm.

---

The flexible string representation takes the problem from the
other side, it attempts to work with the characters by using
their representations and it (can only) fails...

PS I never propose to use utf-8. I only spoke about utf-8
as an example. If you start to discuss indexing, you are off-topic.

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-28 Thread jmfauth
On 28 mar, 14:01, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
 On Thu, 28 Mar 2013 23:11:55 +1100, Neil Hodgson wrote:
  Ian Foote:


  One benefit of
  UTF-8 over Python's flexible representation is that it is, on average,
  more compact over a wide set of samples.

 Sure. And over a different set of samples, it is less compact. If you
 write a lot of Latin-1, Python will use one byte per character, while
 UTF-8 will use two bytes per character.


This flexible string representation is so absurd that not only
it does not know you can not write Western European Languages
with latin-1, it penalizes you by just attempting to optimize
latin-1. Shown in my multiple examples.

(This is a similar case of the long and short int question/dicussion
Chris Angelico opened).


PS1: I received plenty of private mails. I'm suprise, how the dev
do not understand unicode.

PS2: Question I received once from a registrated French Python
Developper (in another context). What are those French characters
you can handle with cp1252 and not with latin-1?

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-28 Thread jmfauth
On 28 mar, 15:38, Chris Angelico ros...@gmail.com wrote:
 On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wxjmfa...@gmail.com wrote:
  This flexible string representation is so absurd that not only
  it does not know you can not write Western European Languages
  with latin-1, it penalizes you by just attempting to optimize
  latin-1. Shown in my multiple examples.

 PEP393 strings have two optimizations, or kinda three:

 1a) ASCII-only strings
 1b) Latin1-only strings
 2) BMP-only strings
 3) Everything else

 Options 1a and 1b are almost identical - I'm not sure what the detail
 is, but there's something flagging those strings that fit inside seven
 bits. (Something to do with optimizing encodings later?) Both are
 optimized down to a single byte per character.

 Option 2 is optimized to two bytes per character.

 Option 3 is stored in UTF-32.

 Once again, jmf, you are forgetting that option 2 is a safe and
 bug-free optimization.

 ChrisA

As long as you are attempting to devide a set of characters in
chunks and try to handle them seperately, it will never work.

Read my previous post about the unicode transformation format.
I know what pep393 does.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-28 Thread jmfauth
On 28 mar, 16:14, jmfauth wxjmfa...@gmail.com wrote:
 On 28 mar, 15:38, Chris Angelico ros...@gmail.com wrote:









  On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wxjmfa...@gmail.com wrote:
   This flexible string representation is so absurd that not only
   it does not know you can not write Western European Languages
   with latin-1, it penalizes you by just attempting to optimize
   latin-1. Shown in my multiple examples.

  PEP393 strings have two optimizations, or kinda three:

  1a) ASCII-only strings
  1b) Latin1-only strings
  2) BMP-only strings
  3) Everything else

  Options 1a and 1b are almost identical - I'm not sure what the detail
  is, but there's something flagging those strings that fit inside seven
  bits. (Something to do with optimizing encodings later?) Both are
  optimized down to a single byte per character.

  Option 2 is optimized to two bytes per character.

  Option 3 is stored in UTF-32.

  Once again, jmf, you are forgetting that option 2 is a safe and
  bug-free optimization.

  ChrisA

 As long as you are attempting to devide a set of characters in
 chunks and try to handle them seperately, it will never work.

 Read my previous post about the unicode transformation format.
 I know what pep393 does.

 jmf

Addendum.

This was you correctly percieved in one another thread.
You qualified it as a switch. Now you have to understand
from where this switch is coming from.

jmf

by toy with
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-28 Thread jmfauth
Chris,

Your problem with int/long, the start of this thread, is
very intersting.

This is not a demonstration, a proof, rather an illustration.

Assume you have a set of integers {0...9} and an operator,
let say, the addition.

Idea.
Just devide this set in two chunks, {0...4} and {5...9}
and work hardly to optimize the addition of 2 operands in
the sets {0...4}.

The problems.
- When optimizing {0...4}, your algorithm will most probably
weaken {5...9}.
- When using {5...9}, you do not benefit from your algorithm, you
will be penalized just by the fact you has optimized {0...4}
- And the first mistake, you are just penalized and impacted by the
fact you have to select in which subset you operands are when
working with {0...9}.

Very interestingly, working with the representation (bytes) of
these integers will not help. You have to consider conceptually
{0..9} as numbers.

Now, replace numbers by characters, bytes by encoded code points,
and you have qualitatively the flexible string representation.

In Unicode, there is one more level of abstraction: one conceptually
neither works with characters, nor with encoded code points, but
with unicode transformed formated entities. (see my previous post).

That means you can work very hardly on the bytes levels,
you will never solves the problem which is one level higher
in the unicode hierarchy:
character - code point - utf - bytes (implementation)
with the important fact that this construct can only go
from left to right.

---

In fact, by proposing a flexible representation of ints, you may
just fall in the same trap the flexible string representation
presents.



All this stuff is explained in good books about the coding of the
characters and/or unicode.
The unicode.org documention explains it too. It is a little
bit harder to discover, because the doc is presenting always
this stuff from a technical perspective.
You get it when reading a large part of the Unicode doc.

jmf



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-28 Thread jmfauth
On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote:
 On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote:
  The flexible string representation takes the problem from the
  other side, it attempts to work with the characters by using
  their representations and it (can only) fails...

 This is false.  As I've pointed out to you before, the FSR does not
 divide characters up by representation.  It divides them up by
 codepoint -- more specifically, by the *bit-width* of the codepoint.
 We call the internal format of the string ASCII or Latin-1 or
 UCS-2 for conciseness and a point of reference, but fundamentally
 all of the FSR formats are simply byte arrays of *codepoints* -- you
 know, those things you keep harping on.  The major optimization
 performed by the FSR is to consistently truncate the leading zero
 bytes from each codepoint when it is possible to do so safely.  But
 regardless of to what extent this truncation is applied, the string is
 *always* internally just an array of codepoints, and the same
 algorithms apply for all representations.

-

You know, we can discuss this ad nauseam. What is important
is Unicode.

You have transformed Python back in an ascii oriented product.

If Python had imlemented Unicode correctly, there would
be no difference in using an a, é, € or any character,
what the narrow builds did.

If I am practically the only one, who speakes /discusses about
this, I can ensure you, this has been noticed.

Now, it's time to prepare the Asparagus, the jambon cru
and a good bottle a dry white wine.

jmf




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-28 Thread jmfauth
On 28 mar, 18:55, Chris Angelico ros...@gmail.com wrote:
 On Fri, Mar 29, 2013 at 4:48 AM, jmfauth wxjmfa...@gmail.com wrote:
  If Python had imlemented Unicode correctly, there would
  be no difference in using an a, é, € or any character,
  what the narrow builds did.

 I'm not following your grammar perfectly here, but if Python were
 implementing Unicode correctly, there would be no difference between
 any of those characters, which is the way a *wide* build works. With a
 narrow build, there is a difference between BMP and non-BMP
 characters.

 ChrisA



The wide build (I never used) is in my mind as correct as
the narrow build. It just covers a different range in unicode
(the whole range).

Claiming that the narrow build is buggy, because it does not
cover the whole unicode is not correct.

Unicode does not stipulate, one has to cover the whole range.
Unicode expects that every character in a range behaves the same
way. This is clearly not realized with the flexible string
representation. An user should not be somehow penalized
simply because it not an ascii user.

If you take the fonts in consideration (btw a problem nobody
is speaking about) and you ensure your application, toolkit, ...
is MES-X or WGL4 compliant, your are also deliberately (and
correctly) working with a restriced unicode range.

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-28 Thread jmfauth
On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote:
 On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote:
  On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote:
  On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote:
   The flexible string representation takes the problem from the
   other side, it attempts to work with the characters by using
   their representations and it (can only) fails...

  This is false.  As I've pointed out to you before, the FSR does not
  divide characters up by representation.  It divides them up by
  codepoint -- more specifically, by the *bit-width* of the codepoint.
  We call the internal format of the string ASCII or Latin-1 or
  UCS-2 for conciseness and a point of reference, but fundamentally
  all of the FSR formats are simply byte arrays of *codepoints* -- you
  know, those things you keep harping on.  The major optimization
  performed by the FSR is to consistently truncate the leading zero
  bytes from each codepoint when it is possible to do so safely.  But
  regardless of to what extent this truncation is applied, the string is
  *always* internally just an array of codepoints, and the same
  algorithms apply for all representations.

  -

  You know, we can discuss this ad nauseam. What is important
  is Unicode.

  You have transformed Python back in an ascii oriented product.

  If Python had imlemented Unicode correctly, there would
  be no difference in using an a, é, € or any character,
  what the narrow builds did.

  If I am practically the only one, who speakes /discusses about
  this, I can ensure you, this has been noticed.

  Now, it's time to prepare the Asparagus, the jambon cru
  and a good bottle a dry white wine.

  jmf

 You still have yet to explain how Python's string representation is
 wrong. Just how it isn't optimal for one specific case. Here's how I
 understand it:

 1) Strings are sequences of stuff. Generally, we talk about strings as
 either sequences of bytes or sequences of characters.

 2) Unicode is a format used to represent characters. Therefore,
 Unicode strings are character strings, not byte strings.

 2) Encodings  are functions that map characters to bytes. They
 typically also define an inverse function that converts from bytes
 back to characters.

 3) UTF-8 IS NOT UNICODE. It is an encoding- one of those functions I
 mentioned in the previous point. It happens to be one of the five
 standard encodings that is defined for all characters in the Unicode
 standard (the others being the little and big endian variants of
 UTF-16 and UTF-32).

 4) The internal representation of a character string DOES NOT MATTER.
 All that matters is that the API represents it as a string of
 characters, regardless of the representation. We could implement
 character strings by putting the Unicode code-points in binary-coded
 decimal and it would be a Unicode character string.

 5) The String type that .NET and Java (and unicode type in Python
 narrow builds) use is not a character string. It is a string of
 shorts, each of which corresponds to a UTF-16 code point. I know this
 is the case because in all of these, the length of \u1f435 is 2 even
 though it only consists of one character.

 6) The new string representation in Python 3.3 can successfully
 represent all characters in the Unicode standard. The actual number of
 bytes that each character consumes is invisible to the user.

--


I shew enough examples. As soon as you are using non latin-1 chars
your optimization just became irrelevant and not only this, you
are penalized.

I'm sorry, saying Python now is just covering the whole unicode
range is not a valuable excuse. I prefer a correct version with
a narrower range of chars, especially if this range represents
the daily used chars.

I can go a step further, if I wish to write an application for
Western European users, I'm better served if I'm using a coding
scheme covering all thesee languages/scripts. What about cp1252 [*]?
Does this not remind somthing?

Python can do better, it only succeeds to do worth!

[*] yes, I kwnow, internally 

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]

2013-03-28 Thread jmfauth
On 28 mar, 22:11, jmfauth wxjmfa...@gmail.com wrote:
 On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote:









  On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote:
   On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote:
   On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote:
The flexible string representation takes the problem from the
other side, it attempts to work with the characters by using
their representations and it (can only) fails...

   This is false.  As I've pointed out to you before, the FSR does not
   divide characters up by representation.  It divides them up by
   codepoint -- more specifically, by the *bit-width* of the codepoint.
   We call the internal format of the string ASCII or Latin-1 or
   UCS-2 for conciseness and a point of reference, but fundamentally
   all of the FSR formats are simply byte arrays of *codepoints* -- you
   know, those things you keep harping on.  The major optimization
   performed by the FSR is to consistently truncate the leading zero
   bytes from each codepoint when it is possible to do so safely.  But
   regardless of to what extent this truncation is applied, the string is
   *always* internally just an array of codepoints, and the same
   algorithms apply for all representations.

   -

   You know, we can discuss this ad nauseam. What is important
   is Unicode.

   You have transformed Python back in an ascii oriented product.

   If Python had imlemented Unicode correctly, there would
   be no difference in using an a, é, € or any character,
   what the narrow builds did.

   If I am practically the only one, who speakes /discusses about
   this, I can ensure you, this has been noticed.

   Now, it's time to prepare the Asparagus, the jambon cru
   and a good bottle a dry white wine.

   jmf

  You still have yet to explain how Python's string representation is
  wrong. Just how it isn't optimal for one specific case. Here's how I
  understand it:

  1) Strings are sequences of stuff. Generally, we talk about strings as
  either sequences of bytes or sequences of characters.

  2) Unicode is a format used to represent characters. Therefore,
  Unicode strings are character strings, not byte strings.

  2) Encodings  are functions that map characters to bytes. They
  typically also define an inverse function that converts from bytes
  back to characters.

  3) UTF-8 IS NOT UNICODE. It is an encoding- one of those functions I
  mentioned in the previous point. It happens to be one of the five
  standard encodings that is defined for all characters in the Unicode
  standard (the others being the little and big endian variants of
  UTF-16 and UTF-32).

  4) The internal representation of a character string DOES NOT MATTER.
  All that matters is that the API represents it as a string of
  characters, regardless of the representation. We could implement
  character strings by putting the Unicode code-points in binary-coded
  decimal and it would be a Unicode character string.

  5) The String type that .NET and Java (and unicode type in Python
  narrow builds) use is not a character string. It is a string of
  shorts, each of which corresponds to a UTF-16 code point. I know this
  is the case because in all of these, the length of \u1f435 is 2 even
  though it only consists of one character.

  6) The new string representation in Python 3.3 can successfully
  represent all characters in the Unicode standard. The actual number of
  bytes that each character consumes is invisible to the user.

 --

 I shew enough examples. As soon as you are using non latin-1 chars
 your optimization just became irrelevant and not only this, you
 are penalized.

 I'm sorry, saying Python now is just covering the whole unicode
 range is not a valuable excuse. I prefer a correct version with
 a narrower range of chars, especially if this range represents
 the daily used chars.

 I can go a step further, if I wish to write an application for
 Western European users, I'm better served if I'm using a coding
 scheme covering all thesee languages/scripts. What about cp1252 [*]?
 Does this not remind somthing?

 Python can do better, it only succeeds to do worth!

 [*] yes, I kwnow, internally 

 jmf

-

Addendum.

And you kwow what? Py34 will suffer from the same desease.
You are spending your time in improving chunks of bytes,
when the problem is elsewhere.
In fact you are working for peanuts, eg the replacing method.


If you are not satisfied with my examples, just pick up
the examples of GvR (ascii-string) on the bug tracker, timeit
them and you will see there is already a problem.

Better, timeit them afeter having replaced his ascii-strings
with non ascii characters...

jmf

and you will see, there is
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of int/long in Python 3

2013-03-27 Thread jmfauth
On 26 mar, 22:08, Grant Edwards inva...@invalid.invalid wrote:


 I think we all agree that jmf is a character.

--

The characters are also intrisic characteristics of a
group in the Group Theory.

If you are not a mathematician, but eg a scientist in
need of these characters, they are available in
precalculated tables, one shorly calls ... Tables of
characters !
(My booklet of the tables is titled Tables for Group Theory)


Example in chemistry, mainly quantum chemistry:

Group Theory and its Application to Chemistry
http://chemwiki.ucdavis.edu/Physical_Chemistry/Symmetry/Group_Theory%3A_Application

(Copied link from Firefox).

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of int/long in Python 3

2013-03-26 Thread jmfauth
On 25 mar, 22:51, Chris Angelico ros...@gmail.com wrote:
 The Python 3 merge of int and long has effectively penalized
 small-number arithmetic by removing an optimization. As we've seen
 from PEP 393 strings (jmf aside), there can be huge benefits from
 having a single type with multiple representations internally ...

--

A character is not an integer (short form).

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of int/long in Python 3

2013-03-26 Thread jmfauth
On 26 mar, 20:03, Chris Angelico ros...@gmail.com wrote:
 On Wed, Mar 27, 2013 at 5:50 AM, jmfauth wxjmfa...@gmail.com wrote:
  On 25 mar, 22:51, Chris Angelico ros...@gmail.com wrote:
  The Python 3 merge of int and long has effectively penalized
  small-number arithmetic by removing an optimization. As we've seen
  from PEP 393 strings (jmf aside), there can be huge benefits from
  having a single type with multiple representations internally ...

  --

  A character is not an integer (short form).

 So?

 ChrisA

A character is not an integer.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: monty python

2013-03-24 Thread jmfauth
On 23 mar, 17:17, Mark Lawrence breamore...@yahoo.co.uk wrote:
 On 23/03/2013 09:24, jmfauth wrote:









  On 20 mar, 22:02, Tim Delaney tim.dela...@aptare.com wrote:
  On 21 March 2013 06:40, jmfauth wxjmfa...@gmail.com wrote:

  
  [snip usual rant from jmf]

  It has been acknowledged as a real regression, but he keeps hijacking every
  thread where strings are mentioned to harp on about it. He has shown no
  inclination to attempt to *fix* the regression and is rapidly coming to be
  regarded as a troll by most participants in this list.

  -

  I can not help to fix it, because it is unfixable. It
  is unfixable, because this flexible string representation
  is wrong by design.

  jmf

 Of course it's fixable.  All you need do is write a PEP clearing stating
 what is wrong with the implementation detailed in PEP393 and your own
 proposed design.  I'm looking forward to reading this PEP.

 Note that going backwards to buggier unicode implementations that
 existed in Python prior to version 3.3 is simply not an option.

 --
 Cheers.

 Mark Lawrence

--

The problem here is that this PEP 393 should not have been
created.
The first time I read it, I quickly understood, it can
not work!

This is illustrated by all the examples I give on this list.
In all the cases, I can explain why.

I never saw somebody beeing able to argue these examples are
wrong and/or explaining why they are wrong, except arguing
the flexible string representation exists!

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: monty python

2013-03-23 Thread jmfauth
On 20 mar, 22:02, Tim Delaney tim.dela...@aptare.com wrote:
 On 21 March 2013 06:40, jmfauth wxjmfa...@gmail.com wrote:

  
  [snip usual rant from jmf]




 It has been acknowledged as a real regression, but he keeps hijacking every
 thread where strings are mentioned to harp on about it. He has shown no
 inclination to attempt to *fix* the regression and is rapidly coming to be
 regarded as a troll by most participants in this list.


-

I can not help to fix it, because it is unfixable. It
is unfixable, because this flexible string representation
is wrong by design.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: monty python

2013-03-23 Thread jmfauth
On 21 mar, 04:12, rusi rustompm...@gmail.com wrote:
 On Mar 21, 12:40 am, jmfauth wxjmfa...@gmail.com wrote:

  

  Courageous people can try to do something with the unicode
  collation algorithm (see unicode.org). Some time ago, for the fun,
  I wrote something (not perfect) with a reduced keys table (see
  unicode.org), only a keys subset for some scripts hold in memory.

  It works with Py32 and Py33. In an attempt to just see the
  performance and how it can react, I did an horrible mistake,
  I forgot Py33 is now optimized for ascii user, it is no more
  unicode compliant and I stupidely tested/sorted lists of French
  words...

 Now lets take this piece by piece…
 I did an horrible mistake : I am sorry. Did you get bruised? Break
 some bones? And is 'h' a vowel in french?
 I forgot Py33 is now optimized for ascii user  Ok.
 it is no more unicode compliant I asked earlier and I ask again --
 What do you mean by (non)compliant?

--

One aspect of Unicode (note the capitalized U).

py32
 timeit.repeat('abc需'.find('a'))
[0.27941279564856814, 0.26568106110789813, 0.265546366757917]
 timeit.repeat('abcdef'.find('a'))
[0.2891812867801491, 0.26698153112010914, 0.26738994644529157]

py33
timeit.repeat('abc需'.find('a'))
[0.5941777382531654, 0.5829193385634426, 0.5519412133990045]
timeit.repeat('abcdef'.find('a'))
[0.44333188136533863, 0.4232506078969891, 0.4225164843046514]


---

In French, depending of the word, a leading h, behaves
as a vowel or as a consonant.
(From this - this typical mistake)

jmf



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help. HOW TO guide for PyQt installation

2013-03-21 Thread jmfauth
On 20 mar, 11:38, Phil Thompson p...@riverbankcomputing.com wrote:
 On Wed, 20 Mar 2013 03:29:35 -0700 (PDT), jmfauth wxjmfa...@gmail.com
 wrote:









  On 20 mar, 10:30, Phil Thompson p...@riverbankcomputing.com wrote:
  On Wed, 20 Mar 2013 02:09:06 -0700 (PDT), jmfauth wxjmfa...@gmail.com
  wrote:

   On 20 mar, 01:12, D. Xenakis gouzouna...@hotmail.com wrote:
   Hi there,
   Im searching for an installation guide for PyQt toolkit.
   To be honest im very confused about what steps should i follow for a
   complete and clean installation. Should i better choose to install
 the
   32bit or the 64bit windows version? Or maybe both? Any chance one of
  them
   is more/less bug-crashy than the other? I know both are availiable
 on
  the
   website but just asking.. If i installed this package on windows 8,
   should i have any problems? From what i read PyQt supports only xp
 and
   win7.
   I was thinking about installing the newer version of PyQt along with
  the
   QT5. I have zero expirience on PyQt so either way, everything is
 going
  to
   be new to me, so i dont care that much about the learning curve
  diference
   between new and old PyQt - Qt version. I did not find any installer
 so
  i
   guess i should customly do everything. Any guide for this plz?

   Id also like to ask.. Commercial licence of PyQt can only be bought
 on
   riverbank's website? I think i noticed somewhere an other reseller
   cheaper one or maybe i didnt know what the hell i was reading :).
  Maybe
   something about Qt and not PyQt.

   Please help this noob,
   Regards

   

   Short answer without explanation. It does not work.

   jmf

  Well it works for me. Care to elaborate?

  Phil

  No problem.

  Yesterday, I downloaded PyQt4-4.10-gpl-Py3.3-Qt5.0.1-x32-2.exe
  and installed it on my Windows 7 Pro box after having removed
  a previous version.

  No problem with the installation.

  I quickly tested it with one of my interactive Python interpreters
  and got an error from PyQt4 import QtGui, QtCore saying, that the
  DLL cannot be found.

  Something similar to what Detlev Offenbach reported on
  the PyQt mailing list. Although, I'm not using Qsci.

  Strangely, I had not problem (if I recall correctly) with a
  very basic application (QMainWindow + QLineEdit).

  I had no problem with the demo (I only lauched it).

  I did not spend to much time in investigating further.

  It's the first time I see such an error; usually, no problem.

 The only time that I've seen a problem like that is when running from a
 shell that was started before running the PyQt installer (ie. one with an
 out of date PATH).

 Phil

--

The PATH could be the cause. I stupidly forgot to check it
before removing PyQt...

I repeated the experiment (app == eta26.py). With and
without PyQt in the system PATH. (Btw, why is it
necessary?)

D:\jm\jmpy\eta\eta26c:\python32\python eta26.py
PyQt: 4.8.6, Qt: 4.7.4 Python 3.2.3

No problem.



D:\jm\jmpy\eta\eta26c:\python33\python eta26.py
Traceback (most recent call last):
  File eta26.py, line 32, in module
from PyQt4 import QtGui, QtCore
ImportError: DLL load failed: Le module spécifié est introuvable.
(Translation: The specified module can no be found.)



D:\jm\jmpy\eta\eta26c:\python33\python eta26.py
PyQt: 4.10, Qt: 4.8.4 Python 3.3.0

No problem.



No idea. It is mysterious for me. eta26 is only
importing QtGui and QtCore. It however uses a sophisticated
widget like QPlainTextEdit.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help. HOW TO guide for PyQt installation

2013-03-21 Thread jmfauth
On 20 mar, 11:29, jmfauth wxjmfa...@gmail.com wrote:
 On 20 mar, 10:30, Phil Thompson p...@riverbankcomputing.com wrote:

-


 Strangely, I had not problem (if I recall correctly) with a
 very basic application (QMainWindow + QLineEdit).


ADDENDUM, CORRECTION

It fails too. I forgot to rename PySide -- PyQt4 !

I tried to collect other experiences via Google. No luck.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help. HOW TO guide for PyQt installation

2013-03-20 Thread jmfauth
On 20 mar, 01:12, D. Xenakis gouzouna...@hotmail.com wrote:
 Hi there,
 Im searching for an installation guide for PyQt toolkit.
 To be honest im very confused about what steps should i follow for a complete 
 and clean installation. Should i better choose to install the 32bit or the 
 64bit windows version? Or maybe both? Any chance one of them is more/less 
 bug-crashy than the other? I know both are availiable on the website but just 
 asking.. If i installed this package on windows 8, should i have any 
 problems? From what i read PyQt supports only xp and win7.
 I was thinking about installing the newer version of PyQt along with the QT5. 
 I have zero expirience on PyQt so either way, everything is going to be new 
 to me, so i dont care that much about the learning curve diference between 
 new and old PyQt - Qt version. I did not find any installer so i guess i 
 should customly do everything. Any guide for this plz?

 Id also like to ask.. Commercial licence of PyQt can only be bought on 
 riverbank's website? I think i noticed somewhere an other reseller cheaper 
 one or maybe i didnt know what the hell i was reading :). Maybe something 
 about Qt and not PyQt.

 Please help this noob,
 Regards



Short answer without explanation. It does not work.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help. HOW TO guide for PyQt installation

2013-03-20 Thread jmfauth
On 20 mar, 10:30, Phil Thompson p...@riverbankcomputing.com wrote:
 On Wed, 20 Mar 2013 02:09:06 -0700 (PDT), jmfauth wxjmfa...@gmail.com
 wrote:









  On 20 mar, 01:12, D. Xenakis gouzouna...@hotmail.com wrote:
  Hi there,
  Im searching for an installation guide for PyQt toolkit.
  To be honest im very confused about what steps should i follow for a
  complete and clean installation. Should i better choose to install the
  32bit or the 64bit windows version? Or maybe both? Any chance one of
 them
  is more/less bug-crashy than the other? I know both are availiable on
 the
  website but just asking.. If i installed this package on windows 8,
  should i have any problems? From what i read PyQt supports only xp and
  win7.
  I was thinking about installing the newer version of PyQt along with
 the
  QT5. I have zero expirience on PyQt so either way, everything is going
 to
  be new to me, so i dont care that much about the learning curve
 diference
  between new and old PyQt - Qt version. I did not find any installer so
 i
  guess i should customly do everything. Any guide for this plz?

  Id also like to ask.. Commercial licence of PyQt can only be bought on
  riverbank's website? I think i noticed somewhere an other reseller
  cheaper one or maybe i didnt know what the hell i was reading :).
 Maybe
  something about Qt and not PyQt.

  Please help this noob,
  Regards

  

  Short answer without explanation. It does not work.

  jmf

 Well it works for me. Care to elaborate?

 Phil

No problem.

Yesterday, I downloaded PyQt4-4.10-gpl-Py3.3-Qt5.0.1-x32-2.exe
and installed it on my Windows 7 Pro box after having removed
a previous version.

No problem with the installation.

I quickly tested it with one of my interactive Python interpreters
and got an error from PyQt4 import QtGui, QtCore saying, that the
DLL cannot be found.

Something similar to what Detlev Offenbach reported on
the PyQt mailing list. Although, I'm not using Qsci.

Strangely, I had not problem (if I recall correctly) with a
very basic application (QMainWindow + QLineEdit).

I had no problem with the demo (I only lauched it).

I did not spend to much time in investigating further.

It's the first time I see such an error; usually, no problem.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: monty python

2013-03-20 Thread jmfauth


Courageous people can try to do something with the unicode
collation algorithm (see unicode.org). Some time ago, for the fun,
I wrote something (not perfect) with a reduced keys table (see
unicode.org), only a keys subset for some scripts hold in memory.

It works with Py32 and Py33. In an attempt to just see the
performance and how it can react, I did an horrible mistake,
I forgot Py33 is now optimized for ascii user, it is no more
unicode compliant and I stupidely tested/sorted lists of French
words...

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String performance regression from python 3.2 to 3.3

2013-03-16 Thread jmfauth
--

utf-32 is already here. You are all most probably [*]
using it without noticing it. How? By using OpenType fonts,
without counting the text processing applications using them.
Why? Because there is no other way to do it.

[*] depending of the font, the internal table(s), eg cmap table,
are in utf-16 or utf-32.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


A reply for rusi (FSR)

2013-03-13 Thread jmfauth
As a reply to rusi's comment:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/a7689b158fdca29e#

From string creation to the itertools usage. A medley. Some timings.

Important:
The real/absolute values of these experiments are not important. I do
not care and I'm not complaining at all.

These values are expected, I expected such values and they are only
confirming (*FOR ME*) my understanding of the coding of the characters
(and Unicode).

#~ py323  py330

#~ test   1: 0.0153577374128190.019290216142579
#~ test   2: 0.0156988016671980.020386269052436
#~ test   3: 0.0156133386842880.018769561472500
#~ test   4: 0.0232352977085290.032253414679390
#~ test   5: 0.0233270621095340.029621391108935
#~ test   6: 1.1199581270767601.095467665651482
#~ test   7: 0.4201584727883110.565518010043673
#~ test   8: 0.6494442346159741.061556978013171
#~ test   9: 0.7123351440720791.211614222458175
#~ test  10: 0.7046229960013571.160909074081441
#~ test  11: 0.6146745849236211.053985430333688
#~ test  12: 0.6603362357927641.059443246081010
#~ test  13: 4.8214359277715705.795325214218677
#~ test  14: 0.4940126682134030.729330462512273
#~ test  15: 0.5048944295857880.879966255906103
#~ test  16: 0.6930933700811031.132884304782264
#~ test  17: 0.7490767437894613.013804437852462
#~ test  18: 7.467055989281286   13.387841650089342
#~ test  19: 7.581776062566778   13.593412812594643
#~ test  20: 9.477877493343140   15.235388291413805
#~ test  21: 0.0226146080261960.020984116094176
#~ test  22: 6.685022041178975   12.687538276191944
#~ test  23: 6.946794763994170   12.986701250949636
#~ test  24: 0.0977968273147600.156285014715777
#~ test  25: 0.0249158071466770.034190706904894
#~ test  26: 0.0249965440660130.032191582014335
#~ test  27: 0.0006939436676840.001315421027272
#~ test  28: 0.0006797654769670.001305968900141
#~ test  29: 0.0016143445481520.025543979763000
#~ test  30: 0.0002040084108120.000286714523313
#~ test  31: 0.0002134605379640.000301286552656
#~ test  32: 0.0002040084108190.000291440586878
#~ test  33: 0.2496929043275390.497374474766957
#~ test  34: 0.2487504484837400.513947598194790
#~ test  35: 0.0998101303960320.249129715085319

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expression problem

2013-03-11 Thread jmfauth
On 11 mar, 03:06, Terry Reedy tjre...@udel.edu wrote:


 ...
 By teaching 'speed before correctness, this site promotes bad
 programming habits and thinking (and the use of low-level but faster
 languages).
 ...


This is exactly what your flexible string representation
does!

And away from technical aspects, you even succeeded to
somehow lose unicode compliance.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling number of zeros of exponent in scientific notation

2013-03-06 Thread jmfauth
On 6 mar, 15:03, Roy Smith r...@panix.com wrote:
 In article c2184b42-41be-4930-9501-361296df7...@googlegroups.com,

  fa...@squashclub.org wrote:
  Instead of:

  1.8e-04

  I need:

  1.8e-004

  So two zeros before the 4, instead of the default 1.

 Just out of curiosity, what's the use case here?

--

 from vecmat6 import *
 from svdecomp6 import *
 from vmio6 import *
 mm = NewMat(3, 2)
 mm[0][0] = 1.0; mm[0][1] = 2.0e-178
 mm[1][0] = 3.0; mm[1][1] = 4.0e-1428
 mm[2][0] = 5.0; mm[2][1] = 6.0
 pr(mm, 'mm =')
mm =
(   1.0e+000  2.0e-178 )
(   3.0e+000  0.0e+000 )
(   5.0e+000  6.0e+000 )
 aa, vv, bbt = SVDecompFull(mm)
 pr(aa, 'aa =')
aa =
(   3.04128e-001 -8.66366e-002 )
(   9.12385e-001 -2.59910e-001 )
(  -2.73969e-001 -9.61739e-001 )
 pr(bbt, 'bbt =')
bbt =
(   7.12974e-001 -7.01190e-001 )
(  -7.01190e-001 -7.12974e-001 )
 rr = MatMulMatMulMat(aa, vv, bbt)
 pr(rr, 'rr =')
rr =
(   1.0e+000 -1.38778e-015 )
(   3.0e+000 -4.44089e-016 )
(   5.0e+000  6.0e+000 )


jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Nuitka now supports Python 3.2

2013-02-27 Thread jmfauth


Fascinating software.
Some are building, some are destroying.

Py33
 timeit.repeat({1:'abc需'})
[0.2573893570572636, 0.24261832285651508, 0.24259548003601594]

Py323
timeit.repeat({1:'abc需'})
[0.11000708521282831, 0.0994753634273593, 0.09901023634051853]

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Nuitka now supports Python 3.2

2013-02-27 Thread jmfauth
On 27 fév, 09:21, jmfauth wxjmfa...@gmail.com wrote:
 

 Fascinating software.
 Some are building, some are destroying.

 Py33 timeit.repeat({1:'abc需'})

 [0.2573893570572636, 0.24261832285651508, 0.24259548003601594]

 Py323
 timeit.repeat({1:'abc需'})
 [0.11000708521282831, 0.0994753634273593, 0.09901023634051853]

 jmf



Oops. My bad. (This google).

You should read abc需

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Speed

2013-02-27 Thread jmfauth
On 27 fév, 23:24, Terry Reedy tjre...@udel.edu wrote:
 On 2/27/2013 3:21 AM, jmfauth hijacked yet another thread:
   Some are building, some are destroying.

 We are still waiting for you to help build a better 3.3+, instead of
 trying to 'destroy' it with mostly irrelevant cherry-picked benchmarks.

   Py33
   timeit.repeat({1:'abc需'})
   [0.2573893570572636, 0.24261832285651508, 0.24259548003601594]

 On my win system, I get a lower time for this:
 [0.16579443757208878, 0.1475787649924598, 0.14970205670637426]

   Py323
   timeit.repeat({1:'abc需'})
   [0.11000708521282831, 0.0994753634273593, 0.09901023634051853]

 While I get the same time for 3.2.3.
 [0.11759353304428544, 0.0948244802968, 0.09532802044164157]

 It seems that something about Jim's machine does not like 3.3.
 *nix will probably see even less of a difference. Times are in
 microseconds, so few programs will ever notice the difference.

 In the meanwhile ... Effort was put into reducing startup time for 3.3
 by making sure that every module imported during startup actual needed
 to be imported, and into speeding up imports.

 The startup process is getting a deeper inspection for 
 3.4http://python.org/dev/peps/pep-0432/
 'Simplifying the CPython startup sequence'
 with some expectation for further speedup.

 Also, a real-world benchmark project has been 
 established.http://speed.python.org/
 Some work has already been done to port benchmarks to 3.x, but I suspect
 there is more to do and more volunteers needed.

 --
 Terry Jan Reedy

-

Terry,

As long as you are attempting to work with a composite scheme
not working with a unique set of characters, not only it will
not work (properly/with efficiency), it can not work.

This not even a unicode problem. This is true for every coding
scheme. That's why we have, today, all these coding schemes, coding
scheme: == set of characters; != set of encoded characters.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Correct handling of case in unicode and regexps

2013-02-24 Thread jmfauth
On 23 fév, 15:26, Devin Jeanpierre jeanpierr...@gmail.com wrote:
 Hi folks,

 I'm pretty unsure of myself when it comes to unicode. As I understand
 it, you're generally supposed to compare things in a case insensitive
 manner by case folding, right? So instead of a.lower() == b.lower()
 (the ASCII way), you do a.casefold() == b.casefold()

 However, I'm struggling to figure out how regular expressions should
 treat case. Python's re module doesn't work properly to my
 understanding, because:

      a = 'ss'
      b = 'ß'
      a.casefold() == b.casefold()
     True
      re.match(re.escape(a), b, re.UNICODE | re.IGNORECASE)
      # oh dear!

 In addition, it seems improbable that this ever _could_ work. Because
 if it did work like that, then what would the value be of
 re.match('s', 'ß', re.UNICODE | re.IGNORECASE).end() ? 0.5?

 I'd really like to hear the thoughts of people more experienced with
 unicode. What is the ideal correct behavior here? Or do I
 misunderstand things?

-

I'm just wondering if there is a real issue here. After all,
this is only a question of conventions. Unicode has some
conventions, re modules may (has to) use some conventions too.

It seems to me, the safest way is to preprocess the text,
which has to be examinated.

Proposed case study:
How should be ss/ß/SS/ẞ interpreted?

'Richard-Strauss-Straße'
'Richard-Strauss-Strasse'
'RICHARD-STRAUSS-STRASSE'
'RICHARD-STRAUSS-STRAẞE'


There is more or less the same situation with sorting.
Unicode can not do all and it may be mandatory to
preprocess the input.

Eg. This fct I wrote once for the fun. It sorts French
words (without unicodedata and locale).

 import libfrancais
 z = ['oeuf', 'œuf', 'od', 'of']
 zo = libfrancais.sortedfr(z)
 zo
['od', 'oeuf', 'œuf', 'of']

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Newbie

2013-02-23 Thread jmfauth
On 23 fév, 16:43, Steve Simmons square.st...@gmail.com wrote:
 On 22/02/2013 22:37, piterrr.dolin...@gmail.com wrote: So far I am getting 
 the impression
...

 My main message to you would be :  don't approach Python with a negative
 attitude, give it a chance and I'm sure you'll come to enjoy it.




Until you realize this:

Py32:

 timeit.timeit('abc需')
0.032749386495456466
 sys.getsizeof('abc需')
42

Py33:

 timeit.timeit('abc需')
0.04104208536801017
 sys.getsizeof('abc需')
50

Very easy to explain: wrong, incorrect, naive unicode
handling.

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Newbie

2013-02-23 Thread jmfauth
On 23 fév, 20:08, Ethan Furman et...@stoneleaf.us wrote:
 On 02/23/2013 10:44 AM, jmfauth wrote:

 [snip various stupidities]

  jmf

 Peter, jmfauth is one of our resident trolls.  Feel free to ignore him.

 --
 ~Ethan~

Sorry, what can say?
More memory and slow down!
If you see a progress, I'm seeing a regression.

Did you test Devanagari canonical decomposition? Probably
not. I did it.
I wrote probably more tests than any core developper
and tests doing precisely what this flexible representation
does (not like the tests I saw).

That's the good point of all this story.
It is not every day that, one has two implementations
of the same product, if one wishes to explain, to teach,
to illustrate unicode or the coding of the characters in
general.
Unicode is not different from the other coding schemes and
it behaves exactly in the same way. The solely and basic
difference lies in the set of the *characters* which is broader.
Unicode, the Consortium, uses the term, Abstract Character
Repertoire.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string.replace doesn't removes :

2013-02-14 Thread jmfauth
On 13 fév, 21:24, 8 Dihedral dihedral88...@googlemail.com wrote:
 Rick Johnson於 2013年2月14日星期四UTC+8上午12時34分11秒寫道:







  On Wednesday, February 13, 2013 1:10:14 AM UTC-6, jmfauth wrote:

d = {ord('a'): 'A', ord('b'): '2', ord('c'): 'C'}

'abcdefgabc'.translate(d)

   'A2CdefgA2C'

def jmTranslate(s, table):

   ...     table = {ord(k):table[k] for k in table}

   ...     return s.translate(table)

   ...

d = {'a': 'A', 'b': '2', 'c': 'C'}

jmTranslate('abcdefgabc', d)

   'A2CdefgA2C'

d = {'a': None, 'b': None, 'c': None}

jmTranslate('abcdefgabc', d)

   'defg'

d = {'a': '€', 'b': '', 'c': ''}

jmTranslate('abcdefgabc', d)

   '€defg€'

 In python the variables of value types, and the variables of lists and
 dictionaries are passed to functions somewhat different.

 This should be noticed by any serious programmer in python.

-

The purpose of my quick and dirty fct was to
show it's possible to create a text replacement
fct which is using exclusively text / strings
via a dict. (Even if in my exemple, I'm using
- and can use - None as an empty string !)


You are right.

It is also arguable, that beeing forced to have
to use a number in order to replace a character,
may not be a so good idea.

This should be noticed by any serious language designer.

More seriously.
.translate() is a very nice and underestimated method.

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string.replace doesn't removes :

2013-02-12 Thread jmfauth
On 13 fév, 06:26, Rick Johnson rantingrickjohn...@gmail.com wrote:
 On Tuesday, February 12, 2013 10:44:09 PM UTC-6, Rick Johnson wrote:
  
   REFERENCES:
  
  [1]: Should string.replace handle list, tuple and dict
  arguments in addition to strings?

  py string.replace(('a', 'b', 'c'), 'abcdefgabc')
  'defg'
  [...]

 And here is a fine example of how a global function architecture can 
 seriously warp your mind! Let me try that again!

 Hypothetical Examples:

 py 'abcdefgabc'.replace(('a', 'b', 'c'), )
 'defg'
 py 'abcdefgabc'.replace(['a', 'b', 'c'], )
 'defg'
 py 'abcdefgabc'.replace({'a':'A', 'b':'2', 'c':'C'})
 'A2CdefgA2C'

 Or, an alternative to passing dict where both old and new arguments accept 
 the sequence:

 py d = {'a':'A', 'b':'2', 'c':'C'}
 py 'abcdefgabc'.replace(d.keys(), d.values())
 'A2CdefgA2C'

 Nice thing about dict is you can control both sub-string and 
 replacement-string on a case-by-case basis. But there is going to be a need 
 to apply a single replacement string to a sequence of substrings; like the 
 null string example provided by the OP.

 (hopefully there's no mistakes this time)



 d = {ord('a'): 'A', ord('b'): '2', ord('c'): 'C'}
 'abcdefgabc'.translate(d)
'A2CdefgA2C'


 def jmTranslate(s, table):
... table = {ord(k):table[k] for k in table}
... return s.translate(table)
...
 d = {'a': 'A', 'b': '2', 'c': 'C'}
 jmTranslate('abcdefgabc', d)
'A2CdefgA2C'
 d = {'a': None, 'b': None, 'c': None}
 jmTranslate('abcdefgabc', d)
'defg'
 d = {'a': '€', 'b': '', 'c': ''}
 jmTranslate('abcdefgabc', d)
'€defg€'



jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread jmfauth
On 7 fév, 04:04, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
 On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:
  Well, an alternative /could/ be:

 ...
 py s = 'http://alongnameofasite1234567.com/q?sports=runa=1b=1'
 py assert u2f(s) == mangle(s)
 py
 py from timeit import Timer
 py setup = 'from __main__ import s, u2f, mangle'
 py t1 = Timer('mangle(s)', setup)
 py t2 = Timer('u2f(s)', setup)
 py
 py min(t1.repeat(repeat=7))
 7.2962000370025635
 py min(t2.repeat(repeat=7))
 10.981598854064941
 py
 py (10.98-7.29)/10.98
 0.33606557377049184

 (Timings done using Python 2.6 on my laptop -- your speeds may vary.)





[OT] Sorry, but I find all these timeit I see here and there
more and more ridiculous.

Maybe it's the language itself, which became ridiculous.


code:

r = repeat(('WHERE IN THE WORLD IS CARMEN?'*10).lower())
print('1:', r)

r = repeat(('WHERE IN THE WORLD IS HÉLÈNE?'*10).lower())
print('2:', r)

t = Timer(re.sub('CARMEN', 'CARMEN', 'WHERE IN THE WORLD IS
CARMEN?'*10), import re)
r = t.repeat()
print('3:', r)

t = Timer(re.sub('HÉLÈNE', 'HÉLÈNE', 'WHERE IN THE WORLD IS
HÉLÈNE?'*10), import re)
r = t.repeat()
print('4:', r)

result:

c:\python32\pythonw -u vitesse3.py
1: [2.578785478740226, 2.5738459157233833, 2.5739002658825543]
2: [2.57605654937141, 2.5784755252962572, 2.5775366066044896]
3: [11.856728254324088, 11.856321809655501, 11.857456073846905]
4: [12.111787643688231, 12.102743462128771, 12.098514783440208]
Exit code: 0
c:\Python33\pythonw -u vitesse3.py
1: [0.6063335264470632, 0.6104798922133946, 0.6078580877959869]
2: [4.080205081267272, 4.079303183698418, 4.0786836706522145]
3: [18.093742209318215, 18.07999618095, 18.07107661757692]
4: [18.852576768615222, 18.841418050790622, 18.840745369110437]
Exit code: 0

The future is bright for ... ascii users.

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py3.3 unicode literal and input()

2012-06-25 Thread jmfauth
Mea culpa. I had not my head on my shoulders.
Inputing if working fine, it returns text correctly.

However, and this is something different, I'm a little
bit surprised, input() does not handle escaped characters
(\u, \U).
Workaround: encode() and decode() as raw-unicode-escape.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py3.3 unicode literal and input()

2012-06-20 Thread jmfauth
On Jun 20, 1:21 am, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
 On Mon, 18 Jun 2012 07:00:01 -0700, jmfauth wrote:
  On 18 juin, 12:11, Steven D'Aprano steve
  +comp.lang.pyt...@pearwood.info wrote:
  On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote:
   On 18 juin, 10:28, Benjamin Kaplan benjamin.kap...@case.edu wrote:
   The u prefix is only there to
   make it easier to port a codebase from Python 2 to Python 3. It
   doesn't actually do anything.

   It does. I shew it!

  Incorrect. You are assuming that Python 3 input eval's the input like
  Python 2 does. That is wrong. All you show is that the one-character
  string a is not equal to the four-character string u'a', which is
  hardly a surprise. You wouldn't expect the string 3 to equal the
  string int('3') would you?

  --
  Steven

  A string is a string, a piece of text, period.

  I do not see why a unicode literal and an (well, I do not know how the
  call it) a normal class str should behave differently in code source
  or as an answer to an input().

 They do not. As you showed earlier, in Python 3.3 the literal strings
 u'a' and 'a' have the same meaning: both create a one-character string
 containing the Unicode letter LOWERCASE-A.

 Note carefully that the quotation marks are not part of the string. They
 are delimiters. Python 3.3 allows you to create a string by using
 delimiters:

 ' '
  
 u' '
 u 

 plus triple-quoted versions of the same. The delimiter is not part of the
 string. They are only there to mark the start and end of the string in
 source code so that Python can tell the difference between the string a
 and the variable named a.

 Note carefully that quotation marks can exist inside strings:

 my_string = This string has 'quotation marks'.

 The  at the start and end of the string literal are delimiters, not part
 of the string, but the internal ' characters *are* part of the string.

 When you read data from a file, or from the keyboard using input(),
 Python takes the data and returns a string. You don't need to enter
 delimiters, because there is no confusion between a string (all data you
 read) and other programming tokens.

 For example:

 py s = input(Enter a string: )
 Enter a string: 42
 py print(s, type(s))
 42 class 'str'

 Because what I type is automatically a string, I don't need to enclose it
 in quotation marks to distinguish it from the integer 42.

 py s = input(Enter a string: )
 Enter a string: This string has 'quotation marks'.
 py print(s, type(s))
 This string has 'quotation marks'. class 'str'

 What you type is exactly what you get, no more, no less.

 If you type 42, you get the two character string 42 and not the int 42.

 If you type [1, 2, 3], then you get the nine character string [1, 2, 3]
 and not a list containing integers 1, 2 and 3.

 If you type 3**0.5 then you get the six character string 3**0.5 and not
 the float 1.7320508075688772.

 If you type u'a' then you get the four character string u'a' and not
 the single character 'a'.

 There is nothing new going on here. The behaviour of input() in Python 3,
 and raw_input() in Python 2, has not changed.

  Should a user write two derived functions?

  input_for_entering_text()
  and
  input_if_you_are_entering_a_text_as_litteral()

 If you, the programmer, want to force the user to write input in Python
 syntax, then yes, you have to write a function to do so. input() is very
 simple: it just reads strings exactly as typed. It is up to you to
 process those strings however you wish.

 --
 Steven


Python 3.3.0a4 (v3.3.0a4:7c51388a3aa7+, May 31 2012, 20:15:21) [MSC v.
1600
32 bit (Intel)] on win32
 ---
running smidzero.py...
...smidzero has been executed
 ---
input(':')
:éléphant
'éléphant'
 ---
input(':')
:u'éléphant'
'éléphant'
 ---
input(':')
:u'\u00e9l\xe9phant'
'éléphant'
 ---
input(':')
:u'\U00e9léphant'
'éléphant'
 ---
input(':')
:\U00e9léphant
'éléphant'
 ---
 ---
# this is expected
 ---
input(':')
:b'éléphant'
b'éléphant'
 ---
len(input(':'))
:b'éléphant'
11

---

Good news on the ru''/ur'' front:
http://bugs.python.org/issue15096

---

Finally I'm just wondering if this unicode_literal
reintroduction is not a bad idea.

b'these_are_bytes'
u'this_is_a_unicode_string'

I wrote all my Py2 code in a unicode mode since ... Py2.3 (?).

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py3.3 unicode literal and input()

2012-06-20 Thread jmfauth
On Jun 20, 11:22 am, Christian Heimes li...@cheimes.de wrote:
 Am 18.06.2012 20:45, schrieb Terry Reedy:

  The simultaneous reintroduction of 'ur', but with a different meaning
  than in 2.7, *was* a problem and it should be removed in the next release.

 FYI:http://hg.python.org/cpython/rev/8e47e9af826e

 Christian

I saw this, not the latest version.
Anyway, thanks for the info.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python equivalent to the A or a output conversions in C

2012-06-19 Thread jmfauth
On Jun 19, 9:54 pm, Edward C. Jones edcjo...@comcast.net wrote:
 On 06/19/2012 12:41 PM, Hemanth H.M wrote:

   float.hex(x)
  '0x1.5p+3'

 Some days I don't ask the brightest questions.  Suppose x was a numpy
 floating scalar (types numpy.float16, numpy.float32, numpy.float64, or
 numpy.float128).  Is there an easy way to write x in
 binary or hex?

I'm not aware about a buitin fct. May be the module
struct — Interpret bytes as packed binary data can help.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Py3.3 unicode literal and input()

2012-06-18 Thread jmfauth
What is input() supposed to return?

 u'a' == 'a'
True

 r1 = input(':')
:a
 r2 = input(':')
:u'a'
 r1 == r2
False
 type(r1), len(r1)
(class 'str', 1)
 type(r2), len(r2)
(class 'str', 4)


---

sys.argv?

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py3.3 unicode literal and input()

2012-06-18 Thread jmfauth
On 18 juin, 10:28, Benjamin Kaplan benjamin.kap...@case.edu wrote:
 On Mon, Jun 18, 2012 at 1:19 AM, jmfauth wxjmfa...@gmail.com wrote:
  What is input() supposed to return?

  u'a' == 'a'
  True

  r1 = input(':')
  :a
  r2 = input(':')
  :u'a'
  r1 == r2
  False
  type(r1), len(r1)
  (class 'str', 1)
  type(r2), len(r2)
  (class 'str', 4)

  ---

  sys.argv?

  jmf

 Python 3 made several backwards-incompatible changes over Python 2.
 First of all, input() in Python 3 is equivalent to raw_input() in
 Python 2. It always returns a string. If you want the equivalent of
 Python 2's input(), eval the result. Second, Python 3 is now unicode
 by default. The str class is a unicode string. There is a separate
 bytes class, denoted by b, for byte strings. The u prefix is only
 there to make it easier to port a codebase from Python 2 to Python 3.
 It doesn't actually do anything.


It does. I shew it!

Related:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/3aefd602507d2fbe#

http://mail.python.org/pipermail/python-dev/2012-June/120341.html

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py3.3 unicode literal and input()

2012-06-18 Thread jmfauth
On 18 juin, 12:11, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
 On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote:
  On 18 juin, 10:28, Benjamin Kaplan benjamin.kap...@case.edu wrote:
  The u prefix is only there to
  make it easier to port a codebase from Python 2 to Python 3. It doesn't
  actually do anything.

  It does. I shew it!

 Incorrect. You are assuming that Python 3 input eval's the input like
 Python 2 does. That is wrong. All you show is that the one-character
 string a is not equal to the four-character string u'a', which is
 hardly a surprise. You wouldn't expect the string 3 to equal the string
 int('3') would you?

 --
 Steven


A string is a string, a piece of text, period.

I do not see why a unicode literal and an (well, I do not
know how the call it) a normal class str should behave
differently in code source or as an answer to an input().

Should a user write two derived functions?

input_for_entering_text()
and
input_if_you_are_entering_a_text_as_litteral()

---

Side effect from the unicode litteral reintroduction.
I do not mind about this, but I expect it does
work logically and correctly. And it does not.

PS English is not my native language. I never know
to reply to an (interro)-negative sentence.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py3.3 unicode literal and input()

2012-06-18 Thread jmfauth
Thinks are very clear to me. I wrote enough interactive
interpreters with all available toolkits for Windows
since I know Python (v. 1.5.6).

I do not see why the semantic may vary differently
in code source or in an interactive interpreter,
esp. if Python allow it!

If you have to know by advance what an end user
is supposed to type and/or check it ('str' or unicode
literal) in order to know if the answer has to be
evaluated or not, then it is better to reintroduce
input() and raw_input().

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py3.3 unicode literal and input()

2012-06-18 Thread jmfauth
We are turning in circles. You are somehow
legitimating the reintroduction of unicode
literals and I shew, not to say proofed, it may
be a source of problems.

Typical Python desease. Introduce a problem,
then discuss how to solve it, but surely and
definitivly do not remove that problem.

As far as I know, Python 3.2 is working very
well.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py3.3 unicode literal and input()

2012-06-18 Thread jmfauth
On Jun 18, 8:45 pm, Terry Reedy tjre...@udel.edu wrote:
 On 6/18/2012 12:39 PM, jmfauth wrote:

  We are turning in circles.

 You are, not we. Please stop.

  You are somehow legitimating the reintroduction of unicode
  literals

 We are not 'reintroducing' unicode literals. In Python 3, string
 literals *are* unicode literals.

 Other developers reintroduced a now meaningless 'u' prefix for the
 purpose of helping people write 23 code that runs on both Python 2 and
 Python 3. Read about it herehttp://python.org/dev/peps/pep-0414/

 In Python 3.3, 'u' should *only* be used for that purpose and should be
 ignored by anyone not writing or editing 23 code. If you are not
 writing such code, ignore it.

   and I shew, not to say proofed, it may

  be a source of problems.

 You are the one making it be a problem.

  Typical Python desease. Introduce a problem,
  then discuss how to solve it, but surely and
  definitivly do not remove that problem.

 The simultaneous reintroduction of 'ur', but with a different meaning
 than in 2.7, *was* a problem and it should be removed in the next release.

  As far as I know, Python 3.2 is working very
  well.

 Except that many public libraries that we would like to see ported to
 Python 3 have not been. The purpose of reintroducing 'u' is to encourage
 more porting of Python 2 code. Period.

 --
 Terry Jan Reedy

It's a matter of perspective. I expected to have
finally a clean Python, the goal is missed.

I have nothing to object. It is your (core devs)
project, not mine. At least, you understood my point
of view.

I'm a more than two decades TeX user. At the release
of XeTeX (a pure unicode TeX-engine), the devs had,
like Python2/3, to make anything incompatible. A success.
It did not happen a week without seeing a updated
package or a refreshed documentation.

Luckily for me, Xe(La)TeX is more important than
Python.

As a scientist, Python is perfect.
From an educational point of view, I'm becoming
more and more skeptical about this language, a
moving target.

Note that I'm not complaining, only desappointed.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3.3.0a4, please add ru'...'

2012-06-17 Thread jmfauth
On 17 juin, 13:30, Christian Heimes li...@cheimes.de wrote:
 Am 16.06.2012 19:36, schrieb jmfauth:

  Please consistency.

 Python 3.3 supports the ur syntax just as Python 2.x:

 $ ./python
 Python 3.3.0a4+ (default:4c704dc97496, Jun 16 2012, 00:06:09)
 [GCC 4.6.3] on linux
 Type help, copyright, credits or license for more information. ur

 ''
 [73917 refs]

 Neither Python 2 nor Python 3 supports ru. I'm a bit astonished that
 rb works in Python 3 as it doesn't work in Python 2.7. But br works
 everywhere.

 Christian

I noticed this at the 3.3.0a0 realease.

The main motivation for this came from this:
http://bugs.python.org/issue13748

PS I saw the dev-list message.

PS2 Opinion, if not really useful, consistency nver hurts.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3.3.0a4, please add ru'...'

2012-06-17 Thread jmfauth
On 17 juin, 15:48, Christian Heimes li...@cheimes.de wrote:
 Am 17.06.2012 14:11, schrieb jmfauth:

  I noticed this at the 3.3.0a0 realease.

  The main motivation for this came from this:
 http://bugs.python.org/issue13748

  PS I saw the dev-list message.

  PS2 Opinion, if not really useful, consistency nver hurts.

 We are must likely drop the ur syntax as it's not compatible with
 Python 2.x's raw unicode notation.http://bugs.python.org/issue15096

 Christian


Yes, but on the other side, you (core developers) have reintroduced
the
messs of the unicode literal, now *assume* it (logiccally).

If the core developers have introduced rb'' or br' (Py2)' because they
never
know if the have to type rb or br (me too), what a beginner should
thing about ur and ru?

Finally, the ultimate argument: what it is Python 3 supposed to be?
A Python 2 derivative for lazy (ascii) programmers or an appealing
clean and coherent language?

jmf




-- 
http://mail.python.org/mailman/listinfo/python-list


Python 3.3.0a4, please add ru'...'

2012-06-16 Thread jmfauth
Please consistency.

 sys.version
'3.3.0a4 (v3.3.0a4:7c51388a3aa7+, May 31 2012, 20:15:21) [MSC v.1600
32 bit (Intel)]'
 'a'
'a'
 b'a'
b'a'
 br'a'
b'a'
 rb'a'
b'a'
 u'a'
'a'
 ur'a'
'a'
 ru'a'
SyntaxError: invalid syntax


jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python3 raw strings and \u escapes

2012-05-30 Thread jmfauth
On 30 mai, 13:54, Thomas Rachel nutznetz-0c1b6768-bfa9-48d5-
a470-7603bd3aa...@spamschutz.glglgl.de wrote:
 Am 30.05.2012 08:52 schrieb ru...@yahoo.com:



  This breaks a lot of my code because in python 2
         re.split (ur'[\u3000]', u'A\u3000A') ==  [u'A', u'A']
  but in python 3 (the result of running 2to3),
         re.split (r'[\u3000]', 'A\u3000A' ) ==  ['A\u3000A']

  I can remove the r prefix from the regex string but then
  if I have other regex backslash symbols in it, I have to
  double all the other backslashes -- the very thing that
  the r-prefix was invented to avoid.

  Or I can leave the r prefix and replace something like
  r'[ \u3000]' with r'[  ]'.  But that is confusing because
  one can't distinguish between the space character and
  the ideographic space character.  It also a problem if a
  reader of the code doesn't have a font that can display
  the character.

  Was there a reason for dropping the lexical processing of
  \u escapes in strings in python3 (other than to add another
  annoyance in a long list of python3 annoyances?)

 Probably it is more consequent. Alas, it makes the whole stuff
 incompatible to Py2.

 But if you think about it: why allow for \u if \r, \n etc. are
 disallowed as well?

  And is there no choice for me but to choose between the two
  poor choices I mention above to deal with this problem?

 There is a 3rd one: use   r'[ ' + '\u3000' + ']'. Not very nice to read,
 but should do the trick...

 Thomas

I suggest to take the problem differently. Python 3
succeeded to put order in the missmatch of the coding
of the characters Python 2 was proposing.

In your case, the

 import unicodedata as ud
 ud.name('\u3000')
'IDEOGRAPHIC SPACE'

character (in fact a unicode code point), is just
a character as a

 ud.name('a')
'LATIN SMALL LETTER A'

The code point / unicode logic, Python 3 proposes and follows,
becomes just straightforward.

 s = 'a\u3000é\u3000€'
 s.split('\u3000')
['a', 'é', '€']

 import re
 re.split('\u3000', s)
['a', 'é', '€']


The backslash, used as real backslash, remains what it
really was in Python 2. Note, the absence of r'...' .

 s = 'a\\b\\c'
 print(s)
a\b\c
 s.split('\\')
['a', 'b', 'c']
 re.split('', s)
['a', 'b', 'c']

 hex(ord('\\'))
'0x5c'
 re.split('\u005c\u005c', s)
['a', 'b', 'c']

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python3 raw strings and \u escapes

2012-05-30 Thread jmfauth
On 30 mai, 08:52, ru...@yahoo.com ru...@yahoo.com wrote:
 In python2, \u escapes are processed in raw unicode
 strings.  That is, ur'\u3000' is a string of length 1
 consisting of the IDEOGRAPHIC SPACE unicode character.

 In python3, \u escapes are not processed in raw strings.
 r'\u3000' is a string of length 6 consisting of a backslash,
 'u', '3' and three '0' characters.

 This breaks a lot of my code because in python 2
       re.split (ur'[\u3000]', u'A\u3000A') == [u'A', u'A']
 but in python 3 (the result of running 2to3),
       re.split (r'[\u3000]', 'A\u3000A' ) == ['A\u3000A']

 I can remove the r prefix from the regex string but then
 if I have other regex backslash symbols in it, I have to
 double all the other backslashes -- the very thing that
 the r-prefix was invented to avoid.

 Or I can leave the r prefix and replace something like
 r'[ \u3000]' with r'[  ]'.  But that is confusing because
 one can't distinguish between the space character and
 the ideographic space character.  It also a problem if a
 reader of the code doesn't have a font that can display
 the character.

 Was there a reason for dropping the lexical processing of
 \u escapes in strings in python3 (other than to add another
 annoyance in a long list of python3 annoyances?)

 And is there no choice for me but to choose between the two
 poor choices I mention above to deal with this problem?


I suggest to take the problem differently. Python 3
succeeded to put order in the missmatch of the coding
of the characters Python 2 was proposing.

The 'IDEOGRAPHIC SPACE' and 'REVERSE SOLIDUS' (backslash)
characters (in fact unicode code points) are just (normal)
characters. The backslash, used as an escaping command,
keeps its function.

Note the absence of r'...'

 s = 'a\u3000é\u3000€'
 s.split('\u3000')
['a', 'é', '€']

 import re
 re.split('\u3000', s)
['a', 'é', '€']


 s = 'a\\b\\c'
 print(s)
a\b\c
 s.split('\\')
['a', 'b', 'c']
 re.split('', s)
['a', 'b', 'c']

 hex(ord('\\'))
'0x5c'
 re.split('\u005c\u005c', s)
['a', 'b', 'c']

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: str.isnumeric and Cuneiforms

2012-05-18 Thread jmfauth
On 17 mai, 21:32, Marco marc...@nsgmail.com wrote:
 Is it normal the str.isnumeric() returns False for these Cuneiforms?

 '\U00012456'
 '\U00012457'
 '\U00012432'
 '\U00012433'


 They are all in the Nl category.

Indeed there are, but Unicode (ver. 5.0.0) does not assign numeric
values to these code points.

Do not ask me, why?

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: str.isnumeric and Cuneiforms

2012-05-18 Thread jmfauth
On 18 mai, 17:08, Marco Buttu name.surn...@gmail.com wrote:
 On 05/17/2012 09:32 PM, Marco wrote:

  Is it normal the str.isnumeric() returns False for these Cuneiforms?

  '\U00012456'
  '\U00012457'
  '\U00012432'
  '\U00012433'

  They are all in the Nl category.

  Marco

 It's ok, I found that they don't have a number assigned in 
 theftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txtdatabase.
 --
 Marco

Good. I was about to send this information. I have all this (not
updated)
stuff locally on my hd.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: str.isnumeric and Cuneiforms

2012-05-18 Thread jmfauth
On 18 mai, 17:08, Marco Buttu name.surn...@gmail.com wrote:
 On 05/17/2012 09:32 PM, Marco wrote:

  Is it normal the str.isnumeric() returns False for these Cuneiforms?

  '\U00012456'
  '\U00012457'
  '\U00012432'
  '\U00012433'

  They are all in the Nl category.

  Marco

 It's ok, I found that they don't have a number assigned in 
 theftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txtdatabase.
 --
 Marco

Non official but really practical:

http://www.fileformat.info/info/unicode/index.htm

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Difference between str.isdigit() and str.isdecimal() in Python 3

2012-05-16 Thread jmfauth
On 16 mai, 17:48, Marco marc...@nsgmail.com wrote:
 Hi all, because

 There should be one-- and preferably only one --obvious way to do it,

 there should be a difference between the two methods in the subject, but
 I can't find it:

   '123'.isdecimal(), '123'.isdigit()
 (True, True)
   print('\u0660123')
 ٠123
   '\u0660123'.isdigit(), '\u0660123'.isdecimal()
 (True, True)
   print('\u216B')
 Ⅻ
   '\u216B'.isdecimal(), '\u216B'.isdigit()
 (False, False)

 Can anyone give me some help?
 Regards, Marco

It seems to me that it is correct, and the reason lies in this:

 import unicodedata as ud
 ud.category('\u216b')
'Nl'
 ud.category('1')
'Nd'

 # Note
 ud.numeric('\u216b')
12.0

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


On u'Unicode string literals' (Py3)

2012-02-29 Thread jmfauth
For those who do not know:
The u'' string literal trick has never worked in Python 2.

 sys.version
'2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
 print u'Un oeuf à zéro EURO uro'
Un  uf à zéro  uro


jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: On u'Unicode string literals' reintroduction (Py3)

2012-02-29 Thread jmfauth
On 29 fév, 14:45, jmfauth wxjmfa...@gmail.com wrote:
 For those who do not know:
 The u'' string literal trick has never worked in Python 2.

  sys.version

 '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' print 
 u'Un oeuf à zéro EURO uro'

 Un  uf à zéro  uro



 jmf


Sorry, I just wanted to show a small example.
I semms Google as changed again.

You should read (2nd attempt)
u'Un œuf à zéro €' with the *correct* typed glyphs 'LATIN SMALL
LIGATURE OE'
in œuf and 'EURO SIGN' in '€uro'.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python math is off by .000000000000045

2012-02-26 Thread jmfauth
On 25 fév, 23:51, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
 On Sat, 25 Feb 2012 13:25:37 -0800, jmfauth wrote:
  (2.0).hex()
  '0x1.0p+1'
  (4.0).hex()
  '0x1.0p+2'
  (1.5).hex()
  '0x1.8p+0'
  (1.1).hex()
  '0x1.1999ap+0'

  jmf

 What's your point? I'm afraid my crystal ball is out of order and I have
 no idea whether you have a question or are just demonstrating your
 mastery of copy and paste from the Python interactive interpreter.



It should be enough to indicate the right direction
for casual interested readers.



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python math is off by .000000000000045

2012-02-25 Thread jmfauth
 (2.0).hex()
'0x1.0p+1'
 (4.0).hex()
'0x1.0p+2'
 (1.5).hex()
'0x1.8p+0'
 (1.1).hex()
'0x1.1999ap+0'


jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: distutils bdist_wininst failure on Linux

2012-02-23 Thread jmfauth
On 23 fév, 15:06, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
 Following instructions here:

 http://docs.python.org/py3k/distutils/builtdist.html#creating-windows...

 I am trying to create a Windows installer for a pure-module distribution
 using Python 3.2. I get a LookupError: unknown encoding: mbcs

 Here is the full output of distutils and the traceback:

 [steve@ando pyprimes]$ python3.2 setup.py bdist_wininst
 running bdist_wininst
 running build
 running build_py
 creating build/lib
 copying src/pyprimes.py - build/lib
 installing to build/bdist.linux-i686/wininst
 running install_lib
 creating build/bdist.linux-i686/wininst
 creating build/bdist.linux-i686/wininst/PURELIB
 copying build/lib/pyprimes.py - build/bdist.linux-i686/wininst/PURELIB
 running install_egg_info
 Writing build/bdist.linux-i686/wininst/PURELIB/pyprimes-0.1.1a-py3.2.egg-info
 creating '/tmp/tmp3utw4_.zip' and adding '.' to it
 adding 'PURELIB/pyprimes.py'
 adding 'PURELIB/pyprimes-0.1.1a-py3.2.egg-info'
 creating dist
 Warning: Can't read registry to find the necessary compiler setting
 Make sure that Python modules winreg, win32api or win32con are installed.
 Traceback (most recent call last):
   File setup.py, line 60, in module
     License :: OSI Approved :: MIT License,
   File /usr/local/lib/python3.2/distutils/core.py, line 148, in setup
     dist.run_commands()
   File /usr/local/lib/python3.2/distutils/dist.py, line 917, in run_commands
     self.run_command(cmd)
   File /usr/local/lib/python3.2/distutils/dist.py, line 936, in run_command
     cmd_obj.run()
   File /usr/local/lib/python3.2/distutils/command/bdist_wininst.py, line 
 179, in run
     self.create_exe(arcname, fullname, self.bitmap)
   File /usr/local/lib/python3.2/distutils/command/bdist_wininst.py, line 
 262, in create_exe
     cfgdata = cfgdata.encode(mbcs)
 LookupError: unknown encoding: mbcs

 How do I fix this, and is it a bug in distutils?

 --
 Steven

Because the 'mbcs' codec is missing in your Linux, :-)

 'abc需'.encode('cp1252')
b'abc\xe9\x9c\x80'
 'abc需'.encode('missing')
Traceback (most recent call last):
  File eta last command, line 1, in module
LookupError: unknown encoding: missing

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: format a measurement result and its error in scientific way

2012-02-17 Thread jmfauth
On 16 fév, 01:18, Daniel Fetchinson fetchin...@googlemail.com wrote:
 Hi folks, often times in science one expresses a value (say
 1.03789291) and its error (say 0.00089) in a short way by parentheses
 like so: 1.0379(9)


Before swallowing any Python solution, you should
realize, the values (value, error) you are using are
a non sense :

1.03789291 +/- 0.00089

You express more precision in the value than
in the error.

---

As ex, in a 1.234(5) notation, the () is usually
used to indicate the accuracy of the digit in ().

Eg 1.345(7)

Typographically, the () is sometimes replaced by
a bold digit ou a subscripted digit.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: format a measurement result and its error in scientific way

2012-02-17 Thread jmfauth
On 17 fév, 11:03, Daniel Fetchinson fetchin...@googlemail.com wrote:
  Hi folks, often times in science one expresses a value (say
  1.03789291) and its error (say 0.00089) in a short way by parentheses
  like so: 1.0379(9)

  Before swallowing any Python solution, you should
  realize, the values (value, error) you are using are
  a non sense :

  1.03789291 +/- 0.00089

  You express more precision in the value than
  in the error.

 My impression is that you didn't understand the original problem:
 given an arbitrary value to arbitrary digits and an arbitrary error,
 find the relevant number of digits for the value that makes sense for
 the given error. So what you call non sense is part of the problem
 to be solved.


I do not know where these numbers (value, error) are
coming from. But, when the value and the error
have not the same precision, there is already
something wrong somewhere.
And this, *prior* to any representation of these
values/numbers.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python usage numbers

2012-02-14 Thread jmfauth
On 13 fév, 04:09, Terry Reedy tjre...@udel.edu wrote:


 * The new internal unicode scheme for 3.3 is pretty much a mixture of
 the 3 storage formats (I am of course, skipping some details) by using
 the widest one needed for each string. The advantage is avoiding
 problems with each of the three. The disadvantage is greater internal
 complexity, but that should be hidden from users. They will not need to
 care about the internals. They will be able to forget about 'narrow'
 versus 'wide' builds and the possible requirement to code differently
 for each. There will only be one scheme that works the same on all
 platforms. Most apps should require less space and about the same time.

 --


Python 2 was built for ascii users. Now, Python 3(.3) is
*optimized* for the ascii users.

And the rest of the crowd? Not so sure, French users
(among others) who can not write their texts will
iso-8859-1/latin1 will be very happy.

No doubts, it will work. Is this however the correct
approach?

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python usage numbers

2012-02-12 Thread jmfauth


There is so much to say on the subject, I do not know
where to start. Some points.

Today, Sunday, 12 February 2012, 90%, if not more, of the
Python applications supposed to work with text and I'm toying
with are simply not working. Two reasons:
1) Most of the devs understand nothing or not enough on the
field of the coding of the characters.
2) In gui applications, most of the devs understand
nothing or not enough in the keyboard keys/chars handling.

---

I know Python since version 1.5.2 or 1.5.6 (?). Among the
applications I wrote, my fun is in writing GUI interactive
interpreters with Python 2 or 3, tkinter, Tkinter, wxPython,
PySide, PyQt4 on Windows.

Believe or not, my interactive interpreters are the only
ones where I can enter text and where text is displayed
correctly. IDLE, wxPython/PyShell, DrPython, ... all
are failing. (I do not count console applications).

Python popularity? I have no popularity-meter. What I know:
I can not type French text in IDLE on Windows. It is like
this since ~ten years and I never saw any complain about
this. (The problem in bad programmation).

Ditto for PyShell in wxPython. I do not count, the number of
corrections I proposed. In one version, it takes me 18 months
until finally decided to propose a correction. During this
time, I never heard of the problem. (Now, it is broken
again).

---

Is there a way to fix this actual status?
- Yes, and *very easily*.

Will it be fixed?
- No, because there is no willingness to solve it.

---

Roy Smith's quote: ... that we'll all just be
using UTF-32, ...

Considering PEP 393, Python is not taking this road.

---

How many devs know, one can not write text in French with
the iso-8859-1 coding? (see pep 393)

How can one explain, corporates like MS or Apple with their
cp1252 or mac-roman codings succeeded to know this?

Ditto for foundries (Adobe, LinoType, ...)

---

Python is 20 years old. It was developped with ascii in
mind. Python was not born, all this stuff was already
a no problem with Windows and VB.
Even a step higher, Windows was no born, this was a no
problem at DOS level (eg TurboPascal), 30 years ago!

Design mistake.

---

Python 2 introduced the unicode type. Very nice.
Problem. The introduction of the automatic coercion
ascii-unicode, which somehow breaks everything.

Very bad design mistake. (In my mind, the biggest one).

---

One day, I fell on the web on a very old discussion
about Python related to the introduction of unicode in
Python 2. Something like:

Python core dev (it was VS or AP): ... lets go with ucs-4
and we have no problem in the future 

Look at the situation today.

---

And so one.

---

Conclusion. A Windows programmer is better served by
downloading VB.NET Express. A end Windows user is
better served with an application developped with VB.NET
Express.

I find somehow funny, Python is able to produce this:

 (1.1).hex()
'0x1.1999ap+0'


and on the other side, Python, Python applications,
are not able to deal correctly with text entering
and text displaying. Probably, the two most important
tasks a computer has to do!

jmf

PS I'm not a computer scientist, only a computer user.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: changing sys.path

2012-02-02 Thread jmfauth
On 2 fév, 11:03, Andrea Crotti andrea.crott...@gmail.com wrote:
 On 02/02/2012 12:51 AM, Steven D'Aprano wrote:



  On Wed, 01 Feb 2012 17:47:22 +, Andrea Crotti wrote:

  Yes they are exactly the same, because in that file I just write exactly
  the same list,
  but when modifying it at run-time it doesn't work, while if at the
  application start
  there is this file everything works correctly...

  That's what really puzzles me.. What could that be then?

  Are you using IDLE or WingIDE or some other IDE which may not be
  honouring sys.path? If so, that's a BAD bug in the IDE.
  Are you changing the working directory manually, by calling os.chdir? If
  so, that could be interfering with the import somehow. It shouldn't, but
  you never know...

  Are you adding absolute paths or relative paths?

 No, no and absolute paths..



  You say that you get an ImportError, but that covers a lot of things
  going wrong. Here's a story. Could it be correct? I can't tell because
  you haven't posted the traceback.

  When you set site-packages/my_paths.pth you get a sys path that looks
  like ['a', 'b', 'fe', 'fi', 'fo', 'fum']. You then call import spam
  which locates b/spam.py and everything works.

  But when you call sys.path.extend(['a', 'b']) you get a path that looks
  like ['fe', 'fi', 'fo', 'fum', 'a', 'b']. Calling import spam locates
  some left over junk file, fi/spam.py or fi/spam.pyc, which doesn't
  import, and you get an ImportError.

 And no the problem is not that I already checked inspecting at run-time..
 This is the traceback and it might be related to the fact that it runs
 from the
 .exe wrapper generated by setuptools:

 Traceback (most recent call last):
    File c:\python25\scripts\dev_main-script.py, line 8, in module
      load_entry_point('psi.devsonly==0.1', 'console_scripts', 'dev_main')()
    File h:\git_projs\psi\psi.devsonly\psi\devsonly\bin\dev_main.py,
 line 152, in main
      Develer(ns).full_run()
    File h:\git_projs\psi\psi.devsonly\psi\devsonly\bin\dev_main.py,
 line 86, in full_run
      run(project_name, test_only=self.ns.test_only)
    File h:\git_projs\psi\psi.devsonly\psi\devsonly\environment.py,
 line 277, in run
      from psi.devsonly.run import Runner
    File h:\git_projs\psi\psi.devsonly\psi\devsonly\run.py, line 7, in
 module
      from psi.workbench.api import Workbench, set_new_dev_main
 ImportError: No module named workbench.api

 Another thing which might matter is that I'm launching Envisage
 applications, which
 heavily rely on the use of entry points, so I guess that if something is
 not in the path
 the entry point is not loaded automatically (but it can be forced I
 guess somehow).

 I solved in another way now, since I also need to keep a dev_main.pth in
 site-packages
 to make Eclipse happy, just respawning the same process on ImportError works
 already perfectly..



There is something strange here. I can not figure
out how correct code will fail with the sys.path.
It seems to me, the lib you are using is somehow not
able to recognize its own structure (his own sys.path).

Idea. Are you sure you are modifying the sys.path at
the right place, understand at the right time
when Python processes?

I'm using this sys.path tweaking at run time very often;
eg to test or to run different versions of the same lib
residing in different dirs, and this, in *any* dir and
independently of *any* .pth file.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: changing sys.path

2012-02-01 Thread jmfauth
On 1 fév, 17:15, Andrea Crotti andrea.crott...@gmail.com wrote:
 So suppose I want to modify the sys.path on the fly before running some code
 which imports from one of the modules added.

 at run time I do
 sys.path.extend(paths_to_add)

 but it still doesn't work and I get an import error.

 If I take these paths and add them to site-packages/my_paths.pth
 everything works, but at run-time the paths which I actually see before
 importing are exactly the same.

 So there is something I guess that depends on the order, but what can I
 reset/reload to make these paths available (I thought I didn't need
 anything in theory)?


 import mod
Traceback (most recent call last):
  File eta last command, line 1, in module
ImportError: No module named mod
 sys.path.append(r'd:\\jm\\junk')
 import mod
 mod
module 'mod' from 'd:\\jm\\junk\mod.py'
 mod.hello()
fct hello in mod.py


sys.path? Probably, the most genious Python idea.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sys.argv as a list of bytes

2012-01-19 Thread jmfauth

 In short: if you need to write system scripts on Unix, and you need them
 to work reliably, you need to stick with Python 2.x.


I think, understanding the coding of the characters helps a bit.

I can not figure out how the example below could not be
done on other systems.

D:\tmpchcp
Page de codes active : 1252

D:\tmpc:\python32\python.exe sysarg.py a b é € \u0430 \u03b1 z
arg: 1   unicode name: LATIN SMALL LETTER A
arg: 2   unicode name: LATIN SMALL LETTER B
arg: 3   unicode name: LATIN SMALL LETTER E WITH ACUTE
arg: 4   unicode name: EURO SIGN
arg: 5   unicode name: CYRILLIC SMALL LETTER A
arg: 6   unicode name: GREEK SMALL LETTER ALPHA
arg: 7   unicode name: LATIN SMALL LETTER Z

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: NaN, Null, and Sorting

2012-01-13 Thread jmfauth
On 13 jan, 20:04, Ethan Furman et...@stoneleaf.us wrote:
 With NaN, it is possible to get a list that will not properly sort:

 -- NaN = float('nan')
 -- spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
 -- sorted(spam)
 [1, 2, nan, 3, nan, 4, 5, 7, nan]

 I'm constructing a Null object with the semantics that if the returned
 object is Null, it's actual value is unknown.


Short answer.

-  NaN != NA()

-  I find the actual implementation (Py3.2) quite satisfying. (M.
Dickinson's work)

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError in compile

2012-01-11 Thread jmfauth
On 11 jan, 01:56, Terry Reedy tjre...@udel.edu wrote:
 On 1/10/2012 8:43 AM, jmfauth wrote:



  D:\c:\python32\python.exe
  Python 3.2.2 (default, Sep  4 2011, 09:51:08) [MSC v.1500 32 bit
  (Intel)] on win
  32
  Type help, copyright, credits or license for more information.
  '\u5de5'.encode('utf-8')
  b'\xe5\xb7\xa5'
  '\u5de5'.encode('mbcs')
  Traceback (most recent call last):
     File stdin, line 1, inmodule
  UnicodeEncodeError: 'mbcs' codec can't encode characters in position
  0--1: inval
  id character
  D:\c:\python27\python.exe
  Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
  (Intel)] on win
  32
  Type help, copyright, credits or license for more information.
  u'\u5de5'.encode('utf-8')
  '\xe5\xb7\xa5'
  u'\u5de5'.encode('mbcs')
  '?'

 mbcs encodes according to the current codepage. Only the chinese
 codepage(s) can encode the chinese char. So the unicode error is correct
 and 2.7 has a bug in that it is doing errors='replace' when it
 supposedly is doing errors='strict'. The Py3 fix was done 
 inhttp://bugs.python.org/issue850997
 2.7 was intentionally left alone because of back-compatibility
 considerations. (None of this addresses the OP's question.)

 --

Ok. I was not aware of this.
PS Prev. post gets lost.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError in compile

2012-01-11 Thread jmfauth
On 11 jan, 01:56, Terry Reedy tjre...@udel.edu wrote:
 On 1/10/2012 8:43 AM, jmfauth wrote:

 ...

 mbcs encodes according to the current codepage. Only the chinese
 codepage(s) can encode the chinese char. So the unicode error is correct
 and 2.7 has a bug in that it is doing errors='replace' when it
 supposedly is doing errors='strict'. The Py3 fix was done 
 inhttp://bugs.python.org/issue850997
 2.7 was intentionally left alone because of back-compatibility
 considerations. (None of this addresses the OP's question.)

 --

win7, cp1252

Ok. I was not aware of this.

 '\N{CYRILLIC SMALL LETTER A}'.encode('mbcs')
Traceback (most recent call last):
  File eta last command, line 1, in module
UnicodeEncodeError: 'mbcs' codec can't encode characters in position
0--1: invalid character
 '\N{GREEK SMALL LETTER ALPHA}'.encode('mbcs')
Traceback (most recent call last):
  File eta last command, line 1, in module
UnicodeEncodeError: 'mbcs' codec can't encode characters in position
0--1: invalid character

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError in compile

2012-01-10 Thread jmfauth
1) If I copy/paste these CJK chars from Google Groups in two of my
interactive
interpreters (no dos/cmd console), I have no problem.

 import unicodedata as ud
 ud.name('工')
'CJK UNIFIED IDEOGRAPH-5DE5'
 ud.name('具')
'CJK UNIFIED IDEOGRAPH-5177'
 hex(ord(('工')))
'0x5de5'
 hex(ord('具'))
'0x5177'


2) It semms the mbcs codec has some difficulties with
these chars.

 '\u5de5'.encode('mbcs')
Traceback (most recent call last):
  File eta last command, line 1, in module
UnicodeEncodeError: 'mbcs' codec can't encode characters in position
0--1: invalid character
 '\u5de5'.encode('utf-8')
b'\xe5\xb7\xa5'
 '\u5de5'.encode('utf-32-be')
b'\x00\x00]\xe5'

3) On the usage of mbcs in files IO interaction -- core devs.

My conclusion.
The bottle neck is on the mbcs side.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError in compile

2012-01-10 Thread jmfauth
On 10 jan, 11:53, 8 Dihedral dihedral88...@googlemail.com wrote:
 Terry Reedy於 2012年1月10日星期二UTC+8下午4時08分40秒寫道:


  I get the same error running 3.2.2 under IDLE but not when pasting into
  Command Prompt. However, Command Prompt may be cheating by replacing the
  Chinese chars with '??' upon pasting, so that Python never gets them --
  whereas they appear just fine in IDLE.

  --


Tested with *my* Windows GUI interactive intepreters.

It seems to me there is a problem with the mbcs codec.

 hex(ord('工'))
'0x5de5'
 '\u5de5'
'工'
 '\u5de5'.encode('mbcs')
Traceback (most recent call last):
  File eta last command, line 1, in module
UnicodeEncodeError: 'mbcs' codec can't encode characters in position
0--1: invalid character
 '\u5de5'.encode('utf-8')
b'\xe5\xb7\xa5'
 '\u5de5'.encode('utf-32-be')
b'\x00\x00]\xe5'
 sys.version
'3.2.2 (default, Sep  4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)]'
 '\u5de5'.encode('mbcs', 'replace')
b'?'

--

 u'\u5de5'.encode('mbcs', 'replace')
'?'
 repr(u'\u5de5'.encode('utf-8'))
'\\xe5\\xb7\\xa5'
 repr(u'\u5de5'.encode('utf-32-be'))
'\\x00\\x00]\\xe5'
 sys.version
'2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'


jmf


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError in compile

2012-01-10 Thread jmfauth
On 10 jan, 13:28, jmfauth wxjmfa...@gmail.com wrote:

Addendum, Python console (dos box)

D:\c:\python32\python.exe
Python 3.2.2 (default, Sep  4 2011, 09:51:08) [MSC v.1500 32 bit
(Intel)] on win
32
Type help, copyright, credits or license for more information.
 '\u5de5'.encode('utf-8')
b'\xe5\xb7\xa5'
 '\u5de5'.encode('mbcs')
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeEncodeError: 'mbcs' codec can't encode characters in position
0--1: inval
id character
 ^Z


D:\c:\python27\python.exe
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
(Intel)] on win
32
Type help, copyright, credits or license for more information.
 u'\u5de5'.encode('utf-8')
'\xe5\xb7\xa5'
 u'\u5de5'.encode('mbcs')
'?'
 ^Z


D:\

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to support a non-standard encoding?

2012-01-06 Thread jmfauth
On 6 jan, 11:03, Ivan i...@llaisdy.com wrote:
 Dear All

 I'm developing a python application for which I need to support a
 non-standard character encoding (specifically ISO 6937/2-1983, Addendum
 1-1989).  Here are some of the properties of the encoding and its use in
 the application:

    - I need to read and write data to/from files.  The file format
      includes two sections in different character encodings (so I
      shan't be able to use codecs.open()).

    - iso-6937 sections include non-printing control characters

    - iso-6937 is a variable width encoding, e.g. A = [41],
      Ä = [0xC8, 0x41]; all non-spacing diacritical marks are in the
      range 0xC0-0xCF.

 By any chance is there anyone out there working on iso-6937?

 Otherwise, I think I need to write a new codec to support reading and
 writing this data.  Does anyone know of any tutorials or blog posts on
 implementing a codec for a non-standard characeter encoding?  Would
 anyone be interested in reading one?



Take a look at the files, Python modules, in the
...\Lib\encodings. This is the place where all codecs
are centralized. Python is magically using these
a long there are present in that dir.

I remember, long time ago, for the fun, I created such
a codec quite easily. I picked up one of the file as
template and I modified its table. It was a
byte - byte table.

For multibytes coding scheme, it may be a litte bit more
complicated; you may take a look, eg, at the mbcs.py codec.

The distibution of such a codec may be a problem.



Another simple approach, os independent.

You probably do not write your code in iso-6937, but
you only need to encode/decode some bytes sequence
on the fly. In that case, work with bytes, create
a couple of coding / decoding functions with a
created dict [*] as helper. It's not so complicate.
Use unicode Py2 or str Py3 (the recommended
way ;-) ) as pivot encoding.

[*] I also created once a such a dict from
# 
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt

I never checked if it does correpond to the official cp1252
codec.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 2 or 3

2011-12-03 Thread jmfauth
On 3 déc, 04:54, Antti J Ylikoski antti.yliko...@tkk.fi wrote:

 Helsinki, Finland, the EU   


 sys.version
'2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
 'éléphant'
'\xe9l\xe9phant'




 sys.version
'3.2.2 (default, Sep  4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)]'
 'éléphant'
'éléphant'



jmf



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread jmfauth
On 6 oct, 06:39, Greg gregor.hochsch...@googlemail.com wrote:
 Brilliant! It worked. Thanks!

 Here is the final code for those who are struggling with similar
 problems:

 ## open and decode file
 # In this case, the encoding comes from the charset argument in a meta
 tag
 # e.g. meta charset=iso-8859-2
 fileObj = open(filePath,r).read()
 fileContent = fileObj.decode(iso-8859-2)
 fileSoup = BeautifulSoup(fileContent)

 ## Do some BeautifulSoup magic and preserve unicode, presume result is
 saved in 'text' ##

 ## write extracted text to file
 f = open(outFilePath, 'w')
 f.write(text.encode('utf-8'))
 f.close()




or  (Python2/Python3)

 import io
 with io.open('abc.txt', 'r', encoding='iso-8859-2') as f:
... r = f.read()
...
 repr(r)
u'a\nb\nc\n'
 with io.open('def.txt', 'w', encoding='utf-8-sig') as f:
... t = f.write(r)
...
 f.closed
True

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How do I automate the removal of all non-ascii characters from my code?

2011-09-13 Thread jmfauth
On 12 sep, 23:39, Rhodri James rho...@wildebst.demon.co.uk wrote:


 Now read what Steven wrote again.  The issue is that the program contains  
 characters that are syntactically illegal.  The engine can be perfectly  
 correctly translating a character as a smart quote or a non breaking space  
 or an e-umlaut or whatever, but that doesn't make the character legal!


Yes, you are right. I did not understand in that way.

However, a small correction/precision. Illegal character
do not exit. One can only have an ill-formed encoded code
points or an illegal encoded code point representing a
character/glyph.

Basically, in the present case. The issue is most probably
a mismatch between the coding directive and the real
coding, with no coding directive == 'ascii'.


jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How do I automate the removal of all non-ascii characters from my code?

2011-09-13 Thread jmfauth
On 13 sep, 10:15, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:

The intrinsic coding of the characters is one thing,
The usage of bytes stream supposed to represent a text
is one another thing,

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   >