Re: [Python-Dev] Regular expressions, Unicode etc.
James Y Knight wrote: > On Aug 8, 2007, at 3:47 PM, Nick Maclaren wrote: > > Firstly, things like backreferences are an absolute no-no. They > > are not regular, and REs with them in cannot be converted to DFAs. > > People keep saying things like this as if GNU grep and tcl's regular > expression matchers didn't exist. But do these work by conversion to a DFA? -- Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [Python-3000] Universal newlines support in Python 3.0
Guido van Rossum writes: > However, the old universal newlines feature also set an attibute named > 'newlines' on the file object to a tuple of up to three elements > giving the actual line endings that were observed on the file so far > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > implemented. I'm tempted to kill it. Does anyone have a use case for > this? I have run into files that intentionally have more than one newline convention used (mbox and Babyl mail folders, with messages received from various platforms). However, most of the time multiple newline conventions is a sign that the file is either corrupt or isn't text. If so, then saving the file may corrupt it. The newlines attribute could be used to check for this condition. > Has anyone even ever used this? Not I. When I care about such issues I prefer that the codec raise an exception at the time of detection. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Regular expressions, Unicode etc.
James Y Knight <[EMAIL PROTECTED]> wrote: > > > Firstly, things like backreferences are an absolute no-no. They > > are not regular, and REs with them in cannot be converted to DFAs. > > That could be 'solved' by a parser that kicked out such constructions, > > but it would get screams from many users. > > People keep saying things like this as if GNU grep and tcl's regular > expression matchers didn't exist. > See http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm for example. PCRE also has a breadth-first engine, but it does not convert the NFA to a DFA (its author is a close colleague of mine). Those engines won't do the conversion, either, and I am prepared to bet that I could produce a pattern that would either run very slowly or expose the semantics differences in most of them. I did NOT say that there were not, alternative, approaches. What I said was correct - you cannot convert such extended expressions to DFAs. You can convert them to things that are sort of NFA/DFA hybrids, which might or might not be a good way to proceed. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode database
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote: > > Sure. But (again): you don't need to have the mappings at all for > what you want to achieve. So there is no point in downloading them Sigh. No, I don't. But, if I want to be able to merge anything back into the main Python source, it is a VERY good idea to use the existing mechanisms and not invent new ones. The easiest thing would have been to hack re.py to create a Unicode table using unicodedata.py directly, and that would indeed be a rather cleaner solution in the long term. But it would have meant that there were now multiple different ways of generating the Unicode data for _sre.c, and that would have led to inconsistencies. As I pointed out, there is already a problem where upgrading the data needs a complete rebuild to get all of the Unicode data back in step; 'make all' in itself does not work. That is precisely the sort of problem that is caused by having duplicate update mechanisms. Now, IF I can work out how the _sre.c engine works enough to put atomic/possessive quantifiers in, this problem will return. My question would be how best to make a suitable proposal that, inter alia, includes changes that can't be made by the normal building mechanisms. And I still don't have a clue about that one. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Regular expressions, Unicode etc.
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote: > > Your specification was "For Unicode, whatever people agree!" > > I would not call that "Unicode-based". Can we drop this, please? I am happy to agree that I was being unclear (it is a common failing of mine), but I did provide the specification I coded. Specifically, and in full, I said: For Unicode, whatever people agree! I use the criterion that it has a defined category that doesn't start with 'C' - which is what I think that most people will accept. That is equivalent to the definition you gave. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Universal newlines support in Python 3.0
Python 3.0 currently has limited universal newlines support: by default, \r\n is translated into \n for text files, but this can be controlled by the newline= keyword parameter. For details on how, see PEP 3116. The PEP prescribes that a lone \r must also be translated, though this hasn't been implemented yet (any volunteers?). However, the old universal newlines feature also set an attibute named 'newlines' on the file object to a tuple of up to three elements giving the actual line endings that were observed on the file so far (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not implemented. I'm tempted to kill it. Does anyone have a use case for this? Has anyone even ever used this? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode database
>> Sure. But (again): you don't need to have the mappings at all for >> what you want to achieve. So there is no point in downloading them > > Sigh. No, I don't. But, if I want to be able to merge anything > back into the main Python source, it is a VERY good idea to use the > existing mechanisms and not invent new ones. I think you still don't understand. Why I keep calling "mappings" is *unrelated* to unicodedata. unicodedata is a different database, and not related at all to the makefile. It never was. > As I pointed out, there is already a problem where upgrading the data > needs a complete rebuild to get all of the Unicode data back in step; > 'make all' in itself does not work. That is precisely the sort of > problem that is caused by having duplicate update mechanisms. Right. Downloading the necessary files is a completely manual process, not supported at all by "make all", which is designed to do something entirely different. > Now, IF I can work out how the _sre.c engine works enough to put > atomic/possessive quantifiers in, this problem will return. My > question would be how best to make a suitable proposal that, inter > alia, includes changes that can't be made by the normal building > mechanisms. > > And I still don't have a clue about that one. You lost me somewhere. What are "changes that can't be made by the normal building process", and what is "this problem" that will return? Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Regular expressions, Unicode etc.
Nick Maclaren wrote: > You can convert them to things that are sort of NFA/DFA > hybrids, If you could express it as an NFA, then you could (in principle) convert it to a DFA. So whatever it's using can't be an NFA either. -- Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Weekly Python Patch/Bug Summary
Patch / Bug Summary
___
Patches : 404 open ( +0) / 3855 closed ( +8) / 4259 total ( +8)
Bugs: 1065 open ( +6) / 6790 closed ( +6) / 7855 total (+12)
RFE : 263 open ( +0) / 295 closed ( +0) / 558 total ( +0)
New / Reopened Patches
__
MSVC++8 x86 tkinter build patch for trunk (2007-08-05)
http://python.org/sf/1767787 opened by brotchie
test_asyncore fix (2007-08-05)
CLOSED http://python.org/sf/1767834 opened by Hasan Diwan
Fix for failing test_scriptpackages in py3k-struni (2007-08-07)
CLOSED http://python.org/sf/1768976 opened by Antti Rasinen
Fix for failing test_plistlib in py3k-struni (2007-08-07)
CLOSED http://python.org/sf/1769016 opened by brotchie
struni: test_xml_etree_c (2007-08-08)
CLOSED http://python.org/sf/1769767 opened by Joe Gregorio
Remove cStringIO usage (2007-08-08)
CLOSED http://python.org/sf/1770008 reopened by tiran
Remove cStringIO usage (2007-08-08)
CLOSED http://python.org/sf/1770008 opened by Christian Heimes
ctypes: c_char now uses bytes and not str (unicode) (2007-08-08)
CLOSED http://python.org/sf/1770355 opened by STINNER Victor
Misc improvements for the io module (2007-08-10)
http://python.org/sf/1771364 opened by Christian Heimes
Patches Closed
__
test_asyncore fix (2007-08-05)
http://python.org/sf/1767834 closed by gvanrossum
test_csv struni fixes + unicode support in _csv (2007-08-03)
http://python.org/sf/1767398 closed by gvanrossum
urllib2-howto - correction (2007-08-02)
http://python.org/sf/1765839 closed by gbrandl
Fix for failing test_scriptpackages in py3k-struni (2007-08-06)
http://python.org/sf/1768976 closed by nnorwitz
Fix for failing test_plistlib in py3k-struni (2007-08-07)
http://python.org/sf/1769016 closed by gvanrossum
struni: test_xml_etree_c (2007-08-07)
http://python.org/sf/1769767 closed by nnorwitz
Remove cStringIO usage (2007-08-08)
http://python.org/sf/1770008 closed by gvanrossum
Remove cStringIO usage (2007-08-08)
http://python.org/sf/1770008 closed by gvanrossum
ctypes: c_char now uses bytes and not str (unicode) (2007-08-08)
http://python.org/sf/1770355 closed by haypo
New / Reopened Bugs
___
SocketServer.DatagramRequestHandler (2007-08-04)
http://python.org/sf/1767511 opened by Alzheimer
Badly formed XML using etree and utf-16 (2007-08-05)
http://python.org/sf/1767933 opened by BugoK
Byte code WITH_CLEANUP missing, MAKE_CLOSURE wrong (2007-08-05)
http://python.org/sf/1768121 opened by L. Peter Deutsch
tutorial (2007-08-06)
CLOSED http://python.org/sf/1768767 opened by Michael R Bax
Python - Operation time out problem (2007-08-06)
http://python.org/sf/1768858 opened by MASK
A paragraph about packages should be updated. (2007-08-07)
CLOSED http://python.org/sf/1769002 opened by Noam Raphael
decimal.Decimal("trash") produces informationless exception (2007-08-08)
http://python.org/sf/1770009 opened by John Machin
platform.mac_ver() returning incorrect patch version (2007-08-08)
http://python.org/sf/1770190 opened by Gus Tabares
Decimal.__int__ overflows for large values (2007-08-08)
http://python.org/sf/1770416 opened by Jason G
words able to decode but unable to encode in GB18030 (2007-08-09)
http://python.org/sf/1770551 opened by Z-flagship
Errors in site.py not reported properly (2007-08-09)
http://python.org/sf/1771260 opened by Adam Olsen
bsddb can't use unicode keys (2007-08-10)
http://python.org/sf/1771381 opened by Erol Aktay
another 'nothing to repeat' (2007-08-10)
CLOSED http://python.org/sf/1771483 opened by viciousdog
minor bug in turtle (2007-08-10)
CLOSED http://python.org/sf/1771558 opened by Jeremy Sanders
Bugs Closed
___
String.capwords() does not capitalize first word (2007-08-03)
http://python.org/sf/1767363 closed by gbrandl
subprocess.Popen.wait fails sporadically with threads (2007-07-16)
http://python.org/sf/1754642 closed by gbrandl
subprocess raising "No Child Process" OSError (2007-07-14)
http://python.org/sf/1753891 closed by gbrandl
tutorial (2007-08-06)
http://python.org/sf/1768767 deleted by mrbax
A paragraph about packages should be updated. (2007-08-07)
http://python.org/sf/1769002 closed by gbrandl
cStringIO no longer accepts array.array objects (2007-06-03)
http://python.org/sf/1730114 closed by gbrandl
another 'nothing to repeat' (2007-08-10)
http://python.org/sf/1771483 deleted by viciousdog
minor bug in turtle (2007-08-10)
http://python.org/sf/1771558 closed by gbrandl
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/option
