Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Greg Ewing
James Y Knight wrote:
> On Aug 8, 2007, at 3:47 PM, Nick Maclaren wrote:
> > Firstly, things like backreferences are an absolute no-no.  They
> > are not regular, and REs with them in cannot be converted to DFAs.
>
> People keep saying things like this as if GNU grep and tcl's regular  
> expression matchers didn't exist.

But do these work by conversion to a DFA?

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [Python-3000] Universal newlines support in Python 3.0

2007-08-10 Thread Stephen J. Turnbull
Guido van Rossum writes:

 > However, the old universal newlines feature also set an attibute named
 > 'newlines' on the file object to a tuple of up to three elements
 > giving the actual line endings that were observed on the file so far
 > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not
 > implemented. I'm tempted to kill it. Does anyone have a use case for
 > this?

I have run into files that intentionally have more than one newline
convention used (mbox and Babyl mail folders, with messages received
from various platforms).  However, most of the time multiple newline
conventions is a sign that the file is either corrupt or isn't text.
If so, then saving the file may corrupt it.  The newlines attribute
could be used to check for this condition.

 > Has anyone even ever used this?

Not I.  When I care about such issues I prefer that the codec raise an
exception at the time of detection.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Nick Maclaren
James Y Knight <[EMAIL PROTECTED]> wrote:
>
> > Firstly, things like backreferences are an absolute no-no.  They
> > are not regular, and REs with them in cannot be converted to DFAs.
> > That could be 'solved' by a parser that kicked out such constructions,
> > but it would get screams from many users.
> 
> People keep saying things like this as if GNU grep and tcl's regular  
> expression matchers didn't exist.
> See http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm for example.

PCRE also has a breadth-first engine, but it does not convert the
NFA to a DFA (its author is a close colleague of mine).  Those
engines won't do the conversion, either, and I am prepared to bet
that I could produce a pattern that would either run very slowly
or expose the semantics differences in most of them.

I did NOT say that there were not, alternative, approaches.  What
I said was correct - you cannot convert such extended expressions
to DFAs.  You can convert them to things that are sort of NFA/DFA
hybrids, which might or might not be a good way to proceed.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode database

2007-08-10 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
>
> Sure. But (again): you don't need to have the mappings at all for
> what you want to achieve. So there is no point in downloading them

Sigh.  No, I don't.  But, if I want to be able to merge anything
back into the main Python source, it is a VERY good idea to use the
existing mechanisms and not invent new ones.

The easiest thing would have been to hack re.py to create a Unicode
table using unicodedata.py directly, and that would indeed be a rather
cleaner solution in the long term.  But it would have meant that there
were now multiple different ways of generating the Unicode data for
_sre.c, and that would have led to inconsistencies.

As I pointed out, there is already a problem where upgrading the data
needs a complete rebuild to get all of the Unicode data back in step;
'make all' in itself does not work.  That is precisely the sort of
problem that is caused by having duplicate update mechanisms.


Now, IF I can work out how the _sre.c engine works enough to put
atomic/possessive quantifiers in, this problem will return.  My
question would be how best to make a suitable proposal that, inter
alia, includes changes that can't be made by the normal building
mechanisms.

And I still don't have a clue about that one.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
>
> Your specification was "For Unicode, whatever people agree!"
>
> I would not call that "Unicode-based".

Can we drop this, please?  I am happy to agree that I was being unclear
(it is a common failing of mine), but I did provide the specification
I coded.  Specifically, and in full, I said:

For Unicode, whatever people agree!  I use the criterion that it
has a defined category that doesn't start with 'C' - which is what
I think that most people will accept.

That is equivalent to the definition you gave.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Universal newlines support in Python 3.0

2007-08-10 Thread Guido van Rossum
Python 3.0 currently has limited universal newlines support: by
default, \r\n is translated into \n for text files, but this can be
controlled by the newline= keyword parameter. For details on how, see
PEP 3116. The PEP prescribes that a lone \r must also be translated,
though this hasn't been implemented yet (any volunteers?).

However, the old universal newlines feature also set an attibute named
'newlines' on the file object to a tuple of up to three elements
giving the actual line endings that were observed on the file so far
(\r, \n, or \r\n). This feature is not in PEP 3116, and it is not
implemented. I'm tempted to kill it. Does anyone have a use case for
this? Has anyone even ever used this?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode database

2007-08-10 Thread Martin v. Löwis
>> Sure. But (again): you don't need to have the mappings at all for
>> what you want to achieve. So there is no point in downloading them
> 
> Sigh.  No, I don't.  But, if I want to be able to merge anything
> back into the main Python source, it is a VERY good idea to use the
> existing mechanisms and not invent new ones.

I think you still don't understand. Why I keep calling "mappings"
is *unrelated* to unicodedata. unicodedata is a different database, and
not related at all to the makefile. It never was.

> As I pointed out, there is already a problem where upgrading the data
> needs a complete rebuild to get all of the Unicode data back in step;
> 'make all' in itself does not work.  That is precisely the sort of
> problem that is caused by having duplicate update mechanisms.

Right. Downloading the necessary files is a completely manual process,
not supported at all by "make all", which is designed to do something
entirely different.

> Now, IF I can work out how the _sre.c engine works enough to put
> atomic/possessive quantifiers in, this problem will return.  My
> question would be how best to make a suitable proposal that, inter
> alia, includes changes that can't be made by the normal building
> mechanisms.
> 
> And I still don't have a clue about that one.

You lost me somewhere. What are "changes that can't be made by the
normal building process", and what is "this problem" that will
return?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Greg Ewing
Nick Maclaren wrote:
> You can convert them to things that are sort of NFA/DFA
> hybrids,

If you could express it as an NFA, then you could
(in principle) convert it to a DFA. So whatever it's
using can't be an NFA either.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Weekly Python Patch/Bug Summary

2007-08-10 Thread Kurt B. Kaiser
Patch / Bug Summary
___

Patches :  404 open ( +0) /  3855 closed ( +8) /  4259 total ( +8)
Bugs: 1065 open ( +6) /  6790 closed ( +6) /  7855 total (+12)
RFE :  263 open ( +0) /   295 closed ( +0) /   558 total ( +0)

New / Reopened Patches
__

MSVC++8 x86 tkinter build patch for trunk  (2007-08-05)
   http://python.org/sf/1767787  opened by  brotchie

test_asyncore fix  (2007-08-05)
CLOSED http://python.org/sf/1767834  opened by  Hasan Diwan

Fix for failing test_scriptpackages in py3k-struni  (2007-08-07)
CLOSED http://python.org/sf/1768976  opened by  Antti Rasinen

Fix for failing test_plistlib in py3k-struni  (2007-08-07)
CLOSED http://python.org/sf/1769016  opened by  brotchie

struni: test_xml_etree_c  (2007-08-08)
CLOSED http://python.org/sf/1769767  opened by  Joe Gregorio

Remove cStringIO usage  (2007-08-08)
CLOSED http://python.org/sf/1770008  reopened by  tiran

Remove cStringIO usage  (2007-08-08)
CLOSED http://python.org/sf/1770008  opened by  Christian Heimes

ctypes: c_char now uses bytes and not str (unicode)  (2007-08-08)
CLOSED http://python.org/sf/1770355  opened by  STINNER Victor

Misc improvements for the io module  (2007-08-10)
   http://python.org/sf/1771364  opened by  Christian Heimes

Patches Closed
__

test_asyncore fix  (2007-08-05)
   http://python.org/sf/1767834  closed by  gvanrossum

test_csv struni fixes + unicode support in _csv  (2007-08-03)
   http://python.org/sf/1767398  closed by  gvanrossum

urllib2-howto - correction  (2007-08-02)
   http://python.org/sf/1765839  closed by  gbrandl

Fix for failing test_scriptpackages in py3k-struni  (2007-08-06)
   http://python.org/sf/1768976  closed by  nnorwitz

Fix for failing test_plistlib in py3k-struni  (2007-08-07)
   http://python.org/sf/1769016  closed by  gvanrossum

struni: test_xml_etree_c  (2007-08-07)
   http://python.org/sf/1769767  closed by  nnorwitz

Remove cStringIO usage  (2007-08-08)
   http://python.org/sf/1770008  closed by  gvanrossum

Remove cStringIO usage  (2007-08-08)
   http://python.org/sf/1770008  closed by  gvanrossum

ctypes: c_char now uses bytes and not str (unicode)  (2007-08-08)
   http://python.org/sf/1770355  closed by  haypo

New / Reopened Bugs
___

SocketServer.DatagramRequestHandler  (2007-08-04)
   http://python.org/sf/1767511  opened by  Alzheimer

Badly formed XML using etree and utf-16  (2007-08-05)
   http://python.org/sf/1767933  opened by  BugoK

Byte code WITH_CLEANUP missing, MAKE_CLOSURE wrong  (2007-08-05)
   http://python.org/sf/1768121  opened by  L. Peter Deutsch

tutorial  (2007-08-06)
CLOSED http://python.org/sf/1768767  opened by  Michael R Bax

Python - Operation time out problem   (2007-08-06)
   http://python.org/sf/1768858  opened by  MASK

A paragraph about packages should be updated.  (2007-08-07)
CLOSED http://python.org/sf/1769002  opened by  Noam Raphael

decimal.Decimal("trash") produces informationless exception  (2007-08-08)
   http://python.org/sf/1770009  opened by  John Machin

platform.mac_ver() returning incorrect patch version  (2007-08-08)
   http://python.org/sf/1770190  opened by  Gus Tabares

Decimal.__int__ overflows for large values  (2007-08-08)
   http://python.org/sf/1770416  opened by  Jason G

words able to decode but unable to encode in GB18030  (2007-08-09)
   http://python.org/sf/1770551  opened by  Z-flagship

Errors in site.py not reported properly  (2007-08-09)
   http://python.org/sf/1771260  opened by  Adam Olsen

bsddb can't use unicode keys  (2007-08-10)
   http://python.org/sf/1771381  opened by  Erol Aktay

another 'nothing to repeat'  (2007-08-10)
CLOSED http://python.org/sf/1771483  opened by  viciousdog

minor bug in turtle  (2007-08-10)
CLOSED http://python.org/sf/1771558  opened by  Jeremy Sanders

Bugs Closed
___

String.capwords() does not capitalize first word  (2007-08-03)
   http://python.org/sf/1767363  closed by  gbrandl

subprocess.Popen.wait fails sporadically with threads  (2007-07-16)
   http://python.org/sf/1754642  closed by  gbrandl

subprocess raising "No Child Process" OSError  (2007-07-14)
   http://python.org/sf/1753891  closed by  gbrandl

tutorial  (2007-08-06)
   http://python.org/sf/1768767  deleted by  mrbax

A paragraph about packages should be updated.  (2007-08-07)
   http://python.org/sf/1769002  closed by  gbrandl

cStringIO no longer accepts array.array objects  (2007-06-03)
   http://python.org/sf/1730114  closed by  gbrandl

another 'nothing to repeat'  (2007-08-10)
   http://python.org/sf/1771483  deleted by  viciousdog

minor bug in turtle  (2007-08-10)
   http://python.org/sf/1771558  closed by  gbrandl

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/option