Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-07 Thread Antoine Pitrou
M.-A. Lemburg mal at egenix.com writes:
 
 IMHO, it would be a lot better to add full Unicode support
 for line breaks to the io layer. Given that the code for the
 complicated handling of the CRLF combination is already there,
 it's not difficult to add support for the remaing line break
 characters.

I'm not against anything in principle here, but I'd just like to point out two
things:

1. Changing line break semantics would break compatibility with the current
behaviour, and it would also diverge from what the `newline` parameter
specifies; this may be annoying if, for example, the TextIOWrapper class is used
to parse some network protocols with a rigorous line ending definition

2. It would be useful to have some input by the original designers of the IO
library (the PEP lists Guido, Daniel Stutzbach and Mike Verdone, but I suppose
other people were involved)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread Martin v. Löwis
 This is simply false AFAICS.  There was little participation on this
 particular issue during PEP 374 that I can recall.  Now that it is
 clearly an issue after all, it's still early in the PEP 385 process.
 Martin has already picked up the ball on EOL support, and has carried
 informal design pretty much to the goal line already ... all that's
 left is the detailed design and the implementation, and there are
 several people involved who will help develop the patch, all very
 capable. 

I'm not so optimistic. To me, it looks like that either Dirkjan or Mark
will implement a hg hook, or else it won't happen (for me, I certainly
know that I will not write Mercurial hooks anytime soon).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread Stephen J. Turnbull
Martin v. Löwis writes:
   This is simply false AFAICS.  There was little participation on this
   particular issue during PEP 374 that I can recall.  Now that it is
   clearly an issue after all, it's still early in the PEP 385 process.
   Martin has already picked up the ball on EOL support, and has carried
   informal design pretty much to the goal line already ... all that's
   left is the detailed design and the implementation, and there are
   several people involved who will help develop the patch, all very
   capable. 
  
  I'm not so optimistic. To me, it looks like that either Dirkjan or Mark
  will implement a hg hook, or else it won't happen (for me, I certainly
  know that I will not write Mercurial hooks anytime soon).

Ouch.  Still, I think the informal discussion so far is pretty close
to a usable solution at that level.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread M.-A. Lemburg
Neil Hodgson wrote:
 Glenn Linderman:
 
 and perhaps other things (and
 are there new Unicode control characters that could be used for line
 endings?),
 
Unicode includes Line Separator U+2028 and Paragraph Separator
 U+2029 but they are rarely supported and very rarely used. They are a
 pain to work with since they are 3 byte sequences in UTF-8. Visual
 Studio does support them.
 
Python does not currently support these line separators such as in
 this example which only reads 2 lines rather than 3:
 
 with open(x.txt, wb) as f:
   f.write(a\nb\u2029c\n.encode('utf-8'))
 with open(x.txt, r) as f:
   n = 1
   for l in f.readlines():
   print(n, repr(l))
   n += 1

Please file a bug report for this. f.readlines() (or rather
the io layer) should be using Py_UNICODE_ISLINEBREAK(ch)
for detecting line break characters.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 06 2009)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread Antoine Pitrou
M.-A. Lemburg mal at egenix.com writes:
 
 Please file a bug report for this. f.readlines() (or rather
 the io layer) should be using Py_UNICODE_ISLINEBREAK(ch)
 for detecting line break characters.

Actually, no. It has been designed from the start to only recognize the
standard line break representations found in common formats/protocols (CR, LF
and CR+LF).
People wanting to split on arbitrary unicode line breaks should use
str.splitlines().

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread Nick Coghlan
Antoine Pitrou wrote:
 M.-A. Lemburg mal at egenix.com writes:
 Please file a bug report for this. f.readlines() (or rather
 the io layer) should be using Py_UNICODE_ISLINEBREAK(ch)
 for detecting line break characters.
 
 Actually, no. It has been designed from the start to only recognize the
 standard line break representations found in common formats/protocols (CR, 
 LF
 and CR+LF).
 People wanting to split on arbitrary unicode line breaks should use
 str.splitlines().

The fairly long-standing RFE relating to an arbitrarily selectable
newline separator seems relevant here:
http://bugs.python.org/issue1152248

As with the discussion there, the problem with using str.splitlines is
that it prevents pipelining approaches that avoid reading a whole file
into memory.

While removing the validity check from readlines() completely is
questionable (the readrecords() approach mentioned in the tracker issue
would still be better there), loosening the validity check to be based
on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it
a feature requests rather than a bug though).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread M.-A. Lemburg
Nick Coghlan wrote:
 Antoine Pitrou wrote:
 M.-A. Lemburg mal at egenix.com writes:
 Please file a bug report for this. f.readlines() (or rather
 the io layer) should be using Py_UNICODE_ISLINEBREAK(ch)
 for detecting line break characters.

 Actually, no. It has been designed from the start to only recognize the
 standard line break representations found in common formats/protocols (CR, 
 LF
 and CR+LF).
 People wanting to split on arbitrary unicode line breaks should use
 str.splitlines().
 
 The fairly long-standing RFE relating to an arbitrarily selectable
 newline separator seems relevant here:
 http://bugs.python.org/issue1152248
 
 As with the discussion there, the problem with using str.splitlines is
 that it prevents pipelining approaches that avoid reading a whole file
 into memory.
 
 While removing the validity check from readlines() completely is
 questionable (the readrecords() approach mentioned in the tracker issue
 would still be better there), loosening the validity check to be based
 on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it
 a feature requests rather than a bug though).

I've had a look at the io implementation: this appears to be
based on the universal newline support idea which addresses
only a fixed set of new line character combinations and is
not as straight forward to extend to support all Unicode
line break characters as I thought.

What I don't understand is why the io layer tries to reinvent
the wheel here instead of just using the codec's .readline()
method - which *does* use .splitlines() and has full support
for all Unicode line break characters (including the CRLF
combination).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 06 2009)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread M.-A. Lemburg
M.-A. Lemburg wrote:
 Nick Coghlan wrote:
 Antoine Pitrou wrote:
 M.-A. Lemburg mal at egenix.com writes:
 Please file a bug report for this. f.readlines() (or rather
 the io layer) should be using Py_UNICODE_ISLINEBREAK(ch)
 for detecting line break characters.

 Actually, no. It has been designed from the start to only recognize the
 standard line break representations found in common formats/protocols 
 (CR, LF
 and CR+LF).
 People wanting to split on arbitrary unicode line breaks should use
 str.splitlines().

 The fairly long-standing RFE relating to an arbitrarily selectable
 newline separator seems relevant here:
 http://bugs.python.org/issue1152248

 As with the discussion there, the problem with using str.splitlines is
 that it prevents pipelining approaches that avoid reading a whole file
 into memory.

 While removing the validity check from readlines() completely is
 questionable (the readrecords() approach mentioned in the tracker issue
 would still be better there), loosening the validity check to be based
 on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it
 a feature requests rather than a bug though).
 
 I've had a look at the io implementation: this appears to be
 based on the universal newline support idea which addresses
 only a fixed set of new line character combinations and is
 not as straight forward to extend to support all Unicode
 line break characters as I thought.
 
 What I don't understand is why the io layer tries to reinvent
 the wheel here instead of just using the codec's .readline()
 method - which *does* use .splitlines() and has full support
 for all Unicode line break characters (including the CRLF
 combination).

... and because of this, the feature is already available if
you use codecs.open() instead of the built-in open():

import codecs

with codecs.open(x.txt, w, encoding='utf-8') as f:
  f.write(a\nb\u2029c\n)

with codecs.open(x.txt, r, encoding='utf-8') as f:
  n = 1
  for l in f.readlines():
 print(n, repr(l))
 n += 1

This prints:

1 'a\n'
2 'b\u2029'
3 'c\n'

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 06 2009)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread Antoine Pitrou
M.-A. Lemburg mal at egenix.com writes:
 
 What I don't understand is why the io layer tries to reinvent
 the wheel here instead of just using the codec's .readline()
 method - which *does* use .splitlines() and has full support
 for all Unicode line break characters (including the CRLF
 combination).

As for the original Python implementation, the goal was probably to start from a
clean sheet. Besides, the new API has seek() and tell() as well. But I'm not
really qualified to say more -- I didn't participate in its design.

As for the C implementation, it had to be written from scratch anyway --
codecs.open() is pure Python and too slow. Deferring to str.splitlines() would
still have been possible but a bit wasteful since in C you can use buffers
directly.

(and, besides, when writing the C implementation we were concerned with exact
compatibility with the Python version -- including line break semantics)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread M.-A. Lemburg
Antoine Pitrou wrote:
 M.-A. Lemburg mal at egenix.com writes:

 What I don't understand is why the io layer tries to reinvent
 the wheel here instead of just using the codec's .readline()
 method - which *does* use .splitlines() and has full support
 for all Unicode line break characters (including the CRLF
 combination).
 
 As for the original Python implementation, the goal was probably to start 
 from a
 clean sheet. Besides, the new API has seek() and tell() as well. But I'm not
 really qualified to say more -- I didn't participate in its design.
 
 As for the C implementation, it had to be written from scratch anyway --
 codecs.open() is pure Python and too slow. Deferring to str.splitlines() would
 still have been possible but a bit wasteful since in C you can use buffers
 directly.

Sure, but the code for line splitting is not really all that
complicated (see PyUnicode_Splitlines()), so could easily
be adapted to work on buffers directly.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 06 2009)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread Antoine Pitrou
M.-A. Lemburg mal at egenix.com writes:
 
 Sure, but the code for line splitting is not really all that
 complicated (see PyUnicode_Splitlines()), so could easily
 be adapted to work on buffers directly.

Certainly indeed. It all comes down to compatibility with the original
implementation.
(PEP 3116 itself is vague on the subject, but it didn't come to me to question
the validity of the Python implementation, I admit)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread Neil Hodgson
M.-A. Lemburg:

 ... and because of this, the feature is already available if
 you use codecs.open() instead of the built-in open():

   So should I not add an issue for the basic open because codecs.open
should be used for this case?

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Dj Gilcrease
On Tue, Aug 4, 2009 at 5:43 PM, Mark Hammondmhamm...@skippinet.com.au wrote:
 I'm more than willing to help on this; I haven't resurrected my stale patch
 because I find win32text only 1/2 a solution that doesn't work in practice.
  Therefore that patch is as stale for me as it is anyone. However, if a plan
 is put in place which offers a full solution and the hg developers are
 committed to it, I promise I'll put my hand up to help with implementation
 in a fairly timely manner...


Not sure what your patch was as I cannot find it, but I did up a quick
change to win32text that uses a versioned .win32text file to maintain
encoders, decoders and an ignore list

http://media.digitalxero.net/win32text.py
http://media.digitalxero.net/.win32text

and add to your hgrc file
[hooks]
precommit.eol_encode = python:hgext.win32text.versioned_encode


it needs to be precommit since it needs to run before the change set
has been created so it can modify the data. Honestly I think this
solution is kind of a hack, a much better solution would be to modify
the encode/decode hooks to accept a filename so you can at least do
ignore pattern matching, but that still ignores versioned encodes /
decodes
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Mark Hammond

On 5/08/2009 3:56 PM, Ben Finney wrote:

Mark Hammondmhamm...@skippinet.com.au  writes:


Let's say I make a branch of the hg repo, myself and a few others work
on it committing as we go, then attempt to merge back upstream. Let's
say some of the early commits on that clone introduced bad line
endings. I'm guessing I would be forced to make a number of
whitespace-only checkins to normalize the line-endings before it could
merge - and these checkins would then be in the history forever.


What is wrong with that? I mean, if that is the actual sequence of
events, why should the history not reflect that?


The problem is the sequence of events happened in the first place.  An 
extra burden is placed on the developer that will quickly get tiresome. 
 I wouldn't personally be happy if that workflow became the norm.



Either way, the situation doesn't seem good.


I see this assertion made often, so I'm not saying you are necessarily
wrong to make it. I just don't see a justification for making it (and,
without justification, I would say it *is* wrong to make it).


*shrug* - in my opinion, the fact the developer is faced with that 
hurdle in their workflow is justification enough to say that developer's 
situation doesn't seem good and should have been prevented from 
happening by the tool much earlier than proposed.


Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Ben Finney
Mark Hammond skippy.hamm...@gmail.com writes:

 On 5/08/2009 3:56 PM, Ben Finney wrote:
  Mark Hammondmhamm...@skippinet.com.au  writes:
 
  Let's say I make a branch of the hg repo, myself and a few others work
  on it committing as we go, then attempt to merge back upstream. Let's
  say some of the early commits on that clone introduced bad line
  endings.
[…]
 
 The problem is the sequence of events happened in the first place. An
 extra burden is placed on the developer that will quickly get
 tiresome. I wouldn't personally be happy if that workflow became the
 norm.

Ah, okay. In that case, the ultimate “problem” is that OS vendors
entrenched their incompatible line-ending conventions instead of
choosing a single standard. Any line-ending burden borne by developers
is a result of that.

If things were different, they'd be different. However, we live with the
legacy of that stupid set of decisions and have no real option to
resolve it permanently short of deprecating entire vistas of tools (or
even entire operating systems).

 *shrug* - in my opinion, the fact the developer is faced with that
 hurdle in their workflow is justification enough to say that
 developer's situation doesn't seem good and should have been
 prevented from happening by the tool much earlier than proposed.

AIUI, this is a combination of several things:

* different OSen have incompatible, entrenched conventions for
  line-ending that is embodied in the default output of their text
  processing tools.

* these differences matter in many concrete ways to the tools that
  process text, so the differences need to be preserved, or explicitly
  transformed.

* distributed VCS has the job of preserving data as present on the
  filesystem, including whatever line-ending convention is present in a
  file.

* distributed VCS has the job of managing data exchange between users,
  presenting differences in a way that allows easy inspection and
  merging.

* humans want to pretend that these incompatibilities don't exist, and
  want “end of line” to be an automatically-handled abstraction.

It's not a simple thing to solve, and many clever people have tried over
the decades. The fact that a centralised VCS can put the problem aside
by requiring an explicit, single decision in the repository, is no help
when addressing the constraints of a distributed VCS.

At some point, the decision about how to handle line endings in
cross-platform data needs to be punted to a human for a
context-sensitive assessment, since (as can be seen) the above list of
requirements is internally inconsistent and can't be relegated to a
one-size-fits-all algorithm.

-- 
 \   “All progress has resulted from people who took unpopular |
  `\  positions.” —Adlai Stevenson |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Martin v. Löwis
I haven't commented on this issue before because I can't really be
 helpful. I just don't understand why hg is being considered before
 it's Windows support is roughly equivalent to svn and cvs.

Is it really that you don't *understand*? It's fairly easy: there was
a PEP which offered a number of options, and there was BDFL
pronouncement. This (BDFL pronouncement) is how Python has always
worked, and, as a principle, it is a good and useful process.

Now, the specific outcome of the process means that more work needs to
be done. So we have a *second* PEP, and we have a lack of volunteers
that help implementing it. The second PEP hasn't been approved yet
(as it isn't complete, yet), so migration to hg is stalled.
The primary volunteer (Dirkjan) has indicated that he can't help with
that specific issue, so other volunteers need to step forward, or we
cannot move to hg.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Mark Hammond

On 5/08/2009 4:50 PM, Ben Finney wrote:

Mark Hammondskippy.hamm...@gmail.com  writes:


On 5/08/2009 3:56 PM, Ben Finney wrote:

Mark Hammondmhamm...@skippinet.com.au   writes:


Let's say I make a branch of the hg repo, myself and a few others work
on it committing as we go, then attempt to merge back upstream. Let's
say some of the early commits on that clone introduced bad line
endings.

[…]


The problem is the sequence of events happened in the first place. An
extra burden is placed on the developer that will quickly get
tiresome. I wouldn't personally be happy if that workflow became the
norm.


Ah, okay. In that case, the ultimate “problem” is that OS vendors
entrenched their incompatible line-ending conventions instead of
choosing a single standard. Any line-ending burden borne by developers
is a result of that.


Yeah - this happened around 1964 if wikipedia is any guide.



If things were different, they'd be different. However, we live with the
legacy of that stupid set of decisions and have no real option to
resolve it permanently short of deprecating entire vistas of tools (or
even entire operating systems).


Agreed - so let's not solve it permanently.

...

It's not a simple thing to solve, and many clever people have tried over
the decades.


As already mentioned in this thread, a capability similar to what svn or 
cvs offers would be sufficient.  While a DVCS does offer unique 
challenges, it seems to me that doing something at commit time without 
requiring magic hooks be configured would go a long way to addressing 
the problem.  Magic hooks on the official repo would then be considered 
the final fallback defense, but should rarely be invoked.



At some point, the decision about how to handle line endings in
cross-platform data needs to be punted to a human for a
context-sensitive assessment, since (as can be seen) the above list of
requirements is internally inconsistent and can't be relegated to a
one-size-fits-all algorithm.


I'm not sure what point you are trying to make, but I believe it *is* 
possible for a solution to be found here which will keep Windows users 
happy.  I'm guessing you haven't had much practical experience with this 
problem, so probably don't see this is clearly as Windows users do.


Cheers,

Mark.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Martin v. Löwis
 If things were different, they'd be different. However, we live with the
 legacy of that stupid set of decisions and have no real option to
 resolve it permanently short of deprecating entire vistas of tools (or
 even entire operating systems).

I think you missed the solution to the problem that Mark proposed
(IIUC): a local commit to a hg repository should already get the line
endings right, by automatically converting the file-to-be-committed
into the repository line endings. This is what CVS has supported for
more than ten years, and what svn supports for close-to ten years.

 * distributed VCS has the job of preserving data as present on the
   filesystem, including whatever line-ending convention is present in a
   file.

No, that's not true. Distributed VCS has the job to help the developer.
That may mean to preserve the file as-is, or it may mean to convert the
file on checkout and checkin. Which of these would be needed depends
on the file, of course.

 It's not a simple thing to solve, and many clever people have tried over
 the decades. The fact that a centralised VCS can put the problem aside
 by requiring an explicit, single decision in the repository, is no help
 when addressing the constraints of a distributed VCS.

Why do you say that? It's not true. The approach that has worked for the
central repository can work just as well for a distributed repository.

 At some point, the decision about how to handle line endings in
 cross-platform data needs to be punted to a human for a
 context-sensitive assessment, since (as can be seen) the above list of
 requirements is internally inconsistent and can't be relegated to a
 one-size-fits-all algorithm.

Right - there needs to be a way for the user to specify what line
endings to use. That's why both CVS and subversion have supported such
configuration, on a per file basis, for many years. I can't see why
hg couldn't, in principle, support the same configuration. Being a DVCS,
such configuration would have to be part of the clone, of course, being
versioned, and all that. I think hg is well capable of keeping versioned
configuration information in the clone, as demonstrated by the .hgignore
files.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Mark Hammond

On 5/08/2009 5:35 PM, Martin v. Löwis wrote:


Now, the specific outcome of the process means that more work needs to
be done. So we have a *second* PEP, and we have a lack of volunteers
that help implementing it. The second PEP hasn't been approved yet
(as it isn't complete, yet), so migration to hg is stalled.
The primary volunteer (Dirkjan) has indicated that he can't help with
that specific issue, so other volunteers need to step forward, or we
cannot move to hg.


I don't recall Dirkjan saying he can't help with that issue - was it a 
lack of time, or a lack of understanding the problem/lack of a Windows 
environment?


The problem I see is a lack of agreement about exactly what the solution 
entails.  I believe there is general agreement win32text needs to be 
enhanced to support versioned 'rules'.  But even with that, the only 
option I see is a truly cross-platform extension to implement these 
rules which every Python committer, regardless of operating-system, is 
expected to use - but that doesn't seem the consensus.


As mentioned, I'm willing to lend manpower for this once there is 
agreement on something workable...


Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Ben Finney
Mark Hammond skippy.hamm...@gmail.com writes:

 As already mentioned in this thread, a capability similar to what svn
 or cvs offers would be sufficient.

That capability presented by centralised VCSen is entirely dependent on
the fact that they *are* centralised. Using a distributed VCS means the
same capability doesn't apply.

 While a DVCS does offer unique challenges, it seems to me that doing
 something at commit time without requiring magic hooks be configured
 would go a long way to addressing the problem.

The hand-waving “doing something” is exactly what needs to be solved.

 Magic hooks on the official repo would then be considered the final
 fallback defense, but should rarely be invoked.

Right, so that's “capability similar to centralised VCS” out of
consideration; I'm glad we agree in the end.

 I'm not sure what point you are trying to make

That I disagree with your position. You seem to think that the problem
has an obvious solution, which is not true; and that choice of a
distributed VCS should be delayed until the problem is solved, which I
don't agree with.

 but I believe it *is* possible for a solution to be found here which
 will keep Windows users happy. I'm guessing you haven't had much
 practical experience with this problem, so probably don't see this is
 clearly as Windows users do.

Your guess is incorrect; I've been bitten time and again by this problem
in many different contexts, enough to know that it's not obvious what
the “right” solution is.

-- 
 \ “Not to perambulate the corridors in the hours of repose in the |
  `\  boots of ascension.” —ski hotel, Austria |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Mark Hammond

On 5/08/2009 6:00 PM, Ben Finney wrote:

Mark Hammondskippy.hamm...@gmail.com  writes:


As already mentioned in this thread, a capability similar to what svn
or cvs offers would be sufficient.


That capability presented by centralised VCSen is entirely dependent on
the fact that they *are* centralised. Using a distributed VCS means the
same capability doesn't apply.


Why do you say that (without justification I might add wink) about 
this issue?



While a DVCS does offer unique challenges, it seems to me that doing
something at commit time without requiring magic hooks be configured
would go a long way to addressing the problem.


The hand-waving “doing something” is exactly what needs to be solved.


I think you have been mis-reading this thread.  It is quite clear what 
'doing something' means in this context - it means implement the 
human-defined rules for the line-ending policy for the repository.



Magic hooks on the official repo would then be considered the final
fallback defense, but should rarely be invoked.


Right, so that's “capability similar to centralised VCS” out of
consideration; I'm glad we agree in the end.


I'm afraid you have lost me again, as clearly we don't agree on what 
useful things can be done at local commit time.



I'm not sure what point you are trying to make


That I disagree with your position. You seem to think that the problem
has an obvious solution, which is not true; and that choice of a
distributed VCS should be delayed until the problem is solved, which I
don't agree with.


Fair enough - but it seems clear to enough of us that we can make 
progress and meet the requirements of the people actually impacted.





but I believe it *is* possible for a solution to be found here which
will keep Windows users happy. I'm guessing you haven't had much
practical experience with this problem, so probably don't see this is
clearly as Windows users do.


Your guess is incorrect; I've been bitten time and again by this problem
in many different contexts, enough to know that it's not obvious what
the “right” solution is.


Sorry about that - but that was the only way I could explain you not 
seeing how such a solution can work.


Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Martin v. Löwis
 Now, the specific outcome of the process means that more work needs to
 be done. So we have a *second* PEP, and we have a lack of volunteers
 that help implementing it. The second PEP hasn't been approved yet
 (as it isn't complete, yet), so migration to hg is stalled.
 The primary volunteer (Dirkjan) has indicated that he can't help with
 that specific issue, so other volunteers need to step forward, or we
 cannot move to hg.
 
 I don't recall Dirkjan saying he can't help with that issue - was it a
 lack of time, or a lack of understanding the problem/lack of a Windows
 environment?

I think he said (at some point) that he is not a Windows user, and thus
can't really help. Of course, he also indicated that, as a Mercurial
contributor, he is willing to help as much as he can.

 The problem I see is a lack of agreement about exactly what the solution
 entails.  I believe there is general agreement win32text needs to be
 enhanced to support versioned 'rules'.  But even with that, the only
 option I see is a truly cross-platform extension to implement these
 rules which every Python committer, regardless of operating-system, is
 expected to use - but that doesn't seem the consensus.
 
 As mentioned, I'm willing to lend manpower for this once there is
 agreement on something workable...

I think it needs to work the other way 'round. Somebody (perhaps you)
needs to propose a hook and configuration settings, and propose that
this hook is used on every system, and that refusal to use these hooks
could lead to changes not being integratable (is that a word?).

There can't be consensus to use a solution that doesn't exist.

My personal favorite outcome would be this:
- most files have svn's native eol style; they get stored in LF
  in the repository; the hook will convert them on Windows, and check
  on Unix.
- some files have windows eol style; they get stored in CRLF.
  The hook will not convert, but only check.
- not sure whether some files need to be declared as unix eol style.
- some files are binary; they get stored as-is - the hook will
  do nothing.

With such a setup, using the hook would be truly optional on Unix,
as it only ever checks and never converts. So if you manage to mess
up, and don't have the hook installed on Unix, you lose when trying
to push. That will teach you to be more careful in the future, or
to install the hook (which hopefully becomes built into Mercurial at
some point).

Whether it is actually possible to implement all that, I don't know.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Martin v. Löwis
 As already mentioned in this thread, a capability similar to what svn
 or cvs offers would be sufficient.
 
 That capability presented by centralised VCSen is entirely dependent on
 the fact that they *are* centralised. Using a distributed VCS means the
 same capability doesn't apply.

Why do you say that? People have demonstrated the contrary already.

 I'm not sure what point you are trying to make
 
 That I disagree with your position. You seem to think that the problem
 has an obvious solution, which is not true; and that choice of a
 distributed VCS should be delayed until the problem is solved, which I
 don't agree with.

But is *has* an obvious solution. See the implementation from Dj
Gilcrease, or the spec that I just posted.

 Your guess is incorrect; I've been bitten time and again by this problem
 in many different contexts, enough to know that it's not obvious what
 the “right” solution is.

The configuration options of svn have served us well enough.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Neil Hodgson
Martin v. Löwis:

 Is it really that you don't *understand*? It's fairly easy: there was
 a PEP ...

   The PEP process is straightforward. However, a PEP may produce an
outcome that proves after more experience to be wrong. ISTM a
prerequisite to choosing a DVCS is that it should support the full
range of development platforms and thus the PEP was accepted
prematurely. At some point the PEP should be reexamined and, if
necessary, rescinded. What I don't understand is why the plan is still
to move to hg despite, after several months, there not being a known
good way to include Windows eol support.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Dirkjan Ochtman
On Wed, Aug 5, 2009 at 01:43, Mark Hammondmhamm...@skippinet.com.au wrote:
 Thanks Nick; I didn't want to be the only one saying that.  There is a fine
 line between asserting reasonable requirements for Windows users and being
 obstructionist and unhelpful, and I'm trying to stay on the former side :)

I'm not trying to be obstructionist and unhelpful (I hope that should
be obvious). On the other hand, I'm working from the point of view of
hg, which has two assumptions:

- we're a distributed system, there's fairly little we can assume about clients
- we exchange checksummed byte streams (even if we have some tools
that assume those streams are code)
- because of the previous point, there's one native (and therefore
better, in a sense) serialization of what you consider structured
data

The first point means, for example, there will always be some clients
who don't have win32text enabled, no matter what, so you can't rely on
it, which is why I want to make the server hooks the primary line of
defense, and view the client-side tools as helper tools (to make it
easy not to trigger the server-side hooks). That doesn't mean I think
Windows users are second-rate, or anything like that!

 I'm not that happy with the server being the primary line of defense. Let's
 say I make a branch of the hg repo, myself and a few others work on it
 committing as we go, then attempt to merge back upstream.  Let's say some of
 the early commits on that clone introduced bad line endings.  I'm guessing
 I would be forced to make a number of whitespace-only checkins to normalize
 the line-endings before it could merge - and these checkins would then be in
 the history forever.  Or I could attempt to recreate the clone by somehow
 replaying the commits with line endings corrected.  Either way, the
 situation doesn't seem good.

I don't think either is bad. In the first case, you have one or maybe
two extra changesets. As we like to advocate small changesets that fix
one thing, a changeset fixing up whitespace is par for the course. ;)
The other solution would be to employ mq, for example, to fix up the
commits, which mq excels at (although admittedly it has a learning
curve).

 I agree.  It isn't fair to make this windows users problem.  It would be
 like me proposing the repo get imported with \r\n line endings, enforce that
 with server side hooks, and let non-Windows users worry about the
 ramifications of that - somehow I doubt that would fly - so neither should
 it fly for Windows users...

 I'm more than willing to help on this; I haven't resurrected my stale patch
 because I find win32text only 1/2 a solution that doesn't work in practice.
  Therefore that patch is as stale for me as it is anyone. However, if a plan
 is put in place which offers a full solution and the hg developers are
 committed to it, I promise I'll put my hand up to help with implementation
 in a fairly timely manner...

Well, I'd be happy to help convince the hg crew to accept whatever we
come up with, but I'm not sure I'm the best person to come up with it.
It sounds like a versioned .hgeols would help a bunch of issues, but I
have the feeling you know that better than me, so I'm hoping you can
come up with a concrete proposal on what should change in win32text to
fix all the problems you see.

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Martin v. Löwis
The PEP process is straightforward. However, a PEP may produce an
 outcome that proves after more experience to be wrong. ISTM a
 prerequisite to choosing a DVCS is that it should support the full
 range of development platforms and thus the PEP was accepted
 prematurely.

To be as blunt as possible: the PEP was accepted because Guido
really, Really, REALLY wanted to switch to Mercurial. So you would
have to convince Guido to revert his decision. You may not like
the decision (I did not like using a DVCS in the first place), but
following such decisions has served us well, and will serve us well
this time.

 At some point the PEP should be reexamined and, if
 necessary, rescinded. What I don't understand is why the plan is still
 to move to hg despite, after several months, there not being a known
 good way to include Windows eol support.

You don't understand why it takes many months? That's also easy: because
there is a single volunteer, and because there is a lot of work. I think
it took me a year to migrate to subversion back then, and I wouldn't be
surprised if the Mercurial migration takes even longer.

Or don't you understand why that single unresolved item didn't manage
to revert the decision? Well, there are many unresolved items in
the Mercurial conversion, some much more stressful than the eol issue
(e.g. the branching discussion). None of them is unsolvable (AFAICT);
you can either contribute to the solution, and sit back and wait for
solutions to emerge. Then you can vote on PEP 385 up or down still.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Martin v. Löwis
 - we're a distributed system, there's fairly little we can assume about 
 clients

Not as Mercurial, no. As Python, we can certainly expect that all of our
contributors have read the developer FAQ, and set up their systems
accordingly. If all else fails, we can revoke commit access (or is
it push access?) if some committer doesn't get the configuration
right. We would, of course, prefer if it was very easy to get the
configuration right, so that problems don't occur in the first place.

 The first point means, for example, there will always be some clients
 who don't have win32text enabled, no matter what, so you can't rely on
 it, which is why I want to make the server hooks the primary line of
 defense

I think it's a terminology issue only: don't say primary, say last.

Can we agree that the last line of defense will be the server hooks,
and the primary line of defense will be the client commits? primary
would mean that this is were most errors are detected and fixed; Mark
would really object to a flow where most errors are detected only
at the server.

 That doesn't mean I think
 Windows users are second-rate, or anything like that!

If the server hooks were the primary line of defense, it would
effectively make Windows users second-rate: they will have to redo all
their changes over-and-over again, whereas the Unix users can push the
changes without any obstacles (just because they are less likely to make
mistakes).

If the client machines were the primary line of defense, Windows users
were treated equally: they would make as few mistakes as Unix users,
because the hooks do what they want correctly.

 I don't think either is bad. In the first case, you have one or maybe
 two extra changesets. As we like to advocate small changesets that fix
 one thing, a changeset fixing up whitespace is par for the course. ;)

Whitespace-only changes hurt the annotate feature, so we dislike them
very much in Python.

 Well, I'd be happy to help convince the hg crew to accept whatever we
 come up with, but I'm not sure I'm the best person to come up with it.

That is all very well. See my other message (asking for volunteers)
as well. If you have more work you would prefer to delegate, please let
us know.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Mark Hammond

On 5/08/2009 6:25 PM, Dirkjan Ochtman wrote:

On Wed, Aug 5, 2009 at 01:43, Mark Hammondmhamm...@skippinet.com.au  wrote:

Thanks Nick; I didn't want to be the only one saying that.  There is a fine
line between asserting reasonable requirements for Windows users and being
obstructionist and unhelpful, and I'm trying to stay on the former side :)


I'm not trying to be obstructionist and unhelpful (I hope that should
be obvious).


It is, and I hope I didn't imply otherwise.


On the other hand, I'm working from the point of view of
hg, which has two assumptions:

- we're a distributed system, there's fairly little we can assume about clients
- we exchange checksummed byte streams (even if we have some tools
that assume those streams are code)
- because of the previous point, there's one native (and therefore
better, in a sense) serialization of what you consider structured
data

The first point means, for example, there will always be some clients
who don't have win32text enabled, no matter what, so you can't rely on
it, which is why I want to make the server hooks the primary line of
defense, and view the client-side tools as helper tools (to make it
easy not to trigger the server-side hooks). That doesn't mean I think
Windows users are second-rate, or anything like that!


In general I agree - although I think we can enforce a social contract 
which puts requirements on people who commit to the Python repository - 
and therefore we can consider the server-side hooks a secondary 
defense.  IOW, the system (including the social aspects of the system) 
are setup such that the server-side hooks are very rarely called upon.



I'm not that happy with the server being the primary line of defense. Let's
say I make a branch of the hg repo, myself and a few others work on it
committing as we go, then attempt to merge back upstream.  Let's say some of
the early commits on that clone introduced bad line endings.  I'm guessing
I would be forced to make a number of whitespace-only checkins to normalize
the line-endings before it could merge - and these checkins would then be in
the history forever.  Or I could attempt to recreate the clone by somehow
replaying the commits with line endings corrected.  Either way, the
situation doesn't seem good.


I don't think either is bad.


With all due respect, I suspect that is because you don't expect to see 
the issue regularly.  This proposal still leaves the problem squarely in 
the lap of Windows users and imposes a burden on them that would 
probably be considered unreasonable if the situation was reversed.


I'm yet to work on a hg repository without mixed line endings.  If I 
understand correctly, every such repository would have involved a 
developer checking in locally, than at some point in the future pushing 
these changes upstream.  I really really don't want hg to tell me at 
this final step that I need to perform whitespace only fixes purely 
because I am running Windows.


I understand we are discussing how win32text can offer that - but I must 
object to your assertion that the situation I described isn't bad when 
you hit it.



Well, I'd be happy to help convince the hg crew to accept whatever we
come up with, but I'm not sure I'm the best person to come up with it.
It sounds like a versioned .hgeols would help a bunch of issues, but I
have the feeling you know that better than me, so I'm hoping you can
come up with a concrete proposal on what should change in win32text to
fix all the problems you see.


Actually, I think it is easy to make this problem much easier to 
understand; mandate every platform should use win32text, then start 
collating the issues people, including yourself, will no doubt face. 
I'm happy to get this ball rolling, but again, don't want this left 
purely in the domain of it is a windows problem - it isn't.


Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Dirkjan Ochtman
On Wed, Aug 5, 2009 at 10:51, Martin v. Löwismar...@v.loewis.de wrote:
 Not as Mercurial, no. As Python, we can certainly expect that all of our
 contributors have read the developer FAQ, and set up their systems
 accordingly. If all else fails, we can revoke commit access (or is
 it push access?) if some committer doesn't get the configuration
 right. We would, of course, prefer if it was very easy to get the
 configuration right, so that problems don't occur in the first place.

There will also be non-committers who forge changesets that you want
to be able to push directly to the Python repositories.

 If the client machines were the primary line of defense, Windows users
 were treated equally: they would make as few mistakes as Unix users,
 because the hooks do what they want correctly.

Similarly, if Python kept its .py files in \r\n line endings by
default instead of \n endings, Unix-like users would be more prone to
mistake, so by keeping the .py files in \n-format, so Python is making
Windows users second-rate by keeping the line endings in \n format. To
cope with that, hg needs to do extra work on the client side.

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Neil Hodgson
Martin v. Löwis:

 Or don't you understand why that single unresolved item didn't manage
 to revert the decision? Well, there are many unresolved items in
 the Mercurial conversion, some much more stressful than the eol issue
 (e.g. the branching discussion).

   Then these issues should have been included in the initial PEP for
choosing a DVCS since the issues could have driven the choice. PEP 374
implies that win32text effectively solves the Windows eol issue which
no longer appears to be correct.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Dirkjan Ochtman
On Wed, Aug 5, 2009 at 11:02, Mark Hammondskippy.hamm...@gmail.com wrote:
 In general I agree - although I think we can enforce a social contract
 which puts requirements on people who commit to the Python repository - and
 therefore we can consider the server-side hooks a secondary defense.  IOW,
 the system (including the social aspects of the system) are setup such that
 the server-side hooks are very rarely called upon.

Agreed.

 With all due respect, I suspect that is because you don't expect to see the
 issue regularly.

I suspect so, too!

 I'm yet to work on a hg repository without mixed line endings.  If I
 understand correctly, every such repository would have involved a developer
 checking in locally, than at some point in the future pushing these changes
 upstream.  I really really don't want hg to tell me at this final step that
 I need to perform whitespace only fixes purely because I am running Windows.

 I understand we are discussing how win32text can offer that - but I must
 object to your assertion that the situation I described isn't bad when you
 hit it.

I agree it is to be avoided, I'm just saying that I think it will be
exceptional and therefore not a large burden, given other kinds of
defenses we can put in place.

 Actually, I think it is easy to make this problem much easier to understand;
 mandate every platform should use win32text, then start collating the issues
 people, including yourself, will no doubt face. I'm happy to get this ball
 rolling, but again, don't want this left purely in the domain of it is a
 windows problem - it isn't.

I'm not sure how win32text will provide anything other than
performance degradation for non-Windows developers, but if there's
functionality to be had, I'm happy to mandate its use on every
platform.

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Martin v. Löwis
 Not as Mercurial, no. As Python, we can certainly expect that all of our
 contributors have read the developer FAQ, and set up their systems
 accordingly. If all else fails, we can revoke commit access (or is
 it push access?) if some committer doesn't get the configuration
 right. We would, of course, prefer if it was very easy to get the
 configuration right, so that problems don't occur in the first place.
 
 There will also be non-committers who forge changesets that you want
 to be able to push directly to the Python repositories.

They will also have to follow the policies we set up. If they refuse to
do that, we refuse to accept their changes. It's very simple, and
contributors have learned very quickly what the policies were (after
they were explained to them).

Whether that means that they have to fix their changesets, or that they
have to redo them, practice will show.

 If the client machines were the primary line of defense, Windows users
 were treated equally: they would make as few mistakes as Unix users,
 because the hooks do what they want correctly.
 
 Similarly, if Python kept its .py files in \r\n line endings by
 default instead of \n endings, Unix-like users would be more prone to
 mistake, so by keeping the .py files in \n-format, so Python is making
 Windows users second-rate by keeping the line endings in \n format. To
 cope with that, hg needs to do extra work on the client side.

I think you still miss the point. *If* hg does the extra work, *then*
Windows users are *not* second-class citizens anymore. They *only*
consider themselves second-class if they have to do additional *manual*
work (*).

Regards,
Martin

(*) They may also consider themselves second-class if they have to
install additional software, so hopefully, the necessary extra code for
hg will become part of the regular Mercurial distribution at some point.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Martin v. Löwis
 I'm not sure how win32text will provide anything other than
 performance degradation for non-Windows developers, but if there's
 functionality to be had, I'm happy to mandate its use on every
 platform.

This is all fairly hypothetical - if hg grew a .hgeols file, it would
be good if it supported that cross-platform. It then may make win32text
obsolete (in particular if it provided some useful defaults).

On Unix, the functionality might be as simple as checking conformance
with the eol-style at pre-commit time.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Mark Hammond

On 5/08/2009 7:09 PM, Dirkjan Ochtman wrote:

I'm not sure how win32text will provide anything other than
performance degradation for non-Windows developers, but if there's
functionality to be had, I'm happy to mandate its use on every
platform.


I see two practical outcomes of such a mandate:

* line-ending rules are enforced for local checkins, even for linux 
users, even though such 'accidental' inappropriate line-ending checkins 
should be much rarer than for windows.


* practical problems faced by Windows users, including any performance 
considerations, are shared by the community and therefore addressed as a 
community, thereby ensuring all platforms are considered as important as 
any other.


Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Ben Finney
Martin v. Löwis mar...@v.loewis.de writes:

  You seem to think that the problem has an obvious solution, which is
  not true;

 But is *has* an obvious solution. See the implementation from Dj
 Gilcrease, or the spec that I just posted.

Two different solutions are both obvious? There are other solutions
proposed elsewhere too; are they also obvious?

Mark Hammond skippy.hamm...@gmail.com writes:

 I think you have been mis-reading this thread.

Quite possibly; I'm not intending to impose my position on anyone. I'll
go back to lurking on the thread for a while and see if it becomes any
clearer.

-- 
 \   “First things first, but not necessarily in that order.” —The |
  `\  Doctor, _Doctor Who_ |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Paul Moore
2009/8/5 Martin v. Löwis mar...@v.loewis.de:
 My personal favorite outcome would be this:
 - most files have svn's native eol style; they get stored in LF
  in the repository; the hook will convert them on Windows, and check
  on Unix.
 - some files have windows eol style; they get stored in CRLF.
  The hook will not convert, but only check.
 - not sure whether some files need to be declared as unix eol style.
 - some files are binary; they get stored as-is - the hook will
  do nothing.

 With such a setup, using the hook would be truly optional on Unix,
 as it only ever checks and never converts. So if you manage to mess
 up, and don't have the hook installed on Unix, you lose when trying
 to push. That will teach you to be more careful in the future, or
 to install the hook (which hopefully becomes built into Mercurial at
 some point).

Given that my preference is to use Unix-style EOL for text files on
Windows, as every text editor I use (barring notepad!) understands LF
format, it seems to me that this proposal also means that the hook
would be optional for me. That suits me fine - I'd prefer to avoid
having hooks that are required for Python checkouts, as that means I
have to remember to configure them on each clone (IIUC).

Of course, this implies that your proposal only requires any action by
the user in the case of Windows users whose text editing tools insist
on CRLF format text files (sources, etc). Is that really a large group
of developers? (I honestly don't know).

I suspect that there is something missing from your proposal, as if
this were the case, then the problem appears to be limited to a very
small group of developers. Maybe it's Visual Studio that insists on
CRLF for source files? (I don't know, as I don't use the VS editor).
If that's the case, then maybe a VS hook would be an alternative
approach? (I can't imagine such a hook would be an *easier* approach,
I only mention it because it makes it clearer where the issue lies).

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Dirkjan Ochtman
On Wed, Aug 5, 2009 at 12:04, Paul Moorep.f.mo...@gmail.com wrote:
 Given that my preference is to use Unix-style EOL for text files on
 Windows, as every text editor I use (barring notepad!) understands LF
 format, it seems to me that this proposal also means that the hook
 would be optional for me. That suits me fine - I'd prefer to avoid
 having hooks that are required for Python checkouts, as that means I
 have to remember to configure them on each clone (IIUC).

Yeah, this may also be what's making it harder for me to understand
the issues. I am actually a Windows user, although I do most of my
development on Linux servers through PuTTY. I just always make sure I
use editors that respect the file's line endings, and so for those
things where I've used hg to version code on Windows (for example,
when testing a Firefox extension) and when my colleague who does edit
his code inside Windows, I've just used editors that deal with line
endings. Typically, in my case, that was either Notepad2 (an awesomely
light-weight Notepad replacement) or Komodo (Edit). That solved all of
my issues, so I haven't had a need for win32text so far.

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Mark Hammond

On 5/08/2009 8:04 PM, Paul Moore wrote:

2009/8/5 Martin v. Löwismar...@v.loewis.de:



With such a setup, using the hook would be truly optional on Unix,
as it only ever checks and never converts. So if you manage to mess
up, and don't have the hook installed on Unix, you lose when trying
to push. That will teach you to be more careful in the future, or
to install the hook (which hopefully becomes built into Mercurial at
some point).


Given that my preference is to use Unix-style EOL for text files on
Windows, as every text editor I use (barring notepad!) understands LF
format,


Most tools that I use will tend to not mix EOL styles in a single file, 
but will tend to create \r\n line endings for new files I create.  Most 
hg repos I come across don't have mixed line endings within individual 
files, so I can only guess these files were accidentally introduced in 
the same way (and indeed I have personally done this.)  I'm hoping to be 
part of the solution instead of part of the problem :)


 it seems to me that this proposal also means that the hook

would be optional for me.


Technically it would be optional for everyone, of course.  However, the 
solution should be such that everyone, regardless of personal 
preference, is willing to take the hit.


For example, if the repo is converted using \r\n line endings natively, 
then Windows users would need to take no action either and puts the onus 
back on you (given your stated preferences) to configure the tool 
appropriately.  I assume you would have no objection to that and would 
be happy to make that tool optional for me?


 That suits me fine - I'd prefer to avoid

having hooks that are required for Python checkouts, as that means I
have to remember to configure them on each clone (IIUC).


Configuring on each clone would certainly be sub-optimal, so the 
proposal is this configuration be stored in a versioned file in the repo.



Of course, this implies that your proposal only requires any action by
the user in the case of Windows users whose text editing tools insist
on CRLF format text files (sources, etc). Is that really a large group
of developers? (I honestly don't know).


It applies to all files that aren't native EOL style - there are just 
less of them regularly modified than those that are so marked.



I suspect that there is something missing from your proposal, as if
this were the case, then the problem appears to be limited to a very
small group of developers. Maybe it's Visual Studio that insists on
CRLF for source files? (I don't know, as I don't use the VS editor).
If that's the case, then maybe a VS hook would be an alternative
approach? (I can't imagine such a hook would be an *easier* approach,
I only mention it because it makes it clearer where the issue lies).


I must concede that Windows developers are the minority here - but 
assuming we want a level playing field, I don't see how that changes the 
underlying issue...


Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Mark Hammond

On 5/08/2009 8:14 PM, Dirkjan Ochtman wrote:

endings. Typically, in my case, that was either Notepad2 (an awesomely
light-weight Notepad replacement) or Komodo (Edit). That solved all of
my issues, so I haven't had a need for win32text so far.


FWIW, I use komodo and scite as my primary editors, and as mentioned, am 
personally responsible for accidentally checking in \r\n files into what 
should be a \n repo.  I am slowly and painfully learning to be more 
careful - IMO, I shouldn't need to...


Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Dirkjan Ochtman
On Wed, Aug 5, 2009 at 13:19, Mark Hammondmhamm...@skippinet.com.au wrote:
 Configuring on each clone would certainly be sub-optimal, so the proposal is
 this configuration be stored in a versioned file in the repo.

Even if we do that, enabling hg extensions will still need to be done
locally -- although it can be done per-user/box instead of per-clone.

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Mark Hammond

On 5/08/2009 9:28 PM, Dirkjan Ochtman wrote:

On Wed, Aug 5, 2009 at 13:19, Mark Hammondmhamm...@skippinet.com.au  wrote:

Configuring on each clone would certainly be sub-optimal, so the proposal is
this configuration be stored in a versioned file in the repo.


Even if we do that, enabling hg extensions will still need to be done
locally -- although it can be done per-user/box instead of per-clone.


That is completely fine, and not unlike SVN where a per-user/box setting 
generally needs to be set once - but after that everything just works. 
 Windows developers don't mind taking a hit once ;)  The dev guide can 
make it clear what the expectations are...


Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Nick Coghlan
Mark Hammond wrote:
 On 5/08/2009 7:09 PM, Dirkjan Ochtman wrote:
 I'm not sure how win32text will provide anything other than
 performance degradation for non-Windows developers, but if there's
 functionality to be had, I'm happy to mandate its use on every
 platform.
 
 I see two practical outcomes of such a mandate:
 
 * line-ending rules are enforced for local checkins, even for linux
 users, even though such 'accidental' inappropriate line-ending checkins
 should be much rarer than for windows.
 
 * practical problems faced by Windows users, including any performance
 considerations, are shared by the community and therefore addressed as a
 community, thereby ensuring all platforms are considered as important as
 any other.

The main error that enabling win32text everywhere can catch is the use
of a *nix client to accidentally corrupt one of the files that is
supposed to have \r\n line endings.

It also simplifies the configuration rules in the Python hg FAQ - we
would be able to just tell all developers wanting to contribute patches
to Python to enable the win32text extension when working with the Python
repositories (or clones thereof) without having to worry about what
platform they were on.

So it seems to me that the main client-side feature we want is a
versioned .hgeols file in the repository that allows files to be
explicitly nominated as one of:
- eol=CRLF (i.e. have \r\n line endings in the repository and should be
left that way on the local disk as well - equivalent to SVN eol-style:CRLF)
- eol=LF (i.e. have \n line endings in the repository and should be left
that way on the local disk as well - equivalent to SVN eol-style:LF)
- eol=CR (i.e. have \n line endings in the repository and should be left
that way on the local disk as well - equivalent to SVN eol-style:CR)
- native text (i.e. always stored in the repository with \n line
endings, but uses native line endings on the local disk - equivalent to
SVN eol-style:native)
- binary (i.e. always reproduced on disk exactly as they are in the
repository - equivalent to SVN files without eol-style set at all)

The .hgeols file should also allow the repository to define which of the
above should be used as the default handling mechanism for text files
that are not named in the file (native text, in the specific case of the
Python repositories).

Files which look like binary files (according to the existing win32text
heuristics) would be left alone regardless of what the default handling
was set to in .hgeols.

win32text would then be enhanced to check for a .hgeols file before
falling back to its existing configuration mechanisms.

The above basically provides the SVN eol-style feature in a more
hg-friendly way. Allowing wildcards in the .hgeols files might be nice,
but I don't think it is actually required. We really don't have that
many files that are affected by this problem (it's just the fact that it
is a number greater than zero that is causing the problem).

The server side pre-push hooks for the main Python repositories would be
set to reject change sets which didn't meet the above rules. If a patch
fails those checks, either the committer can fix it themselves and
resubmit, or else send it back to the originator along with a pointer to
the section in the dev FAQ that describes the expected client-side
configuration.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread MRAB

Nick Coghlan wrote:

Mark Hammond wrote:

On 5/08/2009 7:09 PM, Dirkjan Ochtman wrote:

I'm not sure how win32text will provide anything other than
performance degradation for non-Windows developers, but if there's
functionality to be had, I'm happy to mandate its use on every
platform.

I see two practical outcomes of such a mandate:

* line-ending rules are enforced for local checkins, even for linux
users, even though such 'accidental' inappropriate line-ending checkins
should be much rarer than for windows.

* practical problems faced by Windows users, including any performance
considerations, are shared by the community and therefore addressed as a
community, thereby ensuring all platforms are considered as important as
any other.


The main error that enabling win32text everywhere can catch is the use
of a *nix client to accidentally corrupt one of the files that is
supposed to have \r\n line endings.

It also simplifies the configuration rules in the Python hg FAQ - we
would be able to just tell all developers wanting to contribute patches
to Python to enable the win32text extension when working with the Python
repositories (or clones thereof) without having to worry about what
platform they were on.

So it seems to me that the main client-side feature we want is a
versioned .hgeols file in the repository that allows files to be
explicitly nominated as one of:
- eol=CRLF (i.e. have \r\n line endings in the repository and should be
left that way on the local disk as well - equivalent to SVN eol-style:CRLF)
- eol=LF (i.e. have \n line endings in the repository and should be left
that way on the local disk as well - equivalent to SVN eol-style:LF)
- eol=CR (i.e. have \n line endings in the repository and should be left
that way on the local disk as well - equivalent to SVN eol-style:CR)
- native text (i.e. always stored in the repository with \n line
endings, but uses native line endings on the local disk - equivalent to
SVN eol-style:native)
- binary (i.e. always reproduced on disk exactly as they are in the
repository - equivalent to SVN files without eol-style set at all)

The .hgeols file should also allow the repository to define which of the
above should be used as the default handling mechanism for text files
that are not named in the file (native text, in the specific case of the
Python repositories).

Files which look like binary files (according to the existing win32text
heuristics) would be left alone regardless of what the default handling
was set to in .hgeols.

win32text would then be enhanced to check for a .hgeols file before
falling back to its existing configuration mechanisms.

The above basically provides the SVN eol-style feature in a more
hg-friendly way. Allowing wildcards in the .hgeols files might be nice,
but I don't think it is actually required. We really don't have that
many files that are affected by this problem (it's just the fact that it
is a number greater than zero that is causing the problem).

The server side pre-push hooks for the main Python repositories would be
set to reject change sets which didn't meet the above rules. If a patch
fails those checks, either the committer can fix it themselves and
resubmit, or else send it back to the originator along with a pointer to
the section in the dev FAQ that describes the expected client-side
configuration.


Instead of just talking about line endings, could each file have a
specific 'filetype'? This would define what kind of data it contains,
how it's stored in the repository, and what actions to perform for
fetching and committing, including any checks:

c_header: C header file; LF in repository; native outside

c_source: C source file; LF in repository; native outside

text: plain text; LF in repository; native outside

crlf_text: plain text; CRLF in repository; CRLF outside

cr_text: plain text; CR in repository; CR outside

lf_text: plain text; LF in repository; LF outside

binary: arbitrary binary data; as-is in repository

This could be expanded in the future to include filetypes for JPEG, etc.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Dirkjan Ochtman
On Wed, Aug 5, 2009 at 15:35, MRABpyt...@mrabarnett.plus.com wrote:
 Instead of just talking about line endings, could each file have a
 specific 'filetype'? This would define what kind of data it contains,
 how it's stored in the repository, and what actions to perform for
 fetching and committing, including any checks:

Sounds like YAGNI to me. The outline Nick provided seems to me to be
quite close to the current win32text settings in syntax and purpose
and staying close to that would help making adoption easier.

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Oleg Broytmann
On Wed, Aug 05, 2009 at 02:35:02PM +0100, MRAB wrote:
 Instead of just talking about line endings, could each file have a
 specific 'filetype'?

   EOL-conversion, MIME type and encoding (charset) are three different
concepts. Yes, all of them must be supported, but not necessary in one
configuration mechanism.
   Subversion handles these issues by providing svn:eol-style and
svn:mime-type (handles both MIME type and charset) properties on a
file-by-file basis.

Oleg.
-- 
 Oleg Broytmannhttp://phd.pp.ru/p...@phd.pp.ru
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Nick Coghlan
Dirkjan Ochtman wrote:
 On Wed, Aug 5, 2009 at 15:35, MRABpyt...@mrabarnett.plus.com wrote:
 Instead of just talking about line endings, could each file have a
 specific 'filetype'? This would define what kind of data it contains,
 how it's stored in the repository, and what actions to perform for
 fetching and committing, including any checks:
 
 Sounds like YAGNI to me.

Yep - while SVN does support full mime_type specification for files, I
don't think we have ever used it. The SVN eol-style property is all
we're trying to replicate, since that has served us well in the few
cases where it has mattered.

 The outline Nick provided seems to me to be
 quite close to the current win32text settings in syntax and purpose
 and staying close to that would help making adoption easier.

Yeah, win32text is already tantalising close to what we would like so I
deliberately tried to stay close to its existing approach. We're just
being a bit fussier than most about the repository being able to tell
the clients which files should be given special treatment. That way
individual users can just set it up once on their development machine
and then no longer have to worry about it (if more files that need
special treatment are added to the repository, then the same checkin
that adds them should also update .hgeols).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Stephen J. Turnbull
Mark Hammond writes:

  I'm not sure what point you are trying to make, but I believe it *is* 
  possible for a solution to be found here which will keep Windows users 
  happy.  I'm guessing you haven't had much practical experience with this 
  problem, so probably don't see this is clearly as Windows users do.

Mercurial is not only open source, it's written in Python.  The
problem is known to be hard in a practical sense, the existing
solutions (written by non-Windows developers, of course) are judged to
be insufficient by Windows users, and the non-Windows developers
probably don't see this is clearly as Windows users do.

I think the implication is obvious.  There will be no good solution
until Windows users develop it.  I don't see a good reason to wait for
that.  I do see good reason for non-Windows users to put up with some
inconvenience during the beta phase of implementing that solution;
it's important enough to be fast-tracked, and doesn't need to be
perfect for everybody to be tried (though it should not be allowed to
endanger repo content, which seems unlikely but needs care since it's
a potential disaster).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread John Arbash Meinel
Mark Hammond wrote:
 On 5/08/2009 8:14 PM, Dirkjan Ochtman wrote:
 endings. Typically, in my case, that was either Notepad2 (an awesomely
 light-weight Notepad replacement) or Komodo (Edit). That solved all of
 my issues, so I haven't had a need for win32text so far.
 
 FWIW, I use komodo and scite as my primary editors, and as mentioned, am
 personally responsible for accidentally checking in \r\n files into what
 should be a \n repo.  I am slowly and painfully learning to be more
 careful - IMO, I shouldn't need to...
 
 Cheers,
 
 Mark

IIRC one of the main problems in Copy  Paste. I believe both Scite and
Visual Studio have had issues where they preserve the line endings of
files, but if you paste from another source, it will continue to
preserve the line endings of the pasted content.

That said, you also have the create a new file defaults to CRLF that
has similar problems.

John
=:-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Paul Moore
2009/8/5 Mark Hammond mhamm...@skippinet.com.au:
 Most tools that I use will tend to not mix EOL styles in a single file, but
 will tend to create \r\n line endings for new files I create.  Most hg repos
 I come across don't have mixed line endings within individual files, so I
 can only guess these files were accidentally introduced in the same way (and
 indeed I have personally done this.)  I'm hoping to be part of the solution
 instead of part of the problem :)

Interesting. I don't recall *ever* having generated CRLF line endings
in a LF-delimited file (I use Vim) although I may have created CRLF in
new files (and then not noticed, as Vim handles it transparently
enough that I missed it).

There are no significant projects where I'm a committer, though, so I
interact via patches, which means I don't get the opportunity to break
the repository :-)

 Technically it would be optional for everyone, of course.  However, the
 solution should be such that everyone, regardless of personal preference, is
 willing to take the hit.

 For example, if the repo is converted using \r\n line endings natively, then
 Windows users would need to take no action either and puts the onus back on
 you (given your stated preferences) to configure the tool appropriately.  I
 assume you would have no objection to that and would be happy to make that
 tool optional for me?

Absolutely. My issue is with 2 points:

1) I'm an infrequent contributor, so I don't keep a checkout around. I
make a new clone on demand, so I would be likely to forget to enable
the hook on at least a proportion of my clones. The versioned .hgeols
proposal seems to cover this.

2) This behaviour is something needed for Python only. I've no issue
with enabling win32text globally, but I'd want to be clear that it is
a no-op unless specifically requested (ie, something like
**=cleverencode is *not* used in the absence of an explicit set of
rules). That may well be the case, but I had the impression that
win32text tried to be automatic, so I'd like to verify it.

 I must concede that Windows developers are the minority here - but assuming
 we want a level playing field, I don't see how that changes the underlying
 issue...

Again, agreed entirely.

As a Windows developer who doesn't (knowingly) encounter the issue,
I'm not in a good position to help, but I'm happy to contribute
comments and test things. I'll be offline for a couple of weeks,
though, so you may well have solved it before I can do anything :-)

Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Glenn Linderman
On approximately 8/5/2009 4:28 AM, came the following characters from 
the keyboard of Dirkjan Ochtman:

On Wed, Aug 5, 2009 at 13:19, Mark Hammondmhamm...@skippinet.com.au wrote:

Configuring on each clone would certainly be sub-optimal, so the proposal is
this configuration be stored in a versioned file in the repo.


Even if we do that, enabling hg extensions will still need to be done
locally -- although it can be done per-user/box instead of per-clone.


On approximately 8/5/2009 9:24 AM, came the following characters from 
the keyboard of Paul Moore:

 2) This behaviour is something needed for Python only. I've no issue
 with enabling win32text globally, but I'd want to be clear that it is
 a no-op unless specifically requested (ie, something like
 **=cleverencode is *not* used in the absence of an explicit set of
 rules). That may well be the case, but I had the impression that
 win32text tried to be automatic, so I'd like to verify it.


Depending on [Windows] users to configure their installation of 
Mercurial to work with the Python repository is lame; it will lead to 
new Windows contributors getting beat-up at check-in time, and make them 
less likely to want to contribute even the work they have already done 
(with wrong EOL), and much less to want to start future contributions, 
because some Unix Python hacker will be nasty about Didn't you RTFM? 
(Maybe not at first, but eventually).


If the configuration settings have to be different per project for 
Windows developers using Mercurial for multiple projects, then that is 
also lame... Windows developers would have to keep changing their 
configurations, or (implied in above discussion) remember to recreate 
settings for each new clone or branch or whatever of the Python project. 
 This is also error-prone, and leads to the above problem a different way.


I have read this whole discussion, but want to step back and look at it 
from a theoretical viewpoint.  A good solution would have the following 
characteristics:


INSTALLATION) The developer should install the [D]VCS (for this 
discussion, Mercurial, present or future version), and attempt to access 
a repository (for this discussion, the Python repository, converted and 
configured for the chosen [D]VCS).  The resultant environment should 
automatically be configured to work properly. If any [D]VCS extensions 
are required for the project, they should be automatically installed and 
configured, or the user given explicit instructions on how to do so, as 
a one-time installation step, that adversely affects no other projects 
for which the [D]VCS is used by that or other users of the present 
installation..  See below for what properly means.


EOL CONFIGURATION) Each file, when added to the repository, should have 
a repository setting that indicates what the appropriate EOL type is for 
that file.  The values I have heard are  \n only, \r\n, platform-native, 
and binary.  I haven't heard \r only in this discussion, but have heard 
it in other similar discussions, and it may be a useful setting for 
Mercurial to have, if the feature must be newly implemented there.  I 
believe there are also systems that use RS to separate lines, and 
perhaps other things (and are there new Unicode control characters that 
could be used for line endings?), so it might be good to leave a few 
unassigned values in such a setting.  I don't think any setting should 
be created to allow mixed line ending usage within a file, except 
binary.  Per repository default for this setting should be available to 
avoid burdening the user when creating the typical type of file.


ENCODING CONFIGURATION) Each file, when created, should have a 
repository settings that declares its character repertoire and encoding, 
and if it is a Unicode UTF encoding, whether or not it should have a 
leading BOM.  In my opinion, all source code files should use a Unicode 
encoding, the exception being for test files that help test encoding 
support in internationalized environments.  But the feature supports 
other people's opinions too.  Per repository default for this setting 
should be available to avoid burdening the user when creating the 
typical type of file.


CHECKOUT) Check-outs should be sensitive to the user's local environment 
(platform and locale settings), and non-binary files should be converted 
from the repository format to the local encoding and platform-specific 
line endings.  Settings to override the line endings should be 
optionally available for users whose tools understand other line 
endings, and prefer them over the native line endings.  If the 
characters used within a file cannot be converted losslessly to the 
encoding specified by the locale settings, then it should not be able to 
be checked out.  A special override might be useful for using a lossy 
transformation for a read-only view of the file, at user request.


CHECKIN) Check-ins, even local check-ins to local clones or branches, 
should 

Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Martin v. Löwis
 Given that my preference is to use Unix-style EOL for text files on
 Windows, as every text editor I use (barring notepad!) understands LF
 format, it seems to me that this proposal also means that the hook
 would be optional for me. That suits me fine - I'd prefer to avoid
 having hooks that are required for Python checkouts, as that means I
 have to remember to configure them on each clone (IIUC).
 
 Yeah, this may also be what's making it harder for me to understand
 the issues.

Please trust that there are plenty of editors that get the line ending
implementation wrong. I'm fairly certain that some Visual Studio
versions are among them. They will recognize LF as a line ending, but
add CRLF line breaks when the user presses enter.

In addition, some editors (in particular notepad) choke when confronted
with LF-only files. It is very annoying if you have to look at source
code at somebody else's machine which doesn't have any programmer
editor installed (except for Visual Studio).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Georg Brandl
Neil Hodgson schrieb:
 Martin v. Löwis:
 
 Or don't you understand why that single unresolved item didn't manage
 to revert the decision? Well, there are many unresolved items in
 the Mercurial conversion, some much more stressful than the eol issue
 (e.g. the branching discussion).
 
Then these issues should have been included in the initial PEP for
 choosing a DVCS since the issues could have driven the choice. PEP 374
 implies that win32text effectively solves the Windows eol issue which
 no longer appears to be correct.

Apparently, it was the author's understanding at that time that win32text
would be sufficient.  Also, PEP 374 has not been written in isolation; at
any time during the process people could have notified Dirkjan that this
is not the case.

The branching issue *has* been included in PEP 374; it is not a blocker
for migration, but rather a decision has to be made between two similar,
but in other ways quite different styles for converting SVN branches.

I'm not aware of any other unresolved items; they may exist, but the fact
that they're not discussed on this list in detail means that they are
largely unimportant.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Martin v. Löwis
 I'm not aware of any other unresolved items; they may exist, but the fact
 that they're not discussed on this list in detail means that they are
 largely unimportant.

There is a long list of things that still need to be done; each one
potentially creating new problems. In particular:
- the .hgeols plugin needs to be written
- the hooks need to be written, or at least deployed, for code
  style checks, for email notification, and for buildbot triggering
- the build identification patch needs to be written (I do expect
  many problems out of that one, some possibly small - I'm not a
  Mercurial user, so I can't estimate how difficult that will be)
- buildbot configuration needs to be adjusted
- the roundup regex needs to be configured to refer to hgweb links
- access control needs to be setup
- stackless needs to be converted
- a decision on the location of the PEPs must be made and implemented
- developer documentation needs to be written
- a decision must be made what to do with the migrated parts of
  subversion, in the subversion repository

I may have missed some things. I would like to see test period (say,
two weeks) were we can find further issues.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Neil Hodgson
Glenn Linderman:

 and perhaps other things (and
 are there new Unicode control characters that could be used for line
 endings?),

   Unicode includes Line Separator U+2028 and Paragraph Separator
U+2029 but they are rarely supported and very rarely used. They are a
pain to work with since they are 3 byte sequences in UTF-8. Visual
Studio does support them.

   Python does not currently support these line separators such as in
this example which only reads 2 lines rather than 3:

with open(x.txt, wb) as f:
f.write(a\nb\u2029c\n.encode('utf-8'))
with open(x.txt, r) as f:
n = 1
for l in f.readlines():
print(n, repr(l))
n += 1

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Mark Hammond

On 6/08/2009 12:28 AM, Stephen J. Turnbull wrote:

Mark Hammond writes:

I'm not sure what point you are trying to make, but I believe it *is*
possible for a solution to be found here which will keep Windows users
happy.  I'm guessing you haven't had much practical experience with this
problem, so probably don't see this is clearly as Windows users do.

Mercurial is not only open source, it's written in Python.  The
problem is known to be hard in a practical sense, the existing
solutions (written by non-Windows developers, of course) are judged to
be insufficient by Windows users, and the non-Windows developers
probably don't see this is clearly as Windows users do.

I think the implication is obvious.  There will be no good solution
until Windows users develop it.  I don't see a good reason to wait for
that.


My conclusion is different.  I'm not sure of the history of win32text, 
but it most certainly is now squarely in the hands of Windows users. 
Patches to win32text, or even general discussion is usually met with 
silence, and when prodded, the response is sorry - we don't use that - 
it is a Windows problem.


As a result, we end up in the position we are in now - win32text is 
great in theory but doesn't work in practice, attempts to make it work 
are met with indifference, and the problem stays squarely with Windows 
users.  Non Windows users remain oblivious to the pain, Windows users 
stop bothering with the extension, and the repository post-commit hooks 
then cause different pain.


Hence my conclusion that the answer is for any such support to be 
developed in conjunction with Windows users, but also in such a way that 
the solution works, almost identically, for non Windows users.  By 
insisting all platforms eat the same dog-food, there is much more chance 
the glaringly obvious (to Windows users) issues are addressed.


 I do see good reason for non-Windows users to put up with some

inconvenience during the beta phase of implementing that solution;
it's important enough to be fast-tracked, and doesn't need to be
perfect for everybody to be tried (though it should not be allowed to
endanger repo content, which seems unlikely but needs care since it's
a potential disaster).


And on the flip-side, I accept we may migrate without the agreed 
solution fully implemented - I'm happy to accept commitments about what 
*will* be done even if it isn't a reality for a short while...


Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Stephen J. Turnbull
Mark Hammond writes:
  On 6/08/2009 12:28 AM, Stephen J. Turnbull wrote:

   I think the implication is obvious.  There will be no good solution
   until Windows users develop it.  I don't see a good reason to wait for
   that.

  My conclusion is different.  I'm not sure of the history of win32text, 
  but it most certainly is now squarely in the hands of Windows users. 
  Patches to win32text, or even general discussion is usually met with 
  silence, and when prodded, the response is sorry - we don't use that - 
  it is a Windows problem.

Well, yes, it is a Windows problem.  And it will probably always be
that way, because for practical purposes, Windows users cannot
advocate their platform's infrastructure solutions for open source
projects: those solutions are proprietary.  On the flip side, in my
experience at least Windows users do not contribute much to this kind
of infrastructure initiative, undoubtedly due to the high cost of
acquiring familiarity with the usable options[1], and so have less
input into the process.

But that's a matter of certain costs that are built in to the nature
of a proprietary platform.  Somebody has to pay them, and I think it
should be the users of that platform.  Why should the rest of the
community subsidize that platform?

  As a result, we end up in the position we are in now - win32text is 
  great in theory but doesn't work in practice, attempts to make it work 
  are met with indifference, and the problem stays squarely with Windows 
  users.

This is simply false AFAICS.  There was little participation on this
particular issue during PEP 374 that I can recall.  Now that it is
clearly an issue after all, it's still early in the PEP 385 process.
Martin has already picked up the ball on EOL support, and has carried
informal design pretty much to the goal line already ... all that's
left is the detailed design and the implementation, and there are
several people involved who will help develop the patch, all very
capable.  (Of course it's going to be easier said than done and there
are probably bumps in the road to a smooth workflow, but I do claim
that the process is working as well as you could expect.)

  Hence my conclusion that the answer is for any such support to be 
  developed in conjunction with Windows users, [...]

Ahem.  Why not (primarily) by Windows users?

  And on the flip-side, I accept we may migrate without the agreed
  solution fully implemented - I'm happy to accept commitments about
  what *will* be done even if it isn't a reality for a short while...

Make no mistake about it, EOL support is a tempest in a teapot
compared to the benefits to a large number of core developers in their
*personal* workspaces -- even if the project workflow doesn't change
at all.  That's what is driving this change.

Unless Windows users do it themselves, they are dependent on the good
will of the PEP 385 proponent and other volunteer contributors.  I
don't think accepting commitments is part of the game plan.

Footnotes: 
[1]  Eg, I was willing to participate in PEP 374 because I already
have a great interest in version control and use git daily.  Lots of
Unix users don't, and they didn't participate any more than most
Windows users did.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-04 Thread Nick Coghlan
Dirkjan Ochtman wrote:
 * commit hooks be implemented to enforce this - but this should not be
 necessary if the above was implemented and socially enforced.
 
 You seem to advocate a two-step approach: enforce line endings through
 win32text, catch any errors that slipped through in a hook (commit
 hook is an optional first line of defense, changegroup hooks on the
 server to protect the rest of the world).
 
 I think inverting that approach would be better: have strict hooks on
 the server to prevent people from pushing inappropriate EOLs, and
 provide help on configuring win32text as an extra help for developers
 on Windows who use editors that work better with \r\n. That leaves
 people to pick their own weapon of choice against propagation of \r\n
 (e.g. better editor, commit hooks, whatever) while still making sure
 no inappropriate line endings land in the python.org repositories. It
 also seems to fit well with the whole consenting adults thing (but
 that might just be me).

It's about not treating Windows developers as second class citizens.
Their platform uses \r\n as its native line ending format, so they
should be able to work in that format without any hassles by following
some simple instructions (such as ensure you have version X of the
Windows hg client, enable the win32text extension and configure it in
such-and-such a way). Not oh, yeah, that's an issue but if you search
the Intarwebs there are a few different things you can do that kinda
sorta work but are a bit fragile and klunky.

The precise order the two issues (server side enforcement and client
side assistance) are dealt with doesn't really matter because *both*
issues need to be addressed before we migrate.

win32text needs to be usable on non-Windows clients so that tarballs
generated on a *nix machine get the line endings right in the
Windows-only files.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-04 Thread Mark Hammond

On 4/08/2009 7:20 PM, Nick Coghlan wrote:

Dirkjan Ochtman wrote:

* commit hooks be implemented to enforce this - but this should not be
necessary if the above was implemented and socially enforced.


You seem to advocate a two-step approach: enforce line endings through
win32text, catch any errors that slipped through in a hook (commit
hook is an optional first line of defense, changegroup hooks on the
server to protect the rest of the world).

I think inverting that approach would be better: have strict hooks on
the server to prevent people from pushing inappropriate EOLs, and
provide help on configuring win32text as an extra help for developers
on Windows who use editors that work better with \r\n. That leaves
people to pick their own weapon of choice against propagation of \r\n
(e.g. better editor, commit hooks, whatever) while still making sure
no inappropriate line endings land in the python.org repositories. It
also seems to fit well with the whole consenting adults thing (but
that might just be me).


It's about not treating Windows developers as second class citizens.
Their platform uses \r\n as its native line ending format, so they


Thanks Nick; I didn't want to be the only one saying that.  There is a 
fine line between asserting reasonable requirements for Windows users 
and being obstructionist and unhelpful, and I'm trying to stay on the 
former side :)



should be able to work in that format without any hassles by following
some simple instructions (such as ensure you have version X of the
Windows hg client, enable the win32text extension and configure it in
such-and-such a way). Not oh, yeah, that's an issue but if you search
the Intarwebs there are a few different things you can do that kinda
sorta work but are a bit fragile and klunky.

The precise order the two issues (server side enforcement and client
side assistance) are dealt with doesn't really matter because *both*
issues need to be addressed before we migrate.


I'm not that happy with the server being the primary line of defense. 
Let's say I make a branch of the hg repo, myself and a few others work 
on it committing as we go, then attempt to merge back upstream.  Let's 
say some of the early commits on that clone introduced bad line 
endings.  I'm guessing I would be forced to make a number of 
whitespace-only checkins to normalize the line-endings before it could 
merge - and these checkins would then be in the history forever.  Or I 
could attempt to recreate the clone by somehow replaying the commits 
with line endings corrected.  Either way, the situation doesn't seem good.



win32text needs to be usable on non-Windows clients so that tarballs
generated on a *nix machine get the line endings right in the
Windows-only files.


I agree.  It isn't fair to make this windows users problem.  It would be 
like me proposing the repo get imported with \r\n line endings, enforce 
that with server side hooks, and let non-Windows users worry about the 
ramifications of that - somehow I doubt that would fly - so neither 
should it fly for Windows users...


I'm more than willing to help on this; I haven't resurrected my stale 
patch because I find win32text only 1/2 a solution that doesn't work in 
practice.  Therefore that patch is as stale for me as it is anyone. 
However, if a plan is put in place which offers a full solution and the 
hg developers are committed to it, I promise I'll put my hand up to help 
with implementation in a fairly timely manner...


Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-04 Thread Neil Hodgson
Mark Hammond:

 Thanks Nick; I didn't want to be the only one saying that.  There is a fine
 line between asserting reasonable requirements for Windows users and being
 obstructionist and unhelpful, and I'm trying to stay on the former side :)

   I haven't commented on this issue before because I can't really be
helpful. I just don't understand why hg is being considered before
it's Windows support is roughly equivalent to svn and cvs.

   There has been some similar experience with the main repository for
the Cocoa port of Scintilla which is in bzr on launchpad. Several
times in that repository, files were checked in with wrong line ends
making every line appear changed when looking through history. There
are several causes for this including user error but bzr (and hg)
should default to more helpful behaviour on text files.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-04 Thread Ben Finney
Mark Hammond mhamm...@skippinet.com.au writes:

 Let's say I make a branch of the hg repo, myself and a few others work
 on it committing as we go, then attempt to merge back upstream. Let's
 say some of the early commits on that clone introduced bad line
 endings. I'm guessing I would be forced to make a number of
 whitespace-only checkins to normalize the line-endings before it could
 merge - and these checkins would then be in the history forever.

What is wrong with that? I mean, if that is the actual sequence of
events, why should the history not reflect that?

 Either way, the situation doesn't seem good.

I see this assertion made often, so I'm not saying you are necessarily
wrong to make it. I just don't see a justification for making it (and,
without justification, I would say it *is* wrong to make it).

-- 
 \  “Our products just aren't engineered for security.” —Brian |
  `\ Valentine, senior vice-president of Microsoft Windows |
_o__)  development |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com