[issue31136] raw strings cannot end with a backslash character r'\'

2019-02-20 Thread Graham Wideman


Graham Wideman  added the comment:

Demonstration:
print("x" + r' \' ' + "x")   produces
x \' x
Where is this behavior _ever_ useful? 
Or if there is some use case for this, how frequent is it compared to the 
frequency of users expecting either that backslash does nothing special, or 
that it would behave like an escape, and not appear in the output? 

I'm not here to suggest there's some easy fix for this. I just don't want this 
issue closing as "not a bug" and fail to register that this design is flawed.

--

___
Python tracker 
<https://bugs.python.org/issue31136>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31136] raw strings cannot end with a backslash character r'\'

2019-02-20 Thread Graham Wideman


Graham Wideman  added the comment:

Let us be clear here that this is NOT a case where the backslash escapes the 
subsequent quote. If it WAS such a case, then the sequence \' would leave only 
the quote in the output string. But it doesn't; it leaves the complete 
2-character \' in the output string.
So essentially this is a case of the character sequence \' being given a 
special status that causes that character pair to have a special meaning in 
preference to the meaning of the individual characters.
So this IS a bug -- it may be "as designed", but that produces the bug in the 
name of this feature, "raw string", which is patently misleading and in 
violation of the principle of least surprise. This is a feature (as the FAQ 
explains) provided explicitly for developers of regular expression parsers. So 
at best, these r-strings should be called "regex-oriented" string literals, 
which can be used elsewhere, at risk of knowing this gotcha.

--
nosy: +gwideman

___
Python tracker 
<https://bugs.python.org/issue31136>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20906] Issues in Unicode HOWTO

2014-03-22 Thread Graham Wideman

Graham Wideman added the comment:

@Andre: 

_I_ know more or less the explanations behind all this. I am just putting it 
forward as an example which touches several concepts which are needed to 
explain it, and that a programmer might reason with to change a program (or the 
environment) to produce some output (instead of an exception), and possibly 
even the intended output.

For example, behind the brief explanation you provide, here are some of the 
related concepts:

1. print(s) sends output to stdout, which sends data to windows console 
(cmd.exe).

2. In the process, the output function that print -- stdout invokes attempts 
to encode s according to the encoding that the destination, cmd.exe reports 
that it expects.

3. On Windows (in English, or perhaps it's US locale), cmd.exe defaults to 
expecting encoding cp437.

4. cp437 is an encoding containing only 256 characters. Many Unicode code 
points obviously have no corresponding character in cp437.

5. The encoding process used by print() is set to exception on characters that 
have no mapping to the encoding wanted by stdout.

6. Consequently, print() throws an exception on code points outside of those 
representable in cp437.

Based on that, there are a number of moves the programmer might make, with 
varying results... possibly involving:

-- s.encode([various choices of options here]) -- s_as_bytes
-- print(s_as_bytes) (noting that 'Hello ' + s_as_bytes doesn't work)
-- Or maybe ascii(s)
-- Or possibly sys.stdout.buffer.write()

-- Pros and cons of the above, which require careful tracking of what the 
resulting strings or byte sequences really mean at each juncture.

-- cmd.exe chcp 65001 -- so print(unicode) won't exception, but still many 
chars will show as [?]
-- various font choices in cmd.exe which might be able to show the needed 
graphemes.
-- Automatic font substitution that occurs in some contexts when the selected 
font doesn't contain a requested code point and its grapheme.

... and probably more concepts that I've missed.

-- Graham

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20906
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20906] Issues in Unicode HOWTO

2014-03-22 Thread Graham Wideman

Graham Wideman added the comment:

@R David:  I agree with you. Thanks for extending the line of thinking I 
outlined.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20906
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20906] Issues in Unicode HOWTO

2014-03-20 Thread Graham Wideman

Graham Wideman added the comment:

Marc-Andre:

Thanks for commenting:

  2. 1. Python string -- some other code system, such as 
  ASCII, cp1250, etc. The destination code system doesn't 
  necessarily have anything to do with unicode, and whole 
  ranges of unicode's characters either result in an 
  exception, or get translated as escape sequences. 
  Ie: This is more usefully seen as a translation 
  operation, than merely encoding.

 Those are encodings as well. The operation going from Unicode to one of
 these encodings is called encode in Python.

Yes I am certainly aware that in Python parlance these are also called encode 
(and achieved with encode()), which, I am arguing, is one reason we have 
confusion. These are not encoding into a recognized Unicode-defined byte 
stream, they entail translation and filtering into the allowed character set of 
a different code system and encoding into that code system's byte 
representation (encoding).

  In 1, the encoding process results in data that stays within concepts 
  defined within Unicode. In 2, encoding produces data that would be 
  described by some code system outside of Unicode.
  At the moment I think Python muddles these two ideas together, 
  and I'm not sure how to clarify this. 

 An encoding is a mapping of characters to ordinals, nothing more or less.

In unicode, the mapping from characters to ordinals (code points) is not the 
encoding. It's the mapping from code points to bytes that's the encoding. While 
I wish this was a distinction reserved for pedants, unfortunately it's an 
aspect that's important for users of unicode to understand in order to make 
sense of how it works, and what the literature and the web says (correct and 
otherwise).

 You are viewing all this from the a Unicode point of view, but please
 realize that Unicode is rather new in the business and the many
 other encodings Python supports have been around for decades.

I'm advocating that the concepts be clear enough to understand that Unicode 
(UTF-whatever) works differently (two mappings) than non-Unicode systems 
(single mapping), so that users have some hope of understanding what happens in 
moving from one to the other.

   So it should say 16-bit code points instead, right?
 
  I don't think Unicode code points should ever be described as 
  having a particular number of bits. I think this is a 
  core concept: Unicode separates the character -- code point, 
  and code point -- bits/bytes mappings. 

 You have UCS-2 and UCS-4. UCS-2 representable in 16 bits, UCS-4
 needs 21 bits, but is typically stored in 32-bit. Still,
 you're right: it's better to use the correct terms UCS-2 vs. UCS-4
 rather than refer to the number of bits.

I think mixing in UCS just adds confusion here. Unicode consortium has declared 
UCS obsolete, and even wants people to stop using that term:
http://www.unicode.org/faq/utf_bom.html
UCS-2 is obsolete terminology... the term should now be avoided.
(That's a somewhat silly position -- we must still use the term to talk about 
legacy stuff. But probably not necessary here.)

So my point wasn't about UCS. It was about referring to code points as having a 
particular bit width. Fundamentally, code points are numbers, without regard to 
some particular computer number format. It is a separate matter that they can 
be encoded in 8, 16 or 32 bit encoding schemes (utf-8, 16, 32), and that is 
independent of the magnitude of the code point number. 

It _is_ the case that some code points are large enough integers that when 
encoded they _require_, say, 3 bytes in utf-8, or two 16-bit words in utf-16 
and so on. But the number of bits used in the encoding does not necessarily 
correspond to the number of bits that would be required to represent the 
integer code point number in plain binary. (Only in UTF-32 is the encoded value 
simply the binary version of the code point value.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20906
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20906] Issues in Unicode HOWTO

2014-03-20 Thread Graham Wideman

Graham Wideman added the comment:

Marc-Andre: Thanks for your latest comments. 

 We could also have called encodings: character set, code page,
 character encoding, transformation, etc.

I concur with you that things _could_ be called all sorts of names, and the 
choices may be arbitrary. However, creating a clear explanation requires 
figuring out the distinct things of interest in the domain, picking terms for 
those things that are distinct, and then using those terms rigorously. (Usage 
in the field may vary, which in itself may warrant comment.)

I read your slide deck/time-capsule-from-2002,  with interest, on a number of 
points. (I realize that you were involved in the Python 2.x implementation of 
Unicode. Not sure about 3.x?)

Page 8 What is a Character? is lovely, showing very explicitly Unicode's two 
levels of mapping, and giving names to the separate parts. It strongly suggests 
this HOWTO page needs a similar figure.

That said, there are a few notes to make on that slide, useful in trying to 
arrive at consistent terms: 

1. The figure shows a more precise word for what users regard as a character, 
namely grapheme. I'd forgotten that.

2. It shows e-accent-acute to demonstrate a pair of code points representing a 
single grapheme. That's important, but should avoid suggesting this as the only 
way to form e-accent-acute (canonical equivalence, U+00E9).

3. The illustration identifies the series of code points (the middle row) as 
the Unicode encoding of the string. Ie: The grapheme-to-code-points mapping 
is described as an encoding. Not a wrong use of general language. But 
inconsistent with the mapping that encode() pertains to. (And I don't think 
that the code-point-to-grapheme transform is ever called decoding, but I 
could be wrong.)

4. The illustration of Code Units (in the third row) shows graphemes for the 
Code Units (byte values). This confusingly glosses over the fact that those 
graphemes correspond to what you would see if you _decoded_ these byte values 
using CP1252 or ISO 8859-1, suggesting that the result is reasonable or useful. 
It certainly happens that people do this, deliberately or accidentally, but it 
is a misuse of the data, and should be warned against, or at least explained as 
a confusion.

Returning to your most recent message:

 In Python keep it simple: you have Unicode (code points) and 
 8-bit strings or bytes (code units).

I wish it _were_ that simple. And I agree that, in principle, (assuming Python 
3+) there should inside your program where you have the str type which always 
acts as sequences of Unicode code points, and has string functions. And then 
there's outside your program, where text is represented by sequences of bytes 
that specify or imply some encoding. And your program should use supplied 
library functions to mostly automatically convert on the way in and on the way 
out.

But there are enough situations where the Python programmer, having adopted 
Python 3's string = Unicode approach, sees unexpected results. That prompts 
reading this page, which is called upon to make the fine distinctions to allow 
figuring out what's going on.

I'm not sure what you mean by 8-bit strings but I'm pretty sure that's not an 
available type in Python 3+. Ie: Some functions (eg: encode()) produce 
sequences of bytes, but those don't work entirely like strs. 

---
This discussion to try to revise the article piecemeal has become pretty 
diffuse, with perhaps competing notions of purpose, and what level of detail 
and precision are needed etc. I will try to suggest something productive in a 
subsequent message.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20906
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20906] Issues in Unicode HOWTO

2014-03-20 Thread Graham Wideman

Graham Wideman added the comment:

At the moment I've run out of time to exert much forward push on this.

By way of temporary summary/suggestion for regrouping: Focus on what this page 
is intending to deliver. What concepts should readers of this page be able to 
distinguish and understand when they are finished?

To scope out the needed concepts, I suggest identifying representative 
unicode-related stumbling blocks (possibly from stackoverflow questions).

Here's an example case: just trying to get trivial beyond ASCII functionality 
to work on Windows (Win7, Python 3.3):


s = 'knight \u265E'
print('Hello ' + s)


... which fails with:

UnicodeEncodeError: 'charmap' codec can't encode character '\u265e' in 
position 13: character maps to undefined. 

A naive attempt to fix this by using s.encode() results in the + operation 
failing.

What paths forward do programmers explore in an effort to have this code (a) 
not throw an exception, and produce at least some output, and (b) make it 
produce the correct output?

And why does it work as intended on linux?

The set of concepts identified and explained in this article needs to be 
sufficient to underpin an understanding of the distinct data types, encodings, 
decodings, translations, settings etc relevant to this problem, and how to use 
them to get a desired result.

There are similar problems that occur at other Python-system boundaries, which 
would further illuminate the set of necessary concepts.

Thanks for all comments.

-- Graham

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20906
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20906] Issues in Unicode HOWTO

2014-03-19 Thread Graham Wideman

Graham Wideman added the comment:

Antoine:

Thanks for your comments -- this is slippery stuff.

 It's better, but how about simply In this article?

I was hoping to inform the reader that the hex representations are found in 
many articles, not just special to this one.

 [ showing the glyph ]

Agreed -- it would be good to show the glyphs mentioned. But in a way that 
isn't confusing if the user's web browser doesn't show it correctly.

 For all intents and purposes, iso-8859-1 and friends *are* encodings 
 (and this is how Python actually names them).

I am still mulling this over. iso-8859-1 is most literally an encoding in the 
old sense of the word (character -- byte representation), and is not, per se, 
a unicode-related concept. 

I think part of the ambiguity problem here is that there are two subtly but 
importantly different ideas here:

1. Python string (capable of representing any unicode text) -- some 
full-fidelity and industry recognized unicode byte stream, like utf-8, or 
utf-32. I think this is legitimately described as an encoding of the unicode 
string.

versus:

2. 1. Python string -- some other code system, such as ASCII, cp1250, etc. The 
destination code system doesn't necessarily have anything to do with unicode, 
and whole ranges of unicode's characters either result in an exception, or get 
translated as escape sequences. Ie: This is more usefully seen as a translation 
operation, than merely encoding.

In 1, the encoding process results in data that stays within concepts defined 
within Unicode. In 2, encoding produces data that would be described by some 
code system outside of Unicode.

At the moment I think Python muddles these two ideas together, and I'm not sure 
how to clarify this. 

 So it should say 16-bit code points instead, right?

I don't think Unicode code points should ever be described as having a 
particular number of bits. I think this is a core concept: Unicode separates 
the character -- code point, and code point -- bits/bytes mappings. 

At most, one might want to distinguish different ranges of unicode code points. 
Even if there is a need to distinguish code points = 65535, I don't think this 
should be described as 16-bit, as it muddies the distinction between 
Unicode's two mappings.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20906
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20906] Issues in Unicode HOWTO

2014-03-16 Thread Graham Wideman

Graham Wideman added the comment:

 Do you want to provide a patch?

I would be happy to, but I'm not currently set up to create a patch. Also, I 
hoped that an author who has more history with this article would supervise, 
especially where I don't know what the original intent was.

 I find use of the word narrative intimidating in the context of a technical 
 documentation.

Agreed. How about In documentation such as the current article...

 In general, I find it disappointing that the Unicode HOWTO only gives 
 hexadecimal representations of non-ASCII characters and (almost) never 
 represents them in their true form. This makes things more abstract 
 than necessary.

I concur with reducing unnecessary abstraction. No sure what you mean by true 
form. Do you mean show the glyph which the code point represents? Or the 
sequence of bytes? Or display the code point value in decimal? 

  This is a vague claim. Probably what was intended was: Many 
  Internet standards define protocols in which the data must 
  contain no zero bytes, or zero bytes have special meaning.  
  Is this actually true? Are there many such standards?

 I think it actually means that Internet protocols assume an ASCII-compatible 
 encoding (which UTF-8 is, but not UTF-16 or UTF-32 - nor EBCDIC :-)).

Ah -- yes that makes sense.

  -- Non-Unicode code systems usually don't handle all of 
  the characters to be found in Unicode.

 The term *encoding* is used pervasively when dealing with the transformation 
 of unicode to/from bytes, so I find it confusing to introduce another term 
 here 
 (code systems). I prefer the original sentence.

I see that my revision missed the target. There is a problem, but it is wider 
than this sentence.

One of the most essential points this article should make clear is the 
distinction between older schemes with a single mapping:

Characters -- numbers in particular binary format. (eg: ASCII)

... versus Unicode with two levels of mapping...

Characters -- code point numbers -- particular binary format of the number 
data and sequences thereof.

In the older schemes, encoding referred to the one mapping: chars -- 
numbers in particular binary format. In Unicode, encoding refers only to the 
mapping: code point numbers -- binary format. It does not refer to the chars 
-- code point mapping. (At least, I think that's the case. Regardless, the 
two mappings need to be rigorously distinguished.)

On review, there are many points in the article that muddy this up.  For 
example, Unicode started out using 16-bit characters instead of 8-bit 
characters. Saying so-an-so-bit characters about Unicode, in the current 
article, is either wrong, or very confusing.  Unicode characters are associated 
with code points, NOT with any _particular_ bit level representation.

If I'm right about the preceding, then it would be good for that to be spelled 
out more explicitly, and used consistently throughout the article. (I won't try 
to list all the examples of this problem here -- too messy.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20906
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20906] Issues in Unicode HOWTO

2014-03-16 Thread Graham Wideman

Graham Wideman added the comment:

A further issue regarding one-to-one mappings.

Article: Encodings don’t have to be simple one-to-one mappings like Latin-1. 
Consider IBM’s EBCDIC, which was used on IBM mainframes.

I don't think this paragraph is about one-to-one mappings per se. (ie: one 
character to one code.) It seems to be about whether ranges of characters whose 
code values are contiguous in one coding system are also contiguous in another 
coding system. The EBCDIC encoding is still one-to-one, I believe.

The subject of one-chararacter-to-one-code mapping is important (normalization 
etc), though perhaps beyond the current article. But I think the article should 
avoid suggesting that many-to-one or one-to-many scenarios are common.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20906
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20906] Unicode HOWTO

2014-03-13 Thread Graham Wideman

New submission from Graham Wideman:

The Unicode HOWTO article is an attempt to help users wrap their minds around 
Unicode. There are some opportunities for improvement. Issues presented in 
order of the narrative:

http://docs.python.org/3.3/howto/unicode.html

History of Character Codes
---

References to the 1980's are a bit off.

In the mid-1980s an Apple II BASIC program... 

Assuming the comment is about the state of play in the mid-80's, then: The 
Apple II appeared in 1977. By 1985 we already had Macs, and PCs running DOS, 
which were capable of various character sets (not to mention lowercase letters!)

In the 1980s, almost all personal computers were 8-bit

Both the PC (1983) and Mac (1984) had 16-bit processors.

Definitions:

Characters are abstractions: Not helpful unless one already knows what 
abstraction means in this specific context.

the symbol for ohms (Ω) is usually drawn much like the capital letter omega 
(Ω) in the Greek alphabet [...] but these are two different characters that 
have different meanings.

Omega is a poor example for this concept. Omega is used as the identifier for a 
unit in the same way as m is used for meter, or A is used for ampere. Each 
is a specific use of a character, which, like any specific use, has a 
particular meaning. However, having a particular meaning doesn't necessarily 
require a separate character, and in the case of omega, the Unicode standard 
now says that the separate ohm character is deprecated. 

The ohm sign is canonically equivalent to the capital omega, and normalization 
would remove any distinction.

http://www.unicode.org/versions/Unicode4.0.0/ch07.pdf#search=%22character%20U%2B2126%20maps%20OR%20map%20OR%20mapping%22

A better example might be the roman numerals, code points U+2160 and subsequent.

Definitions


A code point is an integer value, usually denoted in base 16.  

When trying to convey clearly the distinction between character, code point, 
and byte representation, the topic of how it's denoted is a potential 
distraction for the reader, so I suggest this point be a bit more explicitly  
parenthetical, and less confusable with 16 bit.  Like:

A code point value is an integer in the range 0 to over 0x10 (about 1.1 
million, with some 110 thousand assigned so far). In a narrative such as the 
current article, a code point value is usually written in hexadecimal. The 
Unicode standard displays code points with the notation U+265E to mean the 
character with value 0x265e (9822 decimal; Black Chess Knight character).

(Also revise subsequent para to use same example character. I suggest not using 
Ethiotic Syllable WI, because it's unfamiliar to most readers, and it muddies 
the topic by suggesting that Unicode in general captures _syllables_ rather 
than _characters_.)

Encodings:
---
This sequence needs to be represented as a set of bytes
-- This code point sequence needs to be represented as a sequence of bytes

4. Many Internet standards are defined in terms of textual data

This is a vague claim. Probably what was intended was: Many Internet standards 
define protocols in which the data must contain no zero bytes, or zero bytes 
have special meaning.  Is this actually true? Are there many such standards?

Generally people don’t use this encoding,
Probably people per se don't use any encoding, computers do.  -- Because of 
these problems, other more efficient and convenient encodings have been devised 
and are commonly used.

For continuity, directly after that para should come the later paras starting 
with UTF-8 is one of the most common.

2. A Unicode string is turned into a string of bytes...
-- 2. A Unicode string is turned into a sequence of bytes...  (Ie: don't 
overload string in and article about strings and encodings.).

Create a new subhead Converting from Unicode to non-Unicode encodings, and 
move under it the paras:

Encodings don't have to...
Latin-1, also known as...
Encodings don't have to...

But also revise:

Encodings don’t have to handle every possible Unicode character, and most 
encodings don’t.

-- Non-Unicode code systems usually don't handle all of the characters to be 
found in Unicode.

--
assignee: docs@python
components: Documentation
messages: 213367
nosy: docs@python, gwideman
priority: normal
severity: normal
status: open
title: Unicode HOWTO
type: enhancement
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20906
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19805] Revise FAQ Dictionary: consistent key order item

2013-11-26 Thread Graham Wideman

New submission from Graham Wideman:

FAQ entry:
http://docs.python.org/3/faq/programming.html#how-can-i-get-a-dictionary-to-display-its-keys-in-a-consistent-order
claims that there's no way for a dictionary to return keys in a consistent 
order. However, there's OrderedDict which should probably be mentioned here.

--
assignee: docs@python
components: Documentation
messages: 204550
nosy: docs@python, gwideman
priority: normal
severity: normal
status: open
title: Revise FAQ Dictionary: consistent key order item
type: enhancement
versions: Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19805
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19141] Windows Launcher fails to respect PATH

2013-10-02 Thread Graham Wideman

Graham Wideman added the comment:

Hi Vinay, thanks for commenting.  And of course for your efforts on py.exe (and 
no doubt the debate process.) 

I am trying to draw attention to the situation where the script has no shebang 
line, and there is no other explicit configuration info for py.exe. (No py.ini 
file, no py.exe envt variables, no py.exe-specific command-line args).

In that case, the next thing py.exe should check, in my view, is the user's 
PATH, where they may well have defined which python version they prefer (even 
if they are unaware of PEP 397 and Launcher).  This rationale is parallel to 
the one in #17903 that you pointed to.

Currently, py.exe ignores PATH in that case, and falls back to looking through 
all installed pythons and picking the latest 2.x if available.

 The choosing of 2.x vs. 3.x is also mentioned in the PEP 
The discussion of that issue would be illuminating, but I couldn't find it. 
Could you point to where this is mentioned in PEP-0397?

Thanks again.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19141
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19136] CSV, builtin open(), newline arg. Docs broken again.

2013-10-01 Thread Graham Wideman

New submission from Graham Wideman:

The docs appear to be incorrect for CSV at: 
http://docs.python.org/3.3/library/csv.html.

Per issue http://bugs.python.org/issue7198 , there's a long history of 
contention between os.open and csv.writer, in which, on Windows, the default 
result is an unwanted additional '\r'. That was 'fixed' by using the newline='' 
argument to open(). 

This is reflected in the docs at the above link:

with open('eggs.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=...)

However, in python 3.3.2 use of the newline argument returns: TypeError: 
'newline' is an invalid keyword argument for this function.

In brief testing, it appears that a correct result can be obtain by calling 
open as follows:

with open(somepath, 'wb') as writerfile: 
writer = csv.writer(writerfile, delimiter=...)

Note: binary mode, not text as previously needed and currently documented.

--
assignee: docs@python
components: Documentation
messages: 198752
nosy: docs@python, gwideman
priority: normal
severity: normal
status: open
title: CSV, builtin open(), newline arg. Docs broken again.
type: behavior
versions: Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19136
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19136] CSV, builtin open(), newline arg. Docs broken again.

2013-10-01 Thread Graham Wideman

Graham Wideman added the comment:

David:
Yes, as it turns out you are absolutely right, in a manner of speaking.  I have 
retested this exhaustively today, and here's the root cause.

It turns out that in testing, I must have activate a particular simplified test 
script by invoking only scriptname.py rather than invoking 'python 
scriptname.py'. (And then repeating that mistake by reinvoking via console 
history... doh!)

The latter reliably invokes python 3.3.2, because that's the only python in my 
PATH.  The former, it turns out, invokes the Windows Python Launcher, which 
finds a previously installed Python 2.7.1, despite that not being on the PATH. 

So, in my mind, the possibility of launching any version other than Python 
3.3.2 did not enter the picture.

Prior to this, I was only vaguely aware that Windows Python Launcher existed. 
Ironically, it was probably installed by Python 3.3.2.

Sorry for the bogus bug alert.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19136
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19141] Windows Launcher fails to respect PATH

2013-10-01 Thread Graham Wideman

New submission from Graham Wideman:

Python Launcher for Windows provides some important value for Windows users, 
but its ability to invoke python versions not on the PATH is a problem.

py.exe chooses a version of Python to invoke, in more or less this order of 
decreasing priority; it is the *last* one that occurs by default in a new 
install of python 3.3.x:

1. Shebang line in myscript.py

2. py.exe -n argument (n can be 2, 3, 3.2 etc). Launcher chooses the latest 
installed version so specified.

3. PY_PYTHON environment variable

4. py.ini in C:\WINDOWS or user's %LOCALAPPDATA% directory

5. Launcher hunts through registry for ALL previously installed pythons, and 
picks the latest version in the 2.x series. DEFAULT.

The first issue to note is that, to my knowledge, the exact precedence order is 
not documented... it would greatly help if this were done.

That said, the focus in this report is case 5, which as noted is the default 
behavior when python 3.3.2 is installed (and py.exe invoked with scripts having 
no launcher-aware shebang line).

In case 5, py.exe completely ignores the PATH environment variable. So, whereas 
PATH is used to find py.exe, or when the user invokes 'python' on the command 
line, py.exe ignores PATH and launches a version of python that is not 
necessarily in the PATH.

In case 2 where the user supplies a value for 'n', finding a non-PATH version 
of python is excusable on the basis that the user deliberately requests a 
version.

However, in case 5, the user is not invoking py explicitly, and is not 
necessarily aware of py's algorithm for finding all installed versions. The 
user might reasonably expect that invoking a script or double clicking it would 
just invoke 'python' the same as the 'python' command, using PATH.

In particular, if the user understands how PATH works (as reviewed in the docs 
here: 
http://docs.python.org/3/using/windows.html#finding-the-python-executable), 
then upon installing 3.3.x, he or she might explicitly *remove* python 2.x from 
PATH in the expectation that this will disable python 2.x. It is surprising and 
potentially harmful that py.exe does not abide by that choice.

A potential improvement is to interpose an item '4.5' in the above list, in 
which py.exe looks for a version of python on the PATH before falling back to 
searching for latest 2.x python ever installed.

(It is not clear that py.exe should *ever* fallback to just picking the latest 
2.x in the registry (item 5). It is conceivable that a user may have configured 
one of those pythons to do something destructive or insecure on startup, and it 
will be a great surprise if py.exe randomly invokes it just because it has 
the highest version number.)

--
components: Windows
messages: 198812
nosy: gwideman
priority: normal
severity: normal
status: open
title: Windows Launcher fails to respect PATH
type: behavior
versions: Python 3.3, Python 3.4, Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19141
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18939] Venv docs regarding original python install

2013-09-06 Thread Graham Wideman

Graham Wideman added the comment:

@Vinay Sajip  Thanks for looking at this issue and adding the link to PEP 405, 
and your explanation When working... with helpful shebang comments.

That said, the combination of PEP 405 and this updated page doesn't clear 
things up completely.

Vinay remarks The venv documentation does assume that the reader knows what 
virtual environments are and how they work.

If so, let's have a link to where the reader can get up to speed on how they 
work. PEP 405 is a help, but doesn't detail the topics I raised in earlier 
thread messages. Also, different legacy virtual environment schemes work 
differently, so prior knowledge doesn't necessarily help.

But the article already has a link to virtual environments... to a box in the 
same article, which is at the heart of not bringing clarity to the topics at 
hand.

One problem is lack of clarity about what active and activate means. Here 
is what I currently believe:

In connection with venv, the term active is used in two relatively trivial 
but subtly different ways. 

(1) First on PATH: The phrase active environment may be used to simply 
indicate the python environment which will be found first via the user's shell 
PATH. Further, each venv-configured python virtual environment installation 
includes an activate script whose main effect is just to add that 
environment's bin or Scripts directory to the beginning of the user's PATH. 
This makes the selected python environment the default when the user types the 
'python' command. This use of active or activate might better be termed 
default or make default.

(2) Actually running: A second meaning of active refers to an actually 
running instance of a python interpreter and its associated environment, 
whether or not it is first in the user's PATH. Any installed python (virtual or 
not) may be launched by explicitly invoking the complete path to its executable 
(eg: C:\python33\python.exe), whereupon that version of python will run, with 
its associated sys.path and so on.

These two meanings are obviously related. The particular python environment 
(virtual or not) that is active in the first sense, when invoked by a plain 
python command, will become active in the second sense.  But a running 
python (active in the second sense) will not necessarily be the active one 
in the first sense.

Implications for installers: A library installer invoked from the command line, 
unless told otherwise, will presumably install its payload into the python 
environment found via PATH. Consequently, in preparation, the intended target 
python should be made active in the first sense.

I have not elaborated here on my other concern (since I don't understand the 
details) -- clarification of different degrees of isolation/autonomy which can 
be established for each virtual environment. I still believe that's important 
to understand, and the current article and PEP 405 don't cover it successfully, 
in my view.

--
resolution: fixed - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18939
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18938] Prepend Is Not A Word

2013-09-05 Thread Graham Wideman

Graham Wideman added the comment:

Prepend appears in every online dictionary I consulted. For a dictionary to 
list it and give the usual meaning for it, pretty much demonstrates prepend 
functioning as a real word. That and its 1.3 million hits on google.

Prepend certainly has a commonly understood meaning, particularly in 
computing. 

To the extent that prepend has became popular as the appropriate-sounding 
opposite of append, that is exactly why it _should_ be used in this 
context... where one might well need to discuss adding strings before or after, 
and be clear about the distinction.

--
nosy: +gwideman

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18938
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18939] Venv docs regarding original python install

2013-09-05 Thread Graham Wideman

New submission from Graham Wideman:

http://docs.python.org/dev/library/venv.html

More detail needed regarding the original python environment

The article explains how to use venv to create a new python installation with 
independent libraries etc, and a means to activate one or another virtual 
python environment. However, there are some points regarding the original 
python environment which are cloudy.

(1) After pyvenv, what status does the original python installation have? Does 
pyvenv turn it into just one of now two or more virtual environments? Or is the 
original one special? Must it be specifically deactivated in order to activate 
a (different) virtual environment?

(2) The motivation behind venv seems to be to create virtual enviroments that 
can be substantially or completely separate from the system site directories or 
from the original python that pyvenv was run from.  Yet elsewhere the doc 
discusses how  pyvenv creates a pyvenv.cfg file with a home key pointing back 
to the originating Python installation, and sys.base_prefix and 
sys.base_exec_prefix point to the non-venv Python installation which was used 
to create the venv which suggest that a venv is _not_ independent of its 
creating Python installation.

It would be helpful to provide some context for this seemingly contradictory 
information.  Perhaps there are scenarios with differing degrees of 
independence, in which these pointers back to the originating Python 
installation may or may not be relevant?

(3) How do you proceed to create virtual environments from scratch when you 
have no initial python installation, or no python installation of that python 
version?

-- Hope these suggestions help.

--
assignee: docs@python
components: Documentation
messages: 197030
nosy: docs@python, gwideman
priority: normal
severity: normal
status: open
title: Venv docs regarding original python install
type: behavior
versions: Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18939
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18939] Venv docs regarding original python install

2013-09-05 Thread Graham Wideman

Graham Wideman added the comment:

Additionally on the subject of venv docs:  I would encourage making it more 
clear regarding how activate changes the user's PATH. Both 
http://www.python.org/dev/peps/pep-0405/  and 
http://docs.python.org/3.3/library/venv.html  talk about how activate adds the 
activated python binary to the path, but doesn't mention what path:  The one 
for the current console session? The system PATH environment variable (Windows) 
or one of the bash startup scripts (unix)? This is important, because it 
determines how far-reaching is activation of a particular virtual environment.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18939
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18939] Venv docs regarding original python install

2013-09-05 Thread Graham Wideman

Graham Wideman added the comment:

Thanks R. David for your comments.

 It should also mention that the activation is per-shell-session,

.. which also has implications (or lack of effect) for launching from Windows 
Explorer, for example.

Seems like in practical use, one would need to set up a batch file or shell 
script to run a particular venv activate command and launch a command shell 
with that python environment already set up. Shell for python 2.7.5 and 
library XYZ etc.

Advice along these lines would be helpful.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18939
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11553] Docs for: import, packages, site.py, .pth files

2011-06-11 Thread Graham Wideman

Graham Wideman initcont...@grahamwideman.com added the comment:

Hi Eric,  Thanks for starting to review this, and your responses are 
encouraging. Some comments inline below.

FWIW, along the way I accumulated my own notes on this topic, on some pages 
here:

grahamwideman.wikispaces.com
(Left navigation panel...)
Software development  Python  Organization for common modules

Might be of interest as feedback on the digging process I needed in order to 
get some clarity on these issues, and also shows my references.

 Exactly what variants of arguments are possible, and what are their effects?
Does http://docs.python.org/dev/library/functions#__import__ help? Does 
http://docs.python.org/dev/library/importlib ?

Well somewhat overkill -- because the matter of interest was args for from... 
and import, while the docs you mention are for more complicated underlying 
functions. (Interesting nonetheless.)

 Current docs are unclear on points such as:
 -- is __init__.py needed on subpackage directories?
Yes, it always has.  I think there was some discussion about removing them in 
py3k, but this was rejected.
I came to same conclusion.. but have seen it described otherwise (in at least 
one book), so good to state this explicitly.

 -- the __all__ variable: Does it act generally to limit visibility of a
 module or package's attributes, or does pertain only to the
 'from...import *' statement?
 Both. 

I'm pretty sure that's not correct -- pretty sure that __all__ only specifies 
what's included in from...import *, and does not prevent access via 
from...import specific_attrib.  But I may have tested incorrectly.

   
 Seriously misleading discussion of .pth files.  [snip]
Agreed. 

Cool -- I think it's well worth fixing this area for sure!

 In addsitepackages(), the library directory for Windows (the else clause)
 is shown as lower-case 'lib' instead of 'Lib'.
I don’t see any else clause in the 2.7 or 3.3 code.  Otherwise you’re right.

Sorry, the lowecase 'lib' issue is in 
  getsitepackages()... 
  if sys.platform in(...) ... 
  else:...
sitepackages.append(os.path.join(prefix, lib, site-packages))

 sys
 Could helpfully point to a discussion of the typical items to
 be found in sys.path under normal circumstances
Hm, this would be very platform-specific.  What use cases would that help?

It would demystify how python already knows how to find various things under 
vanilla circumstances.

 'Installing Python Modules' document
 Windows has no concept of a user’s home directory,  and so on.
The author probably meant that there was no $HOME environment variable, ~ 
shortcut and all that.

Fair enough, but in actuality there *is* a user-specific location (on Windows) 
examined by site.py, which is in %APPDATA%\Python\.

 For Windows suggests 'prefix' (default: C:\Python) as an installation 
 directory.
 This is indeed one of the possible 'site-package' directories, but surely it 
 is
 deprecated in favor of C:\Python\Lib\site-packages, which this section does 
 not mention.
Don’t confuse the prefix and the install dir.  The directory for Python 
modules is computed as prefix + Lib/site-packages.

Currently, under Alternate installation: Windows (the prefix scheme), it says:
python setup.py install --prefix=\Temp\Python
to install modules to the \Temp\Python directory on the current drive.
Does this really mean install modules to \Temp\Python\Lib\site-packages?
(And as a side point, surely installing under the Temp directory is a strange 
location to pick for an example?)

That was my initial feeback; I think I’ve covered all of your points. 
Looking forward!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11553
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11553] Docs for: import, packages, site.py, .pth files

2011-06-11 Thread Graham Wideman

Graham Wideman initcont...@grahamwideman.com added the comment:

Hi Nick: Thanks for your additional points. Comments inline:

 __all__ only affects import *, and may also affect documentation tools (e.g. 
 pydoc will respect __all__ when deciding what to display). It has no effect 
 on attribute retrieval from modules.

That's indeed my understanding. So the doc (6. Simple statements) which says 
that __all__ determines the list of public names is a bit of a red herring.  
Attributes are accessible (ie: public) regardless of whether on the __all__ 
list.  Instead the __all__ list establishes the list of names imported by *, 
and makes those names reference-able without a module prefix. (Plus gives hints 
about intent to doc tools.)

 pkgutil.extend_path() is used to modify pkg.__path__ attributes, *not* 
 sys.path. 

Understood, and perhaps my point was obtuse.  I was pointing out that the doc 
for extend_path discusses .pkg entries which point to package dirs, and that 
this, it says, is like .pth files. I claim that an entry in a .pth files should 
NOT point to a package dir, but rather to one level up: to a dir that 
*contains* package dirs. (Pointing a .pth entry directly at a package dir will 
break package behavior by exposing the constituent modules to sys.path.)  Hence 
the doc for extend_path is misleadingly suggesting a wrong idea about .pth 
files.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11553
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11553] Docs for: import, packages, site.py, .pth files

2011-06-11 Thread Graham Wideman

Graham Wideman initcont...@grahamwideman.com added the comment:

 Public name is a term that describes a convention, not anything enforced by 
 the interpreter. 

And I guess that's really the main point. In other languages Public means 
accessible, and Private means not so.  In Python, Public means suggested for 
outside consumption, and Private means not so intended, but nonetheless 
accessible. If that was reiterated near the discussion of __all__ it would be 
most helpful.  

  Dirs mentioned in .pkg files *should* be added to the [...] pkg.__path__, 
 not sys.path. 
 That could probably be made clearer, but the docs aren't wrong as they stand.

Again I've not managed to draw attention to the exact point of contention. 
1. A dir added to a .pkg file evidently should be an actual package dir.  
2. A dir added to a .pth file should NOT be an actual package dir. It should be 
the dir at the level above.

Thus the entries in .pkg and .pth files point to different kinds of things, yet 
the doc I pointed to asserts they are the same in this regard.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11553
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11669] Clarify Lang Ref Compound statements footnote

2011-03-25 Thread Graham Wideman

New submission from Graham Wideman initcont...@grahamwideman.com:

In Language Ref section 7 Compound Statements:
http://docs.python.org/release/3.1.3/reference/compound_stmts.html
there's a footnote regarding what happens to unhandled exceptions in a 
try-except statement:

[1] The exception is propagated to the invocation stack only if there is no 
*finally* clause that negates the exception.

This is very unclearly worded, especially since the reader in need of this 
footnote is probably familiar with the *except* clause being the one to 
negate an exception, and may well think this footnote is in error.  This 
footnote could provide a more convincing explanation: 

[1] The exception is propagated to the invocation stack unless there is a 
finally clause which happens to raise another exception. That new exception 
causes the old exception to be lost.

--
assignee: docs@python
components: Documentation
messages: 132072
nosy: docs@python, gwideman
priority: normal
severity: normal
status: open
title: Clarify Lang Ref Compound statements footnote
type: behavior
versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 
3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11669
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11553] Docs for: import, packages, site.py, .pth files

2011-03-15 Thread Graham Wideman

New submission from Graham Wideman initcont...@grahamwideman.com:

The overall scope of this issue is that current Python documentation gives 
vague, sometimes incorrect, information about the set of Python features 
involved in modularizing functionality.  This issue presents an obstacle to 
programmers making smooth transitions from a single module, to collections of 
modules and packages, then on to neatly organized common packages shared 
between projects.

The problem affects documentation of:

import and from...import statements

The Language Reference is way too complicated for the mainstream case. Exactly 
what variants of arguments are possible, and what are their effects? What are 
the interactions with package features, such as whether or not modules have 
been explicitly imported into package __init_.py?

sys.path
-
Typical consituents; range of alternatives for adding more dirs
   
Module site.py
--
Multiple serious errors in the file docstring, relating to site-packages 
directories and .pth files

.pth files
---
Incorrectly described in site.py, and then vaguely described in other docs.
Are .pth files processed everywhere on sys.path? Can they be interative? (No to 
both).

package structure
-
Details of package structure have evidently changed over Python versions. 
Current docs are unclear on points such as:
-- is __init__.py needed on subpackage directories?
-- the __all__ variable: Does it act generally to limit visibility of a module 
or package's attributes, or does pertain only to the 'from...import *' 
statement?

Details:
=

Language Reference
---
 http://docs.python.org/py3k/reference/simple_stmts.html#the-import-statement
The description of the import statement is extensive, but dauntingly 
complicated for the reader trying to understand the mainstream case of simply 
importing modules or packages that are on sys.path.  This is because the 
algorithm for finding modules tries numerous esoteric strategies before falling 
back on the plain-old-file-system method. 

(Even now that I have a good understanding of the plain-old-file variations of 
import, I reread this and find it hard to comprehend, and disorganized and 
incomplete in presenting the available variations of the statement.)

Grammar issue: the grammar shown for the import statement shows:
relative_module ::=  .* module | .+

... which implies that relative module could have zero leading dots. I believe 
an actual relative path is required to have at least one dot (PEP 328).  
Evidently, in this grammar, 'relative_module' really means relative or 
absolute path to module or package, so it would be quite helpful to change to:

relative_path ::=  .+ module | .+
from_path ::= (relative_path | module)

etc.  (Really 'module' is not quite right here either since it's used to mean 
module-or-package.)

 
site.py:

Module site.py implements the site-package related features. The docstring has 
multiple problems with consequences in other docs.

1. Does not mention user-specific site-package directories (implemented by 
addusersitepackages() )

2. Seriously misleading discussion of .pth files.  In the docstring the example 
shows using pth files, called package configuration files in their comments, 
to point to actual package directories bar and foo located within the 
site-packages directory.  This is an absolutely incorrect use of pth files:  If 
foo and bar are packages in .../site-packages/, they do not need to be pointed 
to, they are already on sys.path.  

If the package dirs ARE pointed to by foo.pth and bar.pth, the modules inside 
them will be exposed directly to sys.path, possibly precipitating name 
collisions.  Further, programmers following this example will create packages 
in which import statements will appear to magically perform relative imports 
without leading dots, leading to confusion over how the import statement is 
supposed to work.

It may be that this discussion is held over from a time when package perhaps 
meant Just a Bunch of Files in a Directory?

3. The docstring (or other docs) should make clear that .pth files are ONLY 
processed within site-package directories (ie: only by site.py).

4. Bug: Minor: In addsitepackages(), the library directory for Windows (the 
else clause) is shown as lower-case 'lib' instead of 'Lib'. This has some 
possibility of causing problems when running from a case-sensitive server.  In 
any case, if read as documentation it is misleading.

Tutorial
-
6. Modules:  http://docs.python.org/py3k/tutorial/modules.html  

1. Discussion (6.1.2. The Module Search Path) is good as far as it goes, but it 
doesn't mention the site-package directories.

2. Section 6.4. Packages:  Discussion of __init__.py does describe the purpose 
of these files

[issue11479] Add discussion of trailing slash in raw string to tutorial

2011-03-13 Thread Graham Wideman

Graham Wideman initcont...@grahamwideman.com added the comment:

Eli:  Excellent and thoughtful point. This would indeed be exactly the place to 
suggest os.path.join as an alternative.

In addition, there are still occasions where one needs to form a string with 
trailing backslash. Two examples:
1. When writing the string specifying root directory: r'C:\ '[:-1]
2. Using python to prepare command lines to run other command line programs, 
where an argument may require a final backslash to explicitly specify a target 
directory (as opposed to a file).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11479
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1271] Raw string parsing fails with backslash as last character

2011-03-12 Thread Graham Wideman

Graham Wideman initcont...@grahamwideman.com added the comment:

Thanks to all for your patient comments. I think I am resigned to raw-string 
forever being medium-rare-string :-).

Perhaps it's obvious once you get over the initial shock of non-rawness, but 
workarounds for the disallowed trailing backslash  include (note the final 
space character):

mydir = rC:\somedir\ .rstrip()   or...

mydir = rC:\somedir\ [:-1]

It might be worth mentioning one of these in the raw string docs to emphasize 
that there is this gotcha, that it's easy to fix, and prompting this as an 
idiom that becomes familiar in applications where it's needed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1271] Raw string parsing fails with backslash as last character

2011-03-11 Thread Graham Wideman

Graham Wideman initcont...@grahamwideman.com added the comment:

@Glenn Linderman:  I too am usually quick to assume that innocent fixes may 
have serious unforeseen impacts, but in this case I'm not convinced.  What 
would matter is to enumerate the current behavior, and of that what would be 
changed.  You seem to have had experience with other raw-string 
features/gotchas -- please share! :-)

@David Murray: Excuse denseness on my part, but I'm not following the logic of 
your first paragraph.  I think you are saying that current raw string has to do 
something special to be able to contain the sequence backslash-quote, and this 
has the side effect of precluding that sequence appearing last in a string.  

But surely a completely-escape-free string could also contain backslash-quote 
just fine (assuming the string is surrounded by the other kind of quote).  So 
I'm thinking that the case you mention is not the driver here.  

It's conceivable there is some more complicated case where 
backslash-singlequote AND backslash-doublequote MUST appear literally in the 
same string.  However, it seems a little bizarre to worry about that case, but 
not worry about the simpler case of wanting both a plain singlequote and a 
plain doublequote in the same string.  Maybe there's some popular regular 
expression that calls for this complexity.

I concur that inspection of the parser (and the history and intent of this 
design) would be fascinating.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1271] Raw string parsing fails with backslash as last character

2011-03-09 Thread Graham Wideman

Graham Wideman initcont...@grahamwideman.com added the comment:

(Not clear how to reopen this issue. Hopefully my change here does that.)

OK, so as it currently stands, backslash at end of string is prohibited in the 
interests of allowing backslash to escape quotes that might be embedded within 
the string. 

But the embedded quote scenario doesn't work because the backslash remains in 
the string.  So the current state of play is plain broken.  

Considering:
(a) We already have the ability to use either single or double quotes around 
the string which gives that chance to use the other quote within the string. 
(b) The principle of least surprise for raw string would be to have raw mean 
Never Escape Anything
(c) backslash on end of string is a trap waiting to happen for Windows users.
...I think there is strong motivation to abandon the currently broken 
backslash escapes quote behavior and just let raw strings be totally raw.  
Furthermore, it's hard to imagine that such a move would break anything.

--
nosy: +gwideman
type:  - behavior
versions: +Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3 -Python 
2.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11451] Raw string parsing fails with backslash as last character

2011-03-09 Thread Graham Wideman

New submission from Graham Wideman initcont...@grahamwideman.com:

This is a copy of issue 1271 because I couldn't find a way to reopen it.

So, repeating my comment here:

As it currently stands, backslash at end of string is prohibited, apparently in 
the interests of supposedly allowing backslash to escape quotes that might be 
embedded within the string. 

But the supposedly beneficial backslash-escaping-embedded quote behavior is 
broken because the backslash remains in the string.

Consider:
(a) We already have the ability to use either single or double quotes around 
the string which gives that chance to use the other quote within the string. 
(b) The principle of least surprise for raw string would be to have raw mean 
Never Escape Anything
(c) backslash on end of string is currently a trap waiting to happen for 
Windows paths.
So I think there is strong motivation to abandon the currently broken 
backslash escapes quote behavior and just let raw strings be totally raw.  
Furthermore, it's hard to imagine that such a move would break anything.  
(Famous last words, I know... but I challenge anyone to contrive such a 
scenario!)

--
components: Interpreter Core
messages: 130443
nosy: QuantumTim, facundobatista, georg.brandl, gwideman
priority: normal
severity: normal
status: open
title: Raw string parsing fails with backslash as last character
type: behavior
versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11451
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4033] python search path - .pth recursion

2011-03-08 Thread Graham Wideman

Changes by Graham Wideman initcont...@grahamwideman.com:


--
nosy: +gwideman

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4033
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11426] CSV examples can't close their files

2011-03-06 Thread Graham Wideman

New submission from Graham Wideman initcont...@grahamwideman.com:

On the csv doc page  (.../library/csv.html) most of the examples show creation 
of an anonymous file object within the csv.reader or csv.writer function, for 
example...

spamWriter = csv.writer(open('eggs.csv', 'w'), delimiter=' ',

This anonymity prevents later closing the file, which seems especially 
problematic for a writer.  It also confuses users as to whether there's some 
sort of close function on a csv.reader or csv.writer object which should be 
called, or perhaps some other magic behind the scenes.

I'm pretty sure that it's the doc that is incorrect here.  

This issue was raised pernthetically here 
http://bugs.python.org/issue7198#msg124678 by sjmachin, though mysteriously 
overlooked in his later suggested patch 
http://bugs.python.org/issue7198#msg126593

I suggest changing all examples to include the complete cycle of opening an 
explicit file, and later closing it.

--
assignee: docs@python
components: Documentation
messages: 130228
nosy: docs@python, gwideman
priority: normal
severity: normal
status: open
title: CSV examples can't close their files
type: behavior
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11426
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com