[issue44587] argparse BooleanOptionalAction displays default=SUPPRESS unlike other action types

2021-08-17 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

PR Opened.

A fix for this should be backported as well.  However, if you decide you don't 
want the refactor backported, you can merely continue to change the condition 
inside of BooleanOptionalAction to repeat all of the same checks as are 
contained in the older versions' ArgumentDefaultsHelpFormatter.

--

___
Python tracker 
<https://bugs.python.org/issue44587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44587] argparse BooleanOptionalAction displays default=SUPPRESS unlike other action types

2021-08-17 Thread Toshio Kuratomi


Change by Toshio Kuratomi :


--
pull_requests: +26275
pull_request: https://github.com/python/cpython/pull/27808

___
Python tracker 
<https://bugs.python.org/issue44587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44587] BooleanOptionalAction displays default=SUPPRESS unlike other action types

2021-07-08 Thread Toshio Kuratomi


New submission from Toshio Kuratomi :

This is related to https://bugs.python.org/issue38956 but a different symptom 
(and the current proposed fix for 38956 will not fix this.  My proposed fixes 
for this would also fix 38956).

I have the following code which uses BooleanOptionalAction along with a default 
of SUPPRESS (I use SUPPRESS because I merge these with settings from config and 
environment variables and then validate and add default values using a schema.  
SUPPRESS allows me to tell when a value was not specified at the command line 
by the user):

whole_site_parser.add_argument('--indexes',
   dest='indexes', 
action=BooleanOptionalAction,
   default=argparse.SUPPRESS,
   help='Test')


That code outputs:

  --indexes, --no-indexes
Test (default: ==SUPPRESS==)

Similar code that does not use BooleanOptionalAction does not show default: 
==SUPPRESS == even when formatter_class=ArgumentDefaultsHelpFormatter is used.

Looking at the code, this is occurring because BooleanOptionalArgument has its 
own code to add default: on its own (instead of leaving formatting as the 
responsibility of the formatter_class). The code in BooleanOptionalArgument 
handles less cases where the default should be skipped than the 
ArgumentDefaultsHelpFormatter code; SUPPRESS is one of the missing cases.

I can see two ways of fixing this:

(1) Remove the code from BooleanOptionalArgument that adds the default values.  
It seems to violate the architecture of argparse which delegates modifications 
to the help message to the formatter_class so this is a special case which 
could cause issues for future modifications as well.

(2) If the usefulness of outputting the default values without changing the 
formatter_class is deemed too important to relinquish, then moving the code 
that ArgumentDefaultsHelpFormatter uses to determine when to skip adding 
default to the help message can be extracted from ArgumentDefaultsHelpFormatter 
and called by both ArgumentDefaultsHelpFormatter and BooleanOptionalArgument .

In a review of a fix for 
https://github.com/python/cpython/pull/17447/files#r429630944 raymond hettinger 
thought that outputting of the default values was important to keep although 
I'm not clear on whether he considered that the usefulness comes at the price 
of possibly violating argparse's architecture.  If he hasn't changed his mind, 
then #2 is probably the way to resolve this.

I can submit a PR for either of these once I know which direction to take (the 
first is just removing a few lines of code and I've already written the second 
one since it seemed like the direction that raymond had last asked for).

Please let me know how you'd like me to proceed.

--
messages: 397184
nosy: a.badger
priority: normal
severity: normal
status: open
title: BooleanOptionalAction displays default=SUPPRESS unlike other action types
versions: Python 3.10, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue44587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44383] argparse.BooleanOptionalAction interacts poorly with ArgumentDefaultsHelpFormatter

2021-07-08 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

I believe this is a duplicate of https://bugs.python.org/issue38956 and could 
be closed in favor of that issue.

38956 also has a Pull Request to fix the issue which is awaiting a re-review.

--
nosy: +a.badger

___
Python tracker 
<https://bugs.python.org/issue44383>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40917] pickle exceptions with mandatory keyword args will traceback

2020-06-11 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Thanks!  I confirm that your PR  https://github.com/python/cpython/pull/11580 
for https://bugs.python.org/issue27015 fixes this problem.

Closing this one.

--
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue40917>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40917] pickle exceptions with mandatory keyword args will traceback

2020-06-08 Thread Toshio Kuratomi


Change by Toshio Kuratomi :


--
title: pickling exceptions with mandatory keyword args will traceback -> pickle 
exceptions with mandatory keyword args will traceback

___
Python tracker 
<https://bugs.python.org/issue40917>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40917] pickling exceptions with mandatory keyword args will traceback

2020-06-08 Thread Toshio Kuratomi


New submission from Toshio Kuratomi :

I was trying to use multiprocessing (via a 
concurrent.futures.ProcessPoolExecutor) and encountered an error when pickling 
a custom Exception.  On closer examination I was able to create a simple test 
case that only involves pickle:


import pickle
class StrRegexError(Exception):
def __init__(self, *, pattern):
self.pattern = pattern

data = pickle.dumps(StrRegexError(pattern='test'))
instance = pickle.loads(data)


[pts/11@peru /srv/ansible]$ python3.8 ~/p.py
Traceback (most recent call last):
  File "/home/badger/p.py", line 7, in 
instance = pickle.loads(data)
TypeError: __init__() missing 1 required keyword-only argument: 'pattern'

pickle can handle mandatory keyword args in other classes derived from object; 
it's only classes derived from Exception that have issues.

--
components: Library (Lib)
messages: 371057
nosy: a.badger
priority: normal
severity: normal
status: open
title: pickling exceptions with mandatory keyword args will traceback
versions: Python 3.6, Python 3.7, Python 3.8

___
Python tracker 
<https://bugs.python.org/issue40917>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36998] distutils sdist command fails to create MANIFEST if any filenames are undecodable

2019-05-21 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Figured out the answer to my last question while looking into fixing it.  The 
devguide documents both running tests via regrtest and running them via 
unittest test discovery.  regrtest works:

  ./python -m test -v distutils.test

But unittest doesn't:
  ./python -m unittest -v test.test_distutils
  ./python -m unittest -v distutils.test.test_file_utils
  # etc

I'll submit a separate PR to get that working.

--

___
Python tracker 
<https://bugs.python.org/issue36998>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36998] distutils sdist command fails to create MANIFEST if any filenames are undecodable

2019-05-21 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Are the distutils unittests disabled?  And if so is there a reason?  I was 
looking to add test cases to my PR and found that I couldn't get them (or 
indeed any distutils unittests) to run when trying to only target the distutils 
unittests.

>From looking at the code my guess is that the Python test suite was ported to 
>use the load_test protocol sometime after Python-3.2 but distutils was missed. 
> I can submit a PR to change that unless there's a reason it is the way it is.

--
nosy: +vstinner

___
Python tracker 
<https://bugs.python.org/issue36998>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36998] distutils sdist command fails to create MANIFEST if any filenames are undecodable

2019-05-21 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Okay, pushed a fix for regenerating the MANIFEST as well.

--

___
Python tracker 
<https://bugs.python.org/issue36998>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36998] distutils sdist command fails to create MANIFEST if any filenames are undecodable

2019-05-21 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Uploading a minimal test case.

$ tar -xzvf test-case.tar.gz
$ python3.7 setup.py sdist
running sdist
running check
warning: sdist: standard file not found: should have one of README, README.txt, 
README.rst

reading manifest template 'MANIFEST.in'
writing manifest file 'MANIFEST'
Traceback (most recent call last):
  File "setup.py", line 27, in 
packages=['hello'],
  File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
  File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
  File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
  File "/usr/lib64/python3.7/distutils/command/sdist.py", line 152, in run
self.get_file_list()
  File "/usr/lib64/python3.7/distutils/command/sdist.py", line 208, in 
get_file_list
self.write_manifest()
  File "/usr/lib64/python3.7/distutils/command/sdist.py", line 390, in 
write_manifest
"writing manifest file '%s'" % self.manifest)
  File "/usr/lib64/python3.7/distutils/cmd.py", line 335, in execute
util.execute(func, args, msg, dry_run=self.dry_run)
  File "/usr/lib64/python3.7/distutils/util.py", line 291, in execute
func(*args)
  File "/usr/lib64/python3.7/distutils/file_util.py", line 236, in write_file
f.write(line + "\n")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 
11: surrogates not allowed

With PR applied:

$ ../../python setup.py sdist
running sdist
running check
warning: sdist: standard file not found: should have one of README, README.txt, 
README.rst

reading manifest template 'MANIFEST.in'
writing manifest file 'MANIFEST'
creating hello-1.0
creating hello-1.0/hello
creating hello-1.0/tests
creating hello-1.0/tests/data
making hard links in hello-1.0...
hard linking setup.py -> hello-1.0
hard linking hello/__init__.py -> hello-1.0/hello
hard linking tests/test_cases.py -> hello-1.0/tests
hard linking tests/data/1.bin -> hello-1.0/tests/data
hard linking tests/data/\udcff.bin -> hello-1.0/tests/data
Creating tar archive
removing 'hello-1.0' (and everything under it)

Making this minimal test case, though, I found that there's another error 
somewhere when MANIFEST has already been created (ie: the patched version works 
for the initial generation of MANIFEST but it doesn't work to *regenerate* the 
MANIFEST).  Looking into that now.

--
Added file: https://bugs.python.org/file48349/test-case.tar.gz

___
Python tracker 
<https://bugs.python.org/issue36998>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36998] distutils sdist command fails to create MANIFEST if any filenames are undecodable

2019-05-21 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

>From my initial description:

"An sdist may contain files whose names are undecodable in the current locale.  
For instance, the sdist might include some files for testing whose filenames 
are undecodable because that's the format of the input for that application."

--

___
Python tracker 
<https://bugs.python.org/issue36998>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36998] distutils sdist command fails to create MANIFEST if any filenames are undecodable

2019-05-21 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

I like the idea of defaulting to UTF-8 (although I think you'll have to build 
consensus as to whether that's the right thing to do here) but it won't handle 
the use case here.  There's a need to handle files which are undecodable and 
encoding to utf-8 won't fix that.

--

___
Python tracker 
<https://bugs.python.org/issue36998>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36998] distutils sdist command fails to create MANIFEST if any filenames are undecodable

2019-05-21 Thread Toshio Kuratomi


Change by Toshio Kuratomi :


--
keywords: +patch
pull_requests: +13378
stage:  -> patch review

___
Python tracker 
<https://bugs.python.org/issue36998>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36998] distutils sdist command fails to create MANIFEST if any filenames are undecodable

2019-05-21 Thread Toshio Kuratomi


New submission from Toshio Kuratomi :

An sdist may contain files whose names are undecodable in the current locale.  
For instance, the sdist might include some files for testing whose filenames 
are undecodable because that's the format of the input for that application.

Currently, trying to create the sdist fails with output similar to this:

Traceback (most recent call last):
  File "setup.py", line 330, in 
main()
  File "setup.py", line 325, in main
setup(**setup_params)
  File 
"/home/badger/.local/lib/python3.5/site-packages/setuptools/__init__.py", line 
145, in setup
return distutils.core.setup(**attrs)
  File "/usr/lib/python3.5/distutils/core.py", line 148, in setup
dist.run_commands()
  File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
  File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
cmd_obj.run()
  File "setup.py", line 137, in run
SDist.run(self)
  File "/usr/lib/python3.5/distutils/command/sdist.py", line 158, in run
self.get_file_list()
  File "/usr/lib/python3.5/distutils/command/sdist.py", line 214, in 
get_file_list
self.write_manifest()
  File "/usr/lib/python3.5/distutils/command/sdist.py", line 362, in 
write_manifest
"writing manifest file '%s'" % self.manifest)
  File "/usr/lib/python3.5/distutils/cmd.py", line 336, in execute
util.execute(func, args, msg, dry_run=self.dry_run)
  File "/usr/lib/python3.5/distutils/util.py", line 301, in execute
func(*args)
  File "/usr/lib/python3.5/distutils/file_util.py", line 236, in write_file
f.write(line + "\n")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 45-46: 
ordinal not in range(128)

(I replicated the failure case by setting my locale to POSIX and using a 
standard utf-8 filename but this also applies to having a filename that is not 
actually text in any locale... as I said, filenames used for testing can run 
the gamut of odd choices).

This traceback is interesting as it occurs during writing of the MANIFEST.  
That shows that the undecodable file is read in correctly.  It's only when 
writing the file that it fails.  Some further debugging showed me that the 
filename is read in using the surrogateescape error handler.  So we can round 
trip the filename by using the surrogateescase error handler when writing it 
out.

I tested making the following change:

-f = open(filename, "w")
+f = open(filename, "w", errors="surrogateescape")

and sure enough, the sdist is now created correctly.

I'll submit a PR to fix this.

--
components: Distutils
messages: 343074
nosy: a.badger, dstufft, eric.araujo
priority: normal
severity: normal
status: open
title: distutils sdist command fails to create MANIFEST if any filenames are 
undecodable
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue36998>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35924] curses segfault resizing window

2019-05-12 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Hi Josiah, I've tested my sample program and it looks like the segmentation 
fault is fixed with ncurses-6.1-20190511: 
http://lists.gnu.org/archive/html/bug-ncurses/2019-05/msg00013.html

Are you able to give that a try and see whether it resolves the issue for you 
as well?

For the Core devs; Assuming this is fixed in a newer ncurses, how would you 
like to proceed with this bug?  I have a documentation PR to tell people about 
the bug in ncurses and the workaround: 
https://github.com/python/cpython/pull/13209  I can update that to mention the 
version of ncurses that this is fixed in if you want that.  Other than that, 
I'm not sure what more we can do.

--

___
Python tracker 
<https://bugs.python.org/issue35924>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14353] Proper gettext support in locale module

2019-05-12 Thread Toshio Kuratomi


Change by Toshio Kuratomi :


--
pull_requests: +13173
stage:  -> patch review

___
Python tracker 
<https://bugs.python.org/issue14353>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20044] gettext.install() ignores previous call to locale.setlocale()

2019-05-12 Thread Toshio Kuratomi

Toshio Kuratomi  added the comment:

I tested a small C program and found that setlocale takes precedence for 
LC_ALL, LC_MESSAGES, and LANG but not for LANGUAGE.

int main(int argc, char **argv) {
char *message1;

//setlocale (LC_ALL, "");
setlocale (LC_ALL, "pt_BR.utf-8");
bindtextdomain ("testc", "/srv/python/cpython/tmp");
textdomain ("testc");

message1 = gettext("lemon");
printf("%s\n", message1);
return 0;
}

$ LC_ALL=es_MX.utf-8 LANGUAGE= LC_MESSAGES=es_MX.utf-8 LANG=es_MX.utf-8 ./test
limão

$ LANGUAGE=es_MX  LANG=es_MX.utf-8 ./test
limón


So this could be considered a bug in the stdlib's gettext.  If we fix it, we'll 
need to make sure that we continue to honor LANGUAGE, though.

--
nosy: +a.badger

___
Python tracker 
<https://bugs.python.org/issue20044>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36837] Make il8n tools available from `python -m`

2019-05-11 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Scratch what I said in 
https://bugs.python.org/issue36837?@ok_message=msg%20342005%20created%0Aissue%2036837%20message_count%2C%20messages%20edited%20ok&@template=item#msg342005

GNU msgfmt does extract the charset correctly.  (My previous test failed to 
write any output so it was using the .mo file I had written out with msgfmt.py. 
 I realized that this morning when I figured out why my C test program wasn't 
finding any message catalog.

For reference the three ways to extract strings with the three tools are:
* pygettext.py test.py
* pybabel extract -o messages.pot test.py
* xgettext test.py -o messages.pot test.py

and the three ways to generate catalogs via the three tools are:
* msgfmt3.7.py  es_MX/LC_MESSAGES/domain.po
* msgfmt es_MX/LC_MESSAGES/testc.po -o es_MX/LC_MESSAGES/testc.mo
* pybabel compile -D test -d . [--use-fuzzy]

--

___
Python tracker 
<https://bugs.python.org/issue36837>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24263] unittest cannot load module whose name starts with Unicode

2019-05-10 Thread Toshio Kuratomi

Toshio Kuratomi  added the comment:

>From the description, I think the bug is that filenames that *begin* with 
>non-ascii are not searched for tests.  Looking at the test_dir.tar.gz 
>contents,  this is the test case that I'd use:

Broken:

$ python3 -m unittest discover -vv -p '*.py'
test_走 (tests試驗.Test試驗.試驗) ... ok
test_走 (tests試驗.test試驗.試驗) ... ok

--
Ran 2 tests in 0.000s

OK

Corrected:
$ /srv/python/cpython/python -m unittest discover -vv -p '*.py'
test_走 (tests試驗.Test試驗.試驗) ... ok
test_走 (tests試驗.test試驗.試驗) ... ok
test_走 (tests試驗.試驗.試驗) ... ok

--
Ran 3 tests in 0.000s

OK


isidentifier() is used because filenames to be discovered must be importable 
and thus valid identifiers:  
https://docs.python.org/3/library/unittest.html#test-discovery

--

___
Python tracker 
<https://bugs.python.org/issue24263>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36837] Make il8n tools available from `python -m`

2019-05-09 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

A note about the msgfmt problem.  It looks like GNU gettext's msgfmt has a 
similar problem but the msgfmt from pybabel does not.  This may mean that we 
need to change the gettext *Translation objects to be more tolerant of 
non-ascii encodings (perhaps defaulting to utf-8 instead of ascii).

--

___
Python tracker 
<https://bugs.python.org/issue36837>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36837] Make il8n tools available from `python -m`

2019-05-09 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Note, I've been doing some tests of how our gettext module differs from GNU 
gettext and run into a few bugs and lack of features which make msgfmt unusable 
and limit pygettext's usefulness.

* msgfmt doesn't seem to store the charset from the .po file into the .mo file. 
 I think this might have been okay for the lgettext() and gettext() methods 
under Python2 as those probably passed the byte strings from the .mo files 
through verbatim.  Under Python3, however, we have to decode the byte strings 
to text and we can't do that without knowing the charset.  This leads to a 
UnicodeDecodeError on any .mo file which contains non-ascii characters (which 
is going to be the majority of them)

* So far, I have found that pygettext doesn't understand how to extract strings 
from ngettext().  This means that your code can't use plural forms if you want 
to use pygettext to extract the strings.

These deficiencies are probably things that need to be fixed if we're going to 
continue to promote these tools in the documentation.

--
nosy: +a.badger

___
Python tracker 
<https://bugs.python.org/issue36837>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36310] pygettext3.7 Does Not Recognize gettext Calls Within fstrings

2019-05-09 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Eric, I'm CC'ing you on this issue because I'm not sure if you've considered 
f-strings and gettext and figured out a way to make them work together.  If you 
have, I can look into adding support for extracting the strings to pygettext 
but at the moment, I'm not sure if it's a style that we want to propogate or 
not.

The heart of the problem is that the gettext function has to run before string 
interpolation occurs.  With .format() and the other formatting methods in 
Python, this is achievable rather naturally.  For instance:

from gettext import gettext as _

first = "foo"
last = "baz"
foo = _("{first}, bar, and {last}").format(**globals())

will lead to the string first being gettext substituted like:

"{first}, bar, y {last}"

and then interpolated:

"foo, bar, y baz"

However, trying to do the same with f-strings translates more like this:

foo = _(f"{first}, bar, and {last}") 
foo = _("{first}, bar, and {last}".format(**globals()))  # This is the 
equivalent of the f-string

So the interpolation happens first:

"foo, bar, and baz"

Then, when gettext substitution is tried, it won't be able to find the string 
it knows to look for ("{first}, bar, and {last}")  so no translation will occur.

Allie Fitter's code corrects this ordering problem but introduces other issues. 
 Taking the sample string:

foo = f'{_("{first}, bar, and {last}")}

f-string interpolation runs first, but it sees that it has to invoke the _() 
function so the f-string machinery itself runs gettext:

f'{"{first}, bar, y {last}"}'

The machinery then simply returns that string so we end up with:

   '{first}, bar, y {last}'

which is not quite right but can be fixed by nesting f-strings:

foo = f'{_(f"{first}, bar, and {last}")}

which results in:

f'{f"{first}, bar, y {last}"}

which results in:

f'{"foo, bar, y baz"}'

And finally:

"foo, bar, y baz"

So, that recipe works but is that what we want to tell people to do?  It seems 
quite messy that we have to run the gettext function within the command and use 
nested f-strings so is there/should there be a different way to make this work?

--
nosy: +a.badger, eric.smith

___
Python tracker 
<https://bugs.python.org/issue36310>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14353] Proper gettext support in locale module

2019-05-08 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

As this problem does not affect Python3 I think it's up to the 2.7 release 
manager to decide if it should be merged.  benjamin, what do you think?  If you 
want it, I'll open a PR on github for it.

--
nosy: +a.badger, benjamin.peterson

___
Python tracker 
<https://bugs.python.org/issue14353>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6981] locale.getdefaultlocale() envvars default code and documentation mismatch

2019-05-08 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Hey doko, I was just looking through the oldest gettext bugs and found this bug 
open.  It was caused by your commits here: https://bugs.python.org/issue1166948 
.   It feels like we have a few choices:

* revert the LANGUAGE ordering change which would take us back to the 2.6 
behaviour. 
* update the documentation to reflect the new ordering [Since the change has 
been around for so long, I think this is my personal favorite]
* Remove LANGUAGE from setting the defaultlocale because the GNU gettext usage 
of this variable is actually very different than what we're doing here.  It 
seems like it should only affect LC_MESSAGES and should affect those only as a 
fallback.
* Revert the LANGUAGE ordering change to the beginning of the list but remove 
it from consideration as a source for the *encoding*.

what do you think?

--
nosy: +a.badger, doko

___
Python tracker 
<https://bugs.python.org/issue6981>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35924] curses segfault resizing window

2019-05-08 Thread Toshio Kuratomi


Change by Toshio Kuratomi :


--
keywords: +patch
pull_requests: +13120
stage:  -> patch review

___
Python tracker 
<https://bugs.python.org/issue35924>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35924] curses segfault resizing window

2019-05-08 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

My upstream (ncurses) bug report: 
http://lists.gnu.org/archive/html/bug-ncurses/2019-05/msg00010.html

--

___
Python tracker 
<https://bugs.python.org/issue35924>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35924] curses segfault resizing window

2019-05-08 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

I've diagnosed this a bit further and have a workaround for you.  It appears 
that using addstr() with a string with embedded newlines is a piece of the 
problem.  If I modify your example program so that we add each line as a 
separate string instead of adding them as a single string with embedded 
newlines, we get the ncurses ERR on resize instead of a segfault:

import curses

def main(stdscr):
y, x = curses.LINES//3, curses.COLS//3  # size is arbitrary
box = '\n'.join('+'*x for _ in range(y))
w = stdscr.subwin(y, x+1, y, x) 
while True: 
new_box = box[:]
w.clear()
for offset, line in enumerate(box.splitlines()):
w.addstr(offset, 0, line) 
w.getch()  # not required, just avoids a hot loop

curses.wrapper(main)


I don't see anything in the curses specification that forbids embedded newlines 
in the string to addstr(), though, so I am still thinking that this is a bug in 
ncurses.

--

___
Python tracker 
<https://bugs.python.org/issue35924>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36812] posix_spawnp returns error when used with file_actions

2019-05-08 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Yeah, I've verified what Victor said about the OS not giving us enough 
information to tell what file is causing the issue.  However, I wonder if we 
should change the error message to be less confusing?  I'm a godawful C 
programmer but maybe something like this:

-PyErr_SetFromErrnoWithFilenameObject(PyExc_OSError, path->object);
+if (file_actionsp != NULL) {
+/* OSErrors can be triggered by the program being invoked or by a
+ * problem with the files in file_actions.  Change the default
+ * error message so as not to confuse the programmer
+ */
+if (path->narrow != NULL) {
+char *err_msg_fmt = "While spawning %s\0";
+unsigned int err_msg_size = strlen(path->narrow) + 
strlen(err_msg_fmt) + 1;
+char* err_msg = malloc(err_msg_size);
+
+PyOS_snprintf(err_msg, err_msg_size, err_msg_fmt, 
path->narrow);
+/* Slight abuse, we're sending an error message rather than
+ * a filename
+ */
+PyErr_SetFromErrnoWithFilename(PyExc_OSError, err_msg);
+}
+}
+else
+{
+PyErr_SetFromErrnoWithFilenameObject(PyExc_OSError, path->object);
+}


Which leads to output like this:

>>> import os
>>> file_actions = [(os.POSIX_SPAWN_OPEN, 1, '.tmp/temp_file', os.O_CREAT | 
>>> os.O_RDWR, 777)]
>>> os.posix_spawnp('whoami', ['whoami'], os.environ, file_actions=file_actions)
Traceback (most recent call last):
  File "", line 1, in 
FileNotFoundError: [Errno 2] No such file or directory: 'While spawning whoami'


I can submit a PR for that and people can teach me how to fix my C if it's 
considered useful.

--

___
Python tracker 
<https://bugs.python.org/issue36812>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36812] posix_spawnp returns error when used with file_actions

2019-05-07 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Ah okay, I'll see what information posix_spawnp() (the C function) returns on 
error for that case.

--

___
Python tracker 
<https://bugs.python.org/issue36812>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35924] curses segfault resizing window

2019-05-07 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

I'm still debugging this but it may be an off-by-one error in ncurses, 
wresize.c.  I've found that if I modify the following section in ncurses, our 
problem goes away:

/*
   * Dispose of unwanted memory.
   */
  if (!(win->_flags & _SUBWIN)) { 
  if (ToCols == size_x) { 
  for (row = ToLines + 1; row <= size_y; row++) { 
   free(win->_line[row].text);
  } 
  } else { 
  for (row = 0; row <= size_y; row++) { 
   free(win->_line[row].text);
  } 
  }
  } 
  
  free(win->_line);
  win->_line = new_lines;

Replacing:
  for (row = ToLines + 1; row <= size_y; row++) { 
with:
  for (row = ToLines + 2; row <= size_y; row++) { 

fixes this error.  ToLines is a parameter passed in to wresize.  wresize will 
reuse ToLines number of rows from the old structure in the new structure.  Due 
to that, I think that the chances are good that it is ncurses which is at fault 
here.  I will try to rewrite the test case into a C program and then submit a 
bug report to ncurses upstream.  I'm not sure that there's a way we can work 
around this until that's fixed.

--
nosy: +a.badger

___
Python tracker 
<https://bugs.python.org/issue35924>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36656] Allow os.symlink(src, target, force=True) to prevent race conditions

2019-05-07 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

Additionally, the os module is supposed to closely follow the behaviour of the 
underlying operating system functions: 
https://docs.python.org/3/library/os.html  

> The design of all built-in operating system dependent modules of Python is 
> such that as long as the same functionality is available, it uses the same 
> interface; [..]

The POSIX symlimk function on which this is based has made the decision not to 
overwrite an existing symlink (See the EEXIST error in 
https://pubs.opengroup.org/onlinepubs/009695399/functions/symlink.html or man 
pages on symlink from one of the Linux distros: 
http://man7.org/linux/man-pages/man2/symlink.2.html )   As with many other 
POSIX-derived filesystem functions, the technique you propose, relying on 
atomic filesystem renames) would seem to be the standard method of writing 
race-resistant code.  Due to the mandate for the os module, it feels like that 
belongs in a utility function in custom code or another module rather than 
something for the os module.

A couple of thoughts on what you could do instead:

* A collection of utility functions that fixed race-conditions in filesystem 
handling could make a nice third party module on pypi.

* The stdlib shutil module provides an API that's supposed to be easier to 
implement common use cases than the os.* functions.  Perhaps you could propose 
your idea to the python-ideas mailing list as a new function in that module and 
see what people think of that?

--
nosy: +a.badger

___
Python tracker 
<https://bugs.python.org/issue36656>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36812] posix_spawnp returns error when used with file_actions

2019-05-07 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

The error message is reporting the path.  However, it is only the path 
component that is specified in the call to the function.  

This behaviour is not limited to the posix_spawnp() function but happens with 
any interface that can look up a command in the path.  For instance, here's 
what subprocess.Popen() gives me when I look use it against a 0644 file that is 
present in my PATH:

>>> subprocess.Popen(['fever.py'])
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
  File "/usr/lib64/python3.7/subprocess.py", line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'fever.py'

--
nosy: +a.badger

___
Python tracker 
<https://bugs.python.org/issue36812>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24263] unittest cannot load module whose name starts with Unicode

2019-05-07 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

I've opened a new PR at https://github.com/python/cpython/pull/13149 with the 
commit from https://github.com/python/cpython/pull/1338 and some additional 
changes to address the review comments given by serhiy.storchaka and rbcollins

--
nosy: +a.badger

___
Python tracker 
<https://bugs.python.org/issue24263>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24263] unittest cannot load module whose name starts with Unicode

2019-05-06 Thread Toshio Kuratomi


Change by Toshio Kuratomi :


--
pull_requests: +13064
stage: test needed -> patch review

___
Python tracker 
<https://bugs.python.org/issue24263>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36165] DOC: ssl.rst is missing formatting on two links

2019-05-06 Thread Toshio Kuratomi


Change by Toshio Kuratomi :


--
keywords: +patch
pull_requests: +13046
stage: needs patch -> patch review

___
Python tracker 
<https://bugs.python.org/issue36165>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36165] DOC: ssl.rst is missing formatting on two links

2019-05-06 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

I'll take a look at this one.

--
nosy: +a.badger

___
Python tracker 
<https://bugs.python.org/issue36165>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34369] kqueue.control() documentation and implementation mismatch

2018-08-10 Thread Toshio Kuratomi


Toshio Kuratomi  added the comment:

I don't believe (kqueue.control at least) is a regression from Argument Clinic. 
 Both the documentation and the behaviour are the same in Python-2.7.

--

___
Python tracker 
<https://bugs.python.org/issue34369>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34369] kqueue.control() documentation and implementation mismatch

2018-08-10 Thread Toshio Kuratomi


New submission from Toshio Kuratomi :

The current kqueue documentation specifies that timeout is a keyword argument 
but it can only be passed as a positional argument right now:

>>> import select
>>> ko = select.kqueue()
>>> ko.control([1], 0, timeout=10)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: control() takes no keyword arguments
>>> help(ko.control)
Help on built-in function control:

control(...) method of select.kqueue instance
control(changelist, max_events[, timeout=None]) -> eventlist

Calls the kernel kevent function.
- changelist must be an iterable of kevent objects describing the changes
  to be made to the kernel's watch list or None.
- max_events lets you specify the maximum number of events that the
  kernel will return.
- timeout is the maximum time to wait in seconds, or else None,
  to wait forever. timeout accepts floats for smaller timeouts, too.

This may be related to https://bugs.python.org/issue3852 in which the 
max_events argument used to be documented as optional but the code made it 
mandatory.

--
components: Library (Lib)
messages: 323357
nosy: a.badger
priority: normal
severity: normal
status: open
title: kqueue.control() documentation and implementation mismatch
type: behavior
versions: Python 2.7, Python 3.7

___
Python tracker 
<https://bugs.python.org/issue34369>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23966] More clearly expose/explain native and cross-build target information

2015-04-21 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Note for doko/barry: The multiarch/Tuples page should have a section on how the 
MultiArch Tuples interact with hwcaps (or a link to such a section in a 
different document).  The rationale for not using Gnu-Triplets in 
MulitArch/Tuples currently says that we do not want separate entries for (as an 
example) i386 vs i686 instructions but does not tell why.

https://wiki.debian.org/Multiarch/TheCaseForMultiarch#Mixed_ABIs_and_instruction_set_extensions
  says that the i386 vs i686 use case is probably better addressed by glibc's 
hwcaps but points back to MultiArch/Tuples for rationale.

A section of rationale and example to show how the multiarch tuple and hwcaps 
complement each other would fix that.

--
nosy: +a.badger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23966
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4963] mimetypes.guess_extension result changes after mimetypes.init()

2014-04-14 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Took a look at this and was able to reproduce it on Fedora Linux 20 and current 
cpython head.  It is somewhat random though.  I'm able to get reasonably 
consistent failures using image/jpeg and iterating the test case about 20 times.

Additionally, it looks like the data structure that 
mimetypes.guess_extensions() is reading its extensions from is a list so it 
doesn't have to do with dictionary sort order.  It has something to do with the 
way the extensions are read in from the files and then given to add_type().

Talking to r.david.murray I think that this particular problem can be solved by 
simply sorting the list of extensions prior to guess_extension taking the first 
extension off of the list.

The question of what to do when the first extension in the list isn't the best 
extension should be resolved in Issue1043134.

I'll attach a patch with test case for this problem.

--
nosy: +a.badger
Added file: http://bugs.python.org/file34821/issue4963.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4963
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19846] Python 3 raises Unicode errors with the C locale

2013-12-13 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

It's not a bug for upstart, systemd, sysvinit, cron, etc to use LANG=C.  The 
POSIX locale is the only locale guaranteed to exist on a system.  Therefore 
these low level services should be using LANG=C.  Embedded systems, thin 
clients, and other low memory or low disk devices may benefit from shipping 
without any locales.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19846
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

My impression was that python3 was supposed to help get rid of UnicodeError 
tracebacks, not mojibake.  If mojibake was the problem then we should never 
have gone down the surrogateescape path for input.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19977
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19846] Python 3 raises Unicode errors with the C locale

2013-12-10 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Looking at the glib code, this looks like the SO post is closer to the truth.  
The API documentation for g_filename_to_utf8() is over-simplified to the point 
of confusion.  This section of the glib API document is closer to what the code 
is doing: 
https://developer.gnome.org/glib/stable/glib-Character-Set-Conversion.html#file-name-encodings

* When encoding matters, glib and gtk functions will assume that char*'s that 
you pass to them point to strings which are encoded in utf-8.
* When char* are not utf8 you are responsible for converting them to utf8 to be 
used by the glib functions (if encoding matters).
* glib provides g_filename_to_utf8() for the special case of transforming 
filenames into the encoding that glib expects.  (Presumably because glib and 
gtk deal with non-utf8 unicode filenames more often than the equivalent 
environment variables, command line switches, etc).
* Contrary to the API docs for g_filename_to_utf8(), g_filename_to_utf8() will 
simply return a copy of the byte string it was passed unless 
G_FILENAME_ENCODING or G_BROKEN_FILENAMES is set.  If those are set, then the 
value of G_FILENAME_ENCODING might be used to attempt to decode the filename or 
the encoding specified in the user's locale might be used.

@haypo, I'm pretty sure from reading the code for g_get_filename_charsets() 
that you have the conditionals reversed.  What I'm seeing is:

if G_FILENAME_ENCODING:
charset = the first charset listed in G_FILENAME_ENCODING
if charset == '@locale':
charset = charset of user's locale
elif G_BROKEN_FILENAMES:
charset = charset of user's locale
else:
charset = 'UTF-8'

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19846
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19846] Python 3 raises Unicode errors with the C locale

2013-12-10 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Yes, it returns a list but unless I'm missing something in the general case 
it's the caller's responsibility to loop through the charsets to test for 
failure and try again.  This is not done automatically.

In the specific case we're talking about, first get_filename_charset() decides 
to only return the first entry in the list of charsets: 
list.https://git.gnome.org/browse/glib/tree/glib/gconvert.c#n1118

and then g_filename_to_utf8() disregards the charsets altogether because it 
sees that the filename is supposed to be utf-8 
https://git.gnome.org/browse/glib/tree/glib/gconvert.c#n1160

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19846
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19846] Python 3 raises Unicode errors with the C locale

2013-12-09 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Ahh... added to the nosy list and bug closed all before I got up for the day ;-)

A few words:

I do think that python is broken here.

I do not think that translating everything to utf-8 if ascii is the locale's 
encoding is the solution.

As I would state it, the problem is that python's boundary with the OS is not 
yet uniform.  If you set LC_ALL=C (note, LC_ALL=C is just one of multiple ways 
to beak things.  For instance, LC_ALL=en_US.utf8 when dealing with latin-1 data 
will also break) then python will still *read* non-ascii data from the OS 
through some interfaces but it won't output it back to the OS.  ie:

$ mkdir unicode  cd unicode
$ python3 -c 'open(ñ.txt.encode(latin-1), w).close()'
$ LC_ALL=en_US.utf8 python3
 import os
 dir_listing = os.listdir('.')
 for entry in dir_listing: print(entry)
... 
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf1' in position 
0: surrogates not allowed

Note that currently, input() and sys.stdin.read() won't read undecodable data 
so this is somewhat symmetrical but it seems to me that saying everything that 
interfaces with the OS except the standard streams will use surrogateescape on 
undecodable bytes is drawing a line in an unintuitive location.

(A further note to serhiy.storchaka Your examples are not showing anything 
broken in other programs.  xterm is refusing both input and output that is 
non-ascii.  This is symmetric behaviour.  ls is doing its best to display a 
*human-readable* representation of bytes that it cannot convert in the current 
encoding.  It also provides the -b switch to see the octal values if you 
actually care.  Think of this like opening a binary file in less or another 
pager.)

(Further note for haypo -- On Fedora, the default of en_US is utf8, not 
ISO8859-1.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19846
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17997] ssl.match_hostname(): sub string wildcard should not match IDNA prefix

2013-09-02 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

So, is this a security issue?  I've been wondering if I should apply the 
attached patch to the backports-ssl_match_hostname module on pypi.  I was 
hoping there'd be some information here as to whether this will be going into 
the stdlib in the future.

Thus far, ssl_match_hostname has just been a backport of the match_hostname 
function but if this is a security problem, I could press for us to diverge 
from the python3 stdlib.  It would be easier to make the case if this is seen 
as a critical problem that will need to be fixed even if the current patch 
might not be the eventual fix.

--
nosy: +a.badger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17997
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18713] Enable surrogateescape on stdin and stdout when appropriate

2013-08-20 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Nick and I had talked about this at a recent conference and came to it from 
different directions.  On the one hand, Nick made the point that any encoding 
of surrogateescape'd text to bytes via a different encoding is corrupting the 
data as a whole.  On the other hand, I made the point that raising an exception 
when doing something as basic as printing something that's text type was 
reintroducing the issues that python2 had wrt unicode, bytes, and encodings -- 
particularly with the exception being raised far from the source of the problem 
(when the data is introduced into the program).

After some thought, Nick came up with this solution.  The idea is that 
surrogateescape was originally accepted to allow roundtripping data from the OS 
and back when the OS considers it to be a string but python does not consider 
it to be text.  When that's the case, we know what the encoding was used to 
attempt to construct the text in python.  If that same encoding is used to 
re-encode the data on the way back to the OS, then we're successfully 
roundtripping the data we were given in the first place.  So this is just 
applying the original goal to another API.

--
nosy: +a.badger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17429] platform.platform() can throw Unicode error

2013-03-25 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Patch fixing the issues raised in r.david.murray's review:

* Merged _find_linux_release_file() back into linux_distribution() and broke 
out _UNIXCONFDIR module level variable to enable mocking of the unittest data
* Fix already present style issue in linux_distribution() code
* Separate test_dist_with_unicode() into test_dist_with_utf8 and 
test_dist_with_latin1
* Moved test data into the setUp() method and broke long lines to under 79 chars
* Switched from NamedTempfile() to TemporaryDirectory() = I think this fixes 
the Windows incompatibility but I can mark the tests as Skip on Windows if that 
isn't true (I don't have a Windows box to test on).
* Removed Misc/NEWS portion of patch

* I've switched from os.environ['LC_ALL'] = temp_locale to  
setlocale(locale.LC_ALL, temp_locale).  Testing showed that the former did not 
provoke the bug while the latter does.  Could someone point me at documentation 
on the difference between these two?  I'd like to understand what the two 
different calls do differently so that I use them correctly in other unittests.

--
Added file: http://bugs.python.org/file29573/00175-platform-unicode.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17429
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16310] zipfile: allow surrogates in filenames

2013-03-21 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Version 2 of the patch

* fixes for the style problems noted by ezio.melotti

--
Added file: http://bugs.python.org/file29531/python3-zipfile-surrogate.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16310
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17429] platform.platform() can throw Unicode error

2013-03-20 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Okay, new version of the patch with a unittest.

Re: os-release; I don't believe the current code can handle that file.  i\It 
changes format from a simple string (in most Linux distros) to key value pairs. 
 We'll probably need an update to the code to deal with that at some point in 
the future.

--
Added file: http://bugs.python.org/file29508/00175-platform-unicode.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17429
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17429] platform.platform() can throw Unicode error

2013-03-20 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Added NEWS file.  Rebased against hg default.  Ready for review.

--
Added file: http://bugs.python.org/file29509/00175-platform-unicode.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17429
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16310] zipfile: allow surrogates in filenames

2013-03-20 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Okay, here's the first version of a patch to add surrogate support to a 
zipfile.  I think it's the minimum required to fix this bug.

When archiving, if a filename contains surrogateescape'd bytes, it switches to 
cp437 when it saves the filename into the zipfile.  This seems to be the 
strategy of other zip tools.  Nothing changes when unarchiving (probably to 
deal with what comes out of other tools).

The documentation is also updated to mention that unknown encodings are a 
problem that the zipfile module doesn't handle automatically for you.

I think we could do better but this is a major improvement over the status quo 
(no tracebacks).  Would someone care to review this for merge and then we could 
work on adding some notion of a user-specified encoding to override cp437 
encoding on dearchiving.  (which I think would satisfy:  issue10614, 
issue10972).

The use case in issue10757 might be fixed by this patch (or this patch plus the 
user specified encoding).  Have to look a little harder at it.

--
keywords: +patch
Added file: http://bugs.python.org/file29517/python3-zipfile-surrogate.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16310
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16754] Incorrect shared library extension on linux

2013-03-19 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Matthias, Barry, and I looked at this at pycon today.  It looks a bit like the 
original intent was to have
SO = .so
SOABI = cpython-32mu

and then CPython extension module suffixes would be:

if SOABI:
so_ext = ''.join(., SOABI, SO)
else:
so_ext = SO

This would need to be used in distutils/commands/build_ext.py  We weren't sure 
if there are other places in the code that would need it as well but a quick 
build of a module which uses libraries and needs C extensions showed that this 
seems to work.

The one worrisome question is whether more people have come to rely on the SO 
variable holding the extension module suffix or if more code was broken by the 
extension module suffix replacing the library suffix in the SO variable.  
Answering that might better show us whether to change these variables back to 
their original meanings or to create brand new variables that have the correct 
values.

We also discovered the reason the current version appears to work with 
python-pillow on Ubuntu boxes but not Fedora.  The find_library_files() code 
first checks for library names that would match with the shared library 
suffix.  If that fails, it falls back to looking for the static library 
suffix.  On Fedora, there are no static libraries so this function just fails 
to find the library.  On Ubuntu, the code finds the static libraries and 
returns that.  This causes the code in python-pillow to attempt to link to the 
library with -ljpeg, -lpng, etc...  Since the shared libraries actually are 
present, the compiler and linker use the shared versions even though python 
only found the static versions.

--
nosy: +a.badger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16754
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17467] Enhancement: give mock_open readline() and readlines() methods

2013-03-19 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

Updated patch that includes unittests and fixes readlines() newline behaviour.

--
Added file: http://bugs.python.org/file29479/01000-mock_open-methods.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17467
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17467] Enhancement: give mock_open readline() and readlines() methods

2013-03-19 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

3rd version of the patch.

* Added some documentation to untitest.mock.rst
* Changed the code so that read, readline, and readlines all deplete the same 
copy of read_data.

--
Added file: http://bugs.python.org/file29484/01000-mock_open-methods.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17467
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17467] Enhancement: give mock_open readline() and readlines() methods

2013-03-18 Thread Toshio Kuratomi

New submission from Toshio Kuratomi:

unittest.mock provides a mock_open convenience function[1].  The convenience 
function handled file.read() but does not handle file.readline() or 
file.readlines().  I'll attach a patch that adds support for both of these 
methods.

[1]: http://docs.python.org/3/library/unittest.mock.html#mock-open

--
files: python3-mock_open-methods.patch
keywords: patch
messages: 184512
nosy: a.badger
priority: normal
severity: normal
status: open
title: Enhancement: give mock_open readline() and readlines() methods
versions: Python 3.3
Added file: http://bugs.python.org/file29454/python3-mock_open-methods.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17467
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17429] platform.platform() can throw Unicode error

2013-03-16 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

I'm at pycon.  I'll find someone during the sprints to teach me how the 
unittests are organized.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17429
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17429] platform.platform() can throw Unicode error

2013-03-15 Thread Toshio Kuratomi

New submission from Toshio Kuratomi:

Tested on python-3.2 and python-3.3.  platform.platform() looks for a file in 
/etc/ that looks like it will contain the name of the Linux distribution that 
python3 is running on.  Once found, it reads the contents of the file to have a 
name for the Linux distribution.  Most Linux distributions do create files 
inside of /etc/ with a single line which is the distribution name so this is a 
good heuristic.  However, these files are created by the operating system 
vendor and so they can have a different encoding than the encoding of the 
locale the user uses.  This means that if there are non-ascii characters inside 
the file, user code that invokes platform.platform() may throw a traceback.

Test:

$ LC_ALL=en_US.utf8 sudo echo ' Café'  /etc/fedora-release
$ LC_ALL=C python3
 import platform
 platform.platform()
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib64/python3.2/platform.py, line 1538, in platform
distname,distversion,distid = dist('')
  File /usr/lib64/python3.2/platform.py, line 358, in dist
full_distribution_name=0)
  File /usr/lib64/python3.2/platform.py, line 329, in linux_distribution
firstline = f.readline()
  File /usr/lib64/python3.2/encodings/ascii.py, line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22: 
ordinal not in range(128)

It seems that the standard method of fixing these that we're promoting in 
python3 is to use surrogateescape.  I'll provide a patch that does that.

--
messages: 184234
nosy: a.badger
priority: normal
severity: normal
status: open
title: platform.platform() can throw Unicode error
versions: Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17429
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17429] platform.platform() can throw Unicode error

2013-03-15 Thread Toshio Kuratomi

Changes by Toshio Kuratomi a.bad...@gmail.com:


--
keywords: +patch
Added file: http://bugs.python.org/file29416/00175-platform-unicode.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17429
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17429] platform.platform() can throw Unicode error

2013-03-15 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

I agree.  In my experience, utf-8 is the most common encoding.  Updated patch 
that defaults to utf-8 instead of the user's locale is attached.

--
Added file: http://bugs.python.org/file29420/00175-platform-unicode.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17429
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16310] zipfile: allow surrogates in filenames

2013-03-15 Thread Toshio Kuratomi

Toshio Kuratomi added the comment:

I found some standards docs that could bear on this:

http://www.pkware.com/documents/casestudies/APPNOTE.TXT

Appendix D:
D.1 The ZIP format has historically supported only the original IBM PC 
character encoding set, commonly referred to as IBM Code Page 437.
[..]
D.2 If general purpose bit 11 is unset, the file name and comment should 
conform to the original ZIP character encoding.  If general purpose bit 11 is 
set, the filename and comment must support The Unicode Standard, Version 4.1.0 
or greater using the character encoding form defined by the UTF-8 storage 
specification.
[..]

So there's two choices for a filename in a zipfile:

* bytes that make valid UTF-8 strings
* bytes that make valid strings in code page 437

http://en.wikipedia.org/wiki/Code_page_437#Standard_code_page

Code Page 437 takes up all 256 possible bit patterns available in a byte.

These two factors mean that if a filename in a zipfile is considered from the 
POV of a sequence of bytes, it can (according to the zipfile standard) contain 
any possible sequence of bytes.  If a filename is considered from the POV of a 
sequence of human characters, it can contain any possible sequence of unicode 
code points encoded as utf-8.  

The tricky bit: if the bytes are not valid utf-8 then officially the characters 
should be limited to the 256 characters of Code Page 437.   However, the client 
tools I've looked at exploit the fact that all bytes are possible to simply 
save the bytes that make up the filename into the zip file.

--
nosy: +a.badger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16310
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11200] Addition of abiflags breaks distutils

2011-02-22 Thread Toshio Kuratomi

Toshio Kuratomi a.bad...@gmail.com added the comment:

Distribute issue opened and patch based on Antoine's comments attached.

https://bitbucket.org/tarek/distribute/issue/191/distribute-fails-unittests-on-python-32

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11200] Addition of abiflags breaks distutils

2011-02-20 Thread Toshio Kuratomi

Toshio Kuratomi a.bad...@gmail.com added the comment:

Ha!  Your reply jogged my memory.  MvL mentioned exactly the potential for this 
backwards incompatibility here: 
http://mail.python.org/pipermail/python-dev/2010-December/106351.html when 
talking about whether other API changes could go into distutils to support 
accepted PEPs.

tarek, eric, since python-3.2 is now out, I assume that you're going to want to 
port distribute rather than back the changes out of distutils.  Do you want a 
bug report or is this high enough priority that you're already working on it?

--
status: pending - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11200] Addition of abiflags breaks distutils

2011-02-12 Thread Toshio Kuratomi

New submission from Toshio Kuratomi a.bad...@gmail.com:

When trying to build distribute on the latest python-3.2rc I get the following 
traceback in the unittests (two others that are similar as well):

==
ERROR: test_develop (setuptools.tests.test_develop.TestDevelopTest)
--
Traceback (most recent call last):
  File /usr/lib/python3.2/distutils/util.py, line 283, in subst_vars
return re.sub(r'\$([a-zA-Z_][a-zA-Z_0-9]*)', _subst, s)
  File /usr/lib/python3.2/re.py, line 167, in sub
return _compile(pattern, flags).sub(repl, string, count)
  File /usr/lib/python3.2/distutils/util.py, line 280, in _subst
return os.environ[var_name]
  File /usr/lib/python3.2/os.py, line 450, in __getitem__
value = self._data[self.encodekey(key)]
KeyError: b'abiflags'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File build/src/setuptools/tests/test_develop.py, line 52, in test_develop
cmd.ensure_finalized()
  File /usr/lib/python3.2/distutils/cmd.py, line 109, in ensure_finalized
self.finalize_options()
  File build/src/setuptools/command/develop.py, line 51, in finalize_options
easy_install.finalize_options(self)
  File build/src/setuptools/command/easy_install.py, line 225, in 
finalize_options
self.expand_dirs()
  File build/src/setuptools/command/easy_install.py, line 335, in expand_dirs
'install_scripts', 'install_data',])
  File build/src/setuptools/command/easy_install.py, line 323, in 
_expand_attrs
val = subst_vars(val, self.config_vars)
  File /usr/lib/python3.2/distutils/util.py, line 285, in subst_vars
raise ValueError(invalid variable '$%s' % var)
ValueError: invalid variable '$b'abiflags''

It seems that something in the addition of abiflags is causing distutils to 
search for abiflags in os.environ.  After talking with tarek on IRC we decided 
to open a bug here to see whether this is desirable change in behaviour within 
the stdlib.

The revision introducing abiflags is here: 
http://svn.python.org/view?view=revrevision=85697

--
components: Library (Lib)
messages: 128448
nosy: a.badger, barry, doko, tarek
priority: normal
severity: normal
status: open
title: Addition of abiflags breaks distutils
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10838] subprocess __all__ is incomplete

2011-01-05 Thread Toshio Kuratomi

New submission from Toshio Kuratomi a.bad...@gmail.com:

I have a compatibility module for subprocess in python-2.7 for people who are 
stuck on python-2.4 (without check_call) and they got a traceback from trying 
to use compat.subprocess.list2cmdline().

In order to use the stdlib's subprocess if it's of a recent enough version, I 
check the version and import the symbols from there using from subprocess 
import * in the compat module.  Unfortunately, one of the people is using 
list2cmdline() in their code and list2cmdline() is not in __all__.  Comparing 
the output, there's a few things not in __all__ in both python-2.7 and in 
python-3.1:

(From python-2.7, but python-3.1 boils down to the same list):

 sorted([d for d in  dir (subprocess) if not d.startswith('_')])
['CalledProcessError', 'MAXFD', 'PIPE', 'Popen', 'STDOUT', 'call', 
'check_call', 'check_output', 'errno', 'fcntl', 'gc', 'list2cmdline', 
'mswindows', 'os', 'pickle', 'select', 'signal', 'sys', 'traceback', 'types']
 sorted(subprocess.__all__)
['CalledProcessError', 'PIPE', 'Popen', 'STDOUT', 'call', 'check_call', 
'check_output']

So, MAXFD, list2cmdline, and mswindows seem to be left out.

These could either be made private (prepend with _), or added to __all__ to 
resolve this bug.  (I note that searching for subprocess.any of those three 
leads to some hits so whether or not they're intended to be public, they are 
being used :-(

--
components: Library (Lib)
messages: 125468
nosy: a.badger
priority: normal
severity: normal
status: open
title: subprocess __all__ is incomplete
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10838
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10838] subprocess __all__ is incomplete

2011-01-05 Thread Toshio Kuratomi

Toshio Kuratomi a.bad...@gmail.com added the comment:

IIRC, it was more along the lines of: all private names should be underscored.  
The difference being that we get to choose whether currently non-underscored 
names should get underscored, should be deprecated and then underscored, or 
should be made public, put into __all__, and properly documented.

I think there was general agreement that leaving them non-underscored but 
expecting people to treat them as private wasn't a good idea.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10838
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10838] subprocess __all__ is incomplete

2011-01-05 Thread Toshio Kuratomi

Toshio Kuratomi a.bad...@gmail.com added the comment:

For other's reference, there were three threads in November2010 that touch on 
this:

  :About removing argparse.__all__ or adding more methods to it:
http://mail.python.org/pipermail/python-dev/2010-November/105147.html

  :Removing tk interface in pydoc:
http://mail.python.org/pipermail/python-dev/2010-November/105375.html

The most on topic thread is the one with Subject:
  :[Python-Dev] Breaking undocumented API:
http://mail.python.org/pipermail/python-dev/2010-November/105392.html

People broke threading a few times so you might have to search on the subject.

And ick.  The thread's more of a mess than I remembered.  Reading what Guido 
wrote last it seems like:

All private names should be prepended with _ .  Imported modules are the 
exception to this -- they're private unless included in __all__.  Reading 
between the lines I think it's also saying that not all public names need to be 
in __all__.

So to resolve this ticket:

1) Is this the actual consensus from the end of those threads?
2) Are the three names mentioned in this ticket public or private?
3a) If private, initiate deprecation and create underscore versions of the 
variables.
3b) If public, documentation and adding to __all__ are good but not necessary.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10838
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9561] distutils: set encoding to utf-8 for input and output files

2010-09-13 Thread Toshio Kuratomi

Toshio Kuratomi a.bad...@gmail.com added the comment:

 - RPM spec files, which use ASCII or UTF-8 according to
 http://en.opensuse.org/openSUSE:Specfile_guidelines#Specfile_Encoding but
 it’s not confirmed in
 http://www.rpm.org/max-rpm/s1-rpm-build-creating-spec-file.html (linked
 from the LSB site)
 UTF-8 is a superset of ASCII. If you use utf-8 but only write ascii
 characters, your output file will be written to utf-8... but it will be also
 encoded to ascii. It's magical :-)

 I know that, but it does not answer the question:  Is it okay for these files
 to use UTF-8?

rpm spec files are encoding agnostic similar to POSIX filesystems.  This causes 
no end of troubles for people writing python code that deals with python of 
course, as they cannot rely on the bytes that they are dealing with from one 
package to another to have the same encoding (Remember that things like 
dependency solvers have to compare the information from multiple packages to 
make their decisions).

Individual distributions will have different policies about encoding and the 
use of unicode in spec files to try and mitigate the problems.  For instance, 
Fedora specifies utf-8 in the spec files and additionally specifies that 
package names must be ascii.  (So if there's a package name: python-café, we 
would likely transcribe it as python-cafe when we made a package for it).

utf-8 is a good default for locales on POSIX systems so it's a good default for 
encoding spec files but I know there's some people out there who make their own 
packages that aren't utf-8.  I haven't checked but I also wouldn't be surprised 
if some Asian countries (where the bytes-per-character with utf-8 is high) have 
local distributions that use non-utf-8 encoding as well.  Whether either of 
these use cases needs to be catered to in distutils (when the support is going 
away in distutils2) I'll leave to someone else to decide.  My personal gut 
instinct is no but I'm not one of the people using a non-utf-8 locale.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9561
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1538778] pyo's are not overwritten by different optimization levels

2010-09-13 Thread Toshio Kuratomi

Toshio Kuratomi a.bad...@gmail.com added the comment:

It doesn't fix the problem as it falls into the third class of solutions (one 
that requires cooperation by the system administrator to diagnose and fix).

OTOH, at this point in time I'm putting all of my packages in system packages 
where the .pyos are pregenerated correctly so I personally won't be getting new 
bug reports on this problem so I personally don't need to see this fixed 
anymore.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1538778
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4359] at runtime, distutils uses buildtime files

2009-12-18 Thread Toshio Kuratomi

Toshio Kuratomi a.bad...@gmail.com added the comment:

Hey tarek, the main thrust of this bug for me was storing the data in an
inappropriate format and not having an API to get at it; things that I
think the sysconfig branch will address.  Does it make sense to have a
bug to track that progress?  Does it make sense for that bug to be this
one or a new one?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4359
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue644744] bdist_rpm fails when installing man pages

2009-11-23 Thread Toshio Kuratomi

Toshio Kuratomi a.bad...@gmail.com added the comment:

sed is one of the programs we assume is always present when we build
packages in Fedora which is probably also what is wanted here.  (A
default install of Fedora will include sed but someone might be able to
create a minimal install that did not include it.) Note that within
Fedora we usually use a wildcard with man pages.  For example::

  %{_mandir}/man1/foo.1*

I'd suggest doing this rather than hardcoding .gz.  Automatic
compression of manpages could be disabled on other distros, set to
bzip2, compress, or xz instead.  Wildcarding the suffix will catch all
of these cases and be more future-proof.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue644744
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue644744] bdist_rpm fails when installing man pages

2009-11-23 Thread Toshio Kuratomi

Toshio Kuratomi a.bad...@gmail.com added the comment:

Agreed.  The substitution is still needed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue644744
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4006] os.getenv silently discards env variables with non-UTF-8 values

2008-11-24 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

'''
@a.badger: The behaviour (drop non encodable strings) is not really a 
problem if you configure correctly your program and computer. Eg. you 
spoke about CGI-WSGI: if your website also speak UTF-8, you will be 
able to read all environment variables. So this issue is not 
important, it only appears when your website/OS is not well 
configured. I mean the problem is not in Python but outside Python. 
The PATH variable contains directory names, if you have only names 
encodable in your filesystem encoding (UTF-8 most of the time), you 
will be able to use the PATH variable. If a directory has an non 
decodable name, rename the directory but don't try to fix Python!
'''

The idea that having mixed encodings on a system is a misconfiguration 
is a fallacy.

1) In a multiuser setup, each user has a choice of what encoding to use.
 So mixed encodings are both possible and valid.

2) In a legacy system, your operating system may have all utf-8 naming
for the core OS but all of the old data files is being mounted with
another encoding that the legacy programs on the host expect.

3) On an nfs mount, data may come from users on different machines from
widely separated areas using different system encodings.

4) The same thing as 1-3 but applied to any of the data a site may be
passing via an environment variable rather than just file and directory
names.

5) An application may have to deal with different encodings from the
system default due to limitations of another program.  Since one of
python's many uses is as a glue language, it needs to be able to deal
with these quirks.

6) The application you're interfacing may just be using bytes rather
than text in the environment variables.

Let me put it this way:

If I write a file in a latin-1 encoding and put it on my system that has
a utf-8 system encoding what does python-3 do?

1) If I try to open it as a text file: open('filename', 'r') it throws
a UnicodeDecodeError when I attempt to read some non-utf-8 characters
from it.

2) As a programmer I then know to open it as binary open('filename',
'rb') and do my own decoding of the data now that I've been made aware
that I must take this corner case into account.

Some notes:
1) This seems to be the right general procedure to take when handling
things that are usually text but can contain arbitrary bytes.

2) This makes use of python's exception infrastructure to tell the
programmer plainly what's going wrong instead of silently ignoring
values that the programmer may not have encountered in their test data
but could exist in the real world.  Would you rather get a bug report
from a user that says: FooApp gives me a UnicodeDecodeError traceback
pointing at line 345 (how open() works) or FooApp never authenticates
me (which you then have to track down to the fact that the credentials
on the user's system are being passed in an env var and are not in the
system encoding.)

3) This learns the correct lesson from python-2's unicode problems: Stop
the mixture of bytes and unicode at the border so the programmer can be
explicit about how to deal with the odd-ball data there.  It does not
become squeamish about throwing a Unicode Exception which is the wrong
lesson to learn from python-2.

4) It also doesn't refuse to acknowledge that the world outside python
is not as simple and elegant as the world inside python and allows the
programmer to write an interface to that world instead of forcing them
to go outside of python to deal with it.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4006
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4006] os.getenv silently discards env variables with non-UTF-8 values

2008-11-24 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

 The bug tracker is maybe not the right place to discuss a new Python3
feature.

It's a bug!  But if you guys want it to be a feature, then what mailing
list do I need to join?  Is there one devoted to Unicode or is
python-dev where I need to go?

 1) return mixed unicode and byte types in os.environ
One goal of Python3 was to avoid mixing bytes and characters (bytes/str).

As stated, in my evaluation of the four options, +1 to this, option #1 takes
us back to the problems encountered in python-2.

 2) return only byte types in os.environ
 os.environ contains text (characters) and so should decoded as unicode.

This is correct but is not accurate :-)  os.environ, the python variable,
contains only unicode because that's the way it's coded.  However, the Unix
environment which os.environ attempts to give access to contains bytes which
are almost always representable as characters.  The two caveats are:

1) There's nothing that constrains it to characters -- putting byte
sequences
   that do not include null in the environment is valid.

2) The characters in the environment may be mixed encodings, sometimes
due to
   things outside of the user's control.

 3) raise an exception if someone attempts to access an environment
 variable that cannot be decoded to unicode via the system encoding and
 allow the value to be accessed as a byte string via another method.
 4) silently ignore the non-decodable variables when accessing os.environ
 the normal way but have another method of accessing it that returns all
 values as byte strings.

 Why not for (3).


Do you mean, I support 3?  Or did you not finish a thought here?

 But what would be the another method (4) to access byte 
 string? The problem of having two methods is that you need consistent 
 objects.

This is exactly the problem I was talking about in my analysis of #4 in the
previous comment.  This problem plagues the new os.listdir() method as
well by
introducing a construct that programmers can use that doesn't give all the
information (os.listdir('.')) but also doesn't warn the programmer when the
information is not being shown.

 Imagine that you have os.environ (unicode) and os.environb (bytes).
 
 Example 1:
   os.environb['PATH'] = b'\xff\xff\xff\xff'
 What is the value in os.environ['PATH']?

Since option 4 mimics the os.listdir() method, accesing os.environ['PATH']
would give you a KeyError.  ie, the value was silently dropped just as
os.listdir('.') does.

 Example 2:
   os.environb['PATH'] = b'têst'
 What is the value in os.environ['PATH']?

This doesn't work in python3 since byte strings can only be ASCii literals.

 Example 3:
   os.environ['PATH'] = 'têst'
 What is the value in os.environb['PATH']?

Dependent on the default system encoding.  Assuming utf-8 encoding,
os.environb['PATH'] == b't\xc3\xaast'

 Example 4:
  should I use os.environ['PATH'] or os.environb['PATH'] to get the current
  PATH?

Should you use os.listdir('.') or os.listdir(b'.') to get the list of
files in
the current directory?

This is where treating pathnames, environment variables and etc as strings
instead of bytes becomes non-simple.  Now you have to decide what you really
want to know (and possibly keep two slightly different values if you want to
know two things.)

If you want to keep the path in order to look up commands that the user can
run you want os.environb['PATH'] since this is exactly what the shell
will use
when the user types a command at the commandline.

If you want to display the elements of the PATH for the user, you probably
want this::
  try:
  path = os.environ['PATH'].split(':')
  except KeyError:
  try:
  temp_path = os.environ['PATH'].split(b':')
  except KeyError:
  path = DEFAULT_PATH
  else:
  path = []
  for directory in os.environ['PATH'].split(b':'):
  path.append(unicode(directory,
  sys.getdefaultencoding(), 'replace'))

 It introduces many new cases (bugs?) that have to be prepared and tested.

Those bugs are *already present*.  Without taking one of the four options,
there's simply no way to code a solution.  Take the above code and imagine
that there's no way to access the user's PATH variable when a
non-default-encoding character is present in the PATH.  That means that
you're
always stuck with the value of DEFAULT_PATH instead of being able to display
something reasonable to the user.

(Note, these examples are pretty much the same for option #3 or option
#4.  The
value of option #3 becomes apparent when you use os.getenv('PATH')
instead of
os.environ['PATH'])

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4006
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4006] os.getenv silently discards env variables with non-UTF-8 values

2008-11-23 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

Pardon, but when you close something as wontfix it's polite to say why.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4006
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1208304] urllib2's urlopen() method causes a memory leak

2008-11-23 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

I tried to repeat the test in http://bugs.python.org/msg60749 and found
that the descriptors will close if you read from the file before closing.

so this leads to open descriptors::

  import urllib2
  f = urllib2.urlopen('http://www.google.com')
  f.close()

while this does not::

  import urllib2
  f = urllib2.urlopen('http://www.google.com')
  f.read(1)
  f.close()

--
nosy: +a.badger

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1208304
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1208304] urllib2's urlopen() method causes a memory leak

2008-11-23 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

One further data point.   On two rhel5 systems with identical kernels,
both x86_64, both python-2.4.3... basically, everything I've thought to
check identical, I ran the test code with f.read() in an infinite loop.
 One system only has one TCP socket in use at a time.  The other one has
multiple TCP sockets in use, but they all close eventually.

/usr/sbin/lsof -p INTERPRETER_PID|wc -l reported 

96 67 97 63 91 62 94 78

on subsequent runs.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1208304
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4006] os.getenv silently discards env variables with non-UTF-8 values

2008-11-23 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

Is it a bug?  If so, then it should be retargetted to 3.1 instead of
closed wontfix.  If it's not a bug then there should be an explanation
of why it's not a bug.

As for fixing it there are several inelegant methods that are better
than silently ignoring the problem:

1) return mixed unicode and byte types in os.environ
2) return only byte types in os.environ
3) raise an exception if someone attempts to access an environment
variable that cannot be decoded to unicode via the system encoding and
allow the value to be accessed as a byte string via another method.
4) silently ignore the non-decodable variables when accessing os.environ
the normal way but have another method of accessing it that returns all
values as byte strings.

#4 is closest to what was done with os.listdir().  However, I think that
approach is wrong for os.listdir() and os.environ because it leads to
code that works in simple testing but can start failing mysteriously
when it becomes used in more environments.  The os.listdir() method will
lead to lots of people having to write code that uses the byte methods
on Unix and does its own conversion because it's the only thing
guaranteed to work on Unix and the unicode methods on Windows because
it's the only thing guaranteed to work there.  It degenerates to case #2
except harder to debug and requiring more platform specific knowledge of
the programmer.

#3 seems like the best choice to me as it provides a way for the
programmer to discover what's wrong and provide a fix but people seem to
have learned the wrong lessons from the python2 UnicodeEncode/Decode
problems so that might not have a large following other than me

#2 is conceptually correct since environment variables are a point where
you're receiving bytes from a non-python environment.  However, it's
very annoying for the common case where everything in the environment
has a single encoding.

#1 is the easiest for simplistic code to deal with but seems to violate
the python3 philosophy the most.  I don't like it as it takes us to one
of the real failings of python2's unicode handling: Not knowing what
type of data you're going to get back from a method and therefore not
knowing if you have to convert it before passing it on.  Please don't do
this one as it's two steps forward and one step backwards from where we
are now.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4006
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4006] os.getenv silently discards env variables with non-UTF-8 values

2008-11-23 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

I'm sorry but For the moment, this case is just not supported. is not
an explanation of why this is not a bug.  It is a statement that the
interpreter cannot handle a situation that has arisen.

If you said, We don't believe that any computer has mixed encodings
that can show up in environment variables that would be an explanation
of why this is not a bug and I could then give counter-examples of
computers that have mixed encodings in their environment variables.  So
what's the reason this is not a bug?

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4006
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4359] at runtime, distutils uses buildtime files

2008-11-19 Thread Toshio Kuratomi

New submission from Toshio Kuratomi [EMAIL PROTECTED]:

When using some distutils functions, distutils attempts to use buildtime
files like Makefile and pyconfig*.h as data sources.  For instance, this
snippet::

  from distutils.command.install import install
  from distutils.core import Distribution
  dist = Distribution({name: foopkg})
  cmd = install(dist)
  cmd.ensure_finalized()

There's two reasons this should change.

1) Some Linux distributions separate the python runtime and buildtime
files and put the buildtime files in a -devel package.  Depending on
these buildtime files means that the -devel package can be needed for
running python scripts.  For instance, here's the traceback that occurs
when the previous commands are run without python-devel on Fedora Linux::

  Traceback (most recent call last):
File stdin, line 1, in module
File /usr/lib/python2.5/distutils/cmd.py, line 117, in
ensure_finalized
  self.finalize_options()
File /usr/lib/python2.5/distutils/command/install.py, line 273, in
finalize_options
  (prefix, exec_prefix) = get_config_vars('prefix', 'exec_prefix')
File /usr/lib/python2.5/distutils/sysconfig.py, line 493, in
get_config_vars
  func()
File /usr/lib/python2.5/distutils/sysconfig.py, line 352, in
_init_posix
  raise DistutilsPlatformError(my_msg)
  distutils.errors.DistutilsPlatformError: invalid Python installation:
unable to open /usr/lib/python2.5/config/Makefile (No such file or
directory)

2) keeping the information in a Makefile and *.h files and then having
regular expressions pull the information out is fragile and not what the
tools were meant for.  Using a defined data format is much better.

The variables necessary for building extensions should be placed in a
data file of some sort.  This can be built by the configure script at
the same time as it's substituting variables into the Makefile and
pyconfig files.

xml is good for interoperability and we have good modules in the std
library for that now.  .ini is less verbose and we have modules to deal
with that as well.

--
components: Distutils
messages: 76083
nosy: a.badger
severity: normal
status: open
title: at runtime, distutils uses buildtime files
type: behavior
versions: Python 2.5

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4359
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4036] Support bytes for subprocess.Popen()

2008-10-15 Thread Toshio Kuratomi

Changes by Toshio Kuratomi [EMAIL PROTECTED]:


--
nosy: +a.badger

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4036
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4126] remove not decodable environment variables

2008-10-15 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:


About your subprocess example: we choose to refuse it because we don't 
mix bytes (your non decodable PATH) and unicode ('myapp.sh')

If python3 is doing things right we shouldn't be mixing bytes and
unicode here:

1) the programmer is only sending unicode to subprocess, not a mixture
of bytes and unicode.

2) Python should be converting the arguments to subprocess.call() into
bytes before combining it with PATH, at least on Unix.  The conversion
to bytes is something Python has to do at some point before looking on
the filesystem for the command as filenames are a sequence of bytes in Unix.

Note: your patch for #4036 looks like the right thing to do for the args
argument but as you point out, that doesn't have bearing on the environment.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4126
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4126] remove not decodable environment variables

2008-10-14 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

Yep :-)  I am against throwing away valid data just because we can't
interpret it automatically.  Environment variables in Unix hold bytes. 
Those bytes are usually ASCii characters, however, they do not have to
be.  This is a case of being on the border between python and the
outside world so we need to be able to pass in bytes if the user
requests it.

Let's say that you have a local directory of: /home/\xff/username/bin in
your PATH environment variable and a command named my_app.sh in there. 
At the shell you can happily run myapp.sh and it will do it's thing. 
Now you open your python shell and do:
subprocess.call(['myapp.sh'])

and it doesn't work.  This is non-intuitive behaviour for people who are
used to how the shell works.  All this patch will do is take away the
work around of subprocess.call(['bash', 'myapp.sh'])


I tested Python 2.5: b is also removed, but Python 2.6 keeps the 
variable b.

I just tested python-2.5.1 and b is kept, not removed.

--
nosy: +a.badger

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4126
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4006] os.getenv silently discards env variables with non-UTF-8 values

2008-10-02 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

It's not a feature it's a bug! :-)  (I hope you meant to have a smiley
too ;-)

As stated in the os.listdir() related bug, on Unix filesystems filenames
are a sequence of bytes.  The system encoding allows the user-level
tools to display the filenames as characters instead of byte sequences
and allows you to manipulate the filenames using characters instead of
byte sequences.  But if you change your locale the user level tools will
interpret the byte sequences as different characters and allow you free
access to create files in a different encoding.

So in order to work correctly on Unix you must be able to accept byte
sequences in place of filename.

The sad fact of the matter is that while we can be all unicode with data
and strings inside of python we will always have to be prepared to
handle supposed strings as byte sequences when talking to some things
outside of ourselves.  Sometimes the border has a specification that
tells us what encoding to expect and we can do conversion automatically.
 But when it doesn't we have to be prepared to 1) tell the user that the
data exists even but isn't string type as expected and 2) make the byte
sequence available to the user.

Silently pretending that the data doesn't exist at all is a bug (maybe a
minor bug depending on how often we expect the situation to arise but
still a bug.)

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4006
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3991] urllib.request.urlopen does not handle non-ASCII characters

2008-10-01 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

Oh, that's cool.  I've been fine with this being a request for a needed
function to quote and unquote full urls rather than a bug in urlopen().

I think iri's are a distraction here, though.  The RFC for iris even
says that specifications that call for uris and do not mention iris
should not take iris.  So there's definitely a need for a function to
quote a full uri.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4006] os.getenv silently discards env variables with non-UTF-8 values

2008-10-01 Thread Toshio Kuratomi

New submission from Toshio Kuratomi [EMAIL PROTECTED]:

On a Linux system with a locale setting whose encoding is utf-8, if you
set an environment variable to have a non-utf-8 chanacter, that
environment variable silently does not appear in os.environ::

mkdir ñ
convmv -f utf-8 -t latin-1 --notest ñ
for i in * ; do export PATH=$PATH:$i ; done
echo $PATH
/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/home/badger/bin:�
python3.0
Python 3.0rc1 (r30rc1:66499, Sep 28 2008, 08:21:09) 
[GCC 4.3.0 20080428 (Red Hat 4.3.0-8)] on linux2
Type help, copyright, credits or license for more information.
 import os
 os.environ['PATH']
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python3.0/os.py, line 389, in __getitem__
return self.data[self.keymap(key)]
KeyError: 'PATH'

I'm uncertain of the impact of this.  It was brought up in a discussion
of sending non-ASCii data to a CGI-WSGI script where the data would be
transferred via os.environ.

--
components: Unicode
messages: 74118
nosy: a.badger
severity: normal
status: open
title: os.getenv silently discards env variables with non-UTF-8 values
type: behavior

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4006
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3991] urllib.request.urlopen does not handle non-ASCII characters

2008-09-30 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

The purpose of such a function would be to take something that is not a
valid uri but 1) is a common way of expressing the way to get to the
resource and 2) follows certain rules and turns that into something that
is a valid uri.  non-ASCii strings in the path are a good example of
this since there is a well defined method to encode the strings into the
URL if you are given a character encoding to apply to it.

My first, naive thought is that if the input can be parsed by
urlparse(), then there is a very good chance that we have the ability to
escape the string properly.  Looking at the invalid uri that I gave, for
instance, if you additionally specified an encoding for the path element
there's no reason a function couldn't do the escaping.

What are example inputs that you are concerned about?  I'll see if I can
come up with code that works with them.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3991] urllib.request.urlopen does not handle non-ASCII characters

2008-09-29 Thread Toshio Kuratomi

Toshio Kuratomi [EMAIL PROTECTED] added the comment:

Possibly.  This is a change from python-2.x's urlopen() which escaped
the URL automatically, though.  I can see the case for having the user
call an escape function themselves instead of having urlopen() perform
the escape for them.  However, that function would need to be written.
(The present parse.quote() method only quotes correctly if only the path
component is passed; there's no function to take a full URL and quote it
appropriately.)

Without such a function, a whole lot of code bases will have to reinvent
the wheel creating functions to parse the path out, run it through
urllib.parse.quote() and then pass the result to urlib.urlopen().

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3991] urllib.request.urlopen does not handle non-ASCII characters

2008-09-28 Thread Toshio Kuratomi

New submission from Toshio Kuratomi [EMAIL PROTECTED]:

Tested on python-3.0rc1 -- Linux Fedora 9

I wanted to make sure that python3.0 would handle url's in different
encodings.  So I created two files on an apache server which were named
½ñ.html.  One of the filenames was encoded in utf-8 and the other in
latin-1.  Then I tried the following::

from urllib.request import urlopen
url = 'http://localhost/u/½ñ.html'
urlopen(url.encode('utf-8')).read()

Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python3.0/urllib/request.py, line 122, in urlopen
return _opener.open(url, data, timeout)
  File /usr/lib/python3.0/urllib/request.py, line 350, in open
req.timeout = timeout
AttributeError: 'bytes' object has no attribute 'timeout'

The same thing happens if I give None for the two optional arguments
(data and timeout).

Next I tried using a raw Unicode string:

 urlopen(url).read()
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python3.0/urllib/request.py, line 122, in urlopen
return _opener.open(url, data, timeout)
  File /usr/lib/python3.0/urllib/request.py, line 359, in open
response = self._open(req, data)
  File /usr/lib/python3.0/urllib/request.py, line 377, in _open
'_open', req)
  File /usr/lib/python3.0/urllib/request.py, line 337, in _call_chain
result = func(*args)
  File /usr/lib/python3.0/urllib/request.py, line 1082, in http_open
return self.do_open(http.client.HTTPConnection, req)
  File /usr/lib/python3.0/urllib/request.py, line 1068, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
  File /usr/lib/python3.0/http/client.py, line 843, in request
self._send_request(method, url, body, headers)
  File /usr/lib/python3.0/http/client.py, line 860, in _send_request
self.putrequest(method, url, **skips)
  File /usr/lib/python3.0/http/client.py, line 751, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position
7-8: ordinal not in range(128)

So, in python-3.0rc1, this method is badly broken.

--
components: Unicode
messages: 73982
nosy: a.badger
severity: normal
status: open
title: urllib.request.urlopen does not handle non-ASCII characters
versions: Python 3.0

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2211] Cookie.Morsel interface needs update

2008-08-04 Thread Toshio Kuratomi

Changes by Toshio Kuratomi [EMAIL PROTECTED]:


--
nosy: +a.badger

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2211
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com