Re: Mutating an HTML file with BeautifulSoup

2022-08-21 Thread Buck Evan
I've had much success doing round trips through the lxml.html parser.

https://lxml.de/lxmlhtml.html

I ditched bs for lxml long ago and never regretted it.

If you find that you have a bunch of invalid html that lxml inadvertently
"fixes", I would recommend adding a stutter-step to your project: perform a
noop roundtrip thru lxml on all files. I'd then analyze any diff by
progressively excluding changes via `grep -vP`.
Unless I'm mistaken, all such changes should fall into no more than a dozen
groups.




On Fri, Aug 19, 2022, 1:34 PM Chris Angelico  wrote:

> What's the best way to precisely reconstruct an HTML file after
> parsing it with BeautifulSoup?
>
> Using the Alice example from the BS4 docs:
>
> >>> html_doc = """The Dormouse's story
> 
> The Dormouse's story
>
> Once upon a time there were three little sisters; and
> their names were
> http://example.com/elsie; class="sister" id="link1">Elsie,
> http://example.com/lacie; class="sister" id="link2">Lacie and
> http://example.com/tillie; class="sister" id="link3">Tillie;
> and they lived at the bottom of a well.
>
> ...
> """
> >>> print(soup)
> The Dormouse's story
> 
> The Dormouse's story
> Once upon a time there were three little sisters; and
> their names were
> http://example.com/elsie; id="link1">Elsie,
> http://example.com/lacie; id="link2">Lacie and
> http://example.com/tillie; id="link3">Tillie;
> and they lived at the bottom of a well.
> ...
> 
> >>>
>
> Note two distinct changes: firstly, whitespace has been removed, and
> secondly, attributes are reordered (I think alphabetically). There are
> other canonicalizations being done, too.
>
> I'm trying to make some automated changes to a huge number of HTML
> files, with minimal diffs so they're easy to validate. That means that
> spurious changes like these are very much unwanted. Is there a way to
> get BS4 to reconstruct the original precisely?
>
> The mutation itself would be things like finding an anchor tag and
> changing its href attribute. Fairly simple changes, but might alter
> the length of the file (eg changing "http://example.com/; into
> "https://example.com/;). I'd like to do them intelligently rather than
> falling back on element.sourceline and element.sourcepos, but worst
> case, that's what I'll have to do (which would be fiddly).
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: New Python implementation

2021-02-15 Thread Buck Evan
On Thu, Feb 11, 2021 at 1:49 PM dn via Python-list 
wrote:

> When I first met it, one of the concepts I found difficult to 'wrap my
> head around' was the idea that "open software" allowed folk to fork the
> original work and 'do their own thing'. My thinking was (probably)
> "surely, the original is the authoritative version". Having other
> versions seemed an invitation to confusion and dilution.
>
> However, as soon as (open) software is made available, other people
> start making it 'better' - whatever their own definition of "better".
>
> Yes, it is both a joy and a complication.
>
> ...
>
> Wishing you well. It seems (to (neos-ignorant) me at least) an ambitious
> project. There are certainly times when 'execution speed' becomes a
> major criteria. Many of us will look forward to (your development of) a
> solution. Please let us know when it's ready for use/trials...
>

Well put! Thank you for this thoughtful and informative message. You
obviously put substantial work into it.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Explicit vararg values

2018-09-22 Thread Buck Evan
Received?

On Sun, Sep 16, 2018 at 3:39 PM Buck Evan  wrote:

> I started to send this to python-ideas, but I'm having second thoughts.
> Does tihs have merit?
>
> ---
> I stumble on this a lot, and I see it in many python libraries:
>
> def f(*args, **kwargs):
> ...
>
> f(*[list comprehension])
> f(**mydict)
>
> It always seems a shame to carefully build up an object in order to
> explode it, just to pack it into a near-identical object.
>
> Today I was fiddling with the new python3.7 inspect.signature
> functionality when I ran into this case:
>
> def f(**kwargs): pass
> sig = inspect.signature(f)
> print(sig.bind(a=1, b=2))
>
> The output is "". I found this a
> bit humorous since anyone attempting to bind values in this way, using
> f(kwargs={'a': 1, 'b': 2}) will be sorely dissappointed. I also wondered
> why BoundArguments didn't print '**kwargs' since that's the __str__ of that
> parameter object.
>
> The syntax I'm proposing is:
>f(**kwargs={'a': 1, 'b': 2})
>
> as a synonym of f(a=1, b=2) when an appropriate dictionary is already on
> hand.
>
> ---
> I can argue for this another way as well.
>
> 1)
> When both caller and callee have a known number of values to pass/receive,
> that's the usual syntax:
> def f(x) and f(1)
>
> 2)
> When the caller has a fixed set of values, but the callee wants to handle
> a variable number:   def f(*args) and f(1)
>
> 3)
> Caller has a variable number of arguments (varargs) but the call-ee is
> fixed, that's the splat operator: def f(x) and f(*args)
>
> 4)
> When case 1 and 3 cross paths, and we have a vararg in both the caller and
> callee, right now we're forced to splat both sides: def f(*args) and
> f(*args), but I'd like the option of opting-in to passing along my list
> as-is with no splat or collection operations involved: def f(*args) and
> f(*args=args)
>
> Currently the pattern to handle case 4 neatly is to define two versions of
> a vararg function:
>
> def f(*arg, **kwargs):
> return _f(args, kwargs)
>
> return _f(args, kwargs):
> ...
>
> Such that when internal calllers hit case 4, there's a simple and
> efficient way forward -- use the internal de-vararg'd  definition of f.
> External callers have no such option though, without breaking protected api
> convention.
>
> My proposal would simplify this implementation as well as allowing users
> to make use of a similar calling convention that was only provided
> privately before.
>
> Examples:
>
> log(*args) and _log(args) in logging.Logger
> format and vformat of strings.Formatter
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Explicit vararg values

2018-09-17 Thread Buck Evan
I started to send this to python-ideas, but I'm having second thoughts.
Does tihs have merit?

---
I stumble on this a lot, and I see it in many python libraries:

def f(*args, **kwargs):
...

f(*[list comprehension])
f(**mydict)

It always seems a shame to carefully build up an object in order to explode
it, just to pack it into a near-identical object.

Today I was fiddling with the new python3.7 inspect.signature functionality
when I ran into this case:

def f(**kwargs): pass
sig = inspect.signature(f)
print(sig.bind(a=1, b=2))

The output is "". I found this a
bit humorous since anyone attempting to bind values in this way, using
f(kwargs={'a': 1, 'b': 2}) will be sorely dissappointed. I also wondered
why BoundArguments didn't print '**kwargs' since that's the __str__ of that
parameter object.

The syntax I'm proposing is:
   f(**kwargs={'a': 1, 'b': 2})

as a synonym of f(a=1, b=2) when an appropriate dictionary is already on
hand.

---
I can argue for this another way as well.

1)
When both caller and callee have a known number of values to pass/receive,
that's the usual syntax:
def f(x) and f(1)

2)
When the caller has a fixed set of values, but the callee wants to handle a
variable number:   def f(*args) and f(1)

3)
Caller has a variable number of arguments (varargs) but the call-ee is
fixed, that's the splat operator: def f(x) and f(*args)

4)
When case 1 and 3 cross paths, and we have a vararg in both the caller and
callee, right now we're forced to splat both sides: def f(*args) and
f(*args), but I'd like the option of opting-in to passing along my list
as-is with no splat or collection operations involved: def f(*args) and
f(*args=args)

Currently the pattern to handle case 4 neatly is to define two versions of
a vararg function:

def f(*arg, **kwargs):
return _f(args, kwargs)

return _f(args, kwargs):
...

Such that when internal calllers hit case 4, there's a simple and efficient
way forward -- use the internal de-vararg'd  definition of f. External
callers have no such option though, without breaking protected api
convention.

My proposal would simplify this implementation as well as allowing users to
make use of a similar calling convention that was only provided privately
before.

Examples:

log(*args) and _log(args) in logging.Logger
format and vformat of strings.Formatter
-- 
https://mail.python.org/mailman/listinfo/python-list


[issue34706] Signature.from_callable sometimes drops subclassing

2018-09-16 Thread Buck Evan


Change by Buck Evan :


--
type:  -> behavior

___
Python tracker 
<https://bugs.python.org/issue34706>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34706] Signature.from_callable sometimes drops subclassing

2018-09-16 Thread Buck Evan


New submission from Buck Evan :

Specifically in the case of a class that does not override its constructor 
signature inherited from object.

Github PR incoming shortly.

--
components: Library (Lib)
messages: 325501
nosy: bukzor
priority: normal
severity: normal
status: open
title: Signature.from_callable sometimes drops subclassing
versions: Python 3.7

___
Python tracker 
<https://bugs.python.org/issue34706>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24085] large memory overhead when pyc is recompiled

2015-05-04 Thread Buck Evan

Buck Evan added the comment:

@serhiy.storchaka This is a very stable piece of a legacy code base, so we're 
not keen to refactor it so dramatically, although we could. 

We've worked around this issue by compiling pyc files ahead of time and taking 
extra care that they're preserved through deployment. This isn't blocking our 
2.7 transition anymore.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24085
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24085] large memory overhead when pyc is recompiled

2015-05-01 Thread Buck Evan

Buck Evan added the comment:

New data: The memory consumption seems to be in the compiler rather than the 
marshaller:


```
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
16032
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
16032
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
16032

$ python -c 'import repro'
16032

$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
8984
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
8984
$ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro'
8984
```

We were trying to use PYTHONDONTWRITEBYTECODE as a workaround to this issue, 
but it didn't help us because of this.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24085
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24085] large memory overhead when pyc is recompiled

2015-04-30 Thread Buck Evan

New submission from Buck Evan:

In the attached example I show that there's a significant memory overhead 
present whenever a pre-compiled pyc is not present.

This only occurs with more than 5225 objects (dictionaries in this case)
allocated. At 13756 objects, the mysterious pyc overhead is 50% of memory
usage.

I've reproduced this issue in python 2.6, 2.7, 3.4. I imagine it's present in 
all cpythons.


$ python -c 'import repro'
16736
$ python -c 'import repro'
8964
$ python -c 'import repro'
8964

$ rm *.pyc; python -c 'import repro'
16740
$ rm *.pyc; python -c 'import repro'
16736
$ rm *.pyc; python -c 'import repro'
16740

--
files: repro.py
messages: 242281
nosy: bukzor
priority: normal
severity: normal
status: open
title: large memory overhead when pyc is recompiled
versions: Python 3.4
Added file: http://bugs.python.org/file39238/repro.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24085
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24085] large memory overhead when pyc is recompiled

2015-04-30 Thread Buck Evan

Buck Evan added the comment:

Also, we've reproduced this in both linux and osx.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24085
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5945] PyMapping_Check returns 1 for lists

2015-02-02 Thread Buck Golemon

Buck Golemon added the comment:

We've hit this problem today.

What are we supposed to do in the meantime?

--
nosy: +bukzor

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5945
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22722] inheritable pipes are unwieldy without os.pipe2

2014-10-24 Thread Buck Golemon

New submission from Buck Golemon:

In order to make an inheritable pipe, the code is quite a bit different between 
posixes that implement pipe2 and those that don't (osx, mainly). I believe the 
officially-supported path is to call os.pipe() then os.setinheritable(). This 
seems objectionable since set_inheritable() code is invoked twice, where I'd 
prefer to invoke it zero times (or at most once).

Would it be acceptable to implement a pipe2 shim for those platforms?
If so, I'll (attempt to) provide a patch.

Alternatively, can we change the signature of os.pipe() to 
os.pipe(flags=O_CLOEXEC) ?  In my opinion, such a function could be implemented 
via pipe2 on those platforms that provide it, obviating any need for an 
os.pipe2.

Please tell me which patch to provide, if any.

--
messages: 229947
nosy: bukzor
priority: normal
severity: normal
status: open
title: inheritable pipes are unwieldy without os.pipe2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22722
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22723] visited-link styling is not accessible

2014-10-24 Thread Buck Golemon

New submission from Buck Golemon:

The color needs adjusted such that it has at least 3:1 luminance contrast 
versus the surrounding non-link text. (See non-inheritable 
https://docs.python.org/3/library/os.html#os.dup)

See also:
 * http://www.w3.org/TR/WCAG20/#visual-audio-contrast-without-color
 * 
http://www.w3.org/WAI/WCAG20/Techniques/working-examples/G183/link-contrast.html

Given that the surrounding text is #222, the a:visited color should be bumped 
from #30306f to #6363bb in order to meet the 3:1 luminance-contrast guideline 
while preserving the hue and saturation.

By the same calculation, the un-visited links are slightly too dark and should 
be bumped from #00608f to #0072aa

Validation was done here: 
http://juicystudio.com/services/luminositycontrastratio.php

Luminance adjustments done here: http://colorizer.org/

--
assignee: docs@python
components: Documentation
messages: 229952
nosy: bukzor, docs@python
priority: normal
severity: normal
status: open
title: visited-link styling is not accessible

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22723
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22722] inheritable pipes are unwieldy without os.pipe2

2014-10-24 Thread Buck Golemon

Buck Golemon added the comment:

I notice that dup2 grew an `inheritable=True` argument in 3.4.
This might be a good precedent to use here, as a third option.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22722
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22723] visited-link styling is not accessible

2014-10-24 Thread Buck Golemon

Buck Golemon added the comment:

Proposed patch attached.

--
keywords: +patch
Added file: http://bugs.python.org/file37006/link-color.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22723
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22455] idna/punycode give wrong results on narrow builds

2014-09-21 Thread Buck Golemon

New submission from Buck Golemon:

I have fixed the issue in my branch here:
https://github.com/bukzor/cpython/commit/013e689731ba32319f05a62a602f01dd7d7f2e83

I don't propose it as a patch, but as a proof of concept and point of 
discussion.

If there's no chance of shipping a fix in 2.7.9, feel free to close.

--
messages: 227240
nosy: bukzor
priority: normal
severity: normal
status: open
title: idna/punycode give wrong results on narrow builds
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22455
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: python 3.44 float addition bug?

2014-06-23 Thread buck
It used to be that the best way to compare floating point numbers while 
disregarding the inherent epsilon was to use `str(x) == str(y)`. It looks like 
that workaround doesn't work anymore in 3.4.

What's the recommended way to do this now?

 format(.01 + .01 + .01 + .01 + .01 + .01, 'g') == format(.06, 'g')
True


On Saturday, June 21, 2014 12:24:24 PM UTC-7, Ned Deily wrote:
 In article 
 
 captjjmrkpd5k__h9qg12q+arafzvan6egudtmedge2ccaqe...@mail.gmail.com,
 
  Chris Angelico ros...@gmail.com wrote:
 
  Also, when you're looking at how things print out, consider looking at
 
  two things: the str() and the repr(). Sometimes just print(p)
 
  doesn't give you all the info, so you might instead want to write your
 
  loop thus:
 
  
 
  z = 0.01
 
  p = 0.0
 
  for i in range(19):
 
  p += z
 
  print(str(p) +  --  + repr(p)) 
 
  Sometimes you can get extra clues that way, although in this instance
 
  I think you won't.
 
 
 
 Actually, I think this is one case where you would get extra clues (or 
 
 extra headscratching) if you run the code with various releases of 
 
 Python.
 
 
 
 $ python2.6 b.py
 
 0.01 -- 0.01
 
 0.02 -- 0.02
 
 0.03 -- 0.02
 
 0.04 -- 0.040001
 
 0.05 -- 0.050003
 
 0.06 -- 0.060005
 
 0.07 -- 0.070007
 
 0.08 -- 0.080002
 
 0.09 -- 0.089997
 
 0.1 -- 0.02
 
 0.11 -- 0.10999
 
 0.12 -- 0.11998
 
 0.13 -- 0.12998
 
 0.14 -- 0.13999
 
 0.15 -- 0.14999
 
 0.16 -- 0.16
 
 0.17 -- 0.17001
 
 0.18 -- 0.18002
 
 0.19 -- 0.19003
 
 
 
 $ python2.7 b.py
 
 0.01 -- 0.01
 
 0.02 -- 0.02
 
 0.03 -- 0.03
 
 0.04 -- 0.04
 
 0.05 -- 0.05
 
 0.06 -- 0.060005
 
 0.07 -- 0.07
 
 0.08 -- 0.08
 
 0.09 -- 0.09
 
 0.1 -- 0.0
 
 0.11 -- 0.10999
 
 0.12 -- 0.11998
 
 0.13 -- 0.12998
 
 0.14 -- 0.13999
 
 0.15 -- 0.15
 
 0.16 -- 0.16
 
 0.17 -- 0.17
 
 0.18 -- 0.18002
 
 0.19 -- 0.19003
 
 
 
 $ python3.4 b.py
 
 0.01 -- 0.01
 
 0.02 -- 0.02
 
 0.03 -- 0.03
 
 0.04 -- 0.04
 
 0.05 -- 0.05
 
 0.060005 -- 0.060005
 
 0.07 -- 0.07
 
 0.08 -- 0.08
 
 0.09 -- 0.09
 
 0.0 -- 0.0
 
 0.10999 -- 0.10999
 
 0.11998 -- 0.11998
 
 0.12998 -- 0.12998
 
 0.13999 -- 0.13999
 
 0.15 -- 0.15
 
 0.16 -- 0.16
 
 0.17 -- 0.17
 
 0.18002 -- 0.18002
 
 0.19003 -- 0.19003
 
 
 
 What's going on here is that in Python 2.7 the repr() of floats was 
 
 changed to use the minimum number of digits to accurately roundtrip the 
 
 number under correct rounding.  For compatibility reasons, the str() 
 
 representation was not changed for 2.7.  But in Python 3.2, str() was 
 
 changed to be identical to repr() for floats.  It's important to keep in 
 
 mind that the actual binary values stored in float objects are the same 
 
 across all of these releases; only the representation of them as decimal 
 
 characters varies.
 
 
 
 https://docs.python.org/2.7/whatsnew/2.7.html#other-language-changes
 
 
 
 http://bugs.python.org/issue9337
 
 
 
 -- 
 
  Ned Deily,
 
  n...@acm.org
-- 
https://mail.python.org/mailman/listinfo/python-list


[issue1243678] httplib gzip support

2014-01-24 Thread Buck Golemon

Buck Golemon added the comment:

I believe this issue is still extant.

The tip httplib client neither sends accept-encoding gzip nor supports 
content-encoding gzip.

http://hg.python.org/cpython/file/tip/Lib/http/client.py#l1012

There is a diff to httplib in this attached patch, where there was none in 
#1675951.

--
nosy: +Buck.Golemon

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1243678
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: graphical python

2014-01-19 Thread buck
On Sunday, January 19, 2014 12:19:29 AM UTC-8, Ian wrote:
 On Sat, Jan 18, 2014 at 10:40 PM, buck w***@gmail.com wrote:
 
  I'm trying to work through Skienna's algorithms handbook, and note that the 
  author often uses graphical representations of the diagrams to help 
  understand (and even debug) the algorithms. I'd like to reproduce this in 
  python.
 
 
 
  How would you go about this? pyQt, pygame and pyglet immediately come to 
  mind, but if I go that route the number of people that I can share my work 
  with becomes quite limited, as compared to the portability of javascript 
  projects.
 
 
 
  I guess my question really is: has anyone had success creating an 
  interactive graphical project in the browser using python?
 
 
 
  Is this a dream I should give up on, and just do this project in 
  coffeescript/d3?
 
 
 
 You should be able to do something without much fuss using HTML 5 and
 
 either Pyjamas (which compiles Python code to Javascript) or Brython
 
 (a more or less complete implementation of Python within Javascript).
 
 For example, see the clock demo on the Brython web page.
 
 
 
 Pyjamas is the more established and probably more stable of the two
 
 projects, but you should be aware that there are currently two active
 
 forks of Pyjamas and some controversy surrounding the project
 
 leadership.

Thanks Ian. 
Have you personally used pyjs successfully?
It's ominous that the examples pages are broken...

I was impressed with the accuracy of the Brython implementation. I hope they're 
able to decrease the web weight in future versions.
-- 
https://mail.python.org/mailman/listinfo/python-list


graphical python

2014-01-18 Thread buck
I'm trying to work through Skienna's algorithms handbook, and note that the 
author often uses graphical representations of the diagrams to help understand 
(and even debug) the algorithms. I'd like to reproduce this in python.

How would you go about this? pyQt, pygame and pyglet immediately come to mind, 
but if I go that route the number of people that I can share my work with 
becomes quite limited, as compared to the portability of javascript projects.

I guess my question really is: has anyone had success creating an interactive 
graphical project in the browser using python?

Is this a dream I should give up on, and just do this project in 
coffeescript/d3?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: latin1 and cp1252 inconsistent?

2012-11-17 Thread buck
On Friday, November 16, 2012 4:33:14 PM UTC-8, Nobody wrote:
 On Fri, 16 Nov 2012 13:44:03 -0800, buck wrote:
 IOW: Microsoft's embrace, extend, extinguish strategy has been too
 successful and now we have to deal with it. If HTML content is tagged as
 using ISO-8859-1, it's more likely that it's actually Windows-1252 content
 generated by someone who doesn't know the difference.

Yes that's exactly what it says.

 Given that the only differences between the two are for code points which
 are in the C1 range (0x80-0x9F), which should never occur in HTML, parsing
 ISO-8859-1 as Windows-1252 should be harmless.

should is a wish. The reality is that documents (and especially URLs) exist 
that can be decoded with latin1, but will backtrace with cp1252. I see this as 
a sign that a small refactorization of cp1252 is in order. The proposal is to 
change those UNDEFINED entries to control entries, as is done here:

http://dvcs.w3.org/hg/encoding/raw-file/tip/index-windows-1252.txt

and here:

ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt

This is in line with the unicode standard, which says: 
http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf

 There are 65 code points set aside in the Unicode Standard for compatibility 
 with the C0
 and C1 control codes defined in the ISO/IEC 2022 framework. The ranges of 
 these code
 points are U+..U+001F, U+007F, and U+0080..U+009F, which correspond to 
 the 8-bit
 controls 0x00 to 0x1F (C0 controls), 0x7F (delete), and 0x80 to 0x9F (C1 
 controls), 
 respectively ... There is a simple, one-to-one mapping between 7-bit (and 
 8-bit) control
 codes and the Unicode control codes: every 7-bit (or 8-bit) control code is 
 numerically
 equal to its corresponding Unicode code point.

IOW: Bytes with undefined semantics in the C0/C1 range are control codes, 
which decode to the unicode-point of equal value.

This is exactly the section which allows latin1 to decode 0x81 to U+81, even 
though ISO-8859-1 explicitly does not define semantics for that byte (6.2 
ftp://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf)
-- 
http://mail.python.org/mailman/listinfo/python-list


latin1 and cp1252 inconsistent?

2012-11-16 Thread buck
Latin1 has a block of 32 undefined characters.
Windows-1252 (aka cp1252) fills in 27 of these characters but leaves five 
undefined: 0x81, 0x8D, 0x8F, 0x90, 0x9D

The byte 0x81 decoded with latin gives the unicode 0x81.
Decoding the same byte with windows-1252 yields a stack trace with 
`UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 0: 
character maps to undefined`

This seems inconsistent to me, given that this byte is equally undefined in the 
two standards.

Also, the html5 standard says:

When a user agent [browser] would otherwise use a character encoding given in 
the first column [ISO-8859-1, aka latin1] of the following table to either 
convert content to Unicode characters or convert Unicode characters to bytes, 
it must instead use the encoding given in the cell in the second column of the 
same row [windows-1252, aka cp1252].

http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#character-encodings-0


The current implementation of windows-1252 isn't usable for this purpose (a 
replacement of latin1), since it will throw an error in cases that latin1 would 
succeed.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: latin1 and cp1252 inconsistent?

2012-11-16 Thread buck
On Friday, November 16, 2012 2:34:32 PM UTC-8, Ian wrote:
 On Fri, Nov 16, 2012 at 2:44 PM,  buck wrote:
 
  Latin1 has a block of 32 undefined characters.
 
 
 These characters are not undefined.  0x80-0x9f are the C1 control
 codes in Latin-1, much as 0x00-0x1f are the C0 control codes, and
 their Unicode mappings are well defined.

They are indeed undefined: ftp://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf

 The shaded positions in the code table correspond
to bit combinations that do not represent graphic
characters. Their use is outside the scope of
ISO/IEC 8859; it is specified in other International
Standards, for example ISO/IEC 6429.


However it's reasonable for 0x81 to decode to U+81 because the unicode standard 
says: http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf

 The semantics of the control codes are generally determined by the 
application with which they are used. However, in the absence of specific 
application uses, they may be interpreted according to the control function 
semantics specified in ISO/IEC 6429:1992.


 You can use a non-strict error handling scheme to prevent the error.
  b'hello \x81 world'.decode('cp1252', 'replace')
 'hello \ufffd world'

This creates a non-reversible encoding, and loss of data, which isn't 
acceptable for my application.
-- 
http://mail.python.org/mailman/listinfo/python-list


[issue15009] urlsplit can't round-trip relative-host urls.

2012-07-05 Thread Buck Golemon

Buck Golemon buck.gole...@amd.com added the comment:

Let's examine x://

absolute-URI  = scheme : hier-part [ ? query ]
hier-part = // authority path-abempty

So this is okay if authority and path-abempty can both be empty strings.

authority = [ userinfo @ ] host [ : port ]
host  = IP-literal / IPv4address / reg-name
reg-name  = *( unreserved / pct-encoded / sub-delims )
path-abempty  = *( / segment )

Yep.

And the same applies for x:///y, except that path-abempty matches /y
instead of nothing.

This means these are in fact valid urls per RFC3986, counter to your claim.

--
nosy: +bukzor

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15009
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15009] urlsplit can't round-trip relative-host urls.

2012-06-07 Thread Buck Golemon

Buck Golemon b...@yelp.com added the comment:

Well i think the real issue is that you can't enumerate the protocals that use 
netloc. All protocols are allowed to have a netloc. the smb: protocol 
certainly does, but it's not in the list.

The core issue is that smb:/foo and smb:///foo are different urls, and should 
be represented differently when split. The /// form has a netloc, it's just the 
empty-string. The single-slash form has no netloc, so I propose that 
urlsplit('smb:/foo') return SplitResult(scheme='smb', netloc=None, path='/foo', 
query='', fragment='')

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15009
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15009] urlsplit can't round-trip relative-host urls.

2012-06-05 Thread Buck Golemon

New submission from Buck Golemon b...@yelp.com:

1) As long as x is valid, I expect that urlunsplit(urlsplit(x)) == x
2) yelp:///foo is a well-formed (albeit odd) url. It it similar to file:///tmp: 
it specifies the /foo resource, on the current host, using the yelp protocol 
(defined on mobile devices).

 from urlparse import urlsplit, urlunsplit
 urlunsplit(urlsplit('yelp:///foo'))
'yelp:/foo'

Urlparse / unparse has the same bug:

 urlunparse(urlparse('yelp:///foo'))
'yelp:/foo'

The file: protocol seems to be special-case, in an inappropriate manner:

 urlunsplit(urlsplit('file:///tmp'))
'file:///tmp'

--
components: Library (Lib)
messages: 162378
nosy: Buck.Golemon
priority: normal
severity: normal
status: open
title: urlsplit can't round-trip relative-host urls.
versions: Python 2.6, Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15009
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



sum() requires number, not simply __add__

2012-02-23 Thread Buck Golemon
I feel like the design of sum() is inconsistent with other language
features of python. Often python doesn't require a specific type, only
that the type implement certain methods.

Given a class that implements __add__ why should sum() not be able to
operate on that class?

We can fix this in a backward-compatible way, I believe.

Demonstration:
I'd expect these two error messages to be identical, but they are
not.

  class C(object): pass
  c = C()
  sum((c,c))
TypeError: unsupported operand type(s) for +: 'int' and 'C'
 c + c
TypeError: unsupported operand type(s) for +: 'C' and 'C'


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sum() requires number, not simply __add__

2012-02-23 Thread Buck Golemon
On Feb 23, 1:19 pm, Buck Golemon b...@yelp.com wrote:
 I feel like the design of sum() is inconsistent with other language
 features of python. Often python doesn't require a specific type, only
 that the type implement certain methods.

 Given a class that implements __add__ why should sum() not be able to
 operate on that class?

 We can fix this in a backward-compatible way, I believe.

 Demonstration:
     I'd expect these two error messages to be identical, but they are
 not.

       class C(object): pass
       c = C()
       sum((c,c))
     TypeError: unsupported operand type(s) for +: 'int' and 'C'
      c + c
     TypeError: unsupported operand type(s) for +: 'C' and 'C'

Proposal:

def sum(values,
base=0):
  values =
iter(values)

  try:
  result = values.next()
  except StopIteration:
  return base

  for value in values:
  result += value
  return result
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sum() requires number, not simply __add__

2012-02-23 Thread Buck Golemon
On Feb 23, 1:32 pm, Chris Rebert c...@rebertia.com wrote:
 On Thu, Feb 23, 2012 at 1:19 PM, Buck Golemon b...@yelp.com wrote:
  I feel like the design of sum() is inconsistent with other language
  features of python. Often python doesn't require a specific type, only
  that the type implement certain methods.

  Given a class that implements __add__ why should sum() not be able to
  operate on that class?

 The time machine strikes again! sum() already can. You just need to
 specify an appropriate initial value (the empty list in this example)
 for the accumulator :

 Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
 [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
 Type help, copyright, credits or license for more information. 
 sum([[1,2],[3,4]], [])

 [1, 2, 3, 4]

 Cheers,
 Chris
 --http://rebertia.com

Thanks. I did not know that!

My proposal is still *slightly* superior in two ways:

1) It reduces the number of __add__ operations by one
2) The second argument isn't strictly necessary, if you don't mind
that the 'null sum' will produce zero.

def sum(values, base=0):
  values = iter(values)

  try:
  result = values.next()
  except StopIteration:
  return base

  for value in values:
  result += value

  return result
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Debugging a difficult refcount issue.

2011-12-19 Thread buck
This is what I came up with:
https://gist.github.com/1496028

We'll see if it helps, tomorrow.


On Sunday, December 18, 2011 6:01:50 PM UTC-8, buck wrote:
 Thanks Jack. I think printf is what it will come down to. I plan to put a 
 little code into PyDict_New to print the id and the line at which it was 
 allocated. Hopefully this will show me all the possible suspects and I can 
 figure it out from there.
 
 I hope figuring out the file and line-number from within that code isn't too 
 hard.
 
 
 On Sunday, December 18, 2011 9:52:46 AM UTC-8, Jack Diederich wrote:
  I don't have any great advice, that kind of issue is hard to pin down.
   That said, do try using a python compile with --with-debug enabled,
  with that you can turn your unit tests on and off to pinpoint where
  the refcounts are getting messed up.  It also causes python to use
  plain malloc()s so valgrind becomes useful.  Worst case add assertions
  and printf()s in the places you think are most janky.
  
  -Jack
  
  On Sat, Dec 17, 2011 at 11:17 PM, buck work...@gmail.com wrote:
   I'm getting a fatal python error Fatal Python error: GC object already 
   tracked[1].
  
   Using gdb, I've pinpointed the place where the error is detected. It is 
   an empty dictionary which is marked as in-use. This is somewhat helpful 
   since I can reliably find the memory address of the dict, but it does not 
   help me pinpoint the issue. I was able to find the piece of code that 
   allocates the problematic dict via a malloc/LD_PRELOAD interposer, but 
   that code was pure python. I don't think it was the cause.
  
   I believe that the dict was deallocated, cached, and re-allocated via 
   PyDict_New to a C routine with bad refcount logic, then the above error 
   manifests when the dict is again deallocated, cached, and re-allocated.
  
   I tried to pinpoint this intermediate allocation with a similar 
   PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].
  
   How should I go about debugging this further? I've been completely stuck 
   on this for two days now :(
  
   [1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
   [2] 
   http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
   --
   http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Debugging a difficult refcount issue.

2011-12-18 Thread buck
On Saturday, December 17, 2011 11:55:13 PM UTC-8, Paul Rubin wrote:
 buck workit...@gmail.com writes:
  I tried to pinpoint this intermediate allocation with a similar
  PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].
 
 Did you try a gdb watchpoint?

I didn't try that, since that piece of code is run millions of times, and I 
don't know the dict-id I'm looking for until after the problem has occurred.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Debugging a difficult refcount issue.

2011-12-18 Thread buck
Thanks Jack. I think printf is what it will come down to. I plan to put a 
little code into PyDict_New to print the id and the line at which it was 
allocated. Hopefully this will show me all the possible suspects and I can 
figure it out from there.

I hope figuring out the file and line-number from within that code isn't too 
hard.


On Sunday, December 18, 2011 9:52:46 AM UTC-8, Jack Diederich wrote:
 I don't have any great advice, that kind of issue is hard to pin down.
  That said, do try using a python compile with --with-debug enabled,
 with that you can turn your unit tests on and off to pinpoint where
 the refcounts are getting messed up.  It also causes python to use
 plain malloc()s so valgrind becomes useful.  Worst case add assertions
 and printf()s in the places you think are most janky.
 
 -Jack
 
 On Sat, Dec 17, 2011 at 11:17 PM, buck workit...@gmail.com wrote:
  I'm getting a fatal python error Fatal Python error: GC object already 
  tracked[1].
 
  Using gdb, I've pinpointed the place where the error is detected. It is an 
  empty dictionary which is marked as in-use. This is somewhat helpful since 
  I can reliably find the memory address of the dict, but it does not help me 
  pinpoint the issue. I was able to find the piece of code that allocates the 
  problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure 
  python. I don't think it was the cause.
 
  I believe that the dict was deallocated, cached, and re-allocated via 
  PyDict_New to a C routine with bad refcount logic, then the above error 
  manifests when the dict is again deallocated, cached, and re-allocated.
 
  I tried to pinpoint this intermediate allocation with a similar 
  PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].
 
  How should I go about debugging this further? I've been completely stuck on 
  this for two days now :(
 
  [1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
  [2] 
  http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
  --
  http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Debugging a difficult refcount issue.

2011-12-17 Thread buck
I'm getting a fatal python error Fatal Python error: GC object already 
tracked[1].

Using gdb, I've pinpointed the place where the error is detected. It is an 
empty dictionary which is marked as in-use. This is somewhat helpful since I 
can reliably find the memory address of the dict, but it does not help me 
pinpoint the issue. I was able to find the piece of code that allocates the 
problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure 
python. I don't think it was the cause.

I believe that the dict was deallocated, cached, and re-allocated via 
PyDict_New to a C routine with bad refcount logic, then the above error 
manifests when the dict is again deallocated, cached, and re-allocated.

I tried to pinpoint this intermediate allocation with a similar 
PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].

How should I go about debugging this further? I've been completely stuck on 
this for two days now :(

[1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
[2] 
http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pythonification of the asterisk-based collection packing/unpacking syntax

2011-12-17 Thread buck
I like the spirit of this. Let's look at your examples.

 Examples of use: 
 head, tail::tuple = ::sequence 
 def foo(args::list, kwargs::dict): pass 
 foo(::args, ::kwargs)

My initial reaction was nonono!, but this is simply because of the ugliness. 
The double-colon is very visually busy.

I find that your second example is inconsistent with the others. If we say that 
the variable-name is always on the right-hand-side, we get:

 def foo(list::args, dict::kwargs): pass 

This nicely mirrors other languages (such as in your C# example:  float foo) 
as well as the old python behavior (prefixing variables with */** to modify the 
assignment).

As for the separator, let's examine the available ascii punctuation. Excluding 
valid variable characters, whitespace, and operators, we have:

! -- ok.
 -- can't use this. Would look like a string.
# -- no. Would looks like a comment.
$ -- ok.
' -- no. Would look like a string.
( -- no. Would look like a function.
) -- no. Would look like ... bad syntax.
, -- no. Would indicate a separate item in the variable list.
. -- no. Would look like an attribute.
: -- ok, maybe. Seems confusing in a colon-terminated statement.
; -- no, just no.
? -- ok.
@ -- ok.
[ -- no. Would look like indexing.
] -- no.
` -- no. Would look like a string?
{ -- too strange
} -- too strange
~ -- ok.

That leaves these. Which one looks least strange?

float ! x = 1
float $ x = 1
float ? x = 1
float @ x = 1

The last one looks decorator-ish, but maybe that's proper. The implementation 
of this would be quite decorator-like: take the normal value of x, pass it 
through the indicated function, assign that value back to x.

Try these on for size.

 head, @tuple tail = sequence 
 def foo(@list args, @dict kwargs): pass 
 foo(@args, @kwargs)

For backward compatibility, we could say that the unary * is identical to @list 
and unary ** is identical to @dict.

-buck
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Development tools and practices for Pythonistas

2011-05-05 Thread buck
I use hg for even 50-line standalone python scripts. It's very well suited to 
these small environments, and scales up nicely.


cd /my/working/dir
hg init
hg add myscript.py
hg ci -m 'added myscript'

It's that simple, and now hyou can go back if you make a terrible mistake, and 
you can post it to bitbucket and share with the world if you like, almost as 
easily.

--Buck
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [OT] From svn to something else?

2011-05-05 Thread buck
This is what made me choose Mercurial in my recent search.

http://www.python.org/dev/peps/pep-0374/

There is a tremendous amount of detail there. In summary, hg and git are both 
very good, and essentially equal in features. The only salient difference is 
that  hg is implemented in python, so they went with that. I did the same, and 
I'm quite happy. It's basically svn with the shiny new distributed features 
added.
-- 
http://mail.python.org/mailman/listinfo/python-list


multiprocessing: file-like object

2011-04-28 Thread buck
I've been having issues with getting a file-like object to work with 
multiprocessing. Since the details are quite lengthy, I've posted them on 
stackoverflow here: 
http://stackoverflow.com/questions/5821880/python-multiprocessing-synchronizing-file-like-object

I hope I'm not being super rude by cross-posting, but I thought some of you 
would be interested in the question, and I'd be delighted to get some ideas.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Equivalent code to the bool() built-in function

2011-04-28 Thread buck
I'm not not touching you!
-- 
http://mail.python.org/mailman/listinfo/python-list


[issue8326] Cannot import name SemLock on Ubuntu

2011-04-26 Thread Buck Golemon

Buck Golemon buck.gole...@amd.com added the comment:

@Barry: Yes, it's still a problem.

The ubuntu 10.10 python2.7 still has no multiprocessing.
Since the EOL is April 2012, it needs fixed.

It may be considered an invalid python bug, since it seems to be strictly 
related to Ubuntu packaging, but I thought the python maintainers should know.

--Buck

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8326
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8326] Cannot import name SemLock on Ubuntu

2011-04-24 Thread Buck Golemon

Buck Golemon buck.gole...@amd.com added the comment:

python2.7.1+ from mercurial supports sem_open (and multiprocessing) just fine.

doko: Could you help us figure out why the ubuntu 10.10 python2.7 build has 
this issue? I believe this issue should be assigned to you?

Relevant lines from the config.log:


configure:9566: checking for sem_open
configure:9566: gcc -pthread -o conftest -g -O2   conftest.c -lpthread -ldl  5
configure:9566: $? = 0
configure:9566: result: yes

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8326
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8326] Cannot import name SemLock on Ubuntu lucid

2011-04-14 Thread Buck Golemon

Buck Golemon buck.gole...@amd.com added the comment:

 Isn't this an Ubuntu problem if sem_open only works with some specific 
 kernels?

sem_open works fine (python2.6 is using it), but the python2.7 build process 
didn't detect it properly. This is either a bug with Ubuntu's python2.7 build 
configuration, or with python2.7's feature detection for sem_open.

I'm not sure which.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8326
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8326] Cannot import name SemLock on Ubuntu

2011-04-14 Thread Buck Golemon

Changes by Buck Golemon buck.gole...@amd.com:


--
title: Cannot import name SemLock on Ubuntu lucid - Cannot import name SemLock 
on Ubuntu

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8326
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8326] Cannot import name SemLock on Ubuntu

2011-04-14 Thread Buck Golemon

Buck Golemon buck.gole...@amd.com added the comment:

 I suggest that you try to build from the above mercurial repository and see 
 if the problem persists.

How do I know the configuration options that the Ubuntu packager used?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8326
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8326] Cannot import name SemLock on Ubuntu lucid

2011-04-13 Thread Buck Golemon

Buck Golemon buck.gole...@amd.com added the comment:

On Ubuntu 10.10 (maverick), python2.6 is functioning correctly, but python2.7 
is giving this error again.


$ /usr/bin/python2.7
 from multiprocessing.synchronize import Semaphore
ImportError: This platform lacks a functioning sem_open implementation, 
therefore, the required synchronization primitives needed will not function, 
see issue 3770.

--
nosy: +bukzor

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8326
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9583] PYTHONOPTIMIZE = 0 is not honored

2010-09-20 Thread Buck Golemon

Buck Golemon buck.gole...@amd.com added the comment:

Minimal demo:

$ setenv PYTHONOPTIMIZE 0
$ python3.1 -OO -c print(__debug__)
False


I've used this code to get the desired functionality:

if [[ $TESTING == 1 || ${PYTHONOPTIMIZE-2} =~ '^(0*|)$' ]]; then
#someone is requesting no optimization
export -n PYTHONOPTIMIZE
opt=''
elif [[ $PYTHONOPTIMIZE ]]; then
#someone is setting their own optimization
opt=''
else
#optimization by default
opt='-O'
fi

exec $INSTALL_BASE/bin/python2.6 $opt $@

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9583] Document startup option/environment interaction

2010-09-20 Thread Buck Golemon

Buck Golemon buck.gole...@amd.com added the comment:

If I understand this code, it means that PYTHONOPTIMIZE set to 1 or 2 works as 
expected, but set to 0, gives a flag value of 1.

static int
add_flag(int flag, const char *envs)
{
int env = atoi(envs);
if (flag  env)
flag = env;
if (flag  1)
flag = 1;
return flag;
}


Read literally, the man page indicates that any integer value will give a flag 
value of 2. 

I agree my shell script is probably unusual, but I believe setting this 
environment value to zero and expecting the feature to be off (given no 
contradicting options) is reasonable.

I petition to remove the second if statement above (environment value of 0 
yields no flag).

I'd also love to provide a numeric argument to -O, to dynamically set this 
value more readily, but that is lower importance.

I can implement these and run the unit tests if required.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9583] Document startup option/environment interaction

2010-09-20 Thread Buck Golemon

Buck Golemon buck.gole...@amd.com added the comment:

that number of times isn't exactly accurate either, since 0 is effectively 
interpreted as 1.

This change would only adversely affect people who use no -O option, set 
PYTHONOPTIMIZE to '0', and need optimization.
I feel like that falls into the realm of version differences, but that's your 
decision.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9583] Document startup option/environment interaction

2010-09-20 Thread Buck Golemon

Buck Golemon buck.gole...@amd.com added the comment:

The file is here:
   http://svn.python.org/view/python/trunk/Python/pythonrun.c?view=markup

The second if statement is doing exactly what I find troubling: set the flag 
even if the incoming value is 0.
I guess this is to handle the empty string case, such as:

setenv PYTHONDEBUG
./myscript.py

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9583] PYTHONOPTIMIZE = 0 is not honored

2010-08-12 Thread Buck Golemon

New submission from Buck Golemon buck.gole...@amd.com:

In our environment, we have a wrapper which enables optimization by default 
(-OO). Most commandline tools which have a mode-changing flag such as this, 
also have a flag to do the opposite ( see: ls -t -U, wget -nv -v,  ).

I'd like to implement one or both of:
1) Add a -D option which is the opposite of -O. python -OO -D gives an 
optimization level of 1.
2) Honor PYTHONOPTIMIZE = 0. At the least, the man page needs to describe how 
these two methods interact.

--
components: Interpreter Core
messages: 113717
nosy: bukzor
priority: normal
severity: normal
status: open
title: PYTHONOPTIMIZE = 0 is not honored
type: behavior
versions: Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: organizing your scripts, with plenty of re-use

2009-10-13 Thread Buck
On Oct 12, 4:30 pm, Carl Banks pavlovevide...@gmail.com wrote:
 On Oct 12, 11:24 am, Buck workithar...@gmail.com wrote:

  On Oct 10, 9:44 am, Gabriel Genellina gagsl-...@yahoo.com.ar
  wrote:

   The good thing is that, if the backend package is properly installed  
   somewhere in the Python path ... it still works with no modifications.

  I'd like to get to zero-installation if possible. It's easy with
  simple python scripts, why not packages too? I know the technical
  reasons, but I haven't heard any practical reasons.

 No it's purely technical.  Well mostly technical (there's a minor
 issue of how a script would figure out its root).  No language is
 perfect, not even Python, and sometimes you just have to deal with
 things the way they are.

Python is the closest I've seen. I'd like to deal with this wart if we
can.

 We're trying to help you with workarounds, but it seems like you just
 want to vent more than you want an actual solution.

Steven had the nicest workaround (with the location = __import__
('__main__').__file__ trick), but none of them solve the problem of
the OP: organization of runnable scripts. So far it's been required to
place all runnable scripts directly above any used packages. The
workaround that Gabriel has been touting requires this too.

Maybe it seems like I'm venting when I shoot down these workarounds,
but my real aim is to find some sort of consensus, either that there
is a solution, or an unsolved problem. I'd be delighted with a
solution, but none have been acceptable so far (as I explained in
aggravating detail earlier).

If I can find consensus that this is a real problem, not just my
personal nit-pick, then I'd be willing to donate my time to design,
write and push through a PEP for this purpose. I believe it can be
done neatly with just three new standard functions, but it's premature
to discuss that.

  If the reasons are purely technical, it smells like a PEP to me.

 Good luck with that.  I'd wholeheartedly support a good alternative,
Thanks.

 I just want to warn you that it's not a simple issue to fix, it would be
 involve spectacular and highly backwards-incompatible changes.
 --Carl Banks

I don't believe that's true, but I think that's a separate discussion.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: organizing your scripts, with plenty of re-use

2009-10-13 Thread Buck
On Oct 12, 3:34 pm, Gabriel Genellina gagsl-...@yahoo.com.ar
wrote:
 En Mon, 12 Oct 2009 15:24:34 -0300, Buck workithar...@gmail.com escribió:

  On Oct 10, 9:44 am, Gabriel Genellina gagsl-...@yahoo.com.ar
  wrote:
  The good thing is that, if the backend package is properly installed  
  somewhere in the Python path ... it still works with no modifications.

  I'd like to get to zero-installation if possible. It's easy with
  simple python scripts, why not packages too? I know the technical
  reasons, but I haven't heard any practical reasons.

  If the reasons are purely technical, it smells like a PEP to me.

 That's what I meant to say. It IS a zero-installation schema, and it also  
 works if you properly install the package. Quoting Steven D'Aprano  
 (changing names slightly):

 You would benefit greatly from separating the interface from
 the backend. You should arrange matters so that the users see something
 like this:

 project/
 +-- animal
 +-- mammal
 +-- reptile
 +-- somepackagename/
      +-- __init__.py
      +-- animals.py
      +-- mammals/
          +-- __init__.py
          +-- horse.py
          +-- otter.py
      +-- reptiles/
          +-- __init__.py
          +-- gator.py
          +-- newt.py
      +-- misc/
          +-- __init__.py
          +-- lungs.py
          +-- swimming.py

 where the front end is made up of three scripts animal, mammal and
 reptile, and the entire backend is in a package. [ignore the rest]

 By example, the `animal` script would contain:

  from somepackagename import animals
 animals.main()

 or perhaps something more elaborate, but in any case, the script imports  
 whatever it needs from the `somepackagename` package.

 The above script can be run:

 a) directly from the `project` directory; this could be a checked out copy  
  from svn, or a tar file extracted in /tmp, or whatever. No need to install  
 anything, it just works.

 b) alternatively, you may install somepackagename into site-packages (or  
 the user site directory, or any other location along the Python path), and  
 copy the scripts into /usr/bin (or any other location along the system  
 PATH), and it still works.

 The key is to put all the core functionality into a package, and place the  
 package where Python can find it. Also, it's a good idea to use relative  
 imports from inside the package. There is no need to juggle with sys.path  
 nor even set PYTHONPATH nor import __main__ nor play any strange games; it  
 Just Works (tm).

 --
 Gabriel Genellina

Hi Gabriel. This is very thoughtful. Thanks.

As in the OP, when I have 50 different runnable scripts, it become
necessary to arrange them in directories. How would you do that in
your scheme? Currently it looks like they're required to live directly
above the package containing their code.

--Buck
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: organizing your scripts, with plenty of re-use

2009-10-13 Thread Buck
On Oct 13, 9:37 am, Ethan Furman et...@stoneleaf.us wrote:
 Buck wrote:
  I'd like to get to zero-installation if possible. It's easy with
  simple python scripts, why not packages too? I know the technical
  reasons, but I haven't heard any practical reasons.

 I don't think we mean the same thing by zero-installation... seems to
 me that if you have to copy it, check it out, or anything to get the
 code from point A to point 'usable on your computer', then you have done
 some sort of installation.

I think most people would agree that installation is whatever you need
to do between downloading the software and being able to use it. For
GNU packages, it's './configure  make  make install'. For Python
packages, it's usually './setup.py install'.

  Steven had the nicest workaround (with the location = __import__
  ('__main__').__file__ trick), but none of them solve the problem of
  the OP: organization of runnable scripts. So far it's been required to
  place all runnable scripts directly above any used packages. The
  workaround that Gabriel has been touting requires this too.

 Wha?  Place all runnable scripts directly above any used packages?  I
 must have missed something major in this thread.  The only thing
 necessary is to have the package being imported to be somewhere in
 PYTHONPATH.

The only way to get your packages on the PYTHONPATH currently is to:
   * install the packages to site-packages  (I don't have access)
   * edit the PYTHONPATH all users' environment  (again, no access)
   * create some boilerplate that edits sys.path at runtime (various
problems in previous post)
   * put your scripts directly above the package (this seems best so
far, but forces a flat hierarchy of scripts)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: organizing your scripts, with plenty of re-use

2009-10-12 Thread Buck
On Oct 10, 9:44 am, Gabriel Genellina gagsl-...@yahoo.com.ar
wrote:
 The good thing is that, if the backend package is properly installed  
 somewhere in the Python path ... it still works with no modifications.

I'd like to get to zero-installation if possible. It's easy with
simple python scripts, why not packages too? I know the technical
reasons, but I haven't heard any practical reasons.

If the reasons are purely technical, it smells like a PEP to me.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: organizing your scripts, with plenty of re-use

2009-10-05 Thread Buck
On Oct 5, 11:29 am, Robert Kern robert.k...@gmail.com wrote:
 On 2009-10-05 12:42 PM, Buck wrote:



  With the package layout, you would just do:

      from parrot.sleeping import sleeping_in_a_bed
      from parrot.feeding.eating import eat_cracker

  This is really much more straightforward than you are making it out to be.

  As in the OP, I need things to Just Work without installation
  requirements.
  The reason for this is that I'm in a large corporate environment
  servicing many groups with their own custom environments.

 The more ad hoc hacks you use rather than the standard approaches, the harder 
 it
 is going to be for you to support those custom environments.

I too would prefer a standard approach but there doesn't seem to be an
acceptable one.

 I do believe that you and Stef are exceptions. The vast majority of Python 
 users
 seem to be able to grasp packages well enough.

You're failing to differentiate between python programmer and a
system's users. I understand packages well enough, but I need to
reduce the users' requirements down to simply running a command. I
don't see a way to do that as of now without a large amount of
boilerplate code in every script.

I've considered installing the thing to the PYTHONPATH as most people
suggest, but this has two drawbacks:
  * Extremely hard to push thru my IT department. Possibly impossible.
  * Local checkouts of scripts use the main installation, rather than
the local, possibly revised package code. This necessitates the
boilerplate that installation to the PYTHONPATH was supposed to avoid.
  * We can work around the previous point by requiring a user-owned
dev installation of Python, but this raises the bar to entry past most
of my co-developers threshold. They are more comfortable with tcsh and
perl...

I think the issue here is that the current python-package system works
well enough for the core python devs but leaves normal python
developers without much options beyond all scripts in one directory
or tons of boilerplate everywhere.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: organizing your scripts, with plenty of re-use

2009-10-05 Thread Buck
Thanks. I think we're getting closer to the core of this.

To restate my problem more simply:

My core goal is to have my scripts in some sort of organization better
than a single directory, and still have plenty of re-use between them.
The only way I can see to implement this is to have 10+ lines of
unintelligible hard-coded boilerplate in every runnable script.
That doesn't seem reasonable or pythonic.


On Oct 5, 12:34 pm, Robert Kern robert.k...@gmail.com wrote:
 I would like to see an example of such boilerplate. I do not understand why
 packages would require more than any other organization scheme.

This example is from the 2007 post I referenced in my OP. I'm pretty
sure he meant 'dirname' rather than 'basename', and even then it
doesn't quite work.

http://mail.python.org/pipermail/python-3000/2007-April/006814.html
  import os,sys
  sys.path.insert(1, os.path.basename(os.path.basename(__file__)))


This is from a co-worker trying to address this topic:
  import os, sys
  binpath = binpath or os.path.dirname(os.path.realpath(sys.argv[0]))
  libpath = os.path.join(binpath, 'lib')

  verinfo = sys.version_info
  pythonver = 'python%d.%d' % (verinfo[0], verinfo[1])
  sys.path.append(os.path.join(libpath, pythonver, 'site-packages'))
  sys.path.append(libpath)


This is my personal code:

  from sys import path
  from os.path import abspath, islink, realpath, dirname, normpath,
join
  f = __file__
  #continue working even if the script is symlinked and then compiled
  if f.endswith(.pyc): f = f[:-1]
  if islink(f): f = realpath(f)
  here = abspath(dirname(f))
  libpath = join(here, .., lib)
  libpath = normpath(libpath)
  path.insert(1, libpath)


$ export PYTHONPATH=~/LocalToolCheckouts/:$PYTHONPATH
 This is a simple no-installation way to use the normal
 Python package mechanism that works well if you don't actually need to build
 anything.

This seems simple to you, but my users are electrical engineers and
know just enough UNIX commands to get by. Most are afraid of Python.
Half of them will assume the script is borked when they see a
ImportError: No module named foo. Another 20% will then read the
README and
set their environment wrong (setenv PYTHONPATH foo). The rest will get
it to work after half an hour but never use it again because it was
too complicated. I could fix the error message to tell them exactly
what to do, but at that point I might as well re-write the above
boilerplate code.

I'm overstating my case here for emphasis, but it's essentially true.
--Buck
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Module updating plans for Python 3.1: feedparser, MySQLdb

2009-08-01 Thread Buck
I use MySQLdb quite a bit in my work. I could volunteer to help update
it. Are there any particular bugs we're talking about or just a
straight port to 3.0?

--Buck

On Jul 31, 6:32 pm, John Nagle na...@animats.com wrote:
     Any progress on updating feedparser and MySQLdb for Python 3.x in the
 foreseeable future?

     Feedparser shouldn't be that hard; it's just that nobody is working on it.
 MySQLdb is known to be hard, and that may be a while.

                                 John Nagle

-- 
http://mail.python.org/mailman/listinfo/python-list


[issue2613] inconsistency with bare * in parameter list

2008-06-04 Thread Buck Golemon

Buck Golemon [EMAIL PROTECTED] added the comment:

/agree

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2613
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2613] inconsistency with bare * in parameter list

2008-05-27 Thread Buck Golemon

Buck Golemon [EMAIL PROTECTED] added the comment:

If there's no difference then they should work the same?
I agree there's probably little value in 'fixing' it.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2613
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2950] silly readline module problem

2008-05-23 Thread Buck Golemon

Buck Golemon [EMAIL PROTECTED] added the comment:

I'm not sure what your problem is, but comp.lang.python might be a
better place to ask. It's not clear that this is a bug yet.

http://groups.google.com/group/comp.lang.python/topics

--
nosy: +bgolemon

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2950
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Python installation problem

2007-03-02 Thread Ray Buck
I've been trying to install Mailman, which requires a newer version 
of the Python language compiler (p-code generator?) than the one I 
currently have on my linux webserver/gateway box.


It's running a ClarkConnect 2.01 package based on Red Hat 7.2 linux.

I downloaded the zipped tarball (Python-2.4.4.tgz), ran gunzip, then 
un-tarred it in /usr/local.  Then (logged in as root) from 
/usr/local/Python-2.4.4 I ran the configure script which appeared to 
run properly.  At least there were no error messages that I 
saw.  Then I attempted to run make install and ended up with an 
error make *** Error 1.  It was right at the libinstall section 
of the make, so I did some googling and came up with the following command:

[EMAIL PROTECTED] Python-2.4.4]# make libinstall inclinstall

After thrashing for about 5 minutes, I got basically the same message:
Compiling /usr/local/lib/python2.4/zipfile.py ...
make: *** [libinstall] Error 1

I dunno if this is relevant, but I have Python 2.2.2 in the 
/usr/Python-2.2.2 directory.  Do I have to blow this away in order to 
install the newer distro?  Or do I need to install the new one in/usr 
instead of /usr/local?


Although I'm a retired programmer (mainframes), I'm still learning 
this linux stuff.  I guess that makes me a noob...I hope you'll take 
that into consideration.


Thanks,

Ray
-- 
http://mail.python.org/mailman/listinfo/python-list

Python installation problem (sorry if this is a dup)

2007-02-28 Thread Ray Buck
I've been trying to install Mailman, which requires a newer version 
of the Python language compiler (p-code generator?) than the one I 
currently have on my linux webserver/gateway box.

It's running a ClarkConnect 2.01 package based on Red Hat 7.2 linux.

I downloaded the zipped tarball (Python-2.4.4.tgz), ran gunzip, then 
un-tarred it in /usr/local.  Then (logged in as root) from 
/usr/local/Python-2.4.4 I ran the configure script which appeared to 
run properly.  At least there were no error messages that I 
saw.  Then I attempted to run make install and ended up with an 
error make *** Error 1.  It was right at the libinstall section 
of the make, so I did some googling and came up with the following command:
[EMAIL PROTECTED] Python-2.4.4]# make libinstall inclinstall

After thrashing for about 5 minutes, I got basically the same message:
Compiling /usr/local/lib/python2.4/zipfile.py ...
make: *** [libinstall] Error 1

I dunno if this is relevant, but I have Python 2.2.2 in the 
/usr/Python-2.2.2 directory.  Do I have to blow this away in order to 
install the newer distro?  Or do I need to install the new one in/usr 
instead of /usr/local?

Although I'm a retired programmer (mainframes), I'm still learning 
this linux stuff.  I guess that makes me a noob...I hope you'll take 
that into consideration.

Thanks,

Ray

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming Language for Systems Administrator

2005-04-12 Thread Buck Nuggets
 I also tried SAP-DB before.

Now known as (or was, last time I checked) MaxDB by MySQL

and formerly known as the pre-relational dbms 'Adabas'.  I think the
only reason for its continued existance is that SAP was hoping for a
very low cost, low-end database years ago.  However, the database world
has changed substantially over the last ten years: you can get
postgresql and firebird for nothing, and db2  oracle are often under
$1000 for a small server.

With that in mind I can't think of a database that's more of a has-been
than maxdb.  Maybe something from the 70s like IMS-DB or Model 204?

buck

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: database in python ?

2005-04-12 Thread Buck Nuggets
 In truth, although postgres has more features, MySQL is probably
 better for someone who is just starting to use databases to develop
 for: the chances are higher that anyone using their code will have
 MySQL than Postgres, and they aren't going to need the features
 that Postgresql has that MySQL doesn't.  IMO, this has changed
 since only a year or two ago, when MySQL didn't support foreign-key
 constraints.

mysql does deserve serious consideration now that it supports
transactions.  However, keep in mind:

1.  mysql doesn't support transactions - one of its io layers (innodb)
does.  If you're hoping to get your application hosted you will find
that most mysql installations don't support innodb.  And due to the
bugs in mysql, when you attempt to create a transaction-safe table in
mysql if innodb isn't available it will just silently create it in
myisam, and your transactions will be silently ignored.

2.  mysql is still missing quite a few database basics - views are the
most amazing omission, but the list also includes triggers and stored
procedures as well.  Although most of these features are included in
the new beta, they aren't yet available in production.

3.  mysql has an enormous number of non-standard features such as
comment formatting, how nulls work, concatenation operator, etc.  This
means that you'll learn non-standard sql, and most likely write
non-portable sql.

4.  additionally, mysql has a peculiar set of bugs - in which the
database will change your data and report no exception.  These bugs
were probably a reflection of mysql's marketing message that the
database should do nothing but persist data, and data quality was the
responsibility of the application.  This self-serving message appears
to have been dropped now that they are catching up with other products,
but there's a legacy of cruft that still remains.  Examples of these
errors include:  silent truncation of strings to fit max varchar
length, allows invalid dates, truncation of numeric data to fit max
numeric values, etc.

5.  cost: mysql isn't expensive, but it isn't free either.  Whether or
not you get to use it for free depends on how you interpret their
licensing info and faq.  MySQL's recommendation if you're confused (and
many are) is to license the product or call one of their reps.

Bottomline - mysql has a lot of marketshare, is improving, and I'm sure
that it'll eventually be a credible product.  But right now it's has a
wide range of inexcusable problems.

More info at http://sql-info.de/mysql/gotchas.html

buck

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: database in python ?

2005-04-12 Thread Buck Nuggets
 It's not a bug if you didn't RTFM.

Maybe it's not a bug if it's the only DBMS you've ever used and you
actually believe that overriding explicit  critical declaratives is a
valid design choice.  But it is a bug if it's still only partially
supported in a beta version that nobody is yet hosting.

But maybe this release will actually fix ten years of negligence in one
fell swoop - and all these issues will be easily eliminated.  But just
in case that turns out to be difficult, and there's some reason it has
taken all this time to achive, just wait and see what this guys finds:
   http://sql-info.de/mysql/gotchas.html

BTW, you should upgrade, they're now on 5.0.3.  Their support site
appears to be down right now (timeouts) so I can't check the new bug
list, but since 5.0.2 is beta, it may have introduced more problems
than it solved.

buck

-- 
http://mail.python.org/mailman/listinfo/python-list