Re: Mutating an HTML file with BeautifulSoup
I've had much success doing round trips through the lxml.html parser. https://lxml.de/lxmlhtml.html I ditched bs for lxml long ago and never regretted it. If you find that you have a bunch of invalid html that lxml inadvertently "fixes", I would recommend adding a stutter-step to your project: perform a noop roundtrip thru lxml on all files. I'd then analyze any diff by progressively excluding changes via `grep -vP`. Unless I'm mistaken, all such changes should fall into no more than a dozen groups. On Fri, Aug 19, 2022, 1:34 PM Chris Angelico wrote: > What's the best way to precisely reconstruct an HTML file after > parsing it with BeautifulSoup? > > Using the Alice example from the BS4 docs: > > >>> html_doc = """The Dormouse's story > > The Dormouse's story > > Once upon a time there were three little sisters; and > their names were > http://example.com/elsie; class="sister" id="link1">Elsie, > http://example.com/lacie; class="sister" id="link2">Lacie and > http://example.com/tillie; class="sister" id="link3">Tillie; > and they lived at the bottom of a well. > > ... > """ > >>> print(soup) > The Dormouse's story > > The Dormouse's story > Once upon a time there were three little sisters; and > their names were > http://example.com/elsie; id="link1">Elsie, > http://example.com/lacie; id="link2">Lacie and > http://example.com/tillie; id="link3">Tillie; > and they lived at the bottom of a well. > ... > > >>> > > Note two distinct changes: firstly, whitespace has been removed, and > secondly, attributes are reordered (I think alphabetically). There are > other canonicalizations being done, too. > > I'm trying to make some automated changes to a huge number of HTML > files, with minimal diffs so they're easy to validate. That means that > spurious changes like these are very much unwanted. Is there a way to > get BS4 to reconstruct the original precisely? > > The mutation itself would be things like finding an anchor tag and > changing its href attribute. Fairly simple changes, but might alter > the length of the file (eg changing "http://example.com/; into > "https://example.com/;). I'd like to do them intelligently rather than > falling back on element.sourceline and element.sourcepos, but worst > case, that's what I'll have to do (which would be fiddly). > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: New Python implementation
On Thu, Feb 11, 2021 at 1:49 PM dn via Python-list wrote: > When I first met it, one of the concepts I found difficult to 'wrap my > head around' was the idea that "open software" allowed folk to fork the > original work and 'do their own thing'. My thinking was (probably) > "surely, the original is the authoritative version". Having other > versions seemed an invitation to confusion and dilution. > > However, as soon as (open) software is made available, other people > start making it 'better' - whatever their own definition of "better". > > Yes, it is both a joy and a complication. > > ... > > Wishing you well. It seems (to (neos-ignorant) me at least) an ambitious > project. There are certainly times when 'execution speed' becomes a > major criteria. Many of us will look forward to (your development of) a > solution. Please let us know when it's ready for use/trials... > Well put! Thank you for this thoughtful and informative message. You obviously put substantial work into it. -- https://mail.python.org/mailman/listinfo/python-list
Re: Explicit vararg values
Received? On Sun, Sep 16, 2018 at 3:39 PM Buck Evan wrote: > I started to send this to python-ideas, but I'm having second thoughts. > Does tihs have merit? > > --- > I stumble on this a lot, and I see it in many python libraries: > > def f(*args, **kwargs): > ... > > f(*[list comprehension]) > f(**mydict) > > It always seems a shame to carefully build up an object in order to > explode it, just to pack it into a near-identical object. > > Today I was fiddling with the new python3.7 inspect.signature > functionality when I ran into this case: > > def f(**kwargs): pass > sig = inspect.signature(f) > print(sig.bind(a=1, b=2)) > > The output is "". I found this a > bit humorous since anyone attempting to bind values in this way, using > f(kwargs={'a': 1, 'b': 2}) will be sorely dissappointed. I also wondered > why BoundArguments didn't print '**kwargs' since that's the __str__ of that > parameter object. > > The syntax I'm proposing is: >f(**kwargs={'a': 1, 'b': 2}) > > as a synonym of f(a=1, b=2) when an appropriate dictionary is already on > hand. > > --- > I can argue for this another way as well. > > 1) > When both caller and callee have a known number of values to pass/receive, > that's the usual syntax: > def f(x) and f(1) > > 2) > When the caller has a fixed set of values, but the callee wants to handle > a variable number: def f(*args) and f(1) > > 3) > Caller has a variable number of arguments (varargs) but the call-ee is > fixed, that's the splat operator: def f(x) and f(*args) > > 4) > When case 1 and 3 cross paths, and we have a vararg in both the caller and > callee, right now we're forced to splat both sides: def f(*args) and > f(*args), but I'd like the option of opting-in to passing along my list > as-is with no splat or collection operations involved: def f(*args) and > f(*args=args) > > Currently the pattern to handle case 4 neatly is to define two versions of > a vararg function: > > def f(*arg, **kwargs): > return _f(args, kwargs) > > return _f(args, kwargs): > ... > > Such that when internal calllers hit case 4, there's a simple and > efficient way forward -- use the internal de-vararg'd definition of f. > External callers have no such option though, without breaking protected api > convention. > > My proposal would simplify this implementation as well as allowing users > to make use of a similar calling convention that was only provided > privately before. > > Examples: > > log(*args) and _log(args) in logging.Logger > format and vformat of strings.Formatter > -- https://mail.python.org/mailman/listinfo/python-list
Explicit vararg values
I started to send this to python-ideas, but I'm having second thoughts. Does tihs have merit? --- I stumble on this a lot, and I see it in many python libraries: def f(*args, **kwargs): ... f(*[list comprehension]) f(**mydict) It always seems a shame to carefully build up an object in order to explode it, just to pack it into a near-identical object. Today I was fiddling with the new python3.7 inspect.signature functionality when I ran into this case: def f(**kwargs): pass sig = inspect.signature(f) print(sig.bind(a=1, b=2)) The output is "". I found this a bit humorous since anyone attempting to bind values in this way, using f(kwargs={'a': 1, 'b': 2}) will be sorely dissappointed. I also wondered why BoundArguments didn't print '**kwargs' since that's the __str__ of that parameter object. The syntax I'm proposing is: f(**kwargs={'a': 1, 'b': 2}) as a synonym of f(a=1, b=2) when an appropriate dictionary is already on hand. --- I can argue for this another way as well. 1) When both caller and callee have a known number of values to pass/receive, that's the usual syntax: def f(x) and f(1) 2) When the caller has a fixed set of values, but the callee wants to handle a variable number: def f(*args) and f(1) 3) Caller has a variable number of arguments (varargs) but the call-ee is fixed, that's the splat operator: def f(x) and f(*args) 4) When case 1 and 3 cross paths, and we have a vararg in both the caller and callee, right now we're forced to splat both sides: def f(*args) and f(*args), but I'd like the option of opting-in to passing along my list as-is with no splat or collection operations involved: def f(*args) and f(*args=args) Currently the pattern to handle case 4 neatly is to define two versions of a vararg function: def f(*arg, **kwargs): return _f(args, kwargs) return _f(args, kwargs): ... Such that when internal calllers hit case 4, there's a simple and efficient way forward -- use the internal de-vararg'd definition of f. External callers have no such option though, without breaking protected api convention. My proposal would simplify this implementation as well as allowing users to make use of a similar calling convention that was only provided privately before. Examples: log(*args) and _log(args) in logging.Logger format and vformat of strings.Formatter -- https://mail.python.org/mailman/listinfo/python-list
[issue34706] Signature.from_callable sometimes drops subclassing
Change by Buck Evan : -- type: -> behavior ___ Python tracker <https://bugs.python.org/issue34706> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34706] Signature.from_callable sometimes drops subclassing
New submission from Buck Evan : Specifically in the case of a class that does not override its constructor signature inherited from object. Github PR incoming shortly. -- components: Library (Lib) messages: 325501 nosy: bukzor priority: normal severity: normal status: open title: Signature.from_callable sometimes drops subclassing versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue34706> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24085] large memory overhead when pyc is recompiled
Buck Evan added the comment: @serhiy.storchaka This is a very stable piece of a legacy code base, so we're not keen to refactor it so dramatically, although we could. We've worked around this issue by compiling pyc files ahead of time and taking extra care that they're preserved through deployment. This isn't blocking our 2.7 transition anymore. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24085 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24085] large memory overhead when pyc is recompiled
Buck Evan added the comment: New data: The memory consumption seems to be in the compiler rather than the marshaller: ``` $ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro' 16032 $ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro' 16032 $ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro' 16032 $ python -c 'import repro' 16032 $ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro' 8984 $ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro' 8984 $ PYTHONDONTWRITEBYTECODE=1 python -c 'import repro' 8984 ``` We were trying to use PYTHONDONTWRITEBYTECODE as a workaround to this issue, but it didn't help us because of this. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24085 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24085] large memory overhead when pyc is recompiled
New submission from Buck Evan: In the attached example I show that there's a significant memory overhead present whenever a pre-compiled pyc is not present. This only occurs with more than 5225 objects (dictionaries in this case) allocated. At 13756 objects, the mysterious pyc overhead is 50% of memory usage. I've reproduced this issue in python 2.6, 2.7, 3.4. I imagine it's present in all cpythons. $ python -c 'import repro' 16736 $ python -c 'import repro' 8964 $ python -c 'import repro' 8964 $ rm *.pyc; python -c 'import repro' 16740 $ rm *.pyc; python -c 'import repro' 16736 $ rm *.pyc; python -c 'import repro' 16740 -- files: repro.py messages: 242281 nosy: bukzor priority: normal severity: normal status: open title: large memory overhead when pyc is recompiled versions: Python 3.4 Added file: http://bugs.python.org/file39238/repro.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24085 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24085] large memory overhead when pyc is recompiled
Buck Evan added the comment: Also, we've reproduced this in both linux and osx. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24085 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5945] PyMapping_Check returns 1 for lists
Buck Golemon added the comment: We've hit this problem today. What are we supposed to do in the meantime? -- nosy: +bukzor ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5945 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22722] inheritable pipes are unwieldy without os.pipe2
New submission from Buck Golemon: In order to make an inheritable pipe, the code is quite a bit different between posixes that implement pipe2 and those that don't (osx, mainly). I believe the officially-supported path is to call os.pipe() then os.setinheritable(). This seems objectionable since set_inheritable() code is invoked twice, where I'd prefer to invoke it zero times (or at most once). Would it be acceptable to implement a pipe2 shim for those platforms? If so, I'll (attempt to) provide a patch. Alternatively, can we change the signature of os.pipe() to os.pipe(flags=O_CLOEXEC) ? In my opinion, such a function could be implemented via pipe2 on those platforms that provide it, obviating any need for an os.pipe2. Please tell me which patch to provide, if any. -- messages: 229947 nosy: bukzor priority: normal severity: normal status: open title: inheritable pipes are unwieldy without os.pipe2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22722 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22723] visited-link styling is not accessible
New submission from Buck Golemon: The color needs adjusted such that it has at least 3:1 luminance contrast versus the surrounding non-link text. (See non-inheritable https://docs.python.org/3/library/os.html#os.dup) See also: * http://www.w3.org/TR/WCAG20/#visual-audio-contrast-without-color * http://www.w3.org/WAI/WCAG20/Techniques/working-examples/G183/link-contrast.html Given that the surrounding text is #222, the a:visited color should be bumped from #30306f to #6363bb in order to meet the 3:1 luminance-contrast guideline while preserving the hue and saturation. By the same calculation, the un-visited links are slightly too dark and should be bumped from #00608f to #0072aa Validation was done here: http://juicystudio.com/services/luminositycontrastratio.php Luminance adjustments done here: http://colorizer.org/ -- assignee: docs@python components: Documentation messages: 229952 nosy: bukzor, docs@python priority: normal severity: normal status: open title: visited-link styling is not accessible ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22723 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22722] inheritable pipes are unwieldy without os.pipe2
Buck Golemon added the comment: I notice that dup2 grew an `inheritable=True` argument in 3.4. This might be a good precedent to use here, as a third option. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22722 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22723] visited-link styling is not accessible
Buck Golemon added the comment: Proposed patch attached. -- keywords: +patch Added file: http://bugs.python.org/file37006/link-color.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22723 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22455] idna/punycode give wrong results on narrow builds
New submission from Buck Golemon: I have fixed the issue in my branch here: https://github.com/bukzor/cpython/commit/013e689731ba32319f05a62a602f01dd7d7f2e83 I don't propose it as a patch, but as a proof of concept and point of discussion. If there's no chance of shipping a fix in 2.7.9, feel free to close. -- messages: 227240 nosy: bukzor priority: normal severity: normal status: open title: idna/punycode give wrong results on narrow builds versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22455 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: python 3.44 float addition bug?
It used to be that the best way to compare floating point numbers while disregarding the inherent epsilon was to use `str(x) == str(y)`. It looks like that workaround doesn't work anymore in 3.4. What's the recommended way to do this now? format(.01 + .01 + .01 + .01 + .01 + .01, 'g') == format(.06, 'g') True On Saturday, June 21, 2014 12:24:24 PM UTC-7, Ned Deily wrote: In article captjjmrkpd5k__h9qg12q+arafzvan6egudtmedge2ccaqe...@mail.gmail.com, Chris Angelico ros...@gmail.com wrote: Also, when you're looking at how things print out, consider looking at two things: the str() and the repr(). Sometimes just print(p) doesn't give you all the info, so you might instead want to write your loop thus: z = 0.01 p = 0.0 for i in range(19): p += z print(str(p) + -- + repr(p)) Sometimes you can get extra clues that way, although in this instance I think you won't. Actually, I think this is one case where you would get extra clues (or extra headscratching) if you run the code with various releases of Python. $ python2.6 b.py 0.01 -- 0.01 0.02 -- 0.02 0.03 -- 0.02 0.04 -- 0.040001 0.05 -- 0.050003 0.06 -- 0.060005 0.07 -- 0.070007 0.08 -- 0.080002 0.09 -- 0.089997 0.1 -- 0.02 0.11 -- 0.10999 0.12 -- 0.11998 0.13 -- 0.12998 0.14 -- 0.13999 0.15 -- 0.14999 0.16 -- 0.16 0.17 -- 0.17001 0.18 -- 0.18002 0.19 -- 0.19003 $ python2.7 b.py 0.01 -- 0.01 0.02 -- 0.02 0.03 -- 0.03 0.04 -- 0.04 0.05 -- 0.05 0.06 -- 0.060005 0.07 -- 0.07 0.08 -- 0.08 0.09 -- 0.09 0.1 -- 0.0 0.11 -- 0.10999 0.12 -- 0.11998 0.13 -- 0.12998 0.14 -- 0.13999 0.15 -- 0.15 0.16 -- 0.16 0.17 -- 0.17 0.18 -- 0.18002 0.19 -- 0.19003 $ python3.4 b.py 0.01 -- 0.01 0.02 -- 0.02 0.03 -- 0.03 0.04 -- 0.04 0.05 -- 0.05 0.060005 -- 0.060005 0.07 -- 0.07 0.08 -- 0.08 0.09 -- 0.09 0.0 -- 0.0 0.10999 -- 0.10999 0.11998 -- 0.11998 0.12998 -- 0.12998 0.13999 -- 0.13999 0.15 -- 0.15 0.16 -- 0.16 0.17 -- 0.17 0.18002 -- 0.18002 0.19003 -- 0.19003 What's going on here is that in Python 2.7 the repr() of floats was changed to use the minimum number of digits to accurately roundtrip the number under correct rounding. For compatibility reasons, the str() representation was not changed for 2.7. But in Python 3.2, str() was changed to be identical to repr() for floats. It's important to keep in mind that the actual binary values stored in float objects are the same across all of these releases; only the representation of them as decimal characters varies. https://docs.python.org/2.7/whatsnew/2.7.html#other-language-changes http://bugs.python.org/issue9337 -- Ned Deily, n...@acm.org -- https://mail.python.org/mailman/listinfo/python-list
[issue1243678] httplib gzip support
Buck Golemon added the comment: I believe this issue is still extant. The tip httplib client neither sends accept-encoding gzip nor supports content-encoding gzip. http://hg.python.org/cpython/file/tip/Lib/http/client.py#l1012 There is a diff to httplib in this attached patch, where there was none in #1675951. -- nosy: +Buck.Golemon ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1243678 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: graphical python
On Sunday, January 19, 2014 12:19:29 AM UTC-8, Ian wrote: On Sat, Jan 18, 2014 at 10:40 PM, buck w***@gmail.com wrote: I'm trying to work through Skienna's algorithms handbook, and note that the author often uses graphical representations of the diagrams to help understand (and even debug) the algorithms. I'd like to reproduce this in python. How would you go about this? pyQt, pygame and pyglet immediately come to mind, but if I go that route the number of people that I can share my work with becomes quite limited, as compared to the portability of javascript projects. I guess my question really is: has anyone had success creating an interactive graphical project in the browser using python? Is this a dream I should give up on, and just do this project in coffeescript/d3? You should be able to do something without much fuss using HTML 5 and either Pyjamas (which compiles Python code to Javascript) or Brython (a more or less complete implementation of Python within Javascript). For example, see the clock demo on the Brython web page. Pyjamas is the more established and probably more stable of the two projects, but you should be aware that there are currently two active forks of Pyjamas and some controversy surrounding the project leadership. Thanks Ian. Have you personally used pyjs successfully? It's ominous that the examples pages are broken... I was impressed with the accuracy of the Brython implementation. I hope they're able to decrease the web weight in future versions. -- https://mail.python.org/mailman/listinfo/python-list
graphical python
I'm trying to work through Skienna's algorithms handbook, and note that the author often uses graphical representations of the diagrams to help understand (and even debug) the algorithms. I'd like to reproduce this in python. How would you go about this? pyQt, pygame and pyglet immediately come to mind, but if I go that route the number of people that I can share my work with becomes quite limited, as compared to the portability of javascript projects. I guess my question really is: has anyone had success creating an interactive graphical project in the browser using python? Is this a dream I should give up on, and just do this project in coffeescript/d3? -- https://mail.python.org/mailman/listinfo/python-list
Re: latin1 and cp1252 inconsistent?
On Friday, November 16, 2012 4:33:14 PM UTC-8, Nobody wrote: On Fri, 16 Nov 2012 13:44:03 -0800, buck wrote: IOW: Microsoft's embrace, extend, extinguish strategy has been too successful and now we have to deal with it. If HTML content is tagged as using ISO-8859-1, it's more likely that it's actually Windows-1252 content generated by someone who doesn't know the difference. Yes that's exactly what it says. Given that the only differences between the two are for code points which are in the C1 range (0x80-0x9F), which should never occur in HTML, parsing ISO-8859-1 as Windows-1252 should be harmless. should is a wish. The reality is that documents (and especially URLs) exist that can be decoded with latin1, but will backtrace with cp1252. I see this as a sign that a small refactorization of cp1252 is in order. The proposal is to change those UNDEFINED entries to control entries, as is done here: http://dvcs.w3.org/hg/encoding/raw-file/tip/index-windows-1252.txt and here: ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt This is in line with the unicode standard, which says: http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf There are 65 code points set aside in the Unicode Standard for compatibility with the C0 and C1 control codes defined in the ISO/IEC 2022 framework. The ranges of these code points are U+..U+001F, U+007F, and U+0080..U+009F, which correspond to the 8-bit controls 0x00 to 0x1F (C0 controls), 0x7F (delete), and 0x80 to 0x9F (C1 controls), respectively ... There is a simple, one-to-one mapping between 7-bit (and 8-bit) control codes and the Unicode control codes: every 7-bit (or 8-bit) control code is numerically equal to its corresponding Unicode code point. IOW: Bytes with undefined semantics in the C0/C1 range are control codes, which decode to the unicode-point of equal value. This is exactly the section which allows latin1 to decode 0x81 to U+81, even though ISO-8859-1 explicitly does not define semantics for that byte (6.2 ftp://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf) -- http://mail.python.org/mailman/listinfo/python-list
latin1 and cp1252 inconsistent?
Latin1 has a block of 32 undefined characters. Windows-1252 (aka cp1252) fills in 27 of these characters but leaves five undefined: 0x81, 0x8D, 0x8F, 0x90, 0x9D The byte 0x81 decoded with latin gives the unicode 0x81. Decoding the same byte with windows-1252 yields a stack trace with `UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 0: character maps to undefined` This seems inconsistent to me, given that this byte is equally undefined in the two standards. Also, the html5 standard says: When a user agent [browser] would otherwise use a character encoding given in the first column [ISO-8859-1, aka latin1] of the following table to either convert content to Unicode characters or convert Unicode characters to bytes, it must instead use the encoding given in the cell in the second column of the same row [windows-1252, aka cp1252]. http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#character-encodings-0 The current implementation of windows-1252 isn't usable for this purpose (a replacement of latin1), since it will throw an error in cases that latin1 would succeed. -- http://mail.python.org/mailman/listinfo/python-list
Re: latin1 and cp1252 inconsistent?
On Friday, November 16, 2012 2:34:32 PM UTC-8, Ian wrote: On Fri, Nov 16, 2012 at 2:44 PM, buck wrote: Latin1 has a block of 32 undefined characters. These characters are not undefined. 0x80-0x9f are the C1 control codes in Latin-1, much as 0x00-0x1f are the C0 control codes, and their Unicode mappings are well defined. They are indeed undefined: ftp://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf The shaded positions in the code table correspond to bit combinations that do not represent graphic characters. Their use is outside the scope of ISO/IEC 8859; it is specified in other International Standards, for example ISO/IEC 6429. However it's reasonable for 0x81 to decode to U+81 because the unicode standard says: http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf The semantics of the control codes are generally determined by the application with which they are used. However, in the absence of specific application uses, they may be interpreted according to the control function semantics specified in ISO/IEC 6429:1992. You can use a non-strict error handling scheme to prevent the error. b'hello \x81 world'.decode('cp1252', 'replace') 'hello \ufffd world' This creates a non-reversible encoding, and loss of data, which isn't acceptable for my application. -- http://mail.python.org/mailman/listinfo/python-list
[issue15009] urlsplit can't round-trip relative-host urls.
Buck Golemon buck.gole...@amd.com added the comment: Let's examine x:// absolute-URI = scheme : hier-part [ ? query ] hier-part = // authority path-abempty So this is okay if authority and path-abempty can both be empty strings. authority = [ userinfo @ ] host [ : port ] host = IP-literal / IPv4address / reg-name reg-name = *( unreserved / pct-encoded / sub-delims ) path-abempty = *( / segment ) Yep. And the same applies for x:///y, except that path-abempty matches /y instead of nothing. This means these are in fact valid urls per RFC3986, counter to your claim. -- nosy: +bukzor ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15009 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15009] urlsplit can't round-trip relative-host urls.
Buck Golemon b...@yelp.com added the comment: Well i think the real issue is that you can't enumerate the protocals that use netloc. All protocols are allowed to have a netloc. the smb: protocol certainly does, but it's not in the list. The core issue is that smb:/foo and smb:///foo are different urls, and should be represented differently when split. The /// form has a netloc, it's just the empty-string. The single-slash form has no netloc, so I propose that urlsplit('smb:/foo') return SplitResult(scheme='smb', netloc=None, path='/foo', query='', fragment='') -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15009 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15009] urlsplit can't round-trip relative-host urls.
New submission from Buck Golemon b...@yelp.com: 1) As long as x is valid, I expect that urlunsplit(urlsplit(x)) == x 2) yelp:///foo is a well-formed (albeit odd) url. It it similar to file:///tmp: it specifies the /foo resource, on the current host, using the yelp protocol (defined on mobile devices). from urlparse import urlsplit, urlunsplit urlunsplit(urlsplit('yelp:///foo')) 'yelp:/foo' Urlparse / unparse has the same bug: urlunparse(urlparse('yelp:///foo')) 'yelp:/foo' The file: protocol seems to be special-case, in an inappropriate manner: urlunsplit(urlsplit('file:///tmp')) 'file:///tmp' -- components: Library (Lib) messages: 162378 nosy: Buck.Golemon priority: normal severity: normal status: open title: urlsplit can't round-trip relative-host urls. versions: Python 2.6, Python 2.7, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15009 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
sum() requires number, not simply __add__
I feel like the design of sum() is inconsistent with other language features of python. Often python doesn't require a specific type, only that the type implement certain methods. Given a class that implements __add__ why should sum() not be able to operate on that class? We can fix this in a backward-compatible way, I believe. Demonstration: I'd expect these two error messages to be identical, but they are not. class C(object): pass c = C() sum((c,c)) TypeError: unsupported operand type(s) for +: 'int' and 'C' c + c TypeError: unsupported operand type(s) for +: 'C' and 'C' -- http://mail.python.org/mailman/listinfo/python-list
Re: sum() requires number, not simply __add__
On Feb 23, 1:19 pm, Buck Golemon b...@yelp.com wrote: I feel like the design of sum() is inconsistent with other language features of python. Often python doesn't require a specific type, only that the type implement certain methods. Given a class that implements __add__ why should sum() not be able to operate on that class? We can fix this in a backward-compatible way, I believe. Demonstration: I'd expect these two error messages to be identical, but they are not. class C(object): pass c = C() sum((c,c)) TypeError: unsupported operand type(s) for +: 'int' and 'C' c + c TypeError: unsupported operand type(s) for +: 'C' and 'C' Proposal: def sum(values, base=0): values = iter(values) try: result = values.next() except StopIteration: return base for value in values: result += value return result -- http://mail.python.org/mailman/listinfo/python-list
Re: sum() requires number, not simply __add__
On Feb 23, 1:32 pm, Chris Rebert c...@rebertia.com wrote: On Thu, Feb 23, 2012 at 1:19 PM, Buck Golemon b...@yelp.com wrote: I feel like the design of sum() is inconsistent with other language features of python. Often python doesn't require a specific type, only that the type implement certain methods. Given a class that implements __add__ why should sum() not be able to operate on that class? The time machine strikes again! sum() already can. You just need to specify an appropriate initial value (the empty list in this example) for the accumulator : Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type help, copyright, credits or license for more information. sum([[1,2],[3,4]], []) [1, 2, 3, 4] Cheers, Chris --http://rebertia.com Thanks. I did not know that! My proposal is still *slightly* superior in two ways: 1) It reduces the number of __add__ operations by one 2) The second argument isn't strictly necessary, if you don't mind that the 'null sum' will produce zero. def sum(values, base=0): values = iter(values) try: result = values.next() except StopIteration: return base for value in values: result += value return result -- http://mail.python.org/mailman/listinfo/python-list
Re: Debugging a difficult refcount issue.
This is what I came up with: https://gist.github.com/1496028 We'll see if it helps, tomorrow. On Sunday, December 18, 2011 6:01:50 PM UTC-8, buck wrote: Thanks Jack. I think printf is what it will come down to. I plan to put a little code into PyDict_New to print the id and the line at which it was allocated. Hopefully this will show me all the possible suspects and I can figure it out from there. I hope figuring out the file and line-number from within that code isn't too hard. On Sunday, December 18, 2011 9:52:46 AM UTC-8, Jack Diederich wrote: I don't have any great advice, that kind of issue is hard to pin down. That said, do try using a python compile with --with-debug enabled, with that you can turn your unit tests on and off to pinpoint where the refcounts are getting messed up. It also causes python to use plain malloc()s so valgrind becomes useful. Worst case add assertions and printf()s in the places you think are most janky. -Jack On Sat, Dec 17, 2011 at 11:17 PM, buck work...@gmail.com wrote: I'm getting a fatal python error Fatal Python error: GC object already tracked[1]. Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause. I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated. I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2]. How should I go about debugging this further? I've been completely stuck on this for two days now :( [1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267 [2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Debugging a difficult refcount issue.
On Saturday, December 17, 2011 11:55:13 PM UTC-8, Paul Rubin wrote: buck workit...@gmail.com writes: I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2]. Did you try a gdb watchpoint? I didn't try that, since that piece of code is run millions of times, and I don't know the dict-id I'm looking for until after the problem has occurred. -- http://mail.python.org/mailman/listinfo/python-list
Re: Debugging a difficult refcount issue.
Thanks Jack. I think printf is what it will come down to. I plan to put a little code into PyDict_New to print the id and the line at which it was allocated. Hopefully this will show me all the possible suspects and I can figure it out from there. I hope figuring out the file and line-number from within that code isn't too hard. On Sunday, December 18, 2011 9:52:46 AM UTC-8, Jack Diederich wrote: I don't have any great advice, that kind of issue is hard to pin down. That said, do try using a python compile with --with-debug enabled, with that you can turn your unit tests on and off to pinpoint where the refcounts are getting messed up. It also causes python to use plain malloc()s so valgrind becomes useful. Worst case add assertions and printf()s in the places you think are most janky. -Jack On Sat, Dec 17, 2011 at 11:17 PM, buck workit...@gmail.com wrote: I'm getting a fatal python error Fatal Python error: GC object already tracked[1]. Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause. I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated. I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2]. How should I go about debugging this further? I've been completely stuck on this for two days now :( [1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267 [2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Debugging a difficult refcount issue.
I'm getting a fatal python error Fatal Python error: GC object already tracked[1]. Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause. I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated. I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2]. How should I go about debugging this further? I've been completely stuck on this for two days now :( [1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267 [2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload -- http://mail.python.org/mailman/listinfo/python-list
Re: Pythonification of the asterisk-based collection packing/unpacking syntax
I like the spirit of this. Let's look at your examples. Examples of use: head, tail::tuple = ::sequence def foo(args::list, kwargs::dict): pass foo(::args, ::kwargs) My initial reaction was nonono!, but this is simply because of the ugliness. The double-colon is very visually busy. I find that your second example is inconsistent with the others. If we say that the variable-name is always on the right-hand-side, we get: def foo(list::args, dict::kwargs): pass This nicely mirrors other languages (such as in your C# example: float foo) as well as the old python behavior (prefixing variables with */** to modify the assignment). As for the separator, let's examine the available ascii punctuation. Excluding valid variable characters, whitespace, and operators, we have: ! -- ok. -- can't use this. Would look like a string. # -- no. Would looks like a comment. $ -- ok. ' -- no. Would look like a string. ( -- no. Would look like a function. ) -- no. Would look like ... bad syntax. , -- no. Would indicate a separate item in the variable list. . -- no. Would look like an attribute. : -- ok, maybe. Seems confusing in a colon-terminated statement. ; -- no, just no. ? -- ok. @ -- ok. [ -- no. Would look like indexing. ] -- no. ` -- no. Would look like a string? { -- too strange } -- too strange ~ -- ok. That leaves these. Which one looks least strange? float ! x = 1 float $ x = 1 float ? x = 1 float @ x = 1 The last one looks decorator-ish, but maybe that's proper. The implementation of this would be quite decorator-like: take the normal value of x, pass it through the indicated function, assign that value back to x. Try these on for size. head, @tuple tail = sequence def foo(@list args, @dict kwargs): pass foo(@args, @kwargs) For backward compatibility, we could say that the unary * is identical to @list and unary ** is identical to @dict. -buck -- http://mail.python.org/mailman/listinfo/python-list
Re: Development tools and practices for Pythonistas
I use hg for even 50-line standalone python scripts. It's very well suited to these small environments, and scales up nicely. cd /my/working/dir hg init hg add myscript.py hg ci -m 'added myscript' It's that simple, and now hyou can go back if you make a terrible mistake, and you can post it to bitbucket and share with the world if you like, almost as easily. --Buck -- http://mail.python.org/mailman/listinfo/python-list
Re: [OT] From svn to something else?
This is what made me choose Mercurial in my recent search. http://www.python.org/dev/peps/pep-0374/ There is a tremendous amount of detail there. In summary, hg and git are both very good, and essentially equal in features. The only salient difference is that hg is implemented in python, so they went with that. I did the same, and I'm quite happy. It's basically svn with the shiny new distributed features added. -- http://mail.python.org/mailman/listinfo/python-list
multiprocessing: file-like object
I've been having issues with getting a file-like object to work with multiprocessing. Since the details are quite lengthy, I've posted them on stackoverflow here: http://stackoverflow.com/questions/5821880/python-multiprocessing-synchronizing-file-like-object I hope I'm not being super rude by cross-posting, but I thought some of you would be interested in the question, and I'd be delighted to get some ideas. -- http://mail.python.org/mailman/listinfo/python-list
Re: Equivalent code to the bool() built-in function
I'm not not touching you! -- http://mail.python.org/mailman/listinfo/python-list
[issue8326] Cannot import name SemLock on Ubuntu
Buck Golemon buck.gole...@amd.com added the comment: @Barry: Yes, it's still a problem. The ubuntu 10.10 python2.7 still has no multiprocessing. Since the EOL is April 2012, it needs fixed. It may be considered an invalid python bug, since it seems to be strictly related to Ubuntu packaging, but I thought the python maintainers should know. --Buck -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8326 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8326] Cannot import name SemLock on Ubuntu
Buck Golemon buck.gole...@amd.com added the comment: python2.7.1+ from mercurial supports sem_open (and multiprocessing) just fine. doko: Could you help us figure out why the ubuntu 10.10 python2.7 build has this issue? I believe this issue should be assigned to you? Relevant lines from the config.log: configure:9566: checking for sem_open configure:9566: gcc -pthread -o conftest -g -O2 conftest.c -lpthread -ldl 5 configure:9566: $? = 0 configure:9566: result: yes -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8326 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8326] Cannot import name SemLock on Ubuntu lucid
Buck Golemon buck.gole...@amd.com added the comment: Isn't this an Ubuntu problem if sem_open only works with some specific kernels? sem_open works fine (python2.6 is using it), but the python2.7 build process didn't detect it properly. This is either a bug with Ubuntu's python2.7 build configuration, or with python2.7's feature detection for sem_open. I'm not sure which. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8326 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8326] Cannot import name SemLock on Ubuntu
Changes by Buck Golemon buck.gole...@amd.com: -- title: Cannot import name SemLock on Ubuntu lucid - Cannot import name SemLock on Ubuntu ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8326 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8326] Cannot import name SemLock on Ubuntu
Buck Golemon buck.gole...@amd.com added the comment: I suggest that you try to build from the above mercurial repository and see if the problem persists. How do I know the configuration options that the Ubuntu packager used? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8326 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8326] Cannot import name SemLock on Ubuntu lucid
Buck Golemon buck.gole...@amd.com added the comment: On Ubuntu 10.10 (maverick), python2.6 is functioning correctly, but python2.7 is giving this error again. $ /usr/bin/python2.7 from multiprocessing.synchronize import Semaphore ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770. -- nosy: +bukzor ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8326 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9583] PYTHONOPTIMIZE = 0 is not honored
Buck Golemon buck.gole...@amd.com added the comment: Minimal demo: $ setenv PYTHONOPTIMIZE 0 $ python3.1 -OO -c print(__debug__) False I've used this code to get the desired functionality: if [[ $TESTING == 1 || ${PYTHONOPTIMIZE-2} =~ '^(0*|)$' ]]; then #someone is requesting no optimization export -n PYTHONOPTIMIZE opt='' elif [[ $PYTHONOPTIMIZE ]]; then #someone is setting their own optimization opt='' else #optimization by default opt='-O' fi exec $INSTALL_BASE/bin/python2.6 $opt $@ -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9583 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9583] Document startup option/environment interaction
Buck Golemon buck.gole...@amd.com added the comment: If I understand this code, it means that PYTHONOPTIMIZE set to 1 or 2 works as expected, but set to 0, gives a flag value of 1. static int add_flag(int flag, const char *envs) { int env = atoi(envs); if (flag env) flag = env; if (flag 1) flag = 1; return flag; } Read literally, the man page indicates that any integer value will give a flag value of 2. I agree my shell script is probably unusual, but I believe setting this environment value to zero and expecting the feature to be off (given no contradicting options) is reasonable. I petition to remove the second if statement above (environment value of 0 yields no flag). I'd also love to provide a numeric argument to -O, to dynamically set this value more readily, but that is lower importance. I can implement these and run the unit tests if required. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9583 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9583] Document startup option/environment interaction
Buck Golemon buck.gole...@amd.com added the comment: that number of times isn't exactly accurate either, since 0 is effectively interpreted as 1. This change would only adversely affect people who use no -O option, set PYTHONOPTIMIZE to '0', and need optimization. I feel like that falls into the realm of version differences, but that's your decision. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9583 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9583] Document startup option/environment interaction
Buck Golemon buck.gole...@amd.com added the comment: The file is here: http://svn.python.org/view/python/trunk/Python/pythonrun.c?view=markup The second if statement is doing exactly what I find troubling: set the flag even if the incoming value is 0. I guess this is to handle the empty string case, such as: setenv PYTHONDEBUG ./myscript.py -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9583 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9583] PYTHONOPTIMIZE = 0 is not honored
New submission from Buck Golemon buck.gole...@amd.com: In our environment, we have a wrapper which enables optimization by default (-OO). Most commandline tools which have a mode-changing flag such as this, also have a flag to do the opposite ( see: ls -t -U, wget -nv -v, ). I'd like to implement one or both of: 1) Add a -D option which is the opposite of -O. python -OO -D gives an optimization level of 1. 2) Honor PYTHONOPTIMIZE = 0. At the least, the man page needs to describe how these two methods interact. -- components: Interpreter Core messages: 113717 nosy: bukzor priority: normal severity: normal status: open title: PYTHONOPTIMIZE = 0 is not honored type: behavior versions: Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9583 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: organizing your scripts, with plenty of re-use
On Oct 12, 4:30 pm, Carl Banks pavlovevide...@gmail.com wrote: On Oct 12, 11:24 am, Buck workithar...@gmail.com wrote: On Oct 10, 9:44 am, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: The good thing is that, if the backend package is properly installed somewhere in the Python path ... it still works with no modifications. I'd like to get to zero-installation if possible. It's easy with simple python scripts, why not packages too? I know the technical reasons, but I haven't heard any practical reasons. No it's purely technical. Well mostly technical (there's a minor issue of how a script would figure out its root). No language is perfect, not even Python, and sometimes you just have to deal with things the way they are. Python is the closest I've seen. I'd like to deal with this wart if we can. We're trying to help you with workarounds, but it seems like you just want to vent more than you want an actual solution. Steven had the nicest workaround (with the location = __import__ ('__main__').__file__ trick), but none of them solve the problem of the OP: organization of runnable scripts. So far it's been required to place all runnable scripts directly above any used packages. The workaround that Gabriel has been touting requires this too. Maybe it seems like I'm venting when I shoot down these workarounds, but my real aim is to find some sort of consensus, either that there is a solution, or an unsolved problem. I'd be delighted with a solution, but none have been acceptable so far (as I explained in aggravating detail earlier). If I can find consensus that this is a real problem, not just my personal nit-pick, then I'd be willing to donate my time to design, write and push through a PEP for this purpose. I believe it can be done neatly with just three new standard functions, but it's premature to discuss that. If the reasons are purely technical, it smells like a PEP to me. Good luck with that. I'd wholeheartedly support a good alternative, Thanks. I just want to warn you that it's not a simple issue to fix, it would be involve spectacular and highly backwards-incompatible changes. --Carl Banks I don't believe that's true, but I think that's a separate discussion. -- http://mail.python.org/mailman/listinfo/python-list
Re: organizing your scripts, with plenty of re-use
On Oct 12, 3:34 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: En Mon, 12 Oct 2009 15:24:34 -0300, Buck workithar...@gmail.com escribió: On Oct 10, 9:44 am, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: The good thing is that, if the backend package is properly installed somewhere in the Python path ... it still works with no modifications. I'd like to get to zero-installation if possible. It's easy with simple python scripts, why not packages too? I know the technical reasons, but I haven't heard any practical reasons. If the reasons are purely technical, it smells like a PEP to me. That's what I meant to say. It IS a zero-installation schema, and it also works if you properly install the package. Quoting Steven D'Aprano (changing names slightly): You would benefit greatly from separating the interface from the backend. You should arrange matters so that the users see something like this: project/ +-- animal +-- mammal +-- reptile +-- somepackagename/ +-- __init__.py +-- animals.py +-- mammals/ +-- __init__.py +-- horse.py +-- otter.py +-- reptiles/ +-- __init__.py +-- gator.py +-- newt.py +-- misc/ +-- __init__.py +-- lungs.py +-- swimming.py where the front end is made up of three scripts animal, mammal and reptile, and the entire backend is in a package. [ignore the rest] By example, the `animal` script would contain: from somepackagename import animals animals.main() or perhaps something more elaborate, but in any case, the script imports whatever it needs from the `somepackagename` package. The above script can be run: a) directly from the `project` directory; this could be a checked out copy from svn, or a tar file extracted in /tmp, or whatever. No need to install anything, it just works. b) alternatively, you may install somepackagename into site-packages (or the user site directory, or any other location along the Python path), and copy the scripts into /usr/bin (or any other location along the system PATH), and it still works. The key is to put all the core functionality into a package, and place the package where Python can find it. Also, it's a good idea to use relative imports from inside the package. There is no need to juggle with sys.path nor even set PYTHONPATH nor import __main__ nor play any strange games; it Just Works (tm). -- Gabriel Genellina Hi Gabriel. This is very thoughtful. Thanks. As in the OP, when I have 50 different runnable scripts, it become necessary to arrange them in directories. How would you do that in your scheme? Currently it looks like they're required to live directly above the package containing their code. --Buck -- http://mail.python.org/mailman/listinfo/python-list
Re: organizing your scripts, with plenty of re-use
On Oct 13, 9:37 am, Ethan Furman et...@stoneleaf.us wrote: Buck wrote: I'd like to get to zero-installation if possible. It's easy with simple python scripts, why not packages too? I know the technical reasons, but I haven't heard any practical reasons. I don't think we mean the same thing by zero-installation... seems to me that if you have to copy it, check it out, or anything to get the code from point A to point 'usable on your computer', then you have done some sort of installation. I think most people would agree that installation is whatever you need to do between downloading the software and being able to use it. For GNU packages, it's './configure make make install'. For Python packages, it's usually './setup.py install'. Steven had the nicest workaround (with the location = __import__ ('__main__').__file__ trick), but none of them solve the problem of the OP: organization of runnable scripts. So far it's been required to place all runnable scripts directly above any used packages. The workaround that Gabriel has been touting requires this too. Wha? Place all runnable scripts directly above any used packages? I must have missed something major in this thread. The only thing necessary is to have the package being imported to be somewhere in PYTHONPATH. The only way to get your packages on the PYTHONPATH currently is to: * install the packages to site-packages (I don't have access) * edit the PYTHONPATH all users' environment (again, no access) * create some boilerplate that edits sys.path at runtime (various problems in previous post) * put your scripts directly above the package (this seems best so far, but forces a flat hierarchy of scripts) -- http://mail.python.org/mailman/listinfo/python-list
Re: organizing your scripts, with plenty of re-use
On Oct 10, 9:44 am, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: The good thing is that, if the backend package is properly installed somewhere in the Python path ... it still works with no modifications. I'd like to get to zero-installation if possible. It's easy with simple python scripts, why not packages too? I know the technical reasons, but I haven't heard any practical reasons. If the reasons are purely technical, it smells like a PEP to me. -- http://mail.python.org/mailman/listinfo/python-list
Re: organizing your scripts, with plenty of re-use
On Oct 5, 11:29 am, Robert Kern robert.k...@gmail.com wrote: On 2009-10-05 12:42 PM, Buck wrote: With the package layout, you would just do: from parrot.sleeping import sleeping_in_a_bed from parrot.feeding.eating import eat_cracker This is really much more straightforward than you are making it out to be. As in the OP, I need things to Just Work without installation requirements. The reason for this is that I'm in a large corporate environment servicing many groups with their own custom environments. The more ad hoc hacks you use rather than the standard approaches, the harder it is going to be for you to support those custom environments. I too would prefer a standard approach but there doesn't seem to be an acceptable one. I do believe that you and Stef are exceptions. The vast majority of Python users seem to be able to grasp packages well enough. You're failing to differentiate between python programmer and a system's users. I understand packages well enough, but I need to reduce the users' requirements down to simply running a command. I don't see a way to do that as of now without a large amount of boilerplate code in every script. I've considered installing the thing to the PYTHONPATH as most people suggest, but this has two drawbacks: * Extremely hard to push thru my IT department. Possibly impossible. * Local checkouts of scripts use the main installation, rather than the local, possibly revised package code. This necessitates the boilerplate that installation to the PYTHONPATH was supposed to avoid. * We can work around the previous point by requiring a user-owned dev installation of Python, but this raises the bar to entry past most of my co-developers threshold. They are more comfortable with tcsh and perl... I think the issue here is that the current python-package system works well enough for the core python devs but leaves normal python developers without much options beyond all scripts in one directory or tons of boilerplate everywhere. -- http://mail.python.org/mailman/listinfo/python-list
Re: organizing your scripts, with plenty of re-use
Thanks. I think we're getting closer to the core of this. To restate my problem more simply: My core goal is to have my scripts in some sort of organization better than a single directory, and still have plenty of re-use between them. The only way I can see to implement this is to have 10+ lines of unintelligible hard-coded boilerplate in every runnable script. That doesn't seem reasonable or pythonic. On Oct 5, 12:34 pm, Robert Kern robert.k...@gmail.com wrote: I would like to see an example of such boilerplate. I do not understand why packages would require more than any other organization scheme. This example is from the 2007 post I referenced in my OP. I'm pretty sure he meant 'dirname' rather than 'basename', and even then it doesn't quite work. http://mail.python.org/pipermail/python-3000/2007-April/006814.html import os,sys sys.path.insert(1, os.path.basename(os.path.basename(__file__))) This is from a co-worker trying to address this topic: import os, sys binpath = binpath or os.path.dirname(os.path.realpath(sys.argv[0])) libpath = os.path.join(binpath, 'lib') verinfo = sys.version_info pythonver = 'python%d.%d' % (verinfo[0], verinfo[1]) sys.path.append(os.path.join(libpath, pythonver, 'site-packages')) sys.path.append(libpath) This is my personal code: from sys import path from os.path import abspath, islink, realpath, dirname, normpath, join f = __file__ #continue working even if the script is symlinked and then compiled if f.endswith(.pyc): f = f[:-1] if islink(f): f = realpath(f) here = abspath(dirname(f)) libpath = join(here, .., lib) libpath = normpath(libpath) path.insert(1, libpath) $ export PYTHONPATH=~/LocalToolCheckouts/:$PYTHONPATH This is a simple no-installation way to use the normal Python package mechanism that works well if you don't actually need to build anything. This seems simple to you, but my users are electrical engineers and know just enough UNIX commands to get by. Most are afraid of Python. Half of them will assume the script is borked when they see a ImportError: No module named foo. Another 20% will then read the README and set their environment wrong (setenv PYTHONPATH foo). The rest will get it to work after half an hour but never use it again because it was too complicated. I could fix the error message to tell them exactly what to do, but at that point I might as well re-write the above boilerplate code. I'm overstating my case here for emphasis, but it's essentially true. --Buck -- http://mail.python.org/mailman/listinfo/python-list
Re: Module updating plans for Python 3.1: feedparser, MySQLdb
I use MySQLdb quite a bit in my work. I could volunteer to help update it. Are there any particular bugs we're talking about or just a straight port to 3.0? --Buck On Jul 31, 6:32 pm, John Nagle na...@animats.com wrote: Any progress on updating feedparser and MySQLdb for Python 3.x in the foreseeable future? Feedparser shouldn't be that hard; it's just that nobody is working on it. MySQLdb is known to be hard, and that may be a while. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
[issue2613] inconsistency with bare * in parameter list
Buck Golemon [EMAIL PROTECTED] added the comment: /agree ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2613 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2613] inconsistency with bare * in parameter list
Buck Golemon [EMAIL PROTECTED] added the comment: If there's no difference then they should work the same? I agree there's probably little value in 'fixing' it. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2613 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2950] silly readline module problem
Buck Golemon [EMAIL PROTECTED] added the comment: I'm not sure what your problem is, but comp.lang.python might be a better place to ask. It's not clear that this is a bug yet. http://groups.google.com/group/comp.lang.python/topics -- nosy: +bgolemon __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2950 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Python installation problem
I've been trying to install Mailman, which requires a newer version of the Python language compiler (p-code generator?) than the one I currently have on my linux webserver/gateway box. It's running a ClarkConnect 2.01 package based on Red Hat 7.2 linux. I downloaded the zipped tarball (Python-2.4.4.tgz), ran gunzip, then un-tarred it in /usr/local. Then (logged in as root) from /usr/local/Python-2.4.4 I ran the configure script which appeared to run properly. At least there were no error messages that I saw. Then I attempted to run make install and ended up with an error make *** Error 1. It was right at the libinstall section of the make, so I did some googling and came up with the following command: [EMAIL PROTECTED] Python-2.4.4]# make libinstall inclinstall After thrashing for about 5 minutes, I got basically the same message: Compiling /usr/local/lib/python2.4/zipfile.py ... make: *** [libinstall] Error 1 I dunno if this is relevant, but I have Python 2.2.2 in the /usr/Python-2.2.2 directory. Do I have to blow this away in order to install the newer distro? Or do I need to install the new one in/usr instead of /usr/local? Although I'm a retired programmer (mainframes), I'm still learning this linux stuff. I guess that makes me a noob...I hope you'll take that into consideration. Thanks, Ray -- http://mail.python.org/mailman/listinfo/python-list
Python installation problem (sorry if this is a dup)
I've been trying to install Mailman, which requires a newer version of the Python language compiler (p-code generator?) than the one I currently have on my linux webserver/gateway box. It's running a ClarkConnect 2.01 package based on Red Hat 7.2 linux. I downloaded the zipped tarball (Python-2.4.4.tgz), ran gunzip, then un-tarred it in /usr/local. Then (logged in as root) from /usr/local/Python-2.4.4 I ran the configure script which appeared to run properly. At least there were no error messages that I saw. Then I attempted to run make install and ended up with an error make *** Error 1. It was right at the libinstall section of the make, so I did some googling and came up with the following command: [EMAIL PROTECTED] Python-2.4.4]# make libinstall inclinstall After thrashing for about 5 minutes, I got basically the same message: Compiling /usr/local/lib/python2.4/zipfile.py ... make: *** [libinstall] Error 1 I dunno if this is relevant, but I have Python 2.2.2 in the /usr/Python-2.2.2 directory. Do I have to blow this away in order to install the newer distro? Or do I need to install the new one in/usr instead of /usr/local? Although I'm a retired programmer (mainframes), I'm still learning this linux stuff. I guess that makes me a noob...I hope you'll take that into consideration. Thanks, Ray -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming Language for Systems Administrator
I also tried SAP-DB before. Now known as (or was, last time I checked) MaxDB by MySQL and formerly known as the pre-relational dbms 'Adabas'. I think the only reason for its continued existance is that SAP was hoping for a very low cost, low-end database years ago. However, the database world has changed substantially over the last ten years: you can get postgresql and firebird for nothing, and db2 oracle are often under $1000 for a small server. With that in mind I can't think of a database that's more of a has-been than maxdb. Maybe something from the 70s like IMS-DB or Model 204? buck -- http://mail.python.org/mailman/listinfo/python-list
Re: database in python ?
In truth, although postgres has more features, MySQL is probably better for someone who is just starting to use databases to develop for: the chances are higher that anyone using their code will have MySQL than Postgres, and they aren't going to need the features that Postgresql has that MySQL doesn't. IMO, this has changed since only a year or two ago, when MySQL didn't support foreign-key constraints. mysql does deserve serious consideration now that it supports transactions. However, keep in mind: 1. mysql doesn't support transactions - one of its io layers (innodb) does. If you're hoping to get your application hosted you will find that most mysql installations don't support innodb. And due to the bugs in mysql, when you attempt to create a transaction-safe table in mysql if innodb isn't available it will just silently create it in myisam, and your transactions will be silently ignored. 2. mysql is still missing quite a few database basics - views are the most amazing omission, but the list also includes triggers and stored procedures as well. Although most of these features are included in the new beta, they aren't yet available in production. 3. mysql has an enormous number of non-standard features such as comment formatting, how nulls work, concatenation operator, etc. This means that you'll learn non-standard sql, and most likely write non-portable sql. 4. additionally, mysql has a peculiar set of bugs - in which the database will change your data and report no exception. These bugs were probably a reflection of mysql's marketing message that the database should do nothing but persist data, and data quality was the responsibility of the application. This self-serving message appears to have been dropped now that they are catching up with other products, but there's a legacy of cruft that still remains. Examples of these errors include: silent truncation of strings to fit max varchar length, allows invalid dates, truncation of numeric data to fit max numeric values, etc. 5. cost: mysql isn't expensive, but it isn't free either. Whether or not you get to use it for free depends on how you interpret their licensing info and faq. MySQL's recommendation if you're confused (and many are) is to license the product or call one of their reps. Bottomline - mysql has a lot of marketshare, is improving, and I'm sure that it'll eventually be a credible product. But right now it's has a wide range of inexcusable problems. More info at http://sql-info.de/mysql/gotchas.html buck -- http://mail.python.org/mailman/listinfo/python-list
Re: database in python ?
It's not a bug if you didn't RTFM. Maybe it's not a bug if it's the only DBMS you've ever used and you actually believe that overriding explicit critical declaratives is a valid design choice. But it is a bug if it's still only partially supported in a beta version that nobody is yet hosting. But maybe this release will actually fix ten years of negligence in one fell swoop - and all these issues will be easily eliminated. But just in case that turns out to be difficult, and there's some reason it has taken all this time to achive, just wait and see what this guys finds: http://sql-info.de/mysql/gotchas.html BTW, you should upgrade, they're now on 5.0.3. Their support site appears to be down right now (timeouts) so I can't check the new bug list, but since 5.0.2 is beta, it may have introduced more problems than it solved. buck -- http://mail.python.org/mailman/listinfo/python-list