[issue34870] Core dump when Python VSCode debugger is attached
New submission from Per Lundberg : My code has recently started triggering a core dump in the Python executable when the VSCode debugger is attached. This doesn't happen right away; it seems to happen more or less _after_ the program is done executing (I just placed a breakpoint and stepped it through). The program in question is this: https://github.com/hiboxsystems/trac-to-gitlab/blob/master/migrate.py To help in the debugging of this, I installed python2.7-dbg and gdb-python2 on my Debian machine, and re-ran the script using this version. Here is the GDB output when analyzing the backtrace: $ gdb /usr/bin/python2.7-dbg core GNU gdb (Debian 8.1-4+b1) 8.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/bin/python2.7-dbg...done. [New LWP 19749] [New LWP 19744] [New LWP 19747] [New LWP 19754] [New LWP 19751] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/bin/python2.7-dbg -m ptvsd --host localhost --port 43959 migrate.py --only'. Program terminated with signal SIGSEGV, Segmentation fault. #0 PyEval_EvalFrameEx (f=0x7f815c002310, throwflag=0) at ../Python/ceval.c:3347 3347if (tstate->frame->f_exc_type != NULL) [Current thread is 1 (Thread 0x7f815bfff700 (LWP 19749))] The python backtrace looks like this: (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python2.7/threading.py", line 371, in wait self._acquire_restore(saved_state) File "/usr/lib/python2.7/Queue.py", line 177, in get self.not_empty.wait(remaining) File "/home/per/.vscode/extensions/ms-python.python-2018.8.0/pythonFiles/experimental/ptvsd/ptvsd/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 458, in _on_run cmd = self.cmdQueue.get(1, 0.1) File "/home/per/.vscode/extensions/ms-python.python-2018.8.0/pythonFiles/experimental/ptvsd/ptvsd/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 319, in run self._on_run() File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap self.__bootstrap_inner() And the C-level backtrace: (gdb) bt #0 PyEval_EvalFrameEx (f=Frame 0x7f815c002310, for file /usr/lib/python2.7/threading.py, line 371, in wait (), throwflag=0) at ../Python/ceval.c:3347 #1 0x5624534af42c in PyEval_EvalCodeEx (co=0x7f816216e7d0, globals={'current_thread': None, '_BoundedSemaphore': None, 'currentThread': None, '_Timer': None, '_format_exc': None, 'Semaphore': None, '_deque': None, 'activeCount': None, '_profile_hook': None, '_sleep': None, '_trace_hook': None, 'ThreadError': None, '_enumerate': None, '_start_new_thread': None, 'BoundedSemaphore': None, '_shutdown': None, '__all__': None, '_original_start_new_thread': None, '_Event': None, 'active_count': None, '__package__': None, '_Condition': None, '_RLock': None, '_test': None, 'local': None, '__doc__': None, 'Condition': None, '_Verbose': None, '_DummyThread': None, 'Thread': None, 'warnings': None, '__builtins__': {'bytearray': None, 'IndexError': None, 'all': None, 'help': None, 'vars': None, 'SyntaxError': None, 'unicode': None, 'UnicodeDecodeError': None, 'memoryview': None, 'isinstance': None, 'copyright': None, 'NameError': None, 'BytesWarning': None, 'dict': None, 'input': None, 'oct': None, 'bin': None, 'SystemExit': None, 'StandardError': No ne, 'format': None, 'repr': None, 'sor...(truncated), locals=0x0, args=0x562454463068, argcount=2, kws=0x562454463078, kwcount=0, defs=0x7f8162116408, defcount=1, closure=0x0) at ../Python/ceval.c:3604 #2 0x5624534b23a7 in fast_function (func=, pp_stack=0x7f815bffd3e8, n=2, na=2, nk=0) at ../Python/ceval.c:4467 #3 0x5624534b1f8a in call_function (pp_stack=0x7f815bffd3e8, oparg=1) at ../Python/ceval.c:4392 #4 0x5624534ac45d in PyEval_EvalFrameEx ( f=Frame 0x562454462eb0, for file /usr/lib/python2.7/Queue.py, line 177, in get (self=, maxsize=0, all_tasks_done=<_Condition(_Verbose__verbose=False, _Condition__lock=, acquire=, _Condition__waiters=[], release=) at remote 0x7f81
[issue25144] 3.5 Win install fails with "TARGETDIR"
Per Fryking <fryk...@gmail.com> added the comment: Got the same issue with the 3.6 installer from python.org The thing is that I can't elevate the priviliges to be administrator. So I'm stuck. Uploading the log. Running windows 7 -- nosy: +Per Fryking Added file: https://bugs.python.org/file47278/Python 3.6.3 (32-bit)_20171120135800.log ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue25144> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22544] Inconsistent cmath.log behaviour
Per Brodtkorb added the comment: This is not only a problem for division. It also applies to multiplication as exemplified here: complex(0,inf)+1 # expect 1 + infj Out[16]: (1+infj) (complex(0,inf)+1)*1 # expect 1 + infj Out[17]: (nan+infj) complex(inf, 0) + 1j # expect inf + 1j Out[18]: (inf+1j) (complex(inf, 0)+1j)*1 # expect inf + 1j Out[19]: (inf, nanj) -- nosy: +pbrod versions: +Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22544 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14507] Segfault with starmap and izip combo
New submission from Per Myren progr...@gmail.com: The following code crashes with a segfault on Python 2.7.2: from operator import add from itertools import izip, starmap a = b = [1] for i in xrange(10): a = starmap(add, izip(a, b)) list(a) It also crashes with Python 3.2.2: from operator import add from itertools import starmap a = b = [1] for i in range(10): a = starmap(add, zip(a, b)) list(a) -- components: Library (Lib) messages: 157576 nosy: progrper priority: normal severity: normal status: open title: Segfault with starmap and izip combo type: crash versions: Python 2.7, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14507 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: Ah, I thought that he had reused most of the original C code in _lzmamodule.c not replaced by python code, but I see that not being the case now (only slight fragments;). Oh well, I thought that I'd still earned a note with some slight credit at least, but I guess I won't go postal or anything in lack of either.. :p -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: Not meaning to sound petty, but wouldn't it be common etiquette to retain some original copyright notice from original code intact..? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12013] file /usr/local/lib/python3.1/lib-dynload/_socket.so: symbol inet_aton: referenced symbol not found
Per Rosengren per.roseng...@gmail.com added the comment: On Linux: nm -C /lib/libc.so.6 |grep ' inet_aton' 000cbce0 W inet_aton This means that when Python is build with GCC (like on linux), inet_aton is in system libc. If you build with GCC in solaris, inet_aton will be taken from the GCC lib dir. You need to put that GCC lib dir in your LD_LIBRARY_PATH when you run Python. -- nosy: +Per.Rosengren ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12013 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12394] packaging: generate scripts from callable (dotted paths)
Changes by Per Cederqvist ce...@lysator.liu.se: -- nosy: +ceder ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12394 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11817] berkeley db 5.1 support
New submission from Per Øyvind Karlsen peroyv...@mandriva.org: This patch adds support for berkeley db = 5.1. -- components: Extension Modules files: Python-2.7.1-berkeley-db-5.1.patch keywords: patch messages: 133442 nosy: proyvind priority: normal severity: normal status: open title: berkeley db 5.1 support versions: Python 2.7 Added file: http://bugs.python.org/file21601/Python-2.7.1-berkeley-db-5.1.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11817 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11817] berkeley db 5.1 support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: forgot some additional config checks in setup.py in previous patch.. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11817 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11817] berkeley db 5.1 support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: sloppysloppy... fix previous patch -- Added file: http://bugs.python.org/file21602/Python-2.7.1-berkeley-db-5.1.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11817 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: I've uploaded a new version of the patch to http://codereview.appspot.com/2724043/ now. I'd be okay on doing maintenance directly against the CPython repository btw. :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: LZMAFile, LZMACompressor LZMADecompressor are all inspired by and written to be as similar to bz2's for easier use maintenance. I must admit that I haven't really put much thought into alternate ways to implement them beyond monkey see, monkey do.. ;) LZMAOptions is a bit awkwardly written, but it doesn't serve documentation purposes only, it also exposes these values for max, min etc. to python (ie. as used by it's regression tests) and are also used when processing various compression options passed. IMO it does serve a useful purpose, but certainly wouldn't hurt from being rewritten in some better way... -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: Hehe, don't feel guily on my part at least, I had already implemented it like this long before. :p I guess I could rewrite it following these suggestions, but I probably won't be able to finish it in time for 3.2 beta. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: All fixed now. :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: Here's a patch with the latest code generated against py3k branch, it comes with Doc/library/lzma.rst as well now. -- keywords: +patch Added file: http://bugs.python.org/file19405/py3k-lzmamodule.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: here's Lib/test/teststring.lzma, required by the test suite. -- Added file: http://bugs.python.org/file19406/teststring.lzma ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: here's Lib/test/teststring.xz, required by the test suite. -- Added file: http://bugs.python.org/file19407/teststring.xz ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: I've (finally) finalized the api and prepared pyliblzma to be ready for inclusion now. The code can be found in the 'py3k' branch referred to earlier. Someone else (don't remember who:p) volunteered for writing the PEP earlier, so I leave it up to that person to write the PEP, I won't be able to get around to do so myself in the near future.. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: I've ported pyliblzma to py3k now and also implemented the missing functionality I mentioned earlier, for anyone interested in my progress the branch is found at: https://code.launchpad.net/~proyvind/pyliblzma/py3k I need to fix some memory leakages (side effect of the new PyUnicode/Pybytes change I'm not 100% with yet;) and some various memory errors reported by valgrind etc. though, but things are starting to look quite nice already. :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4015] [patch] make installed scripts executable on windows
Per pybugs.pho...@safersignup.com added the comment: On POSIX the interpreter will be read from the first line of a file. On Windows the interpreter will be read from the Registry HKEY_CLASSES_ROOT\.file-extension . So the correct way to associate a interpreter to a file is to invent a file-extension for every interpreter. Like /usr/bin/python /usr/bin/python3 and /usr/bin/python3.1 on POSIX, there should be .py .py3 and .py31 on Windows! I attached a example-registry-patch to register extensions for 2.5, 2.6 and 3.1 . If you want to use it, you need to adjust the paths! I propose to change all Python-Windows-installer to install versioned extensions. If you want a switcher application, it should read the first line of the script and match it against .*/python(.*)$. So the default POSIX #!/usr/bin/python3.1 can be kept unchanged. With that rexex the app-path can be read from HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\regex-match\InstallPath\. BTW. It would be nice if Python would call itself Python 3.1 instead of python in the Open with...-list! The current naming is problematic if you install more than one Python version. -- nosy: +phobie Added file: http://bugs.python.org/file17481/hklm_python_extensions.reg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4015 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] please support lzma compression as an extension and in the tarfile module
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: if you're already looking at issue6715, then I don't get why you're asking.. ;) quoting from msg106433: For my code, feel free to use your own/any other license you'd like or even public domain (if the license of bz2module.c that much of it's derived from permits of course)! The reason why I picked LGPLv3 in the past was simply just because liblzma at the time was licensed under it, so I just picked the same for simplicity. I've actually already dual-licensed it under the python license in addition on the project page though, but I just forgot updating the module's metadata as well before I released 0.5.3 last month.. Martin: For LGPL (or even GPL for that matter, disregarding linking restrictions) libraries you don't have to distribute the sources of those libraries at all (they're already made available by others, so that would be quite overly redundant, uh?;). LGPL actually doesn't even care at all about the license of your software as long as you only dynamically link against it. I don't really get what the issue would be even if liblzma were still LGPL, it doesn't prohibit you from distributing a dynamically linked library along with python either if necessary (which of course would be of convenience on win32..).. tsktsk, discussions about python module for xz compression should anyways be kept at issue6715 as this one is about the tarfile module ;p -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5689 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: Yeah, I guess I anyways can just break the current API right away to make it compatible with future changes, I've already figured since long ago how it should look like. It's not like I have to implement the actual functionality to ensure compatibility, no-op works like charm. ;) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] please support lzma compression as an extension and in the tarfile module
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: I'm the author of the pyliblzma module, and if desired, I'd be happy to help out adapting pyliblzma for inclusion with python. Most of it's code is based on bz2module.c, so it shouldn't be very far away from being good 'nuff. What I see as required is: * clean out use of C99 types etc. * clean up the LZMAOptions class (this is the biggest difference from the bz2 module, as the filter supports a wide range of various options, everything related such as parsing, api documentation etc. was placed in it's own class, I've yet to receive any feedback on this decission or find any remote equivalents out there to draw inspiration from;) * While most of the liblzma API has been implemented, support for multiple/alternate filters still remains to be implemented. When done it will also cause some breakage with the current pyliblzma API. I plan on doing these things sooner or later anyways, it's pretty much just a matter of motivation and priorities standing in the way, actual interest from others would certainly have a positive effect on this. ;) For other alternatives to the LGPL liblzma, you really don't have any, keep in mind that LZMA is merely the algorithm, while xz (and LZMA_alone, used for '.lzma', now obsolete, but still supported) are the actual format you want support for. The LZMA SDK does not provide any compatibility for this. -- nosy: +proyvind ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5689 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] please support lzma compression as an extension and in the tarfile module
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: ps: pylzma uses the LZMA SDK, which is not what you want. pyliblzma (not the same module;) OTOH uses liblzma, which is the library used by xz/lzma utils You'll find it available at http://launchpad.net/pyliblzma -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5689 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: Ooops, I kinda should've commented on this issue here in stead, rather than in issue5689, so I'll just copy-paste it here as well: I'm the author of the pyliblzma module, and if desired, I'd be happy to help out adapting pyliblzma for inclusion with python. Most of it's code is based on bz2module.c, so it shouldn't be very far away from being good 'nuff. What I see as required is: * clean out use of C99 types etc. * clean up the LZMAOptions class (this is the biggest difference from the bz2 module, as the filter supports a wide range of various options, everything related such as parsing, api documentation etc. was placed in it's own class, I've yet to receive any feedback on this decission or find any remote equivalents out there to draw inspiration from;) * While most of the liblzma API has been implemented, support for multiple/alternate filters still remains to be implemented. When done it will also cause some breakage with the current pyliblzma API. I plan on doing these things sooner or later anyways, it's pretty much just a matter of motivation and priorities standing in the way, actual interest from others would certainly have a positive effect on this. ;) For other alternatives to the LGPL liblzma, you really don't have any, keep in mind that LZMA is merely the algorithm, while xz (and LZMA_alone, used for '.lzma', now obsolete, but still supported) are the actual format you want support for. The LZMA SDK does not provide any compatibility for this. -- nosy: +proyvind ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: ah, you're right, I forgot that the license for the library had changed as well (motivated by attempt of pleasing BSD people IIRC;), in the past the library was LGPL while only the 'xz' util was public domain.. For my code, feel free to use your own/any other license you'd like or even public domain (if the license of bz2module.c that much of it's derived from permits of course)! I guess everyone should be happy now then. :) Btw. for review, I think the code already available should be pretty much good 'nuff for an initial review. Some feedback on things not derived from bz2module.c would be nice, especially the LZMAOptions class would be nice as it's where most of the remaining work required for adding additional filters support. Would kinda blow if I did the work using an approach that would be dismissed as utterly rubbish. ;) Oh well, it's out there available for anyone already, I probably won't(/shouldn't;) have time for it in a month at least, do as you please meanwhile. :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6715 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: creating pipelines in python
Thanks to all for your replies. i want to clarify what i mean by a pipeline. a major feature i am looking for is the ability to chain functions or scripts together, where the output of one script -- which is usually a file -- is required for another script to run. so one script has to wait for the other. i would like to do this over a cluster, where some of the scripts are distributed as separate jobs on a cluster but the results are then collected together. so the ideal library would have easily facilities for expressing this things: script X and Y run independently, but script Z depends on the output of X and Y (which is such and such file or file flag). is there a way to do this? i prefer not to use a framework that requires control of the clusters etc. like Disco, but something that's light weight and simple. right now ruffus seems most relevant but i am not sure -- are there other candidates? thank you. On Nov 23, 4:02 am, Paul Rudin paul.nos...@rudin.co.uk wrote: per perfr...@gmail.com writes: hi all, i am looking for a python package to make it easier to create a pipeline of scripts (all in python). what i do right now is have a set of scripts that produce certain files as output, and i simply have a master script that checks at each stage whether the output of the previous script exists, using functions from the os module. this has several flaws and i am sure someone has thought of nice abstractions for making these kind of wrappers easier to write. does anyone have any recommendations for python packages that can do this? Not entirely what you're looking for, but the subprocess module is easier to work with for this sort of thing than os. See e.g. http://docs.python.org/library/subprocess.html#replacing-shell-pipeline -- http://mail.python.org/mailman/listinfo/python-list
creating pipelines in python
hi all, i am looking for a python package to make it easier to create a pipeline of scripts (all in python). what i do right now is have a set of scripts that produce certain files as output, and i simply have a master script that checks at each stage whether the output of the previous script exists, using functions from the os module. this has several flaws and i am sure someone has thought of nice abstractions for making these kind of wrappers easier to write. does anyone have any recommendations for python packages that can do this? thanks. -- http://mail.python.org/mailman/listinfo/python-list
efficiently splitting up strings based on substrings
I'm trying to efficiently split strings based on what substrings they are made up of. i have a set of strings that are comprised of known substrings. For example, a, b, and c are substrings that are not identical to each other, e.g.: a = 0 * 5 b = 1 * 5 c = 2 * 5 Then my_string might be: my_string = a + b + c i am looking for an efficient way to solve the following problem. suppose i have a short string x that is a substring of my_string. I want to split the string x into blocks based on what substrings (i.e. a, b, or c) chunks of s fall into. to illustrate this, suppose x = 00111. Then I can detect where x starts in my_string using my_string.find(x). But I don't know how to partition x into blocks depending on the substrings. What I want to get out in this case is: 00, 111. If x were 00122, I'd want to get out 00,1, 22. is there an easy way to do this? i can't simply split x on a, b, or c because these might not be contained in x. I want to avoid doing something inefficient like looking at all substrings of my_string etc. i wouldn't mind using regular expressions for this but i cannot think of an easy regular expression for this problem. I looked at the string module in the library but did not see anything that seemd related but i might have missed it. any help on this would be greatly appreciated. thanks. -- http://mail.python.org/mailman/listinfo/python-list
Re: efficiently splitting up strings based on substrings
On Sep 5, 6:42 pm, Rhodri James rho...@wildebst.demon.co.uk wrote: On Sat, 05 Sep 2009 22:54:41 +0100, per perfr...@gmail.com wrote: I'm trying to efficiently split strings based on what substrings they are made up of. i have a set of strings that are comprised of known substrings. For example, a, b, and c are substrings that are not identical to each other, e.g.: a = 0 * 5 b = 1 * 5 c = 2 * 5 Then my_string might be: my_string = a + b + c i am looking for an efficient way to solve the following problem. suppose i have a short string x that is a substring of my_string. I want to split the string x into blocks based on what substrings (i.e. a, b, or c) chunks of s fall into. to illustrate this, suppose x = 00111. Then I can detect where x starts in my_string using my_string.find(x). But I don't know how to partition x into blocks depending on the substrings. What I want to get out in this case is: 00, 111. If x were 00122, I'd want to get out 00,1, 22. is there an easy way to do this? i can't simply split x on a, b, or c because these might not be contained in x. I want to avoid doing something inefficient like looking at all substrings of my_string etc. i wouldn't mind using regular expressions for this but i cannot think of an easy regular expression for this problem. I looked at the string module in the library but did not see anything that seemd related but i might have missed it. I'm not sure I understand your question exactly. You seem to imply that the order of the substrings of x is consistent. If that's the case, this ought to help: import re x = 00122 m = re.match(r(0*)(1*)(2*), x) m.groups() ('00', '1', '22') y = 00111 m = re.match(r(0*)(1*)(2*), y) m.groups() ('00', '111', '') You'll have to filter out the empty groups for yourself, but that's no great problem. -- Rhodri James *-* Wildebeest Herder to the Masses The order of the substrings is consistent but what if it's not 0, 1, 2 but a more complicated string? e.g. a = 1030405, b = 1babcf, c = fUUIUP then the substring x might be 4051ba, in which case using a regexp with (1*) will not work since both a and b substrings begin with the character 1. your solution works if that weren't a possibility, so what you wrote is definitely the kind of solution i am looking for. i am just not sure how to solve it in the general case where the substrings might be similar to each other (but not similar enough that you can't tell where the substring came from). -- http://mail.python.org/mailman/listinfo/python-list
Re: efficiently splitting up strings based on substrings
On Sep 5, 7:07 pm, Rhodri James rho...@wildebst.demon.co.uk wrote: On Sat, 05 Sep 2009 23:54:08 +0100, per perfr...@gmail.com wrote: On Sep 5, 6:42 pm, Rhodri James rho...@wildebst.demon.co.uk wrote: On Sat, 05 Sep 2009 22:54:41 +0100, per perfr...@gmail.com wrote: I'm trying to efficiently split strings based on what substrings they are made up of. i have a set of strings that are comprised of known substrings. For example, a, b, and c are substrings that are not identical to each other, e.g.: a = 0 * 5 b = 1 * 5 c = 2 * 5 Then my_string might be: my_string = a + b + c i am looking for an efficient way to solve the following problem. suppose i have a short string x that is a substring of my_string. I want to split the string x into blocks based on what substrings (i.e. a, b, or c) chunks of s fall into. to illustrate this, suppose x = 00111. Then I can detect where x starts in my_string using my_string.find(x). But I don't know how to partition x into blocks depending on the substrings. What I want to get out in this case is: 00, 111. If x were 00122, I'd want to get out 00,1, 22. is there an easy way to do this? i can't simply split x on a, b, or c because these might not be contained in x. I want to avoid doing something inefficient like looking at all substrings of my_string etc. i wouldn't mind using regular expressions for this but i cannot think of an easy regular expression for this problem. I looked at the string module in the library but did not see anything that seemd related but i might have missed it. I'm not sure I understand your question exactly. You seem to imply that the order of the substrings of x is consistent. If that's the case, this ought to help: import re x = 00122 m = re.match(r(0*)(1*)(2*), x) m.groups() ('00', '1', '22') y = 00111 m = re.match(r(0*)(1*)(2*), y) m.groups() ('00', '111', '') You'll have to filter out the empty groups for yourself, but that's no great problem. The order of the substrings is consistent but what if it's not 0, 1, 2 but a more complicated string? e.g. a = 1030405, b = 1babcf, c = fUUIUP then the substring x might be 4051ba, in which case using a regexp with (1*) will not work since both a and b substrings begin with the character 1. Right. This looks approximately nothing like what I thought your problem was. Would I be right in thinking that you want to match substrings of your potential substrings against the string x? I'm sufficiently confused that I think I'd like to see what your use case actually is before I make more of a fool of myself. -- Rhodri James *-* Wildebeest Herder to the Masses it's exactly the same problem, except there are no constraints on the strings. so the problem is, like you say, matching the substrings against the string x. in other words, finding out where x aligns to the ordered substrings abc, and then determine what chunk of x belongs to a, what chunk belongs to b, and what chunk belongs to c. so in the example i gave above, the substrings are: a = 1030405, b = 1babcf, c = fUUIUP, so abc = 10304051babcffUUIUP given a substring like 4051ba, i'd want to split it into the chunks a, b, and c. in this case, i'd want the result to be: [405, 1ba] -- i.e. 405 is the chunk of x that belongs to a, and 1ba the chunk that belongs to be. in this case, there are no chunks of c. if x instead were 4051babcffUU, the right output is: [405, 1babcf, fUU], which are the corresponding chunks of a, b, and c that make up x respectively. i'm not sure how to approach this. any ideas/tips would be greatly appreciated. thanks again. -- http://mail.python.org/mailman/listinfo/python-list
allowing output of code that is unittested?
hi all, i am using the standard unittest module to unit test my code. my code contains several print statements which i noticed are repressed when i call my unit tests using: if __name__ == '__main__': suite = unittest.TestLoader().loadTestsFromTestCase(TestMyCode) unittest.TextTestRunner(verbosity=2).run(suite) is there a way to allow all the print statements in the code that is being run by the unit test functions to be printed to stdio? i want to be able to see the output of the tested code, in addition to the output of the unit testing framework. thank you. -- http://mail.python.org/mailman/listinfo/python-list
fastest native python database?
hi all, i'm looking for a native python package to run a very simple data base. i was originally using cpickle with dictionaries for my problem, but i was making dictionaries out of very large text files (around 1000MB in size) and pickling was simply too slow. i am not looking for fancy SQL operations, just very simple data base operations (doesn't have to be SQL style) and my preference is for a module that just needs python and doesn't require me to run a separate data base like Sybase or MySQL. does anyone have any recommendations? the only candidates i've seen are snaklesql and buzhug... any thoughts/benchmarks on these? any info on this would be greatly appreciated. thank you -- http://mail.python.org/mailman/listinfo/python-list
Re: fastest native python database?
i would like to add to my previous post that if an option like SQLite with a python interface (pysqlite) would be orders of magnitude faster than naive python options, i'd prefer that. but if that's not the case, a pure python solution without dependencies on other things would be the best option. thanks for the suggestion, will look into gadfly in the meantime. On Jun 17, 11:38 pm, Emile van Sebille em...@fenx.com wrote: On 6/17/2009 8:28 PM per said... hi all, i'm looking for a native python package to run a very simple data base. i was originally using cpickle with dictionaries for my problem, but i was making dictionaries out of very large text files (around 1000MB in size) and pickling was simply too slow. i am not looking for fancy SQL operations, just very simple data base operations (doesn't have to be SQL style) and my preference is for a module that just needs python and doesn't require me to run a separate data base like Sybase or MySQL. You might like gadfly... http://gadfly.sourceforge.net/gadfly.html Emile does anyone have any recommendations? the only candidates i've seen are snaklesql and buzhug... any thoughts/benchmarks on these? any info on this would be greatly appreciated. thank you -- http://mail.python.org/mailman/listinfo/python-list
generating random tuples in python
hi all, i am generating a list of random tuples of numbers between 0 and 1 using the rand() function, as follows: for i in range(0, n): rand_tuple = (rand(), rand(), rand()) mylist.append(rand_tuple) when i generate this list, some of the random tuples might be very close to each other, numerically. for example, i might get: (0.553, 0.542, 0.654) and (0.581, 0.491, 0.634) so the two tuples are close to each other in that all of their numbers have similar magnitudes. how can i maximize the amount of numeric distance between the elements of this list, but still make sure that all the tuples have numbers strictly between 0 and 1 (inclusive)? in other words i want the list of random numbers to be arbitrarily different (which is why i am using rand()) but as different from other tuples in the list as possible. thank you for your help -- http://mail.python.org/mailman/listinfo/python-list
Re: generating random tuples in python
On Apr 20, 11:08 pm, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: On Mon, 20 Apr 2009 11:39:35 -0700, per wrote: hi all, i am generating a list of random tuples of numbers between 0 and 1 using the rand() function, as follows: for i in range(0, n): rand_tuple = (rand(), rand(), rand()) mylist.append(rand_tuple) when i generate this list, some of the random tuples might be very close to each other, numerically. for example, i might get: [...] how can i maximize the amount of numeric distance between the elements of this list, but still make sure that all the tuples have numbers strictly between 0 and 1 (inclusive)? Well, the only way to *maximise* the distance between the elements is to set them to (0.0, 0.5, 1.0). in other words i want the list of random numbers to be arbitrarily different (which is why i am using rand()) but as different from other tuples in the list as possible. That means that the numbers you are generating will no longer be uniformly distributed, they will be biased. That's okay, but you need to describe *how* you want them biased. What precisely do you mean by maximizing the distance? For example, here's one strategy: you need three random numbers, so divide the complete range 0-1 into three: generate three random numbers between 0 and 1/3.0, called x, y, z, and return [x, 1/3.0 + y, 2/3.0 + z]. You might even decide to shuffle the list before returning them. But note that you might still happen to get (say) [0.332, 0.334, 0.668] or similar. That's the thing with randomness. -- Steven i realize my example in the original post was misleading. i dont want to maximize the difference between individual members of a single tuple -- i want to maximize the difference between distinct tuples. in other words, it's ok to have (.332, .334, .38), as long as the other tuple is, say, (.52, .6, .9) which is very difference from (.332, . 334, .38). i want the member of a given tuple to be arbitrary, e.g. something like (rand(), rand(), rand()) but that the tuples be very different from each other. to be more formal by very different, i would be happy if they were maximally distant in ordinary euclidean space... so if you just plot the 3-tuples on x, y, z i want them to all be very different from each other. i realize this is obviously biased and that the tuples are not uniformly distributed -- that's exactly what i want... any ideas on how to go about this? thank you. -- http://mail.python.org/mailman/listinfo/python-list
loading program's global variables in ipython
hi all, i have a file that declares some global variables, e.g. myglobal1 = 'string' myglobal2 = 5 and then some functions. i run it using ipython as follows: [1] %run myfile.py i notice then that myglobal1 and myglobal2 are not imported into python's interactive namespace. i'd like them too -- how can i do this? (note my file does not contain a __name__ == '__main__' clause.) thanks. -- http://mail.python.org/mailman/listinfo/python-list
splitting a large dictionary into smaller ones
hi all, i have a very large dictionary object that is built from a text file that is about 800 MB -- it contains several million keys. ideally i would like to pickle this object so that i wouldnt have to parse this large file to compute the dictionary every time i run my program. however currently the pickled file is over 300 MB and takes a very long time to write to disk - even longer than recomputing the dictionary from scratch. i would like to split the dictionary into smaller ones, containing only hundreds of thousands of keys, and then try to pickle them. is there a way to easily do this? i.e. is there an easy way to make a wrapper for this such that i can access this dictionary as just one object, but underneath it's split into several? so that i can write my_dict[k] and get a value, or set my_dict[m] to some value without knowing which sub dictionary it's in. if there aren't known ways to do this, i would greatly apprciate any advice/examples on how to write this data structure from scratch, reusing as much of the dict() class as possible. thanks. large_dict[a] -- http://mail.python.org/mailman/listinfo/python-list
Re: splitting a large dictionary into smaller ones
On Mar 22, 10:51 pm, Paul Rubin http://phr...@nospam.invalid wrote: per perfr...@gmail.com writes: i would like to split the dictionary into smaller ones, containing only hundreds of thousands of keys, and then try to pickle them. That already sounds like the wrong approach. You want a database. fair enough - what native python database would you recommend? i prefer not to install anything commercial or anything other than python modules -- http://mail.python.org/mailman/listinfo/python-list
parsing tab separated data efficiently into numpy/pylab arrays
hi all, what's the most efficient / preferred python way of parsing tab separated data into arrays? for example if i have a file containing two columns one corresponding to names the other numbers: col1\t col 2 joe\t 12.3 jane \t 155.0 i'd like to parse into an array() such that i can do: mydata[:, 0] and mydata[:, 1] to easily access all the columns. right now i can iterate through the file, parse it manually using the split('\t') command and construct a list out of it, then convert it to arrays. but there must be a better way? also, my first column is just a name, and so it is variable in length -- is there still a way to store it as an array so i can access: mydata [:, 0] to get all the names (as a list)? thank you. -- http://mail.python.org/mailman/listinfo/python-list
[issue5411] add xz compression support to distutils
Per Øyvind Karlsen peroyv...@mandriva.org added the comment: hmm, I'm unsure about how this should be done.. I guess such a test would belong in Lib/distutils/test_dist.py, but I'm uncertain about how it should be done, ie. should it be a test for doing 'bdist', 'bdist_rpm' and 'sdist' for each of the formats supported? I cannot seem to find any tests for the currently supported formats and such tests would introduce dependencies on the tools used to compress with these formats.. -- message_count: 2.0 - 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5411 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
speeding up reading files (possibly with cython)
hi all, i have a program that essentially loops through a textfile file thats about 800 MB in size containing tab separated data... my program parses this file and stores its fields in a dictionary of lists. for line in file: split_values = line.strip().split('\t') # do stuff with split_values currently, this is very slow in python, even if all i do is break up each line using split() and store its values in a dictionary, indexing by one of the tab separated values in the file. is this just an overhead of python that's inevitable? do you guys think that switching to cython might speed this up, perhaps by optimizing the main for loop? or is this not a viable option? thank you. -- http://mail.python.org/mailman/listinfo/python-list
[issue5411] add xz compression support to distutils
New submission from Per Øyvind Karlsen peroyv...@mandriva.org: Here's a patch that adds support for xz compression: http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/python/current/SOURCES/Python-2.6.1-distutils-xz-support.patch?view=log -- assignee: tarek components: Distutils messages: 83072 nosy: proyvind, tarek severity: normal status: open title: add xz compression support to distutils type: feature request versions: Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5411 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
setting PYTHONPATH to override system wide site-packages
hi all, i recently installed a new version of a package using python setup.py install --prefix=/my/homedir on a system where i don't have root access. the old package still resides in /usr/lib/python2.5/site- packages/ and i cannot erase it. i set my python path as follows in ~/.cshrc setenv PYTHONPATH /path/to/newpackage but whenever i go to python and import the module, the version in site- packages is loaded. how can i override this setting and make it so python loads the version of the package that's in my home dir? thanks. -- http://mail.python.org/mailman/listinfo/python-list
Re: setting PYTHONPATH to override system wide site-packages
On Feb 28, 11:24 pm, Carl Banks pavlovevide...@gmail.com wrote: On Feb 28, 7:30 pm, per perfr...@gmail.com wrote: hi all, i recently installed a new version of a package using python setup.py install --prefix=/my/homedir on a system where i don't have root access. the old package still resides in /usr/lib/python2.5/site- packages/ and i cannot erase it. i set my python path as follows in ~/.cshrc setenv PYTHONPATH /path/to/newpackage but whenever i go to python and import the module, the version in site- packages is loaded. how can i override this setting and make it so python loads the version of the package that's in my home dir? What happens when you run the command print sys.path from the Python prompt? /path/to/newpackage should be the second item, and shoud be listed in front of the site-packages dir. What happens when you run print os.eviron['PYTHONPATH'] at the Python interpreter? It's possible that the sysadmin installed a script that removes PYTHONPATH environment variable before invoking Python. What happens when you type which python at the csh prompt? What happens when you type ls /path/to/newpackage at your csh prompt? Is the module you're trying to import there? You approach should work. These are just suggestions on how to diagnose the problem; we can't really help you figure out what's wrong without more information. Carl Banks hi, i am setting it programmatically now, using: import sys sys.path = [] sys.path now looks exactly like what it looked like before, except the second element is my directory. yet when i do import mymodule print mymodule.__version__ i still get the old version... any other ideas? -- http://mail.python.org/mailman/listinfo/python-list
Re: setting PYTHONPATH to override system wide site-packages
On Feb 28, 11:53 pm, per perfr...@gmail.com wrote: On Feb 28, 11:24 pm, Carl Banks pavlovevide...@gmail.com wrote: On Feb 28, 7:30 pm, per perfr...@gmail.com wrote: hi all, i recently installed a new version of a package using python setup.py install --prefix=/my/homedir on a system where i don't have root access. the old package still resides in /usr/lib/python2.5/site- packages/ and i cannot erase it. i set my python path as follows in ~/.cshrc setenv PYTHONPATH /path/to/newpackage but whenever i go to python and import the module, the version in site- packages is loaded. how can i override this setting and make it so python loads the version of the package that's in my home dir? What happens when you run the command print sys.path from the Python prompt? /path/to/newpackage should be the second item, and shoud be listed in front of the site-packages dir. What happens when you run print os.eviron['PYTHONPATH'] at the Python interpreter? It's possible that the sysadmin installed a script that removes PYTHONPATH environment variable before invoking Python. What happens when you type which python at the csh prompt? What happens when you type ls /path/to/newpackage at your csh prompt? Is the module you're trying to import there? You approach should work. These are just suggestions on how to diagnose the problem; we can't really help you figure out what's wrong without more information. Carl Banks hi, i am setting it programmatically now, using: import sys sys.path = [] sys.path now looks exactly like what it looked like before, except the second element is my directory. yet when i do import mymodule print mymodule.__version__ i still get the old version... any other ideas? in case it helps, it gives me this warning when i try to import the module /usr/lib64/python2.5/site-packages/pytz/__init__.py:29: UserWarning: Module dateutil was already imported from /usr/lib64/python2.5/site- packages/dateutil/__init__.pyc, but /usr/lib/python2.5/site-packages is being added to sys.path from pkg_resources import resource_stream -- http://mail.python.org/mailman/listinfo/python-list
optimizing large dictionaries
hello i have an optimization questions about python. i am iterating through a file and counting the number of repeated elements. the file has on the order of tens of millions elements... i create a dictionary that maps elements of the file that i want to count to their number of occurs. so i iterate through the file and for each line extract the elements (simple text operation) and see if it has an entry in the dict: for line in file: try: elt = MyClass(line)# extract elt from line... my_dict[elt] += 1 except KeyError: my_dict[elt] = 1 i am using try/except since it is supposedly faster (though i am not sure about this? is this really true in Python 2.5?). the only 'twist' is that my elt is an instance of a class (MyClass) with 3 fields, all numeric. the class is hashable, and so my_dict[elt] works well. the __repr__ and __hash__ methods of my class simply return str() representation of self, while __str__ just makes everything numeric field into a concatenated string: class MyClass def __str__(self): return %s-%s-%s %(self.field1, self.field2, self.field3) def __repr__(self): return str(self) def __hash__(self): return hash(str(self)) is there anything that can be done to speed up this simply code? right now it is taking well over 15 minutes to process, on a 3 Ghz machine with lots of RAM (though this is all taking CPU power, not RAM at this point.) any general advice on how to optimize large dicts would be great too thanks for your help. -- http://mail.python.org/mailman/listinfo/python-list
Re: optimizing large dictionaries
thanks to everyone for the excellent suggestions. a few follow up q's: 1] is Try-Except really slower? my dict actually has two layers, so my_dict[aKey][bKeys]. the aKeys are very small (less than 100) where as the bKeys are the ones that are in the millions. so in that case, doing a Try-Except on aKey should be very efficient, since often it will not fail, where as if I do: if aKey in my_dict, that statement will get executed for each aKey. can someone definitely say whether Try-Except is faster or not? My benchmarks aren't conclusive and i hear it both ways from several people (though majority thinks TryExcept is faster). 2] is there an easy way to have nested defaultdicts? ie i want to say that my_dict = defaultdict(defaultdict(int)) -- to reflect the fact that my_dict is a dictionary, whose values are dictionary that map to ints. but that syntax is not valid. 3] more importantly, is there likely to be a big improvement for splitting up one big dictionary into several smaller ones? if so, is there a straight forward elegant way to implement this? the way i am thinking is to just fix a number of dicts and populate them with elements. then during retrieval, try the first dict, if that fails, try the second, if not the third, etc... but i can imagine how that's more likely to lead to bugs / debugging give the way my code is setup so i am wondering whether it is really worth it. if it can lead to a factor of 2 difference, i will definitely implement it -- does anyone have experience with this? On Jan 15, 5:58 pm, Steven D'Aprano st...@remove-this- cybersource.com.au wrote: On Thu, 15 Jan 2009 23:22:48 +0100, Christian Heimes wrote: is there anything that can be done to speed up this simply code? right now it is taking well over 15 minutes to process, on a 3 Ghz machine with lots of RAM (though this is all taking CPU power, not RAM at this point.) class MyClass(object): # a new style class with slots saves some memory __slots__ = (field1, field2, field2) I was curious whether using slots would speed up attribute access. class Parrot(object): ... def __init__(self, a, b, c): ... self.a = a ... self.b = b ... self.c = c ... class SlottedParrot(object): ... __slots__ = 'a', 'b', 'c' ... def __init__(self, a, b, c): ... self.a = a ... self.b = b ... self.c = c ... p = Parrot(23, something, [1, 2, 3]) sp = SlottedParrot(23, something, [1, 2, 3]) from timeit import Timer setup = from __main__ import p, sp t1 = Timer('p.a, p.b, p.c', setup) t2 = Timer('sp.a, sp.b, sp.c', setup) min(t1.repeat()) 0.83308887481689453 min(t2.repeat()) 0.62758088111877441 That's not a bad improvement. I knew that __slots__ was designed to reduce memory consumption, but I didn't realise they were faster as well. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
efficient interval containment lookup
hello, suppose I have two lists of intervals, one significantly larger than the other. For example listA = [(10, 30), (5, 25), (100, 200), ...] might contain thousands of elements while listB (of the same form) might contain hundreds of thousands or millions of elements. I want to count how many intervals in listB are contained within every listA. For example, if listA = [(10, 30), (600, 800)] and listB = [(20, 25), (12, 18)] is the input, then the output should be that (10, 30) has 2 intervals from listB contained within it, while (600, 800) has 0. (Elements of listB can be contained within many intervals in listA, not just one.) What is an efficient way to this? One simple way is: for a_range in listA: for b_range in listB: is_within(b_range, a_range): # accumulate a counter here where is_within simply checks if the first argument is within the second. I'm not sure if it's more efficient to have the iteration over listA be on the outside or listB. But perhaps there's a way to index this that makes things more efficient? I.e. a smart way of indexing listA such that I can instantly get all of its elements that are within some element of listB, maybe? Something like a hash, where this look up can be close to constant time rather than an iteration over all lists... if there's any built-in library functions that can help in this it would be great. any suggestions on this would be awesome. thank you. -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient interval containment lookup
thanks for your replies -- a few clarifications and questions. the is_within operation is containment, i.e. (a,b) is within (c,d) iff a = c and b = d. Note that I am not looking for intervals that overlap... this is why interval trees seem to me to not be relevant, as the overlapping interval problem is way harder than what I am trying to do. Please correct me if I'm wrong on this... Scott Daniels, I was hoping you could elaborate on your comment about bisect. I am trying to use it as follows: I try to grid my space (since my intervals have an upper and lower bound) into segments (e.g. of 100) and then I take these bins and put them into a bisect list, so that it is sorted. Then when a new interval comes in, I try to place it within one of those bins. But this is getting messy: I don't know if I should place it there by its beginning number or end number. Also, if I have an interval that overlaps my boundaries -- i.e. (900, 1010) when my first interval is (0, 1000), I may miss some items from listB when i make my count. Is there an elegant solution to this? Gridding like you said seemed straight forward but now it seems complicated.. I'd like to add that this is *not* a homework problem, by the way. On Jan 12, 4:05 pm, Robert Kern robert.k...@gmail.com wrote: [Apologies for piggybacking, but I think GMane had a hiccup today and missed the original post] [Somebody wrote]: suppose I have two lists of intervals, one significantly larger than the other. For example listA = [(10, 30), (5, 25), (100, 200), ...] might contain thousands of elements while listB (of the same form) might contain hundreds of thousands or millions of elements. I want to count how many intervals in listB are contained within every listA. For example, if listA = [(10, 30), (600, 800)] and listB = [(20, 25), (12, 18)] is the input, then the output should be that (10, 30) has 2 intervals from listB contained within it, while (600, 800) has 0. (Elements of listB can be contained within many intervals in listA, not just one.) Interval trees. http://en.wikipedia.org/wiki/Interval_tree -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient interval containment lookup
On Jan 12, 10:58 pm, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: On Mon, 12 Jan 2009 14:49:43 -0800, Per Freem wrote: thanks for your replies -- a few clarifications and questions. the is_within operation is containment, i.e. (a,b) is within (c,d) iff a = c and b = d. Note that I am not looking for intervals that overlap... this is why interval trees seem to me to not be relevant, as the overlapping interval problem is way harder than what I am trying to do. Please correct me if I'm wrong on this... To test for contained intervals: a = c and b = d To test for overlapping intervals: not (b c or a d) Not exactly what I would call way harder. -- Steven hi Steven, i found an implementation (which is exactly how i'd write it based on the description) here: http://hackmap.blogspot.com/2008/11/python-interval-tree.html when i use this however, it comes out either significantly slower or equal to a naive search. my naive search just iterates through a smallish list of intervals and for each one says whether they overlap with each of a large set of intervals. here is the exact code i used to make the comparison, plus the code at the link i have above: class Interval(): def __init__(self, start, stop): self.start = start self.stop = stop import random import time num_ints = 3 init_intervals = [] for n in range(0, num_ints): start = int(round(random.random() *1000)) end = start + int(round(random.random()*500+1)) init_intervals.append(Interval(start, end)) num_ranges = 900 ranges = [] for n in range(0, num_ranges): start = int(round(random.random() *1000)) end = start + int(round(random.random()*500+1)) ranges.append((start, end)) #print init_intervals tree = IntervalTree(init_intervals) t1 = time.time() for r in ranges: tree.find(r[0], r[1]) t2 = time.time() print interval tree: %.3f %((t2-t1)*1000.0) t1 = time.time() for r in ranges: naive_find(init_intervals, r[0], r[1]) t2 = time.time() print brute force: %.3f %((t2-t1)*1000.0) on one run, i get: interval tree: 8584.682 brute force: 8201.644 is there anything wrong with this implementation? it seems very right to me but i am no expert. any help on this would be relly helpful. -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient interval containment lookup
i forgot to add, my naive_find is: def naive_find(intervals, start, stop): results = [] for interval in intervals: if interval.start = start and interval.stop = stop: results.append(interval) return results On Jan 12, 11:55 pm, Per Freem perfr...@yahoo.com wrote: On Jan 12, 10:58 pm, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: On Mon, 12 Jan 2009 14:49:43 -0800, Per Freem wrote: thanks for your replies -- a few clarifications and questions. the is_within operation is containment, i.e. (a,b) is within (c,d) iff a = c and b = d. Note that I am not looking for intervals that overlap... this is why interval trees seem to me to not be relevant, as the overlapping interval problem is way harder than what I am trying to do. Please correct me if I'm wrong on this... To test for contained intervals: a = c and b = d To test for overlapping intervals: not (b c or a d) Not exactly what I would call way harder. -- Steven hi Steven, i found an implementation (which is exactly how i'd write it based on the description) here:http://hackmap.blogspot.com/2008/11/python-interval-tree.html when i use this however, it comes out either significantly slower or equal to a naive search. my naive search just iterates through a smallish list of intervals and for each one says whether they overlap with each of a large set of intervals. here is the exact code i used to make the comparison, plus the code at the link i have above: class Interval(): def __init__(self, start, stop): self.start = start self.stop = stop import random import time num_ints = 3 init_intervals = [] for n in range(0, num_ints): start = int(round(random.random() *1000)) end = start + int(round(random.random()*500+1)) init_intervals.append(Interval(start, end)) num_ranges = 900 ranges = [] for n in range(0, num_ranges): start = int(round(random.random() *1000)) end = start + int(round(random.random()*500+1)) ranges.append((start, end)) #print init_intervals tree = IntervalTree(init_intervals) t1 = time.time() for r in ranges: tree.find(r[0], r[1]) t2 = time.time() print interval tree: %.3f %((t2-t1)*1000.0) t1 = time.time() for r in ranges: naive_find(init_intervals, r[0], r[1]) t2 = time.time() print brute force: %.3f %((t2-t1)*1000.0) on one run, i get: interval tree: 8584.682 brute force: 8201.644 is there anything wrong with this implementation? it seems very right to me but i am no expert. any help on this would be relly helpful. -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient interval containment lookup
hi brent, thanks very much for your informative reply -- didn't realize this about the size of the interval. thanks for the bx-python link. could you (or someone else) explain why the size of the interval makes such a big difference? i don't understand why it affects efficiency so much... thanks. On Jan 13, 12:24 am, brent bpede...@gmail.com wrote: On Jan 12, 8:55 pm, Per Freem perfr...@yahoo.com wrote: On Jan 12, 10:58 pm, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: On Mon, 12 Jan 2009 14:49:43 -0800, Per Freem wrote: thanks for your replies -- a few clarifications and questions. the is_within operation is containment, i.e. (a,b) is within (c,d) iff a = c and b = d. Note that I am not looking for intervals that overlap... this is why interval trees seem to me to not be relevant, as the overlapping interval problem is way harder than what I am trying to do. Please correct me if I'm wrong on this... To test for contained intervals: a = c and b = d To test for overlapping intervals: not (b c or a d) Not exactly what I would call way harder. -- Steven hi Steven, i found an implementation (which is exactly how i'd write it based on the description) here:http://hackmap.blogspot.com/2008/11/python-interval-tree.html when i use this however, it comes out either significantly slower or equal to a naive search. my naive search just iterates through a smallish list of intervals and for each one says whether they overlap with each of a large set of intervals. here is the exact code i used to make the comparison, plus the code at the link i have above: class Interval(): def __init__(self, start, stop): self.start = start self.stop = stop import random import time num_ints = 3 init_intervals = [] for n in range(0, num_ints): start = int(round(random.random() *1000)) end = start + int(round(random.random()*500+1)) init_intervals.append(Interval(start, end)) num_ranges = 900 ranges = [] for n in range(0, num_ranges): start = int(round(random.random() *1000)) end = start + int(round(random.random()*500+1)) ranges.append((start, end)) #print init_intervals tree = IntervalTree(init_intervals) t1 = time.time() for r in ranges: tree.find(r[0], r[1]) t2 = time.time() print interval tree: %.3f %((t2-t1)*1000.0) t1 = time.time() for r in ranges: naive_find(init_intervals, r[0], r[1]) t2 = time.time() print brute force: %.3f %((t2-t1)*1000.0) on one run, i get: interval tree: 8584.682 brute force: 8201.644 is there anything wrong with this implementation? it seems very right to me but i am no expert. any help on this would be relly helpful. hi, the tree is inefficient when the interval is large. as the size of the interval shrinks to much less than the expanse of the tree, the tree will be faster. changing 500 to 50 in both cases in your script, i get: interval tree: 3233.404 brute force: 9807.787 so the tree will work for limited cases. but it's quite simple. check the tree in bx-python:http://bx-python.trac.bx.psu.edu/browser/trunk/lib/bx/intervals/opera... for a more robust implementation. -brentp -- http://mail.python.org/mailman/listinfo/python-list
RE: listdir reports [Error 1006] The volume for a file has been externally altered so that the opened file is no longer valid
FYI: the '/*.*' is part of the error message returned. -Original Message- From: ch...@rebertia.com [mailto:ch...@rebertia.com] On Behalf Of Chris Rebert Sent: Wednesday, January 07, 2009 6:40 PM To: Per Olav Kroka Cc: python-list@python.org Subject: Re: listdir reports [Error 1006] The volume for a file has been externally altered so that the opened file is no longer valid PS: Why does the listdir() function add '*.*' to the path? Don't know what you're talking about. It doesn't do any globbing or add *.* to the path. Its exclusive purpose is to list the contents of a directory, so /in a sense/ it does add *.*, but then not adding *.* would make the function completely useless given its purpose. PS2: Why does the listdir() function add '/*.*' to the path on windows and not '\\*.*' ? You can use either directory separator (\ or /) with the Python APIs on Windows. rc:\WINDOWS\ works just as well as c:/WINDOWS/. Cheers, Chris -- Follow the path of the Iguana... http://rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
[issue3810] os.chdir() et al: is the path str or bytes?
New submission from Per Cederqvist [EMAIL PROTECTED]: The documentation at http://docs.python.org/dev/3.0/library/os.html#os.chdir doesn't specify if the path argument to os.chdir() should be a str or a bytes, or if maybe both are acceptable. This is true for most of the file-manipulating functions in the os module. os.listdir() talks about Unicode objects. It should probably talk about bytes and str instead. -- assignee: georg.brandl components: Documentation messages: 72820 nosy: ceder, georg.brandl severity: normal status: open title: os.chdir() et al: is the path str or bytes? versions: Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2315] TimedRotatingFileHandler does not account for daylight savings time
New submission from Per Cederqvist [EMAIL PROTECTED]: If TimedRotatingFileHandler is instructed to roll over the log at midnight or on a certain weekday, it needs to consider when daylight savings time starts and ends. The current code just blindly adds self.interval to self.rolloverAt, totally ignoring that sometimes it should add 23 or 25 hours instead of 24 hours. (I suspect that the implementation would be simpler if you use the datetime module, rather than attempt to patch the existing code.) -- components: Library (Lib) messages: 63622 nosy: ceder severity: normal status: open title: TimedRotatingFileHandler does not account for daylight savings time type: behavior versions: Python 2.5 __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2315 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2316] TimedRotatingFileHandler names files incorrectly if nothing is logged during an interval
New submission from Per Cederqvist [EMAIL PROTECTED]: If nothing is logged during an interval, the TimedRotatingFileHandler will give bad names to future log files. The enclosed example program sets up a logger that rotates the log every second. It then logs a few messages with sleep of 1, 2, 4, 1 and 1 seconds between the messages. The log files will have names that increase with one second per log file, but the content for the last file will be generated a different second. An example run produced the message 2008-03-17 09:16:06: 1 sec later in a log file named badlogdir/logfile.2008-03-17_09-16-02. This problem was likely introduced in revision 42066. The root cause is that self.rolloverAt is increased by self.interval in doRollover - but if nothing was logged for a while, it should be increased by a multiple of self.interval. -- messages: 63624 nosy: ceder severity: normal status: open title: TimedRotatingFileHandler names files incorrectly if nothing is logged during an interval __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2316 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2316] TimedRotatingFileHandler names files incorrectly if nothing is logged during an interval
Per Cederqvist [EMAIL PROTECTED] added the comment: The attached program will generate log messages with a timestamp that are logged into a file with an unexpected extension. To run: mkdir badlogdir python badlogger.py Running the program takes about 9 seconds. Added file: http://bugs.python.org/file9687/badlogger.py __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2316 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2316] TimedRotatingFileHandler names files incorrectly if nothing is logged during an interval
Changes by Per Cederqvist [EMAIL PROTECTED]: -- components: +Library (Lib) type: - behavior versions: +Python 2.5 __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2316 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2317] TimedRotatingFileHandler logic for removing files wrong
New submission from Per Cederqvist [EMAIL PROTECTED]: There are three issues with log file removal in the TimedRotatingFileHandler class: - Removal will stop working in the year 2100, as the code assumes that timestamps start with .20. - If you run an application with backupCount set to a high number, and then restarts it with a lower number, the code will still not remove as many log files as you expect. It will never remove more than one file when it rotates the log. - It assumes that no other files matches baseFilename + .20*, so make sure that you don't log to both log and log.20th.century.fox in the same directory! Suggested fix: use os.listdir() instead of glob.glob(), filter all file names using a proper regexp, sort the result, and use a while loop to remove files until the result is small enough. To reduce the risk of accidentally removing an unrelated file, the filter regexp should be based on the logging interval, just as the filename is. My suggested fix means that old files may not be removed if you change the interval. I think that is an acceptable behavior, but it should probably be documented to avoid future bug reports on this subject. :-) -- components: Library (Lib) messages: 63626 nosy: ceder severity: normal status: open title: TimedRotatingFileHandler logic for removing files wrong type: behavior versions: Python 2.5 __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2317 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2318] TimedRotatingFileHandler: rotate every month, or every year
New submission from Per Cederqvist [EMAIL PROTECTED]: In my curent project, I would like to rotate log files on the 1st of every month. The TimedRotatingFileHandler class cannot do this, even though it tries to be very generic. I imagine that other projects would like to rotate the log file every year. That can also not be done. -- components: Library (Lib) messages: 63627 nosy: ceder severity: normal status: open title: TimedRotatingFileHandler: rotate every month, or every year type: feature request __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2318 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Program eating memory, but only on one machine?
Hi Everybody: I'm having a difficult time figuring out a a memory use problem. I have a python program that makes use of numpy and also calls a small C module I wrote because part of the simulation needed to loop and I got a massive speedup by putting that loop in C. I'm basically manipulating a bunch of matrices, so nothing too fancy. That aside, when the simulation runs, it typically uses a relatively small amount of memory (about 1.5% of my 4GB of RAM on my linux desktop) and this never increases. It can run for days without increasing beyond this, running many many parameter set iterations. This is what happens both on my Ubuntu Linux machine with the following Python specs: Python 2.4.4c1 (#2, Oct 11 2006, 20:00:03) [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2 Type help, copyright, credits or license for more information. import numpy numpy.version.version '1.0rc1' and also on my Apple MacBook with the following Python specs: Python 2.4.3 (#1, Apr 7 2006, 10:54:33) [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin Type help, copyright, credits or license for more information. import numpy numpy.version.version '1.0.1.dev3435' Well, that is the case on two of my test machines, but not on the one machine that I really wish would work, my lab's cluster, which would give me 20-fold increase in the number of processes I could run. On that machine, each process is using 2GB of RAM after about 1 hour (and the cluster MOM eventually kills them). I can watch the process eat RAM at each iteration and never relinquish it. Here's the Python spec of the cluster: Python 2.4.4 (#1, Jan 21 2007, 12:09:48) [GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-49)] on linux2 Type help, copyright, credits or license for more information. import numpy numpy.version.version '1.0.1' It also showed the same issue with the April 2006 2.4.3 release of python. I have tried using the gc module to force garbage collection after each iteration, but no change. I've done many newsgroup/google searches looking for known issues, but none found. The only major difference I can see is that our cluster is stuck on a really old version of gcc with the RedHat Enterprise that's on there, but I found no suggestions of memory issues online. So, does anyone have any suggestions for how I can debug this problem? If my program ate up memory on all machines, then I would know where to start and would blame some horrible programming on my end. This just seems like a less straightforward problem. Thanks for any help, Per -- http://mail.python.org/mailman/listinfo/python-list
Re: Program eating memory, but only on one machine?
Wolfgang Draxinger wdraxinger at darkstargames.de writes: So, does anyone have any suggestions for how I can debug this problem? Have a look at the version numbers of the GCC used. Probably something in your C code fails if it interacts with GCC 3.x.x. It's hardly Python eating memory, this is probably your C module. GC won't help here, since then you must add this into your C module. If my program ate up memory on all machines, then I would know where to start and would blame some horrible programming on my end. This just seems like a less straightforward problem. GCC 3.x.x brings other runtime libs, than GCC 4.x.x, I would check into that direction. Thank you for the suggestions. Since my C module is such a small part of the simulations, I can just comment out the call to that module completely (though I am still loading it) and fill in what the results would have been with random values. Sadly, the program still eats up memory on our cluster. Still, it could be something related to compiling Python with the older GCC. I'll see if I can make a really small example program that eats up memory on our cluster. That way we'll have something easy to work with. Thanks, Per -- http://mail.python.org/mailman/listinfo/python-list
Re: Program eating memory, but only on one machine? (Solved, sort of)
Per B.Sederberg persed at princeton.edu writes: I'll see if I can make a really small example program that eats up memory on our cluster. That way we'll have something easy to work with. Now this is weird. I figured out the bug and it turned out that every time you call numpy.setmember1d in the latest stable release of numpy it was using up a ton of memory and never releasing it. I replaced every instance of setmember1d with my own method below and I have zero increase in memory. It's not the most efficient of code, but it gets the job done... def ismember(a,b): ainb = zeros(len(a),dtype=bool) for item in b: ainb = ainb | (a==item) return ainb I'll now go post this problem on the numpy forums. Best, Per -- http://mail.python.org/mailman/listinfo/python-list
(question) How to use python get access to google search without query quota limit
I am doing a Natural Language processing project for academic use, I think google's rich retrieval information and query-segment might be of help, I downloaded google api, but there is query limit(1000/day), How can I write python code to simulate the browser-like-activity to submit more than 10k queries in one day? applying for more than 10 licence keys and changing them if query-quota-exception raised is not a neat idea... -- http://mail.python.org/mailman/listinfo/python-list
Re: (question) How to use python get access to google search without query quota limit
Yeah, Thanks Am, I can be considered as an advanced google user, presumably.. But I am not a advanced programmer yet. If everyone can generate unlimited number of queries, soon the user-query-data, which I believe is google's most advantage, will be in chaos. Can they simply ignore some queries from a certain licence key or.. so that they can keep their user-query-statistics normal and yet provide cranky queriers reseanable response? -- http://mail.python.org/mailman/listinfo/python-list
Is there such an idiom?
http://jaynes.colorado.edu/PythonIdioms.html Use dictionaries for searching, not lists. To find items in common between two lists, make the first into a dictionary and then look for items in the second in it. Searching a list for an item is linear-time, while searching a dict for an item is constant time. This can often let you reduce search time from quadratic to linear. Is this correct? s = [1,2,3,4,5...] t = [4,5,6,,8,...] how to find whether there is/are common item(s) between two list in linear-time? how to find the number of common items between two list in linear-time? -- http://mail.python.org/mailman/listinfo/python-list
Re: Is there such an idiom?
Thanks Ron, surely set is the simplest way to understand the question, to see whether there is a non-empty intersection. But I did the following thing in a silly way, still not sure whether it is going to be linear time. def foo(): l = [...] s = [...] dic = {} for i in l: dic[i] = 0 k=0 while k len(s): if s[k] in dic: return True else: pass k+=1 if k == len(s): return False I am still a rookie, and partly just migrated from Haskell... I am not clear about how making one of the lists a dictionary is helpful -- http://mail.python.org/mailman/listinfo/python-list
serial port server cnhd38
To whom it may concern, The serial port server 'cnhd38' has been terminated (on who's initiative, I don't know). It affects the users of the (at least) following nodes: cnhd36, cnhd44, cnhd45, cnhd46, cnhd47. The new terminal server to use is called 'msp-t01'. The port numbers that are of interest for the nodes mentioned above are as follows: port 17: this port is shared between: cnhd44/etm4 serial port (via riscwatch), currently connected here. cnhd36/console port port 18: this port goes to cnhd44/console port port 19: this port goes to cnhd45/console port port 20: this port goes to cnhd47/console port port 21: this port goes to cnhd46/console port To connect to a port, just enter the following command: telnet msp-t01 prefixportnumber ... an extra enter should give you the prompt. prefix is always 20 portnumber is the port number... example, connect to cnhd47/console port: telnet msp-t01 2020 br /Per -- http://mail.python.org/mailman/listinfo/python-list
test
sdfdsafasd -- http://mail.python.org/mailman/listinfo/python-list