Re: Trying to parse a HUGE(1gb) xml file
Adam Tauno Williams, 20.12.2010 20:49: On Mon, 2010-12-20 at 11:34 -0800, spaceman-spiff wrote: This is a rather long post, but i wanted to include all the details& everything i have tried so far myself, so please bear with me& read the entire boringly long post. I am trying to parse a ginormous ( ~ 1gb) xml file. Do that hundreds of times a day. 0. I am a python& xml n00b, s& have been relying on the excellent beginner book DIP(Dive_Into_Python3 by MP(Mark Pilgrim) Mark , if u are readng this, you are AWESOME& so is your witty& humorous writing style) 1. Almost all exmaples pf parsing xml in python, i have seen, start off with these 4 lines of code. import xml.etree.ElementTree as etree Try import xml.etree.cElementTree as etree instead. Note the leading "c", which hints at the C implementations of ElementTree. It's much faster and much more memory friendly than the Python implementation. tree = etree.parse('*path_to_ginormous_xml*') root = tree.getroot() #my huge xml has 1 root at the top level print root Yes, this is a terrible technique; most examples are crap. 2. In the 2nd line of code above, as Mark explains in DIP, the parse function builds& returns a tree object, in-memory(RAM), which represents the entire document. I tried this code, which works fine for a small ( ~ 1MB), but when i run this simple 4 line py code in a terminal for my HUGE target file (1GB), nothing happens. In a separate terminal, i run the top command,& i can see a python process, with memory (the VIRT column) increasing from 100MB , all the way upto 2100MB. Yes, this is using DOM. DOM is evil and the enemy, full-stop. Actually, ElementTree is not "DOM", it's modelled after the XML Infoset. While I agree that DOM is, well, maybe not "the enemy", but not exactly beautiful either, ElementTree is really a good thing, likely also in this case. I am guessing, as this happens (over the course of 20-30 mins), the tree representing is being slowly built in memory, but even after 30-40 mins, nothing happens. I dont get an error, seg fault or out_of_memory exception. You need to process the document as a stream of elements; aka SAX. IMHO, this is the worst advice you can give. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: Redundant importing of modules
On 21 déc, 03:03, Steve Holden wrote: > On 12/20/2010 8:36 PM, Jshgwave wrote:> > > When writing a function that uses a module such as NumPy, it is tempting > > to include the statement "import numpy" or "import numpy as np" in the > > definition of the function, in case the function is used in a script > > that hasn't already imported NumPy. (answering the OP - post didn't show off here on c.l.py): This is actually totally useless. The global namespace of a function is the namespace of the module in which it has been defined, not the namespace of the module where the function is called. -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
spaceman-spiff, 20.12.2010 21:29: I am sorry i left out what exactly i am trying to do. 0. Goal :I am looking for a specific element..there are several 10s/100s occurrences of that element in the 1gb xml file. The contents of the xml, is just a dump of config parameters from a packet switch( although imho, the contents of the xml dont matter) I need to detect them& then for each 1, i need to copy all the content b/w the element's start& end tags& create a smaller xml file. Then cElementTree's iterparse() is your friend. It allows you to basically iterate over the XML tags while its building an in-memory tree from them. That way, you can either remove subtrees from the tree if you don't need them (to safe memory) or otherwise handle them in any way you like, such as serialising them into a new file (and then deleting them). Also note that the iterparse implementation in lxml.etree allows you to specify a tag name to restrict the iterator to these tags. That's usually a lot faster, but it also means that you need to take more care to clean up the parts of the tree that the iterator stepped over. Depending on your requirements and the amount of manual code optimisation that you want to invest, either cElementTree or lxml.etree may perform better for you. It seems that you already found the article by Liza Daly about high performance XML processing with Python. Give it another read, it has a couple of good hints and examples that will help you here. Stefan -- http://mail.python.org/mailman/listinfo/python-list
how to inter-working on process in window
Hi all how do i send an ESC key into a process on window i already get a pid of process but i dont know how to send ESC key into process Please help Ha -- http://mail.python.org/mailman/listinfo/python-list
Re: On 07/13/2010 02:18 PM, Adam Mercer wrote:That version of M2Crypto does not
I was getting the same error trying to build M2Crypto 0.20.2 for Python 2.5 on a Win 7 laptop, so I pulled down the trunk, and it did build properly using minGW and Swig. However, when I try to "python setup.py install", python simply gives the same complaint that python was built in visual studio 2003, and will not install M2Crypto. Any help would be greatly appreciated, as I have been trying to get this to work for days. Does anyone have a build for 0.20.2 that works with python 2.5? I found so many other builds, but not that one. Thanks, Bob S. > On Tuesday, July 13, 2010 5:18 PM Adam Mercer wrote: > Hi > > I am trying to build M2Crypto on Mac OS X 10.6.4 against python2.5 > (python2.6 fails in the same way), with SWIG 2.0.0 and OpenSSL 1.0.0a > and it is failing with the following: > > 105 :info:build swigging SWIG/_m2crypto.i to SWIG/_m2crypto_wrap.c > 106 :info:build swig -python > -I/opt/local/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 > -I/opt/local/include -includeall -o SWIG/_m2crypto_wrap.c > SWIG/_m2crypto.i > 107 :info:build SWIG/_bio.i:64: Warning 454: Setting a > pointer/reference variable may leak memory. > 108 :info:build SWIG/_rand.i:19: Warning 454: Setting a > pointer/reference variable may leak memory. > 109 :info:build SWIG/_evp.i:156: Warning 454: Setting a > pointer/reference variable may leak memory. > 110 :info:build SWIG/_dh.i:36: Warning 454: Setting a > pointer/reference variable may leak memory. > 111 :info:build SWIG/_rsa.i:43: Warning 454: Setting a > pointer/reference variable may leak memory. > 112 :info:build SWIG/_dsa.i:31: Warning 454: Setting a > pointer/reference variable may leak memory. > 113 :info:build SWIG/_ssl.i:207: Warning 454: Setting a > pointer/reference variable may leak memory. > 114 :info:build SWIG/_x509.i:313: Warning 454: Setting a > pointer/reference variable may leak memory. > 115 :info:build SWIG/_pkcs7.i:42: Warning 454: Setting a > pointer/reference variable may leak memory. > 116 :info:build SWIG/_pkcs7.i:42: Warning 454: Setting a > pointer/reference variable may leak memory. > 117 :info:build SWIG/_util.i:9: Warning 454: Setting a > pointer/reference variable may leak memory. > 118 :info:build SWIG/_ec.i:111: Warning 454: Setting a > pointer/reference variable may leak memory. > 119 :info:build SWIG/_engine.i:162: Warning 454: Setting a > pointer/reference variable may leak memory. > 120 :info:build creating build/temp.macosx-10.6-x86_64-2.5 > 121 :info:build creating build/temp.macosx-10.6-x86_64-2.5/SWIG > 122 :info:build /usr/bin/gcc-4.2 -fno-strict-aliasing -mno-fused-madd > -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes > -I/opt/local/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 > -I/opt/local/include > -I/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_python_py25-m2crypto/work/M2Crypto-0.20.2/SWIG > -c SWIG/_m2crypto_wrap.c -o > build/temp.macosx-10.6-x86_64-2.5/SWIG/_m2crypto_wrap.o -DTHREADING > 123 :info:build SWIG/_m2crypto_wrap.c: In function 'rand_pseudo_bytes': > 124 :info:build SWIG/_m2crypto_wrap.c:3899: warning: pointer targets > in passing argument 1 of 'PyString_FromStringAndSize' differ in > signedness > 125 :info:build SWIG/_m2crypto_wrap.c: In function 'pkcs5_pbkdf2_hmac_sha1': > 126 :info:build SWIG/_m2crypto_wrap.c:3973: warning: pointer targets > in passing argument 1 of 'PyString_FromStringAndSize' differ in > signedness > 127 :info:build SWIG/_m2crypto_wrap.c: In function 'bytes_to_key': > 128 :info:build SWIG/_m2crypto_wrap.c:4132: warning: pointer targets > in passing argument 1 of 'PyString_FromStringAndSize' differ in > signedness > 129 :info:build SWIG/_m2crypto_wrap.c: In function 'sign_final': > 130 :info:build SWIG/_m2crypto_wrap.c:4228: warning: pointer targets > in passing argument 1 of 'PyString_FromStringAndSize' differ in > signedness > 131 :info:build SWIG/_m2crypto_wrap.c: In function 'pkey_as_der': > 132 :info:build SWIG/_m2crypto_wrap.c:4300: warning: pointer targets > in passing argument 1 of 'PyString_FromStringAndSize' differ in > signedness > 133 :info:build SWIG/_m2crypto_wrap.c: In function 'pkey_get_modulus': > 134 :info:build SWIG/_m2crypto_wrap.c:4333: warning: value computed is not > used > 135 :info:build SWIG/_m2crypto_wrap.c:4358: warning: value computed is not > used > 136 :info:build SWIG/_m2crypto_wrap.c: In function 'AES_crypt': > 137 :info:build SWIG/_m2crypto_wrap.c:: warning: pointer targets > in passing argument 1 of 'PyString_FromStringAndSize' differ in > signedness > 138 :info:build SWIG/_m2crypto_wrap.c: At top level: > 139 :info:build SWIG/_m2crypto_wrap.c:5846: error: expected '=', ',', > ';', 'asm' or '__attribute__' before '*' token > 140 :info:build SWIG/_m2crypto_wrap.c:5850: error: expected ')' before '*' > token > 141 :info:build SWIG/_m2crypto_wrap.c:5854:
Re: Modifying an existing excel spreadsheet
On Dec 20, 9:56 pm, Ed Keith wrote: > I have a user supplied 'template' Excel spreadsheet. I need to create a new > excel spreadsheet based on the supplied template, with data filled in. > > I found the tools > herehttp://www.python-excel.org/, andhttp://sourceforge.net/projects/pyexcelerator/. > I have been trying to use the former, since the latter seems to be devoid of > documentation (not even any docstrings). > > My first thought was to copy the template, open the copy, modify it and save > the modifications. But it looks like if I open an existing spreadsheet it > must be read only. So I tried to open the template, copy it to a new > spreadsheet and write the new spreadsheet, but I can't seem to copy the > images, and it looks like copying the formatting is going to be difficult. > > Can anyone give me any tips or advice? > > Thanks in advance, > > -EdK > > Ed Keith > > e_...@yahoo.com > > Blog: edkeith.blogspot.com Have you tried: http://groups.google.com/group/python-excel and searching the archives for "template"? Similar questions have come up before there. hth Jon -- http://mail.python.org/mailman/listinfo/python-list
Re: Bug in fixed_point?!
On 12/20/10 10:03 PM, C Barrington-Leigh wrote: I cannot figure out what I'm doing wrong. The following does not return a fixed point: from scipy import optimize xxroot= optimize.fixed_point(lambda xx: exp(-2.0*xx)/2.0, 1.0, args=(), xtol=1e-12, maxiter=500) print ' %f solves fixed point, ie f(%f)=%f ?'% (xxroot,xxroot,exp(-2.0*xxroot)/2.0) You will want to ask scipy questions on the scipy-user mailing list: http://www.scipy.org/Mailing_Lists When you do, please provide the information that Terry Reedy asked for. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: class inheritance
JLundell wrote: On Saturday, March 13, 2010 9:03:36 AM UTC-8, Jonathan Lundell wrote: I've got a subclass of fractions.Fraction called Value; it's a mostly trivial class, except that it overrides __eq__ to mean 'nearly equal'. However, since Fraction's operations result in a Fraction, not a Value, I end up with stuff like this: x = Value(1) + Value(2) where x is now a Fraction, not a Value, and x == y uses Fraction.__eq__ rather than Value.__eq__. This appears to be standard Python behavior (int does the same thing). I've worked around it by overriding __add__, etc, with functions that invoke Fraction but coerce the result. But that's tedious; there are a lot of methods to override. So I'm wondering: is there a more efficient way to accomplish what I'm after? I recently implemented a different approach to this. I've got: class Rational(fractions.Fraction): ... and some methods of my own, including my own __new__ and __str__ (which is one of the reasons I need the class). Then after (outside) the class definition, this code that was inspired by something similar I noticed in Python Cookbook. There are two things going on here. One is, of course, the automation at import time. The other is that the wrapper gets a Fraction instance and simply overrides __class__, rather than creating yet another Rational and unbinding the interim Fraction. Seems to work quite well. [snip] Another option is to use a metaclass: class Perpetuate(ABCMeta): def __new__(metacls, cls_name, cls_bases, cls_dict): if len(cls_bases) > 1: raise TypeError("multiple bases not allowed") result_class = type.__new__(metacls, cls_name, cls_bases, cls_dict) base_class = cls_bases[0] known_attr = set() for attr in cls_dict.keys(): known_attr.add(attr) for attr in base_class.__dict__.keys(): if attr in ('__new__'): continue code = getattr(base_class, attr) if callable(code) and attr not in known_attr: setattr(result_class, attr, metacls._wrap(base_class, code)) elif attr not in known_attr: setattr(result_class, attr, code) return result_class @staticmethod def _wrap(base, code): def wrapper(*args, **kwargs): if args: cls = args[0] result = code(*args, **kwargs) if type(result) == base: return cls.__class__(result) elif isinstance(result, (tuple, list, set)): new_result = [] for partial in result: if type(partial) == base: new_result.append(cls.__class__(partial)) else: new_result.append(partial) result = result.__class__(new_result) elif isinstance(result, dict): for key in result: value = result[key] if type(value) == base: result[key] = cls.__class__(value) return result wrapper.__name__ = code.__name__ wrapper.__doc__ = code.__doc__ return wrapper then the actual class becomes: class CloseFraction(Fraction): __metaclass__ = Perpetuate def __eq__(x, y): return abs(x - y) < 1 # season to taste def __repr__(x): return "CloseFraction(%d, %d)" % (x.numerator, x.denominator) Perpetuate needs to handle multiple inheritance better, but it might meet your needs at this point. Sample run: --> n = CloseFraction(3, 2) --> n CloseFraction(3, 2) --> print n 3/2 --> m = CloseFraction(9, 4) --> m CloseFraction(9, 4) --> n == m True --> n - m CloseFraction(-3, 4) --> n + m CloseFraction(15, 4) --> n.real CloseFraction(3, 2) --> n.imag 0 # this is an int Hope this helps! ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: help with link parsing?
On Dec 20, 7:14 pm, "Littlefield, Tyler" wrote: > Hello all, > I have a question. I guess this worked pre 2.6; I don't remember the > last time I used it, but it was a while ago, and now it's failing. > Anyone mind looking at it and telling me what's going wrong? Also, is > there a quick way to match on a certain site? like links from google.com > and only output those? > #!/usr/bin/env python > > #This program is free software: you can redistribute it and/or modify it > under the terms of the GNU General Public License as published > #by the Free Software Foundation, either version 3 of the License, or > (at your option) any later version. > > #This program is distributed in the hope that it will be useful, but > WITHOUT ANY WARRANTY; without even the implied warranty of > #MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > General Public License for more details. > # > #You should have received a copy of the GNU General Public License along > with this program. If not, see > #http://www.gnu.org/licenses/. > > """ > This script will parse out all the links in an html document and write > them to a textfile. > """ > import sys,optparse > import htmllib,formatter > > #program class declarations: > class Links(htmllib.HTMLParser): > def __init__(self,formatter): > htmllib.HTMLParser.__init__(self, formatter) > self.links=[] > def start_a(self, attrs): > if (len(attrs)>0): > for a in attrs: > if a[0]=="href": > self.links.append(a[1]) > print a[1] > break > > def main(argv): > if (len(argv)!=3): > print("Error:\n"+argv[0]+" .\nParses > for all links and saves them to .") > return 1 > lcount=0 > format=formatter.NullFormatter() > html=Links(format) > print "Retrieving data:" > page=open(argv[1],"r") > print "Feeding data to parser:" > html.feed(page.read()) > page.close() > print "Writing links:" > output=open(argv[2],"w") > for i in (html.links): > output.write(i+"\n") > lcount+=1 > output.close() > print("Wrote "+str(lcount)+" links to "+argv[2]+"."); > print("done.") > > if (__name__ == "__main__"): > #we call the main function passing a list of args, and exit with > the return code passed back. > sys.exit(main(sys.argv)) > > -- > > Thanks, > Ty This doesn't answer your original question, but excluding the command line handling, how's this do you?: import lxml from urlparse import urlsplit doc = lxml.html.parse('http://www.google.com') print map(urlsplit, doc.xpath('//a/@href')) [SplitResult(scheme='http', netloc='www.google.co.uk', path='/imghp', query='hl=en&tab=wi', fragment=''), SplitResult(scheme='http', netloc='video.google.co.uk', path='/', query='hl=en&tab=wv', fragment=''), SplitResult(scheme='http', netloc='maps.google.co.uk', path='/maps', query='hl=en&tab=wl', fragment=''), SplitResult(scheme='http', netloc='news.google.co.uk', path='/nwshp', query='hl=en&tab=wn', fragment=''), ...] Much nicer IMHO, plus the lxml.html has iterlinks() and other convenience functions for handling HTML. hth Jon. -- http://mail.python.org/mailman/listinfo/python-list
[RELEASED] Python 3.2 beta 2
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On behalf of the Python development team, I'm happy to announce the second beta preview release of Python 3.2. Python 3.2 is a continuation of the efforts to improve and stabilize the Python 3.x line. Since the final release of Python 2.7, the 2.x line will only receive bugfixes, and new features are developed for 3.x only. Since PEP 3003, the Moratorium on Language Changes, is in effect, there are no changes in Python's syntax and built-in types in Python 3.2. Development efforts concentrated on the standard library and support for porting code to Python 3. Highlights are: * numerous improvements to the unittest module * PEP 3147, support for .pyc repository directories * PEP 3149, support for version tagged dynamic libraries * PEP 3148, a new futures library for concurrent programming * PEP 384, a stable ABI for extension modules * PEP 391, dictionary-based logging configuration * an overhauled GIL implementation that reduces contention * an extended email package that handles bytes messages * countless fixes regarding bytes/string issues; among them full support for a bytes environment (filenames, environment variables) * many consistency and behavior fixes for numeric operations * a sysconfig module to access configuration information * a pure-Python implementation of the datetime module * additions to the shutil module, among them archive file support * improvements to pdb, the Python debugger For a more extensive list of changes in 3.2, see http://docs.python.org/3.2/whatsnew/3.2.html To download Python 3.2 visit: http://www.python.org/download/releases/3.2/ Please consider trying Python 3.2 with your code and reporting any bugs you may notice to: http://bugs.python.org/ Enjoy! - -- Georg Brandl, Release Manager georg at python.org (on behalf of the entire python-dev team and 3.2's contributors) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk0Q/aAACgkQN9GcIYhpnLDf8gCgkLGAsE+T3R505jZc1RxXDYsa NSsAnRGaFjeTm9o2Z5O8FuIzTUG8t1PT =hHzz -END PGP SIGNATURE- -- http://mail.python.org/mailman/listinfo/python-list
seeking pygtk bindings for gtkdatabox
Hello, a search for the python bindings for gtkdatabox lead no where. Anyone know of who is maintaining/working/siting such a package? Thanks in advance. Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending XML to a WEB Service and Getting Response Back
On 12/20/2010 11:45 PM, Ian Kelly wrote: On 12/20/2010 11:34 PM, John Nagle wrote: SOAPpy is way out of date. The last update on SourceForge was in 2001. 2007, actually: http://sourceforge.net/projects/pywebsvcs/files/ And there is repository activity within the past 9 months. Still, point taken. The original SOAPpy was at http://sourceforge.net/projects/soapy/files/ but was apparently abandoned in 2001. Someone else picked it up and moved it to http://sourceforge.net/projects/pywebsvcs/files/SOAP.py/ where it was last updated in 2005. ZSI was last updated in 2007. Users are still submitting bug reports, but nobody is answering. Somebody posted "Who maintains the pywebsvcs webpage?" in February 2009, but no one answered them. There's also "Python SOAP" http://sourceforge.net/projects/pythonsoap/ abandoned in 2005. The "suds" module http://sourceforge.net/projects/python-suds/ was last updated in March, 2010. That version will work with Python 2.6, and probably 2.7. There's very little project activity, but at least it's reasonably current. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Scanning directories for new files?
Hey everyone. I'm in the midst of writing a parser to clean up incoming files, remove extra data that isn't needed, normalize some values, etc. The base files will be uploaded via FTP. How does one go about scanning a directory for new files? For now we're looking to run it as a cron job but eventually would like to move away from that into making it a service running in the background. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending XML to a WEB Service and Getting Response Back
Thanks for the response all. I tried exploring suds (which seems to be the current) and i hit problems right away. I will now try urllib or httplib. I have asked for help in the suds forum. Hope somebody replies. When i try to create a client, the error is as follows. >>> from suds.client import Client >>> url = 'http://10.251.4.33:8041/DteEnLinea/ws/EnvioGuia.jws' >>> client = Client(url) Traceback (most recent call last): File "", line 1, in File "suds/client.py", line 112, in __init__ self.wsdl = reader.open(url) File "suds/reader.py", line 152, in open d = self.fn(url, self.options) File "suds/wsdl.py", line 136, in __init__ d = reader.open(url) File "suds/reader.py", line 79, in open d = self.download(url) File "suds/reader.py", line 101, in download return sax.parse(string=content) File "suds/sax/parser.py", line 136, in parse sax.parse(source) File "/usr/local/lib/python2.7/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/local/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/local/lib/python2.7/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/local/lib/python2.7/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:62: syntax error >>> [3] + Stopped (SIGTSTP)python This seems to be a old problem passing versions. Regards, Anurag On Wed, Dec 22, 2010 at 12:40 AM, John Nagle wrote: > On 12/20/2010 11:45 PM, Ian Kelly wrote: >> >> On 12/20/2010 11:34 PM, John Nagle wrote: >>> >>> SOAPpy is way out of date. The last update on SourceForge was in >>> 2001. >> >> 2007, actually: http://sourceforge.net/projects/pywebsvcs/files/ >> >> And there is repository activity within the past 9 months. Still, point >> taken. > > The original SOAPpy was at > > http://sourceforge.net/projects/soapy/files/ > > but was apparently abandoned in 2001. Someone else picked > it up and moved it to > > http://sourceforge.net/projects/pywebsvcs/files/SOAP.py/ > > where it was last updated in 2005. ZSI was last updated in > 2007. Users are still submitting bug reports, but nobody > is answering. Somebody posted "Who maintains the pywebsvcs webpage?" > in February 2009, but no one answered them. > > There's also "Python SOAP" > > http://sourceforge.net/projects/pythonsoap/ > > abandoned in 2005. > > The "suds" module > > http://sourceforge.net/projects/python-suds/ > > was last updated in March, 2010. That version > will work with Python 2.6, and probably 2.7. > There's very little project activity, but at > least it's reasonably current. > > John Nagle > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list
Re: Scanning directories for new files?
On Dec 21, 7:17 pm, Matty Sarro wrote: > Hey everyone. > I'm in the midst of writing a parser to clean up incoming files, > remove extra data that isn't needed, normalize some values, etc. The > base files will be uploaded via FTP. > How does one go about scanning a directory for new files? For now > we're looking to run it as a cron job but eventually would like to > move away from that into making it a service running in the > background. Not a direct answer, but I would choose the approach of letting the FTP server know when a new file has been added. For instance: http://www.pureftpd.org/project/pure-ftpd - "Any external shell script can be called after a successful upload. Virus scanners and database archiveal can easily be set up." Of course, there's loads more servers, that I'm sure will have callback events or similar. Although, yes, the monitoring the file system is completely possible. hth Jon. -- http://mail.python.org/mailman/listinfo/python-list
Re: Scanning directories for new files?
Am 21.12.2010 20:17, schrieb Matty Sarro: Hey everyone. I'm in the midst of writing a parser to clean up incoming files, remove extra data that isn't needed, normalize some values, etc. The base files will be uploaded via FTP. How does one go about scanning a directory for new files? For now we're looking to run it as a cron job but eventually would like to move away from that into making it a service running in the background. When You say cron, I assume you're running linux. One approach would be to os.walk() the directory in question, and filling a dict with the absolute name of the file as key and the output from stat() as content. Then re-scan regularly and check for changes in mtime,ctime etc. A less ressource consuming approach would be to use Linux' inotify infrastructure, which can be used from python https://github.com/seb-m/pyinotify And, your service is only an import away :-) https://github.com/seb-m/pyinotify/blob/master/python2/examples/daemon.py -- http://mail.python.org/mailman/listinfo/python-list
Re: Scanning directories for new files?
On Tue, 21 Dec 2010 14:17:40 -0500, Matty Sarro wrote: > Hey everyone. > I'm in the midst of writing a parser to clean up incoming files, remove > extra data that isn't needed, normalize some values, etc. The base files > will be uploaded via FTP. > How does one go about scanning a directory for new files? For now we're > looking to run it as a cron job but eventually would like to move away > from that into making it a service running in the background. > Make sure the files are initially uploaded using a name that the parser isn't looking for and rename it when the upload is finished. This way the parser won't try to process a partially loaded file. If you are uploading to a *nix machine You the rename can move the file between directories provided both directories are in the same filing system. Under those conditions rename is always an atomic operation with no copying involved. This would you to, say, upload the file to "temp/ myfile" and renamed it to "uploaded/myfile" with your parser only scanning the uploaded directory and, presumably, renaming processed files to move them to a third directory ready for further processing. I've used this technique reliably with files arriving via FTP at quite high rates. -- martin@ | Martin Gregorie gregorie. | Essex, UK org | -- http://mail.python.org/mailman/listinfo/python-list
Re: If/then style question
I'd bet you would stress your point Steven! But you don't need to persuade me, I do already agree. I just meant to say that, when the advantage is little, there's no need to rewrite a working function. And that with modern CPUs, if tests take so little time, that even some redundant one is not so much of a nuisance. in your working example, the "payload" is just a couple of integer calculations, that take very little time too. So the overhead due to redundant if tests does show clearly. And also in that not-really-real situation, 60% overhead just meant less than 3 seconds. Just for the sake of discussion, I tried to give both functions some plough to pull, and a worst-case situation too: >>> t1 = Timer('for x in range(100): print func1(0),', ... 'from __main__ import func1') >>> >>> t2 = Timer('for x in range(100): print func2(0),', ... 'from __main__ import func2') >>> >>> min(t1.repeat(number=1, repeat=1)) -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 53.011015366479114 >>> min(t2.repeat(number=1, repeat=1)) -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 47.55442856564332 that accounts for a scant 11% overhead, on more than one million tests per cycle. That said, let's make really clear that I would heartily prefer func2 to func1, based both on readability and speed. Thank you for having spent some time playing with me! Francesco On 19/12/2010 1.05, Steven D'Aprano wrote: Well, let's try it with a working (albeit contrived) example. This is just an example -- obviously I wouldn't write the function like this in real life, I'd use a while loop, but to illustrate the issue it will do. def func1(n): result = -1 done = False n = (n+1)//2 if n%2 == 1: result = n done = True if not done: n = (n+1)//2 if n%2 == 1: result = n done = True if not done: n = (n+1)//2 if n%2 == 1: result = n done = True if not done: for i in range(100): if not done: n = (n+1)//2 if n%2 == 1: result = n done = True return result def func2(n): n = (n+1)//2 if n%2 == 1: return n n = (n+1)//2 if n%2 == 1: return n n = (n+1)//2 if n%2 == 1: return n for i in range(100): n = (n+1)//2 if n%2 == 1: return n return -1 Not only is the second far more readable that the first, but it's also significantly faster: from timeit import Timer t1 = Timer('for i in range(20): x = func1(i)', ... 'from __main__ import func1') t2 = Timer('for i in range(20): x = func2(i)', ... 'from __main__ import func2') min(t1.repeat(number=10, repeat=5)) 7.3219029903411865 min(t2.repeat(number=10, repeat=5)) 4.530779838562012 The first function does approximately 60% more work than the first, all of it unnecessary overhead. -- http://mail.python.org/mailman/listinfo/python-list
Re: Scanning directories for new files?
On Tue, 21 Dec 2010 14:17:40 -0500, Matty Sarro wrote: > Hey everyone. > I'm in the midst of writing a parser to clean up incoming files, remove > extra data that isn't needed, normalize some values, etc. The base files > will be uploaded via FTP. > How does one go about scanning a directory for new files? For now we're > looking to run it as a cron job but eventually would like to move away > from that into making it a service running in the background. You can try pyinotify. Pyinotify is a Python module for monitoring filesystems changes. Pyinotify relies on a Linux Kernel feature (merged in kernel 2.6.13) called inotify. inotify is an event-driven notifier, its notifications are exported from kernel space to user space through three system calls. pyinotify binds these system calls and provides an implementation on top of them offering a generic and abstract way to manipulate those functionalities. I'm assuming your using Linux. You seem to be at least using UNIX (cron). read more at: http://pyinotify.sourceforge.net/ Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending XML to a WEB Service and Getting Response Back
On 12/21/2010 12:10 PM, John Nagle wrote: The original SOAPpy was at http://sourceforge.net/projects/soapy/files/ but was apparently abandoned in 2001. Someone else picked it up and moved it to http://sourceforge.net/projects/pywebsvcs/files/SOAP.py/ These are unrelated projects, AFACT. The former was released as version 0.1 on 4/27/01. According to the changelog, the first public release of the latter was version 0.5 on 4/17/01. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 3.2 beta 2
I wonder if Unladen Swallow is still being considered for merger with Python 3.3. Is it? On Dec 21, 4:18 pm, Georg Brandl wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On behalf of the Python development team, I'm happy to announce the > second beta preview release of Python 3.2. > > Python 3.2 is a continuation of the efforts to improve and stabilize the > Python 3.x line. Since the final release of Python 2.7, the 2.x line > will only receive bugfixes, and new features are developed for 3.x only. > > Since PEP 3003, the Moratorium on Language Changes, is in effect, there > are no changes in Python's syntax and built-in types in Python 3.2. > Development efforts concentrated on the standard library and support for > porting code to Python 3. Highlights are: > > * numerous improvements to the unittest module > * PEP 3147, support for .pyc repository directories > * PEP 3149, support for version tagged dynamic libraries > * PEP 3148, a new futures library for concurrent programming > * PEP 384, a stable ABI for extension modules > * PEP 391, dictionary-based logging configuration > * an overhauled GIL implementation that reduces contention > * an extended email package that handles bytes messages > * countless fixes regarding bytes/string issues; among them full > support for a bytes environment (filenames, environment variables) > * many consistency and behavior fixes for numeric operations > * a sysconfig module to access configuration information > * a pure-Python implementation of the datetime module > * additions to the shutil module, among them archive file support > * improvements to pdb, the Python debugger > > For a more extensive list of changes in 3.2, see > > http://docs.python.org/3.2/whatsnew/3.2.html > > To download Python 3.2 visit: > > http://www.python.org/download/releases/3.2/ > > Please consider trying Python 3.2 with your code and reporting any bugs > you may notice to: > > http://bugs.python.org/ > > Enjoy! > > - -- > Georg Brandl, Release Manager > georg at python.org > (on behalf of the entire python-dev team and 3.2's contributors) > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > > iEYEARECAAYFAk0Q/aAACgkQN9GcIYhpnLDf8gCgkLGAsE+T3R505jZc1RxXDYsa > NSsAnRGaFjeTm9o2Z5O8FuIzTUG8t1PT > =hHzz > -END PGP SIGNATURE- -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 3.2 beta 2
Am 21.12.2010 22:56, schrieb Luis M. González: > I wonder if Unladen Swallow is still being considered for merger with > Python 3.3. > Is it? 3.2 isn't even released yet, and 3.3 will appear 18 months after it (so in Summer 2012). It's much too early to tell. OTOH, to answer you literal question: most certainly. At least you seem to be considering it, so it's certainly being considered by somebody. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Funny __future__ imports
from __future__ import space_shuttle DeprecationWarning: will be removed in next release Post yours! -- http://mail.python.org/mailman/listinfo/python-list
Re: True lists in python?
Duncan Booth writes: > I guess you might be able to do it with a double-linked list provided > that when traversing the list you always keep two nodes around to > determine the direction. e.g. instead of asking for node6.nextNode() you > ask for node6.nextNode(previous=node1) and then the code can return > whichever sibling wasn't given. That would make reversal (assuming you > have both nodes) O(1), but would make traversing the list slower. There used to be a trick to implement doubly linked lists with the same memory footprint as singly linked ones: instead of each node storing two pointers (one to the next node, one to the previous one), you just store one value: (previous node) xor (next node) This means that when traversing the list, you need to always remember which node you are coming from. But it also makes these lists kind of symmetrical. -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list
Specialisation / Interests
Hi all, Was thinking tonight (now this morning my time): What would we consider the "long time" posters on c.l.p consider what they respond to and offer serious advice on. For instance: - Raymond Hettinger for algo's in collections and itertools - MRAB for regex's (never seen him duck a post where re was (not) required. - the "effbot" for PIL & ElementTree - Mark Hammond for work on win32 - Mark Dickinson for floating point/number theory etc... Then so many others!... I'm leaving a huge amount out, so no rudeness intended - but what you think guys and gals? Cheers, Jon. -- http://mail.python.org/mailman/listinfo/python-list
Re: [python-committers] [RELEASED] Python 3.2 beta 2
On Wed, Dec 22, 2010 at 6:18 AM, Georg Brandl wrote: > Since PEP 3003, the Moratorium on Language Changes, is in effect, there > are no changes in Python's syntax and built-in types in Python 3.2. Minor nit - we actually did tweak a few of the builtin types a bit (mostly the stuff to improve Sequence ABC conformance and to make range objects more list-like) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia -- http://mail.python.org/mailman/listinfo/python-list
Re: Funny __future__ imports
On 21/12/2010 22:17, Daniel da Silva wrote: from __future__ import space_shuttle DeprecationWarning: will be removed in next release Post yours! from __future__ import time_machine ImportError: time_machine in use by import -- http://mail.python.org/mailman/listinfo/python-list
Re: Funny __future__ imports
On 12/21/2010 6:38 PM MRAB said... On 21/12/2010 22:17, Daniel da Silva wrote: from __future__ import space_shuttle DeprecationWarning: will be removed in next release Post yours! from __future__ import time_machine ImportError: time_machine in use by import from __future__ import improved_realestate_market ValueError: realestate market depreciated :) -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending XML to a WEB Service and Getting Response Back
On 12/21/2010 11:26 AM, Anurag Chourasia wrote: Thanks for the response all. I tried exploring suds (which seems to be the current) and i hit problems right away. I will now try urllib or httplib. I have asked for help in the suds forum. Hope somebody replies. When i try to create a client, the error is as follows. from suds.client import Client url = 'http://10.251.4.33:8041/DteEnLinea/ws/EnvioGuia.jws' client = Client(url) Traceback (most recent call last): File "", line 1, in File "suds/client.py", line 112, in __init__ self.wsdl = reader.open(url) File "suds/reader.py", line 152, in open d = self.fn(url, self.options) File "suds/wsdl.py", line 136, in __init__ d = reader.open(url) File "suds/reader.py", line 79, in open d = self.download(url) File "suds/reader.py", line 101, in download return sax.parse(string=content) File "suds/sax/parser.py", line 136, in parse sax.parse(source) File "/usr/local/lib/python2.7/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/local/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/local/lib/python2.7/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/local/lib/python2.7/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException::1:62: syntax error [3] + Stopped (SIGTSTP)python This seems to be a old problem passing versions. Regards, Anurag Try posting a URL that isn't on network 10. That's some local network at your end. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Bug in fixed_point?!
On Dec 21, 9:36 am, Robert Kern wrote: > When you do, please provide the information that Terry Reedy asked for. > Sorry; quite right. For completeness I'll post here as well as over on scipy. Here's the actual code: - from scipy import optimize from math import exp xxroot= optimize.fixed_point(lambda xx: exp(-2.0*xx)/2.0, 1.0, args=(), xtol=1e-12, maxiter=500) print ' %f solves fixed point, ie f(%f)=%f ?'% (xxroot,xxroot,exp(-2.0*xxroot)/2.0) Here is the output -- Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) In [1]: run tmp.py 0.332058 solves fixed point, ie f(0.332058)=0.257364 ? -- http://mail.python.org/mailman/listinfo/python-list