Re: The future of Python immutability
John Nagle wrote: Python's concept of immutability is useful, but it could be more general. In the beginning, strings, tuples, and numbers were immutable, and everything else was mutable. That was simple enough. But over time, Python has acquired more immutable types - immutable sets and immutable byte arrays. Each of these is a special case. Python doesn't have immutable objects as a general concept, but it may be headed in that direction. There was some fooling around with an immmutability API associated with NumPy back in 2007, but that was removed. As more immutable types are added, a more general approach may be useful. Suppose, for discussion purposes, we had general immutable objects. Objects inherited from immutableobject instead of object would be unchangeable once __init__ had returned. Where does this take us? Immutability is interesting for threaded programs, because immutable objects can be shared without risk. Consider a programming model where objects shared between threads must be either immutable or synchronized in the sense that Java uses the term. Yes, this is one of the reasons I am currently learning Haskell, I am not yet anywhwere near proficient but the reason I am looking into FP is because of some of the claims of the FP community, particularly Erlang, regarding the benefits of pure FP with respect to multi-threading. It's a shame this post came right now since I'm not really up-to-speed enough with Haskell to comment on it with repsect to multi-threading. context I program Perl, Java and C++ for my day job, I've spent a lot of time making multithreaded programs work correctly and have even experienced the POE on a large project. So my comments below are based on experience of these languages. /context Such programs are free of most race conditions, without much programmer effort to make them so. I disagree. They are not free of most race conditions, and it still takes a lot of effort. Where did you get this idea from? Have you been reading some Java primer that attempts to make it sound easy? Java synchronized turned out to be a headache partly because trying to figure out how to lock all the little stuff being passed around a headache. But Java doesn't have immutable objects. Python does, and that can be exploited to make thread-based programming cleaner. This is nothing to do with Java, any multithreaded language that has mutable shared state has exactly the same problems. Can we talk about threading rather than Java please? Additionally Java provides a lot more than monitors (synchronized) for controlling multiple threads. Java does have immutable objects. Strings in Java are immutable for example. As are the object-based numeric types, Bytes, Characters etc. There are lots and lots of immutable types in Java and you can make your own by creating a class with no mutator methods and declaring it final. The general idea is that synchronized objects would have built in locks which would lock at entry to any function of the object and unlock at exit. The lock would also unlock at explicit waits. A Queue object would be a good example of a synchronized object. With this mechanism, multi-thread programs with shared data structures can be written with little or no explicit locking by the programmer. If the restrictions are made a bit stricter, strict enough that threads cannot share mutable unsynchronized data, removal of the global interpreter lock is potentially possible. This is a route to improved performance on modern multi-core CPUs. Right, this is where I would love to have had more experience with Haksell. Yes, as soon as you get to a situation where no thread can access shared state that is mutable your problems go away, you're also getting no work done becasue the threads, whilst they may be performing lots of interesting calculations have no way of allowing the rest of the program, or operating system, know about it. You can, today, in any language that provides threads, make any number of threaded programs that do not contain any race conditions, it's just that most of them are terribly dull and uninteresting. I'd love for someone from the Haskell/Erlang/other pure FP community provide some canonical example of how this is acheived in pure FP. I'll get there soon but I'm not going to skip ahead in my reading, I'm still trying to learn the basics. So, in response to your point of trying to get an immutable API so that Python can easily have multi-threaded programs that do not present race conditions I would say the following: That is not the challenge, that's the easy part. The challenge is getting useful information out of a system that has only been fed immutable objects. Regards, Nigel -- http://mail.python.org/mailman/listinfo/python-list
Re: The future of Python immutability
Stefan Behnel wrote: Nigel Rantor wrote: John Nagle wrote: Immutability is interesting for threaded programs, because immutable objects can be shared without risk. Consider a programming model where objects shared between threads must be either immutable or synchronized in the sense that Java uses the term. Such programs are free of most race conditions, without much programmer effort to make them so. I disagree. They are not free of most race conditions, and it still takes a lot of effort. Where did you get this idea from? Have you been reading some Java primer that attempts to make it sound easy? Read again what he wrote. In a language with only immutable data types (which doesn't mean that you can't efficiently create modified versions of a data container), avoiding race conditions is trivial. The most well known example is clearly Erlang. Adding synchronised data structures to that will not make writing race conditions much easier. My comment you quoted was talking about Java and the use of synchronized. I fthat was unclear I apologise. Please feel free to read the entirety of my post before replying. n -- http://mail.python.org/mailman/listinfo/python-list
Re: evolution [was Re: An assessment of the Unicode standard]
r wrote: I'd like to present a bug report to evolution, obviously the garbage collector is malfunctioning. I think most people think that when they read the drivel that you generate. I'm done with your threads and posts. *plonk* -- http://mail.python.org/mailman/listinfo/python-list
Re: An assessment of the Unicode standard
Hendrik van Rooyen wrote: On Sunday 30 August 2009 22:46:49 Dennis Lee Bieber wrote: Rather elitist viewpoint... Why don't we just drop nukes on some 60% of populated landmasses that don't have a western culture and avoid the whole problem? Now yer talking, boyo! It will surely help with the basic problem which is the heavy infestation of people on the planet! :-) bait On two conditions: 1) We drop some test bombs on Slough to satisfy Betjeman. 2) We strap both Xah and r to aforementioned bombs. /bait switch Also, I'm surprised no-one has mentioned Esperanto yet. Sounds like something r and Xah would *love*. Slightly off-topic - does anyone have a good recipe for getting thunderbird to kill whole threads for good? Either based on a rule or just some extension I can use? The Xah/r threads are like car crashes, I can't help but watch but my time could be better spent and I don't want to unsub the whole list. /switch Cheers, n -- http://mail.python.org/mailman/listinfo/python-list
Re: Need help with Python scoping rules
kj wrote: Needless to say, I'm pretty beat by this point. Any help would be appreciated. Thanks, Based on your statement above, and the fact that multiple people have now explained *exactly* why your attempt at recursion hasn't worked, it might be a good idea to step back, accept the advice and walk away instead of trying to convince people that the language forbids recursion and doesn't provide decent OO ecapsulation. Otherwise I'd wager you'll soon be appearing in multiple kill-files. n -- http://mail.python.org/mailman/listinfo/python-list
Re: zip codes
MRAB wrote: Sjoerd Mullender wrote: Martin P. Hellwig wrote: Shailen wrote: Is there any Python module that helps with US and foreign zip-code lookups? I'm thinking of something that provides basic mappings of zip to cities, city to zips, etc. Since this kind of information is so often used for basic user-registration, I'm assuming functionality of this sort must be available for Python. Any suggestions will be much appreciated. There might be an associated can of worms here, for example in the Netherlands zip codes are actually copyrighted and require a license if you want to do something with them, on the other hand you get a nice SQL formatted db to use it. I don't know how this works in other countries but I imagine that it is likely to be generally the same. Also in The Netherlands, ZIP codes are much more fine-grained than in some other countries: ZIP code plus house number together are sufficient to uniquely identify an address. I.e. you don't need the street name. E.g., my work address has ZIP code 1098 XG and house number 123, so together they indicate that I work at Science Park 123, Amsterdam. In other words, a simple city - ZIP mapping is not sufficient. The same comment applies to UK postcodes, which are also alphanumeric. My home postcode, for example, is shared with only 3 other houses, IIRC. Kind of off-topic...but nevertheless... Yes, the UK postcode database (PAF) can be bought from the Royal Mail for a fee. The data cannot be copyright, but the version they maintain and distribute is. As an aside, the PAF has finer grained information than simply the postal code, every letterbox in the UK has (or is meant to) a DPS (delivery point suffix), so that given a post code and DPS you can uniquely identify individual letterbox even when, for example, a house has been split into multiple flats. So, nastily, you *can* identify individual letterboxes, but the Royal Mail does not publicise the fact, so you cannot actually look at a post code on a letter and determine the letterbox it is intended for. Shame really. n -- http://mail.python.org/mailman/listinfo/python-list
Re: callable virtual method
Jean-Michel Pichavant wrote: Your solution will work, for sure. The problem is that it will dumb down the Base class interface, multiplying the number of methods by 2. This would not be an issue in many cases, in mine there's already too much meaningful methods in my class for me to add artificial ones. Thanks for the tip anyway. I suggest you reconsider. You asked a question and have been given a standard way of achieving the desired outcome. It's common in OO to use a Template pattern like this. If you're not interested in finding out how loads of people have already solved the problem then why ask? The methods that require overriding can be prefixed with an underscore so that people get a hint that they are an implementation detail rather than part of the public interface. I don't see your problem, other than a vague aesthetic unease. Regards, n -- http://mail.python.org/mailman/listinfo/python-list
Re: callable virtual method
Jean-Michel Pichavant wrote: Nigel Rantor wrote: Jean-Michel Pichavant wrote: Your solution will work, for sure. The problem is that it will dumb down the Base class interface, multiplying the number of methods by 2. This would not be an issue in many cases, in mine there's already too much meaningful methods in my class for me to add artificial ones. Thanks for the tip anyway. I suggest you reconsider. You asked a question and have been given a standard way of achieving the desired outcome. It's common in OO to use a Template pattern like this. If you're not interested in finding out how loads of people have already solved the problem then why ask? The methods that require overriding can be prefixed with an underscore so that people get a hint that they are an implementation detail rather than part of the public interface. I don't see your problem, other than a vague aesthetic unease. Regards, n I understand how refuting some obvious solution may look just stupid. You're right, I shouldn't have asked. I never said it seemed stupid. I was merely curious as to why you'd ask a question and ignore solutions. By the way I'd like to know if I am I alone to find that class Stream: def start def stop def reset is better than class Stream: def start def _start def stop def _stop def reset def _reset (try to figure out with 20+ methods) What you call aesthetic may sometimes fall into readability. Depends on what you mean by better. Do you mean pleasing to your eye or performs the task you want it to? Assuming you are taking the aesthetic viewpoint I think that in this case it will depend on how you set out your code. Realise that all of the underscore methods for your class are boilerplate, they simply raise an exception. They can all be at the end of the file, commented as an entire block to be left alone. Editing the main body of code is then fairly easy, and uncluttered... e.g. # # Stream class blah blah blah # class Stream: def start def stop def reset # # stubs to be over-ridden in sub-classes, add one for each # method that requires overriding. # def _start def _stop def _reset Regards, Nigel p.s. Please take this in the spirit it is offered. I'm trying to stop you from ignoring a good suggestion, not make you feel like a fool. -- http://mail.python.org/mailman/listinfo/python-list
Re: cross platform method Re: How to get the total size of a local hard disk?
Tim Harig wrote: warning font=small print This is a joke. Do not take it seriously. I do not actually suggest anybody use this method to measure the size of their drive. I do not take any responsibility for any damages incurred by using this method. I will laugh at you if you do. Offer not valid in AK, HI, Puero Rico, or U.S Virgin Ilands. /warning Like most jokes it's not really funny if you have to explain it. But I appreciate that you're worried that anyone who would actually follow the advice would also probably be rabidly litigious even if they were one of those rare-breed of living brain-donors. n -- http://mail.python.org/mailman/listinfo/python-list
Re: Connection tester
Sparky wrote: Hey! I am developing a small application that tests multiple websites and compares their response time. Some of these sites do not respond to a ping and, for the measurement to be standardized, all sites must have the same action preformed upon them. Another problem is that not all of the sites have the same page size and I am not interested in how long it takes to load a page but instead just how long it takes for the website to respond. Finally, I am looking to keep this script platform independent, if at all possible. Yes, lots of people block ICMP so you can't use it to reliably tell whether a machine is there or not. At least three possible solutions. 1) Perform a HEAD request against the document root. This is likely to be a static page and making it a HEAD request will make most responses take similar times. 2) Perform an OPTIONS request as specified in the RFC below for the * resource. This doesn't always work. 3) Perform a request you believe will fail so that you are provided with a 4XX error code, the only time this should take any appreciable time is when someone has cute server-generated error pages. HTTP/1.1 RFC - http://www.ietf.org/rfc/rfc2616.txt n -- http://mail.python.org/mailman/listinfo/python-list
Re: Winter Madness - Passing Python objects as Strings
Hendrik van Rooyen wrote: Nigel Rantor wi...@wiggly.org wrote: It just smells to me that you've created this elaborate and brittle hack to work around the fact that you couldn't think of any other way of getting the thread to change it's behaviour whilst waiting on input. I am beginning to think that you are a troll, as all your comments are haughty and disparaging, while you either take no trouble to follow, or are incapable of following, my explanations. In the event that this is not the case, please try to understand my reply to Skip, and then suggest a way that will perform better in my use case, out of your vast arsenal of better, quicker, more reliable, portable and comprehensible ways of doing it. Well, why not have a look at Gabriel's response. That seems like a much more portable way of doing it if nothing else. I'm not trolling, you just seem to be excited about something that sounds like a fundamentally bad idea. n -- http://mail.python.org/mailman/listinfo/python-list
Re: Winter Madness - Passing Python objects as Strings
Hendrik van Rooyen wrote: If you have any interest, contact me and I will send you the source. Maybe you could tell people what the point is... n -- http://mail.python.org/mailman/listinfo/python-list
Re: Winter Madness - Passing Python objects as Strings
Hendrik van Rooyen wrote: Nigel Rantor wi...@wiggly.org wrote: Hendrik van Rooyen wrote: If you have any interest, contact me and I will send you the source. Maybe you could tell people what the point is... Well its a long story, but you did ask... [snip] Maybe I should have said why should people care or why would someone use this or what problem does this solve Your explanation doesn't make a whole lot of sense to me, I'm sure it does to you. Why, for example, would someone use your system to pass objects between processes (I think this is the main thing you are providing?) rather than POSH or some other system? Regards, n -- http://mail.python.org/mailman/listinfo/python-list
Re: Winter Madness - Passing Python objects as Strings
Hendrik van Rooyen wrote: It is not something that would find common use - in fact, I have never, until I started struggling with my current problem, ever even considered the possibility of converting a pointer to a string and back to a pointer again, and I would be surprised if anybody else on this list has done so in the past, in a context other than debugging. Okay, well, I think that's probably because it sounds like a fairly good way of making things slower and hard to port to another interpreter. Obviously depending on how you're achieving this. If you need to pass infomation to a thread then I would suggest there's better, quicker, more reliable, portable and comprehensible ways of doing it. It just smells to me that you've created this elaborate and brittle hack to work around the fact that you couldn't think of any other way of getting the thread to change it's behaviour whilst waiting on input. Just my 0.02p n -- http://mail.python.org/mailman/listinfo/python-list
Re: binary file compare...
Adam Olsen wrote: On Apr 16, 11:15 am, SpreadTooThin bjobrie...@gmail.com wrote: And yes he is right CRCs hashing all have a probability of saying that the files are identical when in fact they are not. Here's the bottom line. It is either: A) Several hundred years of mathematics and cryptography are wrong. The birthday problem as described is incorrect, so a collision is far more likely than 42 trillion trillion to 1. You are simply the first person to have noticed it. B) Your software was buggy, or possibly the input was maliciously produced. Or, a really tiny chance that your particular files contained a pattern that provoked bad behaviour from MD5. Finding a specific limitation of the algorithm is one thing. Claiming that the math is fundamentally wrong is quite another. You are confusing yourself about probabilities young man. Just becasue something is extremely unlikely does not mean it can't happen on the first attempt. This is true *no matter how big the numbers are*. If you persist in making these ridiculous claims that people *cannot* have found collisions then as I said, that's up to you, but I'm not going to employ you to do anything except make tea. Thanks, Nigel -- http://mail.python.org/mailman/listinfo/python-list
Re: binary file compare...
Adam Olsen wrote: On Apr 16, 4:27 pm, Rhodri James rho...@wildebst.demon.co.uk wrote: On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen rha...@gmail.com wrote: On Apr 16, 3:16 am, Nigel Rantor wig...@wiggly.org wrote: Okay, before I tell you about the empirical, real-world evidence I have could you please accept that hashes collide and that no matter how many samples you use the probability of finding two files that do collide is small but not zero. I'm afraid you will need to back up your claims with real files. So that would be a no then. If the implementation of dicts in Python, say, were to assert as you are that the hashes aren't going to collide, then I'd have to walk away from it. There's no point in using something that guarantees a non-zero chance of corrupting your data. Python's hash is only 32 bits on a 32-bit box, so even 2**16 keys (or 65 thousand) will give you a decent chance of a collision. In contrast MD5 needs 2**64, and a *good* hash needs 2**128 (SHA-256) or 2**256 (SHA-512). The two are at totally different extremes. I'm just going to go ahead and take the above as an admission by you that the chance of collision is non-zero, and that if we accept that fact you cannot rely on a hash function to tell you if two files are identical. Thanks. There is *always* a non-zero chance of corruption, due to software bugs, hardware defects, or even operator error. It is only in that broader context that you can realize just how minuscule the risk is. Please explain why you're dragging the notion of corruption into this when it seems to be beside the point? Can you explain to me why you justify great lengths of paranoia, when the risk is so much lower? Because in the real world, where I work, in practical, real, factual terms I have seen it happen. Not once. Not twice. But many, many times. Why are you advocating a solution to the OP's problem that is more computationally expensive than a simple byte-by-byte comparison and doesn't guarantee to give the correct answer? For single, one-off comparison I have no problem with a byte-by-byte comparison. There's a decent chance the files won't be in the OS's cache anyway, so disk IO will be your bottleneck. Only if you're doing multiple comparisons is a hash database justified. Even then, if you expect matching files to be fairly rare I won't lose any sleep if you're paranoid and do a byte-by-byte comparison anyway. New vulnerabilities are found, and if you don't update promptly there is a small (but significant) chance of a malicious file leading to collision. If I have a number of files then I would certainly use a hash as a quick test, but if two files hash to the same value I still have to go compare them. Hashing in this case saves time doing a comparison for each file but it doesn't obviate the need to do a byte-by-byte comparison to see if the files that hash to the same value are actually the same. That's not my concern though. What I'm responding to is Nigel Rantor's grossly incorrect statements about probability. The chance of collision, in our life time, is *insignificant*. Please tell me which statements? That the chance of two files hashing to the same value is non-zero? You admit as much above. Also, please remember I gave you a way of verifying what I said, go crawl the web for images, pages, whatever, build a hash DB and tell me how long it takes you to find a collision using MD5 (which is the hash I was talking about when I told you I had real-world, experience to back up the theoretical claim that collisions occur. Regards, Nigel -- http://mail.python.org/mailman/listinfo/python-list
Re: binary file compare...
Adam Olsen wrote: On Apr 16, 3:16 am, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: On Apr 15, 12:56 pm, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing an incorrect result. Not when you're using them to compare lots of files. Trust me. Been there, done that, got the t-shirt. Using hash functions to tell whether or not files are identical is an error waiting to happen. But please, do so if it makes you feel happy, you'll just eventually get an incorrect result and not know it. Please tell us what hash you used and provide the two files that collided. MD5 If your hash is 256 bits, then you need around 2**128 files to produce a collision. This is known as a Birthday Attack. I seriously doubt you had that many files, which suggests something else went wrong. Okay, before I tell you about the empirical, real-world evidence I have could you please accept that hashes collide and that no matter how many samples you use the probability of finding two files that do collide is small but not zero. I'm afraid you will need to back up your claims with real files. Although MD5 is a smaller, older hash (128 bits, so you only need 2**64 files to find collisions), and it has substantial known vulnerabilities, the scenario you suggest where you *accidentally* find collisions (and you imply multiple collisions!) would be a rather significant finding. No. It wouldn't. It isn't. The files in question were millions of audio files. I no longer work at the company where I had access to them so I cannot give you examples, and even if I did Data Protection regulations wouldn't have allowed it. If you still don't beleive me you can easily verify what I'm saying by doing some simple experiemnts. Go spider the web for images, keep collecting them until you get an MD5 hash collision. It won't take long. Please help us all by justifying your claim. Now, please go and re-read my request first and admit that everything I have said so far is correct. Mind you, since you use MD5 I wouldn't be surprised if your files were maliciously produced. As I said before, you need to consider upgrading your hash every few years to avoid new attacks. Good grief, this is nothing to do with security concerns, this is about someone suggesting to the OP that they use a hash function to determine whether or not two files are identical. Regards, Nige -- http://mail.python.org/mailman/listinfo/python-list
Re: binary file compare...
Adam Olsen wrote: On Apr 15, 12:56 pm, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing an incorrect result. Not when you're using them to compare lots of files. Trust me. Been there, done that, got the t-shirt. Using hash functions to tell whether or not files are identical is an error waiting to happen. But please, do so if it makes you feel happy, you'll just eventually get an incorrect result and not know it. Please tell us what hash you used and provide the two files that collided. MD5 If your hash is 256 bits, then you need around 2**128 files to produce a collision. This is known as a Birthday Attack. I seriously doubt you had that many files, which suggests something else went wrong. Okay, before I tell you about the empirical, real-world evidence I have could you please accept that hashes collide and that no matter how many samples you use the probability of finding two files that do collide is small but not zero. Which is the only thing I've been saying. Yes, it's unlikely. Yes, it's possible. Yes, it happens in practice. If you are of the opinion though that a hash function can be used to tell you whether or not two files are identical then you are wrong. It really is that simple. I'm not sitting here discussing this for my health, I'm just trying to give the OP the benefit of my experience, I have worked with other people who insisted on this route and had to find out the hard way that it was a Bad Idea (tm). They just wouldn't be told. Regards, Nige -- http://mail.python.org/mailman/listinfo/python-list
Re: binary file compare...
Martin wrote: On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: The checksum does look at every byte in each file. Checksumming isn't a way to avoid looking at each byte of the two files, it is a way of mapping all the bytes to a single number. My understanding of the original question was a way to determine wether 2 files are equal or not. Creating a checksum of 1-n files and comparing those checksums IMHO is a valid way to do that. I know it's a (one way) mapping between a (possibly) longer byte sequence and another one, how does checksumming not take each byte in the original sequence into account. The fact that two md5 hashes are equal does not mean that the sources they were generated from are equal. To do that you must still perform a byte-by-byte comparison which is much less work for the processor than generating an md5 or sha hash. If you insist on using a hashing algorithm to determine the equivalence of two files you will eventually realise that it is a flawed plan because you will eventually find two files with different contents that nonetheless hash to the same value. The more files you test with the quicker you will find out this basic truth. This is not complex, it's a simple fact about how hashing algorithms work. n -- http://mail.python.org/mailman/listinfo/python-list
Re: binary file compare...
Grant Edwards wrote: We all rail against premature optimization, but using a checksum instead of a direct comparison is premature unoptimization. ;) And more than that, will provide false positives for some inputs. So, basically it's a worse-than-useless approach for determining if two files are the same. n -- http://mail.python.org/mailman/listinfo/python-list
Re: binary file compare...
Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing an incorrect result. Not when you're using them to compare lots of files. Trust me. Been there, done that, got the t-shirt. Using hash functions to tell whether or not files are identical is an error waiting to happen. But please, do so if it makes you feel happy, you'll just eventually get an incorrect result and not know it. n -- http://mail.python.org/mailman/listinfo/python-list
Re: Ordered Sets
Aahz wrote: In article 9a5d59e1-2798-4864-a938-9b39792c5...@s9g2000prg.googlegroups.com, Raymond Hettinger pyt...@rcn.com wrote: Here's a new, fun recipe for you guys: http://code.activestate.com/recipes/576694/ That is *sick* and perverted. I'm not sure why. Would it be less sick if it had been called UniqueList ? n -- http://mail.python.org/mailman/listinfo/python-list
Re: file locking...
bruce wrote: Hi. Got a bit of a question/issue that I'm trying to resolve. I'm asking this of a few groups so bear with me. I'm considering a situation where I have multiple processes running, and each process is going to access a number of files in a dir. Each process accesses a unique group of files, and then writes the group of files to another dir. I can easily handle this by using a form of locking, where I have the processes lock/read a file and only access the group of files in the dir based on the open/free status of the lockfile. However, the issue with the approach is that it's somewhat synchronous. I'm looking for something that might be more asynchronous/parallel, in that I'd like to have multiple processes each access a unique group of files from the given dir as fast as possible. I don't see how this is synchronous if you have a lock per file. Perhaps you've missed something out of your description of your problem. So.. Any thoughts/pointers/comments would be greatly appreciated. Any pointers to academic research, etc.. would be useful. I'm not sure you need academic papers here. One trivial solution to this problem is to have a single process determine the complete set of files that require processing then fork off children, each with a different set of files to process. The parent then just waits for them to finish and does any post-processing required. A more concrete problem statement may of course change the solution... n -- http://mail.python.org/mailman/listinfo/python-list
Re: file locking...
koranthala wrote: On Mar 1, 2:28 pm, Nigel Rantor wig...@wiggly.org wrote: bruce wrote: Hi. Got a bit of a question/issue that I'm trying to resolve. I'm asking this of a few groups so bear with me. I'm considering a situation where I have multiple processes running, and each process is going to access a number of files in a dir. Each process accesses a unique group of files, and then writes the group of files to another dir. I can easily handle this by using a form of locking, where I have the processes lock/read a file and only access the group of files in the dir based on the open/free status of the lockfile. However, the issue with the approach is that it's somewhat synchronous. I'm looking for something that might be more asynchronous/parallel, in that I'd like to have multiple processes each access a unique group of files from the given dir as fast as possible. I don't see how this is synchronous if you have a lock per file. Perhaps you've missed something out of your description of your problem. So.. Any thoughts/pointers/comments would be greatly appreciated. Any pointers to academic research, etc.. would be useful. I'm not sure you need academic papers here. One trivial solution to this problem is to have a single process determine the complete set of files that require processing then fork off children, each with a different set of files to process. The parent then just waits for them to finish and does any post-processing required. A more concrete problem statement may of course change the solution... n Using twisted might also be helpful. Then you can avoid the problems associated with threading too. No one mentioned threads. I can't see how Twisted in this instance isn't like using a sledgehammer to crack a nut. n -- http://mail.python.org/mailman/listinfo/python-list
Re: file locking...
Hi Bruce, Excuse me if I'm a little blunt below. I'm ill grumpy... bruce wrote: hi nigel... using any kind of file locking process requires that i essentially have a gatekeeper, allowing a single process to enter, access the files at a time... I don't beleive this is a necessary condition. That would only be the case if you allowed yourself a single lock. i can easily setup a file read/write lock process where a client app gets/locks a file, and then copies/moves the required files from the initial dir to a tmp dir. after the move/copy, the lock is released, and the client can go ahead and do whatever with the files in the tmp dir.. thie process allows multiple clients to operate in a psuedo parallel manner... i'm trying to figure out if there's a much better/faster approach that might be available.. which is where the academic/research issue was raised.. I'm really not sure why you want to move the files around. Here are two different approaches from the one I initially gave you that deals perfectly well with a directory where files are constantly being added. In both approaches we are going to try and avoid using OS-specific locking mechanisms, advisory locking, flock etc. So it should work everywhere as long as you also have write access to the filesystem you're on. Approach 1 - Constant Number of Processes This requires no central manager but for every file lock requires a few OS calls. Start up N processes with the same working directory WORK_DIR. Each process then follows this algorithm: - sleep for some small random period. - scan the WORK_DIR for a FILE that does not have a corresponding LOCK_FILE - open LOCK_FILE in append mode and write our PID into it. - close LOCK_FILE - open LOCK_FILE - read first line from LOCK_FILE and compare to our PID - if the PID we just read from the LOCK_FILE matches ours then we may process the corresponding FILE otherwise another process beat us to it. - repeat After processing a file completely you can remove it and the lockfile at the same time. As long as filenames follow some pattern then you can simply say that the LOCK_FILE for FILE is called FILE.lock e.g. WORK_DIR : /home/wiggly/var/work FILE : /home/wiggly/var/work/data_2354272.dat LOCK_FILE : /home/wiggly/var/work/data_2354272.dat.lock Approach 2 - Managed Processes Here we have a single main process that spawns children. The children listen for filenames on a pipe that the parent has open to them. The parent constantly scans the WORK_DIR for new files to process and as it finds one it sends that filename to a child process. You can either be clever about the children and ensure they tell the parent when they're free or just pass them work in a round-robin fashion. I hope the two above descriptions make sense, let me know if they don't. n the issue that i'm looking at is analogous to a FIFO, where i have lots of files being shoved in a dir from different processes.. on the other end, i want to allow mutiple client processes to access unique groups of these files as fast as possible.. access being fetch/gather/process/delete the files. each file is only handled by a single client process. thanks.. -Original Message- From: python-list-bounces+bedouglas=earthlink@python.org [mailto:python-list-bounces+bedouglas=earthlink@python.org]on Behalf Of Nigel Rantor Sent: Sunday, March 01, 2009 2:00 AM To: koranthala Cc: python-list@python.org Subject: Re: file locking... koranthala wrote: On Mar 1, 2:28 pm, Nigel Rantor wig...@wiggly.org wrote: bruce wrote: Hi. Got a bit of a question/issue that I'm trying to resolve. I'm asking this of a few groups so bear with me. I'm considering a situation where I have multiple processes running, and each process is going to access a number of files in a dir. Each process accesses a unique group of files, and then writes the group of files to another dir. I can easily handle this by using a form of locking, where I have the processes lock/read a file and only access the group of files in the dir based on the open/free status of the lockfile. However, the issue with the approach is that it's somewhat synchronous. I'm looking for something that might be more asynchronous/parallel, in that I'd like to have multiple processes each access a unique group of files from the given dir as fast as possible. I don't see how this is synchronous if you have a lock per file. Perhaps you've missed something out of your description of your problem. So.. Any thoughts/pointers/comments would be greatly appreciated. Any pointers to academic research, etc.. would be useful. I'm not sure you need academic papers here. One trivial solution to this problem is to have a single process determine the complete set of files that require processing then fork off children, each with a different set of files to process. The parent then just waits for them to finish and does any post-processing required
Re: file locking...
zugnush wrote: You could do something like this so that every process will know if the file belongs to it without prior coordination, it means a lot of redundant hashing though. In [36]: import md5 In [37]: pool = 11 In [38]: process = 5 In [39]: [f for f in glob.glob('*') if int(md5.md5(f).hexdigest(),16) % pool == process ] Out[39]: You're also relying on the hashing being perfectly distributed, otherwise some processes aren't going to be performing useful work even though there is useful work to perform. In other words, why would you rely on a scheme that limits some processes to certain parts of the data? If we're already talking about trying to get away without some global lock for synchronisation this seems to go against the original intent of the problem... n -- http://mail.python.org/mailman/listinfo/python-list
Re: code challenge: generate minimal expressions using only digits 1,2,3
Trip Technician wrote: anyone interested in looking at the following problem. if you can give me a good reason why this is not homework I'd love to hear it...I just don't see how this is a real problem. we are trying to express numbers as minimal expressions using only the digits one two and three, with conventional arithmetic. so for instance 33 = 2^(3+2)+1 = 3^3+(3*2) are both minimal, using 4 digits but 33 = ((3+2)*2+1)*3 using 5 is not. I have tried coding a function to return the minimal representation for any integer, but haven't cracked it so far. The naive first attempt is to generate lots of random strings, eval() them and sort by size and value. this is inelegant and slow. Wow. Okay, what other ways have you tried so far? Or are you beating your head against the search the entire problem space solution still? This problem smells a lot like factorisation, so I would think of it in terms of wanting to reduce the target number using as few operations as possible. If you allow exponentiation that's going to be your biggest hitter so you know that the best you can do using 2 digits is n^n where n is the largest digit you allow yourself. Are you going to allow things like n^n^n or not? n -- http://mail.python.org/mailman/listinfo/python-list
Re: code challenge: generate minimal expressions using only digits 1,2,3
Luke Dunn wrote: yes power towers are allowed right, okay, without coding it here's my thought. factorise the numbers you have but only allowing primes that exist in your digit set. then take that factorisation and turn any repeated runs of digits multiplied by themselves into power-towers any remainder can then be created in other ways, starting with a way other than exponentiation that is able to create the largest number, i.e. multiplication, then addition... I've not got time to put it into code right now but it shouldn't be too hard... e.g. digits : 3, 2, 1 n : 10 10 = 2*5 - but we don't have 5... 10 = 3*3 + 1 10 = 3^2+1 3 digits n : 27 27 = 3*3*3 27 = 3^3 2 digits n : 33 33 = 3*3*3 + 6 33 = 3*3*3 + 3*2 33 = 3^3+3*2 4 digits exponentiation, multiplication, division, addition and subtraction. Brackets when necessary but length is sorted on number of digits not number of operators plus digits. I always try my homework myself first. in 38 years of life I've learned only to do what i want, if I wanted everyone else to do my work for me I'd be a management consultant ! On Fri, Feb 20, 2009 at 3:52 PM, Luke Dunn luke.d...@gmail.com mailto:luke.d...@gmail.com wrote: I am teaching myself coding. No university or school, so i guess its homework if you like. i am interested in algorithms generally, after doing some of Project Euler. Of course my own learning process is best served by just getting on with it but sometimes you will do that while other times you might just choose to ask for help. if no one suggests then i will probably shelve it and come back to it myself when I'm fresh. no it's not a real world problem but my grounding is in math so i like pure stuff anyway. don't see how that is a problem, as a math person i accept the validity of pure research conducted just for curiosity and aesthetic satisfaction. it often finds an application later anyway Thanks for your helpful suggestion of trying other methods and i will do that in time. my motive was to share an interesting problem because a human of moderate math education can sit down with this and find minimal solutions easily but the intuition they use is quite subtle, hence the idea of converting the human heuristic into an algorithm became of interest, and particularly a recursive one. i find that the development of a piece of recursion usually comes as an 'aha', and since i hadn't had such a moment, i thought i'd turn the problem loose on the public. also i found no online reference to this problem so it seemed ripe for sharing. On Fri, Feb 20, 2009 at 3:39 PM, Nigel Rantor wig...@wiggly.org mailto:wig...@wiggly.org wrote: Trip Technician wrote: anyone interested in looking at the following problem. if you can give me a good reason why this is not homework I'd love to hear it...I just don't see how this is a real problem. we are trying to express numbers as minimal expressions using only the digits one two and three, with conventional arithmetic. so for instance 33 = 2^(3+2)+1 = 3^3+(3*2) are both minimal, using 4 digits but 33 = ((3+2)*2+1)*3 using 5 is not. I have tried coding a function to return the minimal representation for any integer, but haven't cracked it so far. The naive first attempt is to generate lots of random strings, eval() them and sort by size and value. this is inelegant and slow. Wow. Okay, what other ways have you tried so far? Or are you beating your head against the search the entire problem space solution still? This problem smells a lot like factorisation, so I would think of it in terms of wanting to reduce the target number using as few operations as possible. If you allow exponentiation that's going to be your biggest hitter so you know that the best you can do using 2 digits is n^n where n is the largest digit you allow yourself. Are you going to allow things like n^n^n or not? n -- http://mail.python.org/mailman/listinfo/python-list
Re: code challenge: generate minimal expressions using only digits 1,2,3
Trip Technician wrote: yes n^n^n would be fine. agree it is connected to factorisation. building a tree of possible expressions is my next angle. I think building trees of the possible expressions as a couple of other people have suggested is simply a more structured way of doing what you're currnetly doing. Right now you're throwing darts at the problem space, and hoping that the next one point you hit will be a more optimal solution. If you enumerate all the expression trees you are just ensuring you don't miss any solutions. I think the algorithm/hueristic I just posted should get you to the answer quicker though... n -- http://mail.python.org/mailman/listinfo/python-list
Re: To Troll or Not To Troll (aka: as keyword woes)
James Stroud wrote: Andreas Waldenburger wrote: Is it me, or has c.l.p. developed a slightly harsher tone recently? (Haven't been following for a while.) Yep. I can only post here for about a week or two until someone blows a cylinder and gets ugly because they interpreted something I said as a criticism of the language and took it personally by extension. Then I have to take a 4 month break because I'm VERY prone to reciprocating--nastily. I think its a symptom of the language's maturing, getting popular, and a minority fraction* of the language's most devout advocates developing an egotism that complements their python worship in a most unsavory way. I wish they would instead spend their energy volunteering to moderate this list and culling out some of the spam. *No names were mentioned in the making of this post. I joined this list some time ago, I am not a regular python user. I have maintained my list subscription because when I'm bored the flames here are very entertaining. I don't think I need to mention specifics really. Oh, and the weekly thread about immutable default arguments is a cracker...more please. n -- http://mail.python.org/mailman/listinfo/python-list
Re: Exhaustive Unit Testing
Roy Smith wrote: There's a well known theory in studies of the human brain which says people are capable of processing about 7 +/- 2 pieces of information at once. It's not about processing multiple taks, it's about the amount of things that can be held in working memory. n -- http://mail.python.org/mailman/listinfo/python-list
Re: How to best explain a subtle difference between Python and Perl ?
Jonathan Gardner wrote: [...eloquent and interesting discussion of variable system snipped...] Is Python's variable system better than perl's? It depends on which way you prefer. As for me, being a long-time veteran of perl and Python, I don't think having a complicated variable system such as perl's adds anything to the language. Python's simplicity in this regard is not only sufficient, but preferable. Very well put. I am not however sure I agree with your very final thought. I ma a long time C/C++/Java/Perl developer. I know some python too. The Python system is the same as the Java system, apart from Java's primitive types, which is a completely different discussion that I really don't want to get into right now. So, everything is by reference. I understand, and agree that a simple system is good. And maybe even preferable. But it isn't always sufficient. Some algorithms are much easier to write if you know that your parameters are going to be copied and that the function may use them as local variables without having to explicitly create copies. You can also reason more easily about what side-effects the function could have if you know it cannot possibly modify your parameters. Other systems out there require pointer-like semantics (for example CORBA out and inout parameters) which have to be kludged in languages like Java to pass in wrapper objects/boxes that can be assigned values. Whilst it may be easier to learn a system like python/java, in the end the amount of time spent learning the system is normally dwarfed by the time spent using the system to build software. I would rather have a type system that is as expressive as possible. Also, note that it is entirely possible to create many, many, many interesting and useful things in Perl without having to resort to references. They are a relatively new feature after all. Just my 0.02p n -- http://mail.python.org/mailman/listinfo/python-list
Re: You advice please
Calvin Spealman wrote: Ruby (on Rails) people love to talk about Ruby (on Rails). Python people are too busy getting things done to talk as loudly. Have you read this list? I would suggest your comment indicates not. Throwaway comments like yours that are pithy, emotional and devoid of any factual content are just the kind of thing that makes lists such as this less useful than they could be. You are acting as a source of noise, not signal. I'm sure you don't want to be considered in that manner, so perhaps you should think about adding something to the conversation instead. Before you reply please think about what you plan on saying, you'll be helping not only me but yourself and anyone who reads your post. n -- http://mail.python.org/mailman/listinfo/python-list
Re: You advice please
Fredrik Lundh wrote: Nigel Rantor wrote: Throwaway comments like yours that are pithy, emotional and devoid of any factual content are just the kind of thing that makes lists such as this less useful than they could be. Oh, please. It's a fact that Python advocacy is a lot more low-key than the advocacy of certain potentially competing technologies. It's always been that way. Too many Europeans involved, most likely. Your opinion. We simply disagree on this point. I'm not sure what the comment about Europeans even means though. Have you read this list? I would suggest your comment indicates not. This list is a Python forum. Calvin (who's a long time contributor to this forum, which you would have known if you'd actually followed the list for some time) was talking about the real world. I did not mean in a how long have you been here way. I apologise. I meant in a have you not seen how much traffic, including rabid fanboys, this list gets? You're right, I should have been much clearer on that point. n -- http://mail.python.org/mailman/listinfo/python-list
Re: You advice please
Calvin Spealman wrote: God forbid I try to make a joke. Ah, sorry, sense of humour failure for me today obviously. n -- http://mail.python.org/mailman/listinfo/python-list
Re: How to best explain a subtle difference between Python and Perl ?
Palindrom wrote: ### Python ### liste = [1,2,3] def foo( my_list ): my_list = [] The above points the my_list reference at a different object. In this case a newly created list. It does not modify the liste object, it points my_list to a completely different object. ### Perl ### @lst =(1,2,3); $liste [EMAIL PROTECTED]; foo($liste); print @lst\n; sub foo { my($my_list)[EMAIL PROTECTED]; @{$my_list}=() } The above code *de-references* $my_list and assigns an empty list to its referant (@lst). The two code examples are not equivalent. An equivalent perl example would be as follows: ### Perl ### @lst =(1,2,3); $liste [EMAIL PROTECTED]; foo($liste); print @lst\n; sub foo { my($my_list)[EMAIL PROTECTED]; $my_list = []; } The above code does just what the python code does. It assigns a newly created list object to the $my_list reference. Any changes to this now have no effect on @lst because $my_list no longer points there. n -- http://mail.python.org/mailman/listinfo/python-list
Re: Terminate a python script from linux shell / bash script
Gros Bedo wrote: Thank you guys for your help. My problem is that I project to use this command to terminate a script when uninstalling the software, so I can't store the PID. This command will be integrated in the spec file of the RPM package. Here's the script I'll use, it may help someone else: #!/bin/sh # PYTHON SCRIPT PROCESS KILLER by GBO v0.1 # This script will look for all the lines containing $SOFTWARENAME in the process list, and close them SOFTWARENAME='yoursoftware' #This is case insensitive JOBPRESENT=$(ps -ef | grep -i $SOFTWARENAME | grep -v grep) echo $JOBPRESENT ps -ef | grep -i $SOFTWARENAME | grep -v grep | awk '{print $2}' | xargs kill If you have a long running process you wish to be able to kill at a later date the normal way of doing it would be for the script itself to write it's own PID to a file that you can then inspect from a different process and use to kill it. So, my_daemon.py might shove its PID into /var/run/my_daemon.pid And later my_daemon_killer.py (or indeed my_daemon_killer.sh) would read the PID out of /var/run/my_daemon.pid and pass that to a kill command. Using ps/grep in the way you're trying to do is always going to be inexact and people will not thank you for killing processes they wanted running. n -- http://mail.python.org/mailman/listinfo/python-list
Re: new style class
gert wrote: Could not one of you just say @staticmethod for once damnit :) why were you asking if you knew the answer? yeesh -- http://mail.python.org/mailman/listinfo/python-list
Re: new style class
gert wrote: On Nov 2, 12:27 pm, Boris Borcic [EMAIL PROTECTED] wrote: gert wrote: class Test(object): def execute(self,v): return v def escape(v): return v if __name__ == '__main__': gert = Test() print gert.m1('1') print Test.m2('2') Why doesn't this new style class work in python 2.5.1 ? why should it ? I don't know I thought it was supported from 2.2? I think what Boris was being exceedingly unhelpful in saying was why should it work when you're calling methods that do not exist I don't see 'm1' or 'm2' defined for the class 'Test'. n -- http://mail.python.org/mailman/listinfo/python-list
Re: modules and generated code
J. Clifford Dyer wrote: Maybe I'm missing something obvious, but it sounds like you are over-complicating the idea of inheritance. Do you just want to create a subclass of the other class? Nope, that isn't my problem. I have an IDL file that is used to generate a set of stub and skeleton code that is not human-modifiable. Eventually I would like to have my IDL in source control and have a setup script able to generate my stubs and skels and install them for me. At the same time I want to produce code that uses this code but in the same package. In Java or Perl I can easily create a couple package/module like this: package org.mine.package; [...class definitions...] and then somewhere else package org.mine.otherpackage; [...class definitions...] These can be compiled into separate Jar files and just work. Since the python is the final target though I don't want to put it all in one directory because then I need to be clever when I regenerate the generated code, I don't want old python modules lying around that are no longer in the IDL. Blowing the entire generated directory away is the best way of doing this, so I don't want my implementation code in there. Basically, I want the same top-level package to have bits of code in different directories, but because Python requires the __init__.py file it only picks up the first one in PYTHONPATH. I'm not sure if that makes sense, my brain is already toast from meetings today. n -- http://mail.python.org/mailman/listinfo/python-list
Re: modules and generated code
Peter Otten wrote: Nigel Rantor wrote: So, if I have a tool that generates python code for me (in my case, CORBA stubs/skels) in a particular package is there a way of placing my own code under the same package hierarchy without all the code living in the same directory structure. http://docs.python.org/lib/module-pkgutil.html Ooh, thanks for that. Yep, looks like that should work, but it doesn't. :-/ Do you have any idea whether other __init__.py scripts from the same logical module will still be run in this case? The generated code uses its init script to pull in other code. Off, to tinker some more with this. n -- http://mail.python.org/mailman/listinfo/python-list
modules and generated code
Hi all, Python newbie here with what I hope is a blindingly obvious question that I simply can't find the answer for in the documentation. So, if I have a tool that generates python code for me (in my case, CORBA stubs/skels) in a particular package is there a way of placing my own code under the same package hierarchy without all the code living in the same directory structure. Ideally I would like something like the following: package_dir/ top_level_package/ generated_code_package/ implementation_code_package/ but have two distinct directories that hold them so that I can simply delete the generated code and regenerate it without worrying that anything got left behind. So, I want something like: generated_package_dir/ top_level_package/ generated_code_package/ implementation_package_dir/ top_level_package/ implementation_code_package/ Whilst I can create this structure, and add 'generated_package_dir' and 'implementation_package_dir' to PYTHONPATH the fact that both directories contain 'top_level_package' seems to be causing clashes, perhaps because there are multiple __init__.py files for 'top_level_package'? I know that this is possible in Java, Perl and C++ so I am finding it hard to believe I can't do the same in Python, I just think I'm too new to know how. I have spent most of this morning searching through all the docs I can find, searching on USENET and the web to no avail. Any help or pointers greatly appreciated. Regards, n -- http://mail.python.org/mailman/listinfo/python-list
Re: modules and generated code
Peter Otten wrote: Nigel Rantor wrote: Peter Otten wrote: Nigel Rantor wrote: So, if I have a tool that generates python code for me (in my case, CORBA stubs/skels) in a particular package is there a way of placing my own code under the same package hierarchy without all the code living in the same directory structure. http://docs.python.org/lib/module-pkgutil.html Yep, looks like that should work, but it doesn't. :-/ Do you have any idea whether other __init__.py scripts from the same logical module will still be run in this case? I don't think it will. Yeah, I am getting that impression. Gah! The generated code uses its init script to pull in other code. You could invoke it explicitly via execfile(/path/to/generated/package/__init__.py) in the static package/__init__.py. Hmm, yes, that works. It's not pretty though, it seems to be finding the file relative to the current directory, I suppose writing a bit of code that figures out where this package is located and modifying it won't be too hard. And, at the risk of being flamed or sounding like a troll, this seems like something that should be easy to do...other languages manage it quite neatly. Up until this point I was really liking my exposure to Python :-/ I wonder if there is any more magic that I'm missing, the thing is the pkgutil method looks exactly like what I want, except for not executing the additional __init__.py files in the added directories. Thanks for the help so far Peter, if anyone has a prettier solution then I'm all ears. Cheers, n -- http://mail.python.org/mailman/listinfo/python-list