On Thu, 17 Jul 2008, Monika Jisswel wrote: > I see no problem, if you open very BIG files then your memory will get > filled up & your system will halt,
I'm going to disagree with you on this one. First, in general, it is not the case that opening a very large file will cause memory to be filled up. In general, only the portion of the file that is being processed needs to actually be in memory. A tarfile appears to be an exception to that general rule. The question is, why? Can that exception be avoided and a program that's processing a tar file be as well-behaved in terms of resource consumption as a program that processes other types of files? Second, although most resource constraint problems can often be addressed by buying more resource, that's not usually a good approach. Where, as here, the resource constraint surfaces only in rare cases (i.e., processing a tarfile), the better approach is to find out if something is out of whack with respect to that one case. Simply adding resources is a poor use of, um, resources, for a couple reasons. I'd rather spend my money on a nice dinner than on adding memory to a computer system that is perfectly adequate in every other way, other than in processing a single file. And adding resources, whether memory, disk, or CPU, is a band-aid: it gets you over this hump, but not the next. If I add enough memory to process a 4-Gb file, great, I can now process a 4-Gb file. But if I get a 6-Gb file in a couple months, I have to pull out the checkbook again. But managing the resource utilization is a scalable scheme. > can you buy more food than your fridge can handle , and write to a list > asking to invistigate the problem ? This is such an off-the wall analogy that it's difficult to respond to, but what the heck. First, I'm not writing to the list to ask it to investigate the problem. I'm writing to the list to find out what tools are available so that *I* can investigate the problem. Second, under this analogy, you're looking at a scenario where food is put into a refrigerator in containers, and when consumed, the containers are left in the refrigerator. Now, your solution here might be to keep buying more or larger refrigerators. Mine would be to see if I can get rid of all the empty containers that are uselessly occupying space in the refrigerator, so I can utilize the space for useful purposes (refrigerating food) rather than chilling empty containers for no reason. Back to the real Python case: now that I can monitor my memory usage, I can try various strategies to solve the problem, and I can do it with a subset of data. Instead of running the program on a 4Gb file and waiting to see if it blows up or halts my system in 15 minutes after processing a couple gig, I can run it with a much smaller 60 Mb file, and track its effects. For anyone who cares about the real issue: it seems that tarfile.py caches every member it processes in an internal list. The list isn't actually used if accessing the file as an iterator, so by reinitializing it to [], the memory consumption problem is avoided. This breaks other methods of the module, which are used to extract particular desired members, but in my case, that's okay. I'm currently experimenting to see if I can come up with a patch that will either allow both means of accessing the members (as an iterator and directly), or, as a less desirable alternative, if a parameter like cache=False is specified, allow access as an iterator and raise an exception if the other methods are used. Thanks to a couple tarfile.py experts on comp.lang.python for their insight on this. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor