Re: Object cleanup

2012-05-31 Thread psaff...@googlemail.com
Thanks for all the responses. It looks like none of the BeautifulSoup objects have __del__ methods, so I don't think that can be the problem. To answer your other question, guppy was the best match I came up with when looking for a memory profile for Python (or more specifically Heapy):

Object cleanup

2012-05-30 Thread psaff...@googlemail.com
I am writing a screen scraping application using BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/ (which is fantastic, by the way). I have an object that has two methods, each of which loads an HTML document and scrapes out some information, putting strings from the HTML documents

Overlapping region resolution

2009-05-21 Thread psaff...@googlemail.com
This may be an algorithmic question, but I'm trying to code it in Python, so... I have a list of pairwise regions, each with an integer start and end and a float data point. There may be overlaps between the regions. I want to resolve this into an ordered list with no overlapping regions. My

Multiprocessing Pool and functions with many arguments

2009-04-29 Thread psaff...@googlemail.com
I'm trying to get to grips with the multiprocessing module, having only used ParallelPython before. based on this example: http://docs.python.org/library/multiprocessing.html#using-a-pool-of-workers what happens if I want my f to take more than one argument? I want to have a list of tuples of

Re: CSV performance

2009-04-29 Thread psaff...@googlemail.com
rows = fh.read().split() coords = numpy.array(map(int, rows[1::3]), dtype=int) points = numpy.array(map(float, rows[2::3]), dtype=float) chromio.writelines(map(chrommap.__getitem__, rows[::3])) My original version is about 15 seconds. This version is about 9. The chunks version posted by

CSV performance

2009-04-27 Thread psaff...@googlemail.com
I'm using the CSV library to process a large amount of data - 28 files, each of 130MB. Just reading in the data from one file and filing it into very simple data structures (numpy arrays and a cstringio) takes around 10 seconds. If I just slurp one file into a string, it only takes about a second,

Re: CSV performance

2009-04-27 Thread psaff...@googlemail.com
Thanks for your replies. Many apologies for not including the right information first time around. More information is below. I have tried running it just on the csv read: import time import csv afile = largefile.txt t0 = time.clock() print working at file, afile reader =

mod_python form upload: permission denied sometimes...

2009-04-24 Thread psaff...@googlemail.com
I have a mod_python application that takes a POST file upload from a form. It works fine from my machine, other machines in my office and my home machine. It does not work from my bosses machine in a different city - he gets You don't have permission to access this on this server. In the logs,

Re: Memory efficient tuple storage

2009-03-19 Thread psaff...@googlemail.com
In the end, I used a cStringIO object to store the chromosomes - because there are only 23, I can use one character for each chromosome and represent the whole lot with a giant string and a dictionary to say what each character means. Then I used numpy arrays for the data and coordinates. This

Parallel processing on shared data structures

2009-03-19 Thread psaff...@googlemail.com
I'm filing 160 million data points into a set of bins based on their position. At the moment, this takes just over an hour using interval trees. I would like to parallelise this to take advantage of my quad core machine. I have some experience of Parallel Python, but PP seems to only really work

Memory efficient tuple storage

2009-03-13 Thread psaff...@googlemail.com
I'm reading in some rather large files (28 files each of 130MB). Each file is a genome coordinate (chromosome (string) and position (int)) and a data point (float). I want to read these into a list of coordinates (each a tuple of (chromosome, position)) and a list of data points. This has taught

Re: Memory efficient tuple storage

2009-03-13 Thread psaff...@googlemail.com
Thanks for all the replies. First of all, can anybody recommend a good way to show memory usage? I tried heapy, but couldn't make much sense of the output and it didn't seem to change too much for different usages. Maybe I was just making the h.heap() call in the wrong place. I also tried

Which core am I running on?

2009-02-09 Thread psaff...@googlemail.com
Is there some way I can get at this information at run-time? I'd like to use it to tag diagnostic output dumped during runs using Parallel Python. Peter -- http://mail.python.org/mailman/listinfo/python-list

Re: Which core am I running on?

2009-02-09 Thread psaff...@googlemail.com
On 9 Feb, 12:24, Gerhard Häring g...@ghaering.de wrote: Looks like I have answered a similar question once, btw. ;-) Ah, yes - thanks. I did Google for it, but obviously didn't have the right search term. Cheers, Peter -- http://mail.python.org/mailman/listinfo/python-list

Too many open files

2009-02-09 Thread psaff...@googlemail.com
I'm building a pipeline involving a number of shell tools. In each case, I create a temporary file using tempfile.mkstmp() and invoke a command (cmd /tmp/tmpfile) on it using subprocess.Popen. At the end of each section, I call close() on the file handles and use os.remove() to delete them. Even

Re: Which core am I running on?

2009-02-09 Thread psaff...@googlemail.com
On 9 Feb, 12:24, Gerhard Häring g...@ghaering.de wrote: http://objectmix.com/python/631346-parallel-python.html Hmm. In fact, this doesn't seem to work for pp. When I run the code below, it says everything is running on the one core. import pp import random import time from string import

Re: mod_python: delay in files changing after alteration

2009-01-12 Thread psaff...@googlemail.com
On 6 Jan, 23:31, Graham Dumpleton graham.dumple...@gmail.com wrote: Thus, any changes to modules/packages installed on sys.path require a full restart of Apache to ensure they are loaded by all Apache child worker processes. That will be it. I'm pulling in some libraries of my own from

subprocess.Popen stalls

2009-01-12 Thread psaff...@googlemail.com
I'm building a bioinformatics application using the ipcress tool: http://www.ebi.ac.uk/~guy/exonerate/ipcress.man.html I'm using subprocess.Popen to execute ipcress, which takes a group of files full of DNA sequences and returns some analysis on them. Here's a code fragment: cmd =

Re: subprocess.Popen stalls

2009-01-12 Thread psaff...@googlemail.com
On 12 Jan, 15:33, mk mrk...@gmail.com wrote: Better use communicate() method: Oh yes - it's right there in the documentation. That worked perfectly. Many thanks, Peter -- http://mail.python.org/mailman/listinfo/python-list

mod_python: delay in files changing after alteration

2009-01-05 Thread psaff...@googlemail.com
Maybe this is an apache question, in which case apologies. I am running mod_python 3.3.1-3 on apache 2.2.9-7. It works fine, but I find that when I alter a source file during development, it sometimes takes 5 seconds or so for the changes to be seen. This might sound trivial, but when debugging

Re: Selecting a different superclass

2008-12-18 Thread psaff...@googlemail.com
On 17 Dec, 20:33, Chris Rebert c...@rebertia.com wrote: superclass = TraceablePointSet if tracing else PointSet Perfect - many thanks. Good to know I'm absolved from evil, also ;) Peter -- http://mail.python.org/mailman/listinfo/python-list

Selecting a different superclass

2008-12-17 Thread psaff...@googlemail.com
This might be a pure OO question, but I'm doing it in Python so I'll ask here. I'm writing a number crunching bioinformatics application. Read lots of numbers from files; merge, median and munge; draw plots. I've found that the most critical part of this work is validation and traceability -