Elaina Ann Hyde wrote: > Thanks for all the helpful hints, I really like the idea of using > distances > instead of a limit. Walter was right that the 'i !=j' condition was > causing problems. I think that Alan and Steven's use of the index > separately was great as it makes this much easier to test (and yes > 'astropysics' is a valid package, it's in there for later when I convert > astrophysical coordinates and whatnot, pretty great but a little buggy > FYI). So I thought, hey, why not try to do a little of all these ideas, > and, if you'll forgive the change in syntax, I think the problem is that > the file might really just be too big to handle, and I'm not sure I have > the right idea with the best_match:
> The errors are as follows: dat2=asciitable.read(y,Reader=asciitable.NoHeader,data_start=4,fill_values=['nan','-9.999']) > File > "/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site- packages/asciitable-0.8.0-py2.7.egg/asciitable/ui.py", > line 131, in read > dat = _guess(table, new_kwargs) > File > "/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site- packages/asciitable-0.8.0-py2.7.egg/asciitable/ui.py", > line 175, in _guess > dat = reader.read(table) > File > "/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site- packages/asciitable-0.8.0-py2.7.egg/asciitable/core.py", > line 841, in read > self.lines = self.inputter.get_lines(table) > File > "/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site- packages/asciitable-0.8.0-py2.7.egg/asciitable/core.py", > line 158, in get_lines > lines = table.splitlines() > MemoryError > ---------------------- > So this means I don't have enough memory to run through the large file? > Even if I just read in with asciitable I get this problem, I looked again > and the large file is 1.5GB of text lines, so very large. I was thinking > of trying to tell the read function to skip lines that are too far away, > the file is much, much bigger than the area I need. Thanks for the > comments so far. > ~Elaina > Hmm, 1.5GB would be about 30,000 bytes per line if the 50,000 lines you mentioned before are correct. What does $ wc <bigfile> say? Can you give the first few lines of <bigfile> here or on pastebin.com? I don't have asciitables installed but a quick look into the code suggests it consumes a lot more memory than necessary to solve your problem. If the file format is simple a viable alternative may be to extract the interesting columns manually together with the line index. Once you have the best matches you can build the result from <bigfile> and the indices of the best matches. Alternatively you can split <bigfile> into a few parts, calculate the best matches for each part, and finally calculate the best matches of the partial best matches combined. _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
