On Tue, Jun 16, 2009 at 07:54:44PM -0700, Istvan Albert wrote:
-> On Jun 16, 10:22?am, "C. Titus Brown" <[email protected]> wrote:
-> 
-> > Questions & comments welcome! ?Watch the github space for updates and
-> > bugfixes.
-> 
-> One possible issue with this approach is that it always unpacks all
-> fields, even if one has no interest in using them. Especially the
-> attribute columns are less frequently used but have a strong effect on
-> performance.
-> 
-> This can lead to somewhat sluggish performance - most data sources
-> distribute GFF files that happen to store a lot of attributes - but
-> all the user is interested is separating by strand or operating on
-> intervals (at least this is very common in the type of analyses that I
-> run). The parser will be substantially slower (possibly one or two
-> orders of magnitude) than just splitting manually. A quick test (6
-> attributes, 100K lines) finishes in 12 seconds vs 1 second a
-> csv.DictReader or 0.5 seconds for a csv.reader. As long as the GFF
-> files are short this is not really a problem, but for larger files it
-> will be noticeable.

OK, I've added a parse_attributes option.  It yields about a 35 point
performance gain (48 seconds rather than 75) for my 1m-row GMAP input
file.

cheers,
--titus
-- 
C. Titus Brown, [email protected]

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to