On Jun 16, 10:22 am, "C. Titus Brown" <[email protected]> wrote:

> Questions & comments welcome!  Watch the github space for updates and
> bugfixes.

One possible issue with this approach is that it always unpacks all
fields, even if one has no interest in using them. Especially the
attribute columns are less frequently used but have a strong effect on
performance.

This can lead to somewhat sluggish performance - most data sources
distribute GFF files that happen to store a lot of attributes - but
all the user is interested is separating by strand or operating on
intervals (at least this is very common in the type of analyses that I
run). The parser will be substantially slower (possibly one or two
orders of magnitude) than just splitting manually. A quick test (6
attributes, 100K lines) finishes in 12 seconds vs 1 second a
csv.DictReader or 0.5 seconds for a csv.reader. As long as the GFF
files are short this is not really a problem, but for larger files it
will be noticeable.

Thanks for the embedded links and docs in the code, those are very
useful, I learned some new things about GFF that I did not know
before.

best,

Istvan

Not sure what the right solution is, maybe a flag that needs to be
turned on to get the attribute splitting behavior.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to