Hi,
my users start to take comparability serious and start to downsample.
But it seems like the random_lines_two_pass.py tool is very slow with large 
input files, like bed files with 40million reads to e.g. 33million reads

https://bitbucket.org/galaxy/galaxy-dist/src/2469c53051ea/tools/filters/random_lines_two_pass.py?at=default

I don't understand the rationale behind the deletion of the positions from the 
array, in most programming languages deletion from an array is slow.

Benchmarking the two random sampling methods was too difficult for me, I 
removed the get_random_by_subtraction method,
and my users are happy.
Did anybody really benchmark this?

thank you very much,
ido



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to