Re: GSoC - 6th coding week

2014-06-27 Thread Ketil Malde
> what is the typical usage pattern of that list? For me, it's almost always streaming through it, building up some other data structure as I run along. > if (memory) space is very tight, then the result of the parse > could just be a list of file offsets, to be used later, > when elements are a

Re: GSoC - 6th coding week

2014-06-27 Thread Christian Höner zu Siederdissen
Hi, for mmap-ing, there are xyz-mmap libraries on hackage. In general, for libraries accessing this kind (huge bio data base) of data, we want both, efficient streaming and efficient pull-in of the whole dataset. Gruss, Christian * Johannes Waldmann [27.06.2014 15:58]: > Hi, > > >> what is me

Re: GSoC - 6th coding week

2014-06-27 Thread Johannes Waldmann
Hi, >> what is meant by "the parsing is lazy" exactly? > I don't know, did I use that term? Yes, in the docs http://hackage.haskell.org/package/blastxml-0.3.1/docs/Bio-BlastXML.html >> You want a BlastResult with a lazy list of results >> (containing BlastRecords with a lazy list of hits, etc)?

Re: GSoC - 6th coding week

2014-06-27 Thread Ketil Malde
> (I'm interested in XML processing as well - > also large files, though not for bio stuff) > can you show a test case (actual source code, > XML input data, and your performance measurements)? Probably - the data file I used is a bit large (eight gigs), so probably not ideal to ship around as a

Re: GSoC - 6th coding week

2014-06-27 Thread Johannes Waldmann
On 06/27/2014 12:31 PM, Ketil Malde wrote: > performance of the blastxml library, > which parses Blast XML output files. (I'm interested in XML processing as well - also large files, though not for bio stuff) can you show a test case (actual source code, XML input data, and your performance mea

Re: GSoC - 6th coding week

2014-06-27 Thread Ketil Malde
One of the issues we've discussed briefly, was the performance of the blastxml library, which parses Blast XML output files. These files can typically be large, and there is quite a bit of overhead in parsing them - which poses a challenge for transalign. It uses Neil Mitchells 'tagsoup' library

GSoC - 6th coding week

2014-06-23 Thread Sarah Berkemer
Hello! Here is my blog post for the 6th coding week. This is what I did the last week: http://biohaskell.org/GSoC_blog/Week_5 And this is my plan for this week: http://biohaskell.org/GSoC_blog/Weeks_6and7 The last weeks I changed a lot in the code to decrease the time and space consumption. The a