Hello I'm resurecting this thread in light of htslib+samtools 1.0 release.

I see in the code of hfile.c that capacity is blocked over 32k, this is not 
good for parallel file systems where the typical block is 4MB

Also, is there a way at runtime to control block sizes or do we need to compile 
a "special" version for the cluster? If we do need a special version, it would 
be nice to
have a central constant to modify to control the block size.

Thanks
Louis

On 13-08-28 05:40 AM, John Marshall wrote:
> On 26 Aug 2013, at 18:11, Louis Letourneau wrote:
>>    I was having performance issues with mpileup on our GPFS cluster. I
>> traced it back to the way BAMs are processed. Headers (18 bytes) are
>> read first, then the rest of the block. This is not efficient on
>> distributed FS.
>>
>>   I saw that there was a TODO I/O buffering in the implementation of
>> KNET. In the mean time I forced KNET off be removing the define and
>> forced buffering on fread with setvbuf. In my case, to 4MB.
> 
> As it happens, one of the things I have been doing in the last couple of 
> weeks is implementing I/O buffering for htslib's low-level file access.  This 
> is a new layer above knetfile, collecting all the I/O system calls in one 
> place so we have the opportunity to detect I/O errors robustly, using 
> fstat(2) on local files to determine appropriate buffer sizes as mentioned in 
> that thread from last October, and allowing for format autodetection even on 
> pipes by peeking at the buffer.
> 
> Have a look at the io branch at https://github.com/samtools/htslib if you're 
> interested.  There's an upcoming commit to plug the BAM/SAM etc I/O into it, 
> and then it will be interesting to see what if any performance changes there 
> are on Lustre and other distributed file systems.
> 
> Cheers,
> 
>     John
> 

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to