Re: [Pytables-users] Performance advice/Blosc

Anthony Scopatz Mon, 12 Mar 2012 19:48:18 -0700

On Mon, Mar 12, 2012 at 1:37 PM, Jimmy Paillet <jimmy.pail...@gmail.com>wrote:


> I did change  BLOSC_MAX_TYPESIZE as suggested, and indeed the compression
> rate of blosc was greatly improved, to the level of lzo more  or less...
>
> No apparent side effects...
>
> I still only get one thread when decompressing though.
> Is there a requirement to use the mpi version of the hd5 libraries for
> blosc to be multithreaded?
>

Hi Jimmy,

There is no dependency on the MPI currently.  We are using pthreads only,
as I recall.

Hope this helps track down your issue.

Be Well
Anthony


>
> Best,
> Jimmy
>
> 2012/2/11 <pytables-users-requ...@lists.sourceforge.net>
>
>> Send Pytables-users mailing list submissions to
>>        pytables-users@lists.sourceforge.net
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>        https://lists.sourceforge.net/lists/listinfo/pytables-users
>> or, via email, send a message with subject or body 'help' to
>>        pytables-users-requ...@lists.sourceforge.net
>>
>> You can reach the person managing the list at
>>        pytables-users-ow...@lists.sourceforge.net
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Pytables-users digest..."
>>
>>
>> Today's Topics:
>>
>>   1. Performance advice/Blosc (Francesc Alted)
>>   2. Re: Performance advice/Blosc (Francesc Alted)
>>   3. Re: Pytables-users Digest, Vol 69, Issue 2 (Luc Kesters)
>>   4. Re: Pytables-users Digest, Vol 69, Issue 2 (Anthony Scopatz)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 10 Feb 2012 11:40:12 +0100
>> From: Francesc Alted <fal...@pytables.org>
>> Subject: [Pytables-users] Performance advice/Blosc
>> To: pytables-users@lists.sourceforge.net
>> Message-ID: <c0acf694-cd3e-4408-9778-f1abe91b1...@pytables.org>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> Apparently sent from an unsubscribed address.
>>
>> Begin forwarded message:
>>
>> > From: pytables-users-boun...@lists.sourceforge.net
>> > Subject: Auto-discard notification
>> > Date: February 10, 2012 12:14:45 AM GMT+01:00
>> > To: pytables-users-ow...@lists.sourceforge.net
>> >
>> > The attached message has been automatically discarded.
>> > From: Jimmy Paillet <jimmy.pail...@gmail.com>
>> > Subject: Performance advice/Blosc
>> > Date: February 10, 2012 12:14:19 AM GMT+01:00
>> > To: pytables-users@lists.sourceforge.net
>> >
>> >
>> > Hey,
>> >
>> > I'd like to ask some advice about pytables data organization and
>> compression performance....
>> >
>> > My data set is just a big table (500Mrows, 45 columns), the file size
>> is 70GB, compressed with blosc-4... the compression ratio is around 2-3.
>> > Several ultralight indexes.
>> > Python 2.5, pytables 2.3.1 ubuntu 8.04 64 bits, 4core Intel Xeon 12GB
>> RAM.
>> >
>> > The file is on a NAS, which I am linked to with a GbE link.
>> > Performance was not that bad for a max IO bandwidth of 90 MB/s
>> >
>> > To see how it would scale with I/O speed, I set up a 3-SSD RAID 0
>> (sequential read speeds up to 660 MB/s)
>> > I got a bit disappointed. Yes, very selective queries that can use
>> indexes are very much faster on the RAID (up to 6 times).
>> > However, broader queries are almost on par with the speed I got from
>> the NAS system, which seemed weird as it's getting close
>> > to sequential reads. This is the queries I wanted to speed up!
>> >
>> > It seems I can't get past 80-90 MB/s when reading a compressed h5.
>> > It's roughly the same with lzo or blosc (except lzo compressed 2 times
>> more)...
>> > Does that number seems reasonable? Reading from lzo and especially
>> blosc on the web, it looks a bit underwhelming in comparison....
>> > Am I missing something?
>> >
>> > One of my issues I believe is that I can't get more than one
>> decompressing blosc thread, even though I set tables.setBloscMaxThreads(6).
>> > Any ideas of what is happening here?
>> >
>> > On uncompressed files, I can reach the 600MB/s limit when doing reads.
>>  But since I get files that are 2 to 6 times bigger,
>> > I often end up with similar performances. So I wonder how to scale my
>> system.
>> >
>> > Thanks for any input.
>> > J.
>> >
>> >
>> >
>> >
>>
>> -- Francesc Alted
>>
>>
>>
>>
>>
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Fri, 10 Feb 2012 11:55:05 +0100
>> From: Francesc Alted <fal...@pytables.org>
>> Subject: Re: [Pytables-users] Performance advice/Blosc
>> To: pytables-users@lists.sourceforge.net
>> Message-ID: <7599edd4-ac43-46a9-a6be-f7a112979...@pytables.org>
>> Content-Type: text/plain; charset=us-ascii
>>
>> On Feb 10, 2012, at 11:40 AM, Francesc Alted wrote:
>>
>> > Apparently sent from an unsubscribed address.
>> >
>> > Begin forwarded message:
>> >
>> >> From: pytables-users-boun...@lists.sourceforge.net
>> >> Subject: Auto-discard notification
>> >> Date: February 10, 2012 12:14:45 AM GMT+01:00
>> >> To: pytables-users-ow...@lists.sourceforge.net
>> >>
>> >> The attached message has been automatically discarded.
>> >> From: Jimmy Paillet <jimmy.pail...@gmail.com>
>> >> Subject: Performance advice/Blosc
>> >> Date: February 10, 2012 12:14:19 AM GMT+01:00
>> >> To: pytables-users@lists.sourceforge.net
>> >>
>> >>
>> >> Hey,
>> >>
>> >> I'd like to ask some advice about pytables data organization and
>> compression performance....
>> >>
>> >> My data set is just a big table (500Mrows, 45 columns), the file size
>> is 70GB, compressed with blosc-4... the compression ratio is around 2-3.
>> >> Several ultralight indexes.
>> >> Python 2.5, pytables 2.3.1 ubuntu 8.04 64 bits, 4core Intel Xeon 12GB
>> RAM.
>> >>
>> >> The file is on a NAS, which I am linked to with a GbE link.
>> >> Performance was not that bad for a max IO bandwidth of 90 MB/s
>> >>
>> >> To see how it would scale with I/O speed, I set up a 3-SSD RAID 0
>> (sequential read speeds up to 660 MB/s)
>> >> I got a bit disappointed. Yes, very selective queries that can use
>> indexes are very much faster on the RAID (up to 6 times).
>> >> However, broader queries are almost on par with the speed I got from
>> the NAS system, which seemed weird as it's getting close
>> >> to sequential reads. This is the queries I wanted to speed up!
>> >>
>> >> It seems I can't get past 80-90 MB/s when reading a compressed h5.
>> >> It's roughly the same with lzo or blosc (except lzo compressed 2 times
>> more)...
>> >> Does that number seems reasonable? Reading from lzo and especially
>> blosc on the web, it looks a bit underwhelming in comparison....
>> >> Am I missing something?
>>
>> The problem here is probably related with the length size of your records
>> which is probably larger than 256 bytes, and for such sizes the shuffle
>> filter becomes inactive (this is because I have found that it tends to
>> introduce too much performance penalization).
>>
>> In case you want to experiment by yourself, you can raise the value of
>> BLOSC_MAX_TYPESIZE in:
>>
>> https://github.com/PyTables/PyTables/blob/master/blosc/blosc.h#L42
>>
>> and recompile.  I'm guessing here, but I don't think this could bring
>> other problems.  At any rate, if you do so, I'm curious about the results,
>> so please post them here.
>>
>> The best solution is to implement column-wise tables (the current ones in
>> PyTables are row-wise).
>>
>> >> One of my issues I believe is that I can't get more than one
>> decompressing blosc thread, even though I set tables.setBloscMaxThreads(6).
>> >> Any ideas of what is happening here?
>>
>> Hmm, that should work.  Why are you saying that this does not work?
>>
>> >>
>> >> On uncompressed files, I can reach the 600MB/s limit when doing reads.
>>  But since I get files that are 2 to 6 times bigger,
>> >> I often end up with similar performances. So I wonder how to scale my
>> system.
>> >>
>> >> Thanks for any input.
>> >> J.
>>
>> -- Francesc Alted
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Performance advice/Blosc

Reply via email to