Re: [Pytables-users] Performance advice/Blosc

Jimmy Paillet Mon, 12 Mar 2012 13:38:40 -0700

I did change  BLOSC_MAX_TYPESIZE as suggested, and indeed the compression
rate of blosc was greatly improved, to the level of lzo more  or less...


No apparent side effects...

I still only get one thread when decompressing though.
Is there a requirement to use the mpi version of the hd5 libraries for
blosc to be multithreaded?

Best,
Jimmy

2012/2/11 <pytables-users-requ...@lists.sourceforge.net>

> Send Pytables-users mailing list submissions to
>        pytables-users@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://lists.sourceforge.net/lists/listinfo/pytables-users
> or, via email, send a message with subject or body 'help' to
>        pytables-users-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
>        pytables-users-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Pytables-users digest..."
>
>
> Today's Topics:
>
>   1. Performance advice/Blosc (Francesc Alted)
>   2. Re: Performance advice/Blosc (Francesc Alted)
>   3. Re: Pytables-users Digest, Vol 69, Issue 2 (Luc Kesters)
>   4. Re: Pytables-users Digest, Vol 69, Issue 2 (Anthony Scopatz)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 10 Feb 2012 11:40:12 +0100
> From: Francesc Alted <fal...@pytables.org>
> Subject: [Pytables-users] Performance advice/Blosc
> To: pytables-users@lists.sourceforge.net
> Message-ID: <c0acf694-cd3e-4408-9778-f1abe91b1...@pytables.org>
> Content-Type: text/plain; charset="us-ascii"
>
> Apparently sent from an unsubscribed address.
>
> Begin forwarded message:
>
> > From: pytables-users-boun...@lists.sourceforge.net
> > Subject: Auto-discard notification
> > Date: February 10, 2012 12:14:45 AM GMT+01:00
> > To: pytables-users-ow...@lists.sourceforge.net
> >
> > The attached message has been automatically discarded.
> > From: Jimmy Paillet <jimmy.pail...@gmail.com>
> > Subject: Performance advice/Blosc
> > Date: February 10, 2012 12:14:19 AM GMT+01:00
> > To: pytables-users@lists.sourceforge.net
> >
> >
> > Hey,
> >
> > I'd like to ask some advice about pytables data organization and
> compression performance....
> >
> > My data set is just a big table (500Mrows, 45 columns), the file size is
> 70GB, compressed with blosc-4... the compression ratio is around 2-3.
> > Several ultralight indexes.
> > Python 2.5, pytables 2.3.1 ubuntu 8.04 64 bits, 4core Intel Xeon 12GB
> RAM.
> >
> > The file is on a NAS, which I am linked to with a GbE link.
> > Performance was not that bad for a max IO bandwidth of 90 MB/s
> >
> > To see how it would scale with I/O speed, I set up a 3-SSD RAID 0
> (sequential read speeds up to 660 MB/s)
> > I got a bit disappointed. Yes, very selective queries that can use
> indexes are very much faster on the RAID (up to 6 times).
> > However, broader queries are almost on par with the speed I got from the
> NAS system, which seemed weird as it's getting close
> > to sequential reads. This is the queries I wanted to speed up!
> >
> > It seems I can't get past 80-90 MB/s when reading a compressed h5.
> > It's roughly the same with lzo or blosc (except lzo compressed 2 times
> more)...
> > Does that number seems reasonable? Reading from lzo and especially blosc
> on the web, it looks a bit underwhelming in comparison....
> > Am I missing something?
> >
> > One of my issues I believe is that I can't get more than one
> decompressing blosc thread, even though I set tables.setBloscMaxThreads(6).
> > Any ideas of what is happening here?
> >
> > On uncompressed files, I can reach the 600MB/s limit when doing reads.
>  But since I get files that are 2 to 6 times bigger,
> > I often end up with similar performances. So I wonder how to scale my
> system.
> >
> > Thanks for any input.
> > J.
> >
> >
> >
> >
>
> -- Francesc Alted
>
>
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 2
> Date: Fri, 10 Feb 2012 11:55:05 +0100
> From: Francesc Alted <fal...@pytables.org>
> Subject: Re: [Pytables-users] Performance advice/Blosc
> To: pytables-users@lists.sourceforge.net
> Message-ID: <7599edd4-ac43-46a9-a6be-f7a112979...@pytables.org>
> Content-Type: text/plain; charset=us-ascii
>
> On Feb 10, 2012, at 11:40 AM, Francesc Alted wrote:
>
> > Apparently sent from an unsubscribed address.
> >
> > Begin forwarded message:
> >
> >> From: pytables-users-boun...@lists.sourceforge.net
> >> Subject: Auto-discard notification
> >> Date: February 10, 2012 12:14:45 AM GMT+01:00
> >> To: pytables-users-ow...@lists.sourceforge.net
> >>
> >> The attached message has been automatically discarded.
> >> From: Jimmy Paillet <jimmy.pail...@gmail.com>
> >> Subject: Performance advice/Blosc
> >> Date: February 10, 2012 12:14:19 AM GMT+01:00
> >> To: pytables-users@lists.sourceforge.net
> >>
> >>
> >> Hey,
> >>
> >> I'd like to ask some advice about pytables data organization and
> compression performance....
> >>
> >> My data set is just a big table (500Mrows, 45 columns), the file size
> is 70GB, compressed with blosc-4... the compression ratio is around 2-3.
> >> Several ultralight indexes.
> >> Python 2.5, pytables 2.3.1 ubuntu 8.04 64 bits, 4core Intel Xeon 12GB
> RAM.
> >>
> >> The file is on a NAS, which I am linked to with a GbE link.
> >> Performance was not that bad for a max IO bandwidth of 90 MB/s
> >>
> >> To see how it would scale with I/O speed, I set up a 3-SSD RAID 0
> (sequential read speeds up to 660 MB/s)
> >> I got a bit disappointed. Yes, very selective queries that can use
> indexes are very much faster on the RAID (up to 6 times).
> >> However, broader queries are almost on par with the speed I got from
> the NAS system, which seemed weird as it's getting close
> >> to sequential reads. This is the queries I wanted to speed up!
> >>
> >> It seems I can't get past 80-90 MB/s when reading a compressed h5.
> >> It's roughly the same with lzo or blosc (except lzo compressed 2 times
> more)...
> >> Does that number seems reasonable? Reading from lzo and especially
> blosc on the web, it looks a bit underwhelming in comparison....
> >> Am I missing something?
>
> The problem here is probably related with the length size of your records
> which is probably larger than 256 bytes, and for such sizes the shuffle
> filter becomes inactive (this is because I have found that it tends to
> introduce too much performance penalization).
>
> In case you want to experiment by yourself, you can raise the value of
> BLOSC_MAX_TYPESIZE in:
>
> https://github.com/PyTables/PyTables/blob/master/blosc/blosc.h#L42
>
> and recompile.  I'm guessing here, but I don't think this could bring
> other problems.  At any rate, if you do so, I'm curious about the results,
> so please post them here.
>
> The best solution is to implement column-wise tables (the current ones in
> PyTables are row-wise).
>
> >> One of my issues I believe is that I can't get more than one
> decompressing blosc thread, even though I set tables.setBloscMaxThreads(6).
> >> Any ideas of what is happening here?
>
> Hmm, that should work.  Why are you saying that this does not work?
>
> >>
> >> On uncompressed files, I can reach the 600MB/s limit when doing reads.
>  But since I get files that are 2 to 6 times bigger,
> >> I often end up with similar performances. So I wonder how to scale my
> system.
> >>
> >> Thanks for any input.
> >> J.
>
> -- Francesc Alted
>
>
>
>
>
>
>
>
>

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Performance advice/Blosc

Reply via email to