On Mon, May 14, 2012 at 3:05 PM, Francesc Alted <fal...@pytables.org> wrote:
[snip]
However, do not expect to use all your cores at full speed in this cases,
> as the reductions in numexpr can only make use of one thread (this is
> because this has not been implemented yet, not due to a intrinsic
> limitation of numexpr).
>
Hello Francesc,
Not to side track the discussion too much, but is there a ticket open for
this in numexpr? It seems that at least for certain reductions (sum, mult,
etc), splitting this up over many cores would be pretty easy. I may to
wrong about this though ;)
Be Well
Anthony
>
> Francesc
>
>
>
>
> I hope this helps. If you need other tips on speeding up the
> sum operation, please let us know.
>
> Be Well
> Anthony
>
> Timer unit: 1e-06 s
>
> File: pytables_expr_test.py
> Function: fn at line 66
> Total time: 1.63254 s
>
> Line # Hits Time Per Hit % Time Line Contents
> ==============================================================
> 66 def fn(p, h5table):
> 67 '''
> 68 actual function
> we are going to minimize. It consists of
> 69 the pytables
> Table object and a list of parameters.
> 70 '''
> 71 1 14 14.0 0.0 uv =
> h5table.colinstances
> 72
> 73 # store parameters in
> a dict object with names
> 74 # like p0, p1, p2,
> etc. so they can be used in
> 75 # the Expr object.
> 76 4 21 5.2 0.0 for i in
> xrange(len(p)):
> 77 3 19 6.3 0.0 k = 'p'+str(i)
> 78 3 14 4.7 0.0 uv[k] = p[i]
> 79
> 80 # systematic shift on
> b is a polynomial in a
> 81 1 4 4.0 0.0 db = 'p0 * a*a + p1
> * a + p2'
> 82
> 83 # the element-wise
> function
> 84 1 6 6.0 0.0 fn_str = '(a - (b +
> %s))**2' % db
> 85
> 86 1 16427 16427.0 1.0 expr =
> Expr(fn_str,uservars=uv)
> 87 1 21438 21438.0 1.3 expr.eval()
> 88
> 89 # returning the "sum
> of squares"
> 90 1 1594600 1594600.0 97.7 return sum(expr)
>
>
>
>
> On Mon, May 14, 2012 at 1:59 PM, Johann Goetz <jgo...@ucla.edu> wrote:
>
>> SHORT VERSION:
>>
>> Please take a look at the fn() function in the attached file (pasted
>> below). When I run this with 10M events or more I notice that the total CPU
>> usage never goes above the percentage I get using single-threaded eval().
>> Am I at some other limit or can I improve performance by doing something
>> else?
>>
>> LONG VERSION:
>>
>> I have been trying to use the tables.Expr object to speed up a
>> sophisticated calculation over an entire dataset (a pytables Table object).
>> The calculation took so long that I had to create a simple example to make
>> sure I knew what I was doing. I apologize in advance for the lengthy code
>> below, but I wanted the example to mimic exactly what I'm trying to do and
>> to be totally self-contained.
>>
>> I have attached a file (and pasted it below) in which I create a hdf5
>> file with a single large Table of two columns. As you can see, I'm not
>> worried about writing speed at all - I'm concerned about read speed.
>>
>> I would like to draw your attention to the fn() function. This is where I
>> evaluate a "chi-squared" value on the dataset. My strategy is to populate
>> the "h5table.colinstances" dict object with several parameters which I call
>> p0, p1, etc and then create the Expr object using these and the column
>> names from the Table.
>>
>> If I create 10M rows (77 MB file) in the Table (with the command below),
>> the evaluation seems to be CPU bound (one of my cores is at 100% - the
>> others are idle) and it takes about 7 seconds (about 10 MB/s). Similarly, I
>> get about 70 seconds for 100M events.
>>
>> python pytables_expr_test.py 10000000
>> python pytables_expr_test.py 100000000
>>
>> So my question: It seems to me that I am not fully using the CPU power
>> available on my computer (see next paragraph). Am I missing something or
>> doing something wrong in the fn() function below?
>>
>> A few side-notes: My hard-disk is capable of over 200 MB/s in sequential
>> reading (sustained and tested with large files using the iozone program), I
>> have two 4-core CPU's on this machine but the total CPU usage during eval()
>> never goes above the percentage I get using single-threaded mode with
>> "numexpr.set_num_threads(1)".
>>
>> I am using pytables 2.3.1 and numexpr 2.0.1
>>
>> --
>> Johann T. Goetz, PhD. <http://sites.google.com/site/theodoregoetz/>
>> jgo...@ucla.edu
>> Nefkens Group, UCLA Dept. of Physics & Astronomy
>> Hall-B, Jefferson Lab, Newport News, VA
>>
>>
>> ### BEGIN file: pytables_expr_test.py
>>
>> from tables import openFile, Expr
>>
>> ### Control of the number of threads used when issuing the
>> ### Expr::eval() command
>> #import numexpr
>> #numexpr.set_num_threads(2)
>>
>> def create_ntuple_file(filename, npoints, pmodel):
>> '''
>> create an hdf5 file with a single table which contains
>> npoints number of rows of type row_t (defined below)
>> '''
>> from numpy import random, poly1d
>> from tables import IsDescription, Float32Col
>>
>> class row_t(IsDescription):
>> '''
>> the rows of the table to be created
>> '''
>> a = Float32Col()
>> b = Float32Col()
>>
>> def append_row(h5row, pmodel):
>> '''
>> consider this a single "event" being appended
>> to the dataset (table)
>> '''
>> h5row['a'] = random.uniform(0,10)
>>
>> h5row['b'] = h5row['a'] # reality (or model)
>> h5row['b'] = h5row['b'] - poly1d(pmodel)(h5row['a']) # systematics
>> h5row['b'] = h5row['b'] + random.normal(0,0.1) # noise
>>
>> h5row.append()
>>
>> h5file = openFile(filename, 'w')
>> h5table = h5file.createTable('/', 'table', row_t, "Data")
>> h5row = h5table.row
>>
>> # recording data to file...
>> for n in xrange(npoints):
>> append_row(h5row, pmodel)
>>
>> h5file.close()
>>
>> def create_ntuple_file_if_needed(filename, npoints, pmodel):
>> '''
>> looks to see if the file is already there and if so,
>> it makes sure its the right size. Otherwise, it
>> removes the existing file and creates a new one.
>> '''
>> from os import path, remove
>>
>> print 'model parameters:', pmodel
>>
>> if path.exists(filename):
>> h5file = openFile(filename, 'r')
>> h5table = h5file.root.table
>> if len(h5table) != npoints:
>> h5file.close()
>> remove(filename)
>>
>> if not path.exists(filename):
>> create_ntuple_file(filename, npoints, pmodel)
>>
>> def fn(p, h5table):
>> '''
>> actual function we are going to minimize. It consists of
>> the pytables Table object and a list of parameters.
>> '''
>> uv = h5table.colinstances
>>
>> # store parameters in a dict object with names
>> # like p0, p1, p2, etc. so they can be used in
>> # the Expr object.
>> for i in xrange(len(p)):
>> k = 'p'+str(i)
>> uv[k] = p[i]
>>
>> # systematic shift on b is a polynomial in a
>> db = 'p0 * a*a + p1 * a + p2'
>>
>> # the element-wise function
>> fn_str = '(a - (b + %s))**2' % db
>>
>> expr = Expr(fn_str,uservars=uv)
>> expr.eval()
>>
>> # returning the "sum of squares"
>> return sum(expr)
>>
>> if __name__ == '__main__':
>> '''
>> usage:
>> python pytables_expr_test.py [npoints]
>>
>> Hint: try this with 10M points
>> '''
>> from sys import argv
>> from time import time
>>
>> npoints = 1000000
>> if len(argv) > 1:
>> npoints = int(argv[1])
>>
>> filename = 'tmp.'+str(npoints)+'.hdf5'
>>
>> pmodel = [-0.04,0.002,0.001]
>>
>> print 'creating file (if it doesn\'t exist)...'
>> create_ntuple_file_if_needed(filename, npoints, pmodel)
>>
>> h5file = openFile(filename, 'r')
>> h5table = h5file.root.table
>>
>> print 'evaluating function'
>> starttime = time()
>> print fn([0.,0.,0.], h5table)
>> print 'evaluated file in',time()-starttime,'seconds.'
>>
>> #EOF
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> Pytables-users mailing
> listPytables-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> --
> Francesc Alted
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users