Hey Anthony
Thanks a lot for this. Your method with map() works around 30000 times
faster!
BEFORE:
(database)DEBUG:BOVESPA.VISTA.PETR4: took 0.096931 seconds to do everything
else
(database)DEBUG:BOVESPA.VISTA.PETR4: took 0.780372 seconds to ZIP
AFTER:
(database)DEBUG:BOVESPA.VISTA.PETR4: took 0.073058 seconds to do everything
else
(database)DEBUG:BOVESPA.VISTA.PETR4: took 0.000024 seconds to ZIP
On Fri, Mar 22, 2013 at 12:35 PM, Anthony Scopatz <scop...@gmail.com> wrote:
> On Fri, Mar 22, 2013 at 7:11 AM, Julio Trevisan
> <juliotrevi...@gmail.com>wrote:
>
>> Hi,
>>
>> I just joined this list, I am using PyTables for my project and it works
>> great and fast.
>>
>> I am just trying to optimize some parts of the program and I noticed that
>> zipping the tuples to get one tuple per column takes much longer than
>> reading the data itself. The thing is that readWhere() returns one tuple
>> per row, whereas I I need one tuple per column, so I have to use the zip()
>> function to achieve this. Is there a way to skip this zip() operation?
>> Please see below:
>>
>>
>> def quote_GetData(self, period, name, dt1, dt2):
>> """Returns timedata.Quotes object.
>>
>> Arguments:
>> period -- value from within infogetter.QuotePeriod
>> name -- quote symbol
>> dt1, dt2 -- datetime.datetime or timestamp values
>>
>> """
>> t = time.time()
>> node = self.quote_GetNode(period, name)
>> ts1 = misc.datetime2timestamp(dt1)
>> ts2 = misc.datetime2timestamp(dt2)
>>
>> L = node.readWhere( \
>> "(timestamp/1000 >= %f) & (timestamp/1000 <= %f)" % \
>> (ts1/1000, ts2/1000))
>> rowNum = len(L)
>> Q = timedata.Quotes()
>> print "%s: took %f seconds to do everything else" % (name,
>> time.time()-t)
>>
>> t = time.time()
>> if rowNum > 0:
>> (Q.timestamp, Q.open, Q.close, Q.high, Q.low, Q.volume, \
>> Q.numTrades) = zip(*L)
>> print "%s: took %f seconds to ZIP" % (name, time.time()-t)
>> return Q
>>
>> *And the printout:*
>> BOVESPA.VISTA.PETR4: took 0.068788 seconds to do everything else
>> BOVESPA.VISTA.PETR4: took 0.379910 seconds to ZIP
>>
>
> Hi Julio,
>
> The problem here isn't zip (packing and un-packing are generally
> fast operations -- they happen *all* the time in Python). Nor is the
> problem specifically with PyTables. Rather this is an issue with how you
> are using numpy structured arrays (look them up). Basically, this is slow
> because you are creating a list of column tuples where every element is a
> Python object of the corresponding type. For example upcasting every
> 32-bit integer to a Python int is very expensive!
>
> What you *should* be doing is keeping the columns as numpy arrays, which
> keeps the memory layout small, continuous, fast, and if done right does not
> require a copy (which you are doing now).
>
> The value of L here is a structured array. So say I have some
> other structured array with 4 fields, the right way to do this is to pull
> out each field individually by indexing
>
> a, b, c, d = x['a'], x['b'], x['c'], x['d']
>
> or more generally (for all fields):
>
> a, b, c, d = map(lambda x: i[x], i.dtype.names)
>
> or for some list of fields:
>
> a, c, b = map(lambda x: i[x], ['a', 'c', 'b'])
>
> Timing both your original method and the new one gives:
>
> In [47]: timeit a, b, c, d = zip(*i)
> 1000 loops, best of 3: 1.3 ms per loop
>
> In [48]: timeit a, b, c, d = map(lambda x: i[x], i.dtype.names)
> 100000 loops, best of 3: 2.3 µs per loop
>
> So the method I propose is 500x-1000x times faster. Using numpy
> idiomatically is very important!
>
> Be Well
> Anthony
>
>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_d2d_mar
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_mar
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users