Re: [Pytables-users] Reading single column from table

Julio Trevisan Mon, 08 Apr 2013 10:44:47 -0700

Hey Anthony

Thanks a lot for this. Your method with map() works around 30000 times
faster!



BEFORE:
(database)DEBUG:BOVESPA.VISTA.PETR4: took 0.096931 seconds to do everything
else
(database)DEBUG:BOVESPA.VISTA.PETR4: took 0.780372 seconds to ZIP


AFTER:
(database)DEBUG:BOVESPA.VISTA.PETR4: took 0.073058 seconds to do everything
else
(database)DEBUG:BOVESPA.VISTA.PETR4: took 0.000024 seconds to ZIP





On Fri, Mar 22, 2013 at 12:35 PM, Anthony Scopatz <scop...@gmail.com> wrote:

> On Fri, Mar 22, 2013 at 7:11 AM, Julio Trevisan 
> <juliotrevi...@gmail.com>wrote:
>
>> Hi,
>>
>> I just joined this list, I am using PyTables for my project and it works
>> great and fast.
>>
>> I am just trying to optimize some parts of the program and I noticed that
>> zipping the tuples to get one tuple per column takes much longer than
>> reading the data itself. The thing is that readWhere() returns one tuple
>> per row, whereas I I need one tuple per column, so I have to use the zip()
>> function to achieve this. Is there a way to skip this zip() operation?
>> Please see below:
>>
>>
>>     def quote_GetData(self, period, name, dt1, dt2):
>>         """Returns timedata.Quotes object.
>>
>>         Arguments:
>>           period -- value from within infogetter.QuotePeriod
>>           name -- quote symbol
>>           dt1, dt2 -- datetime.datetime or timestamp values
>>
>>         """
>>         t = time.time()
>>         node = self.quote_GetNode(period, name)
>>         ts1 = misc.datetime2timestamp(dt1)
>>         ts2 = misc.datetime2timestamp(dt2)
>>
>>         L = node.readWhere( \
>>                    "(timestamp/1000 >= %f) & (timestamp/1000 <= %f)" % \
>>                    (ts1/1000, ts2/1000))
>>         rowNum = len(L)
>>         Q = timedata.Quotes()
>>         print "%s: took %f seconds to do everything else" % (name,
>> time.time()-t)
>>
>>         t = time.time()
>>         if rowNum > 0:
>>             (Q.timestamp, Q.open, Q.close, Q.high, Q.low, Q.volume, \
>>              Q.numTrades) = zip(*L)
>>         print "%s: took %f seconds to ZIP" % (name, time.time()-t)
>>         return Q
>>
>> *And the printout:*
>> BOVESPA.VISTA.PETR4: took 0.068788 seconds to do everything else
>> BOVESPA.VISTA.PETR4: took 0.379910 seconds to ZIP
>>
>
> Hi Julio,
>
> The problem here isn't zip (packing and un-packing are generally
> fast operations -- they happen *all* the time in Python).    Nor is the
> problem specifically with PyTables.  Rather this is an issue with how you
> are using numpy structured arrays (look them up).  Basically, this is slow
> because you are creating a list of column tuples where every element is a
> Python object of the corresponding type.  For example  upcasting every
> 32-bit integer to a Python int is very expensive!
>
> What you *should* be doing is keeping the columns as numpy arrays, which
> keeps the memory layout small, continuous, fast, and if done right does not
> require a copy (which you are doing now).
>
> The value of L here is a structured array.  So say I have some
> other structured array with 4 fields, the right way to do this is to pull
> out each field individually by indexing
>
> a, b, c, d = x['a'], x['b'], x['c'], x['d']
>
> or more generally (for all fields):
>
> a, b, c, d = map(lambda x: i[x], i.dtype.names)
>
> or for some list of fields:
>
> a, c, b = map(lambda x: i[x], ['a', 'c', 'b'])
>
> Timing both your original method and the new one gives:
>
> In [47]: timeit a, b, c, d = zip(*i)
> 1000 loops, best of 3: 1.3 ms per loop
>
> In [48]: timeit a, b, c, d = map(lambda x: i[x], i.dtype.names)
> 100000 loops, best of 3: 2.3 µs per loop
>
> So the method I propose is 500x-1000x times faster.  Using numpy
> idiomatically is very important!
>
> Be Well
> Anthony
>
>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_d2d_mar
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_mar
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Reading single column from table

Reply via email to