Re: [Pytables-users] Reading single column from table

Anthony Scopatz Mon, 08 Apr 2013 11:57:31 -0700

I am glad =)
On Apr 8, 2013 12:44 PM, "Julio Trevisan" <juliotrevi...@gmail.com> wrote:


> Hey Anthony
>
> Thanks a lot for this. Your method with map() works around 30000 times
> faster!
>
>
> BEFORE:
> (database)DEBUG:BOVESPA.VISTA.PETR4: took 0.096931 seconds to do
> everything else
> (database)DEBUG:BOVESPA.VISTA.PETR4: took 0.780372 seconds to ZIP
>
>
> AFTER:
> (database)DEBUG:BOVESPA.VISTA.PETR4: took 0.073058 seconds to do
> everything else
> (database)DEBUG:BOVESPA.VISTA.PETR4: took 0.000024 seconds to ZIP
>
>
>
>
>
> On Fri, Mar 22, 2013 at 12:35 PM, Anthony Scopatz <scop...@gmail.com>wrote:
>
>> On Fri, Mar 22, 2013 at 7:11 AM, Julio Trevisan 
>> <juliotrevi...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I just joined this list, I am using PyTables for my project and it works
>>> great and fast.
>>>
>>> I am just trying to optimize some parts of the program and I noticed
>>> that zipping the tuples to get one tuple per column takes much longer than
>>> reading the data itself. The thing is that readWhere() returns one tuple
>>> per row, whereas I I need one tuple per column, so I have to use the zip()
>>> function to achieve this. Is there a way to skip this zip() operation?
>>> Please see below:
>>>
>>>
>>>     def quote_GetData(self, period, name, dt1, dt2):
>>>         """Returns timedata.Quotes object.
>>>
>>>         Arguments:
>>>           period -- value from within infogetter.QuotePeriod
>>>           name -- quote symbol
>>>           dt1, dt2 -- datetime.datetime or timestamp values
>>>
>>>         """
>>>         t = time.time()
>>>         node = self.quote_GetNode(period, name)
>>>         ts1 = misc.datetime2timestamp(dt1)
>>>         ts2 = misc.datetime2timestamp(dt2)
>>>
>>>         L = node.readWhere( \
>>>                    "(timestamp/1000 >= %f) & (timestamp/1000 <= %f)" % \
>>>                    (ts1/1000, ts2/1000))
>>>         rowNum = len(L)
>>>         Q = timedata.Quotes()
>>>         print "%s: took %f seconds to do everything else" % (name,
>>> time.time()-t)
>>>
>>>         t = time.time()
>>>         if rowNum > 0:
>>>             (Q.timestamp, Q.open, Q.close, Q.high, Q.low, Q.volume, \
>>>              Q.numTrades) = zip(*L)
>>>         print "%s: took %f seconds to ZIP" % (name, time.time()-t)
>>>         return Q
>>>
>>> *And the printout:*
>>> BOVESPA.VISTA.PETR4: took 0.068788 seconds to do everything else
>>> BOVESPA.VISTA.PETR4: took 0.379910 seconds to ZIP
>>>
>>
>> Hi Julio,
>>
>> The problem here isn't zip (packing and un-packing are generally
>> fast operations -- they happen *all* the time in Python).    Nor is the
>> problem specifically with PyTables.  Rather this is an issue with how you
>> are using numpy structured arrays (look them up).  Basically, this is slow
>> because you are creating a list of column tuples where every element is a
>> Python object of the corresponding type.  For example  upcasting every
>> 32-bit integer to a Python int is very expensive!
>>
>> What you *should* be doing is keeping the columns as numpy arrays, which
>> keeps the memory layout small, continuous, fast, and if done right does not
>> require a copy (which you are doing now).
>>
>> The value of L here is a structured array.  So say I have some
>> other structured array with 4 fields, the right way to do this is to pull
>> out each field individually by indexing
>>
>> a, b, c, d = x['a'], x['b'], x['c'], x['d']
>>
>> or more generally (for all fields):
>>
>> a, b, c, d = map(lambda x: i[x], i.dtype.names)
>>
>> or for some list of fields:
>>
>> a, c, b = map(lambda x: i[x], ['a', 'c', 'b'])
>>
>> Timing both your original method and the new one gives:
>>
>> In [47]: timeit a, b, c, d = zip(*i)
>> 1000 loops, best of 3: 1.3 ms per loop
>>
>> In [48]: timeit a, b, c, d = map(lambda x: i[x], i.dtype.names)
>> 100000 loops, best of 3: 2.3 µs per loop
>>
>> So the method I propose is 500x-1000x times faster.  Using numpy
>> idiomatically is very important!
>>
>> Be Well
>> Anthony
>>
>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Everyone hates slow websites. So do we.
>>> Make your web apps faster with AppDynamics
>>> Download AppDynamics Lite for free today:
>>> http://p.sf.net/sfu/appdyn_d2d_mar
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pytables-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_d2d_mar
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Reading single column from table

Reply via email to