Re: [Pytables-users] pyTable index from c++

Jim Knoll Fri, 09 Nov 2012 13:27:08 -0800

Thanks for taking the time.

Most of our tables are very wide  lots of col....  and simple conditions are 
common.... so that is why in-kernel makes almost no impact for me.


-----Original Message-----
From: Francesc Alted [mailto:[email protected]] 
Sent: Friday, November 09, 2012 11:27 AM
To: [email protected]
Subject: Re: [Pytables-users] pyTable index from c++

Well, expected performance of in-kernel (numexpr powered) queries wrt 
regular (python) queries largely depends on where the bottleneck is. If 
your table has a lot of columns, then the bottleneck is going to be more 
on the I/O side, so you cannot expect a large difference in performance. 
However, if your table has a small number of columns, then there is more 
likelihood that bottleneck is CPU, and your chances to experiment a 
difference are higher.

Of course, having complex queries (i.e. queries that take conditions 
over several columns, or just combinations of conditions in the same 
column) makes the query more CPU intensive, and in-kernel normally wins 
by a comfortable margin.

Finally, what indexing is doing is to reduce the number of rows where 
the conditions have to be evaluated, so depending on the cardinality of 
the query and the associated index, you can get more or less speedup.

Francesc

On 11/9/12 5:12 PM, Jim Knoll wrote:
>
> Thanks for the reply. I will put some investigation of C++ access on 
> my list for items to look at over the slow holiday season.
>
> For the short term we will store a C++ ready index as a different 
> table object in the same h5 file. It will work... just a bit of a waste 
> on disk space.
>
> One follow up question
>
> Why would my performance of
>
> for row in node.where('stringField == "SomeString"'):
>
> *not*be noticeably faster than
>
> for row in node:
>
> if row.stringField == "SomeString" :
>
> Specifically when there is no index. I understand and see the speed 
> improvement only when I have a index. I expected to see some benefit 
> from numexpr even with no index. I expected node.where() to be much 
> faster. What I see is identical performance. Is numexpr benefit only 
> seen for complex math like (floatField ** intField > otherFloatField) 
> I did not see that to be the case on my first attempt.... Seems that I 
> only benefit from a index.
>
> *From:*Anthony Scopatz [mailto:[email protected]]
> *Sent:* Friday, November 09, 2012 12:24 AM
> *To:* Discussion list for PyTables
> *Subject:* Re: [Pytables-users] pyTable index from c++
>
> On Thu, Nov 8, 2012 at 10:19 PM, Jim Knoll 
> <[email protected] <mailto:[email protected]>> 
> wrote:
>
> I love the index function and promote the internal use of PyTables at 
> my company. The availability of a indexed method to speed the search 
> is the main reason why.
>
> We are a mixed shop using c++ to create H5 (just for the raw speed ... 
> need to keep up with streaming data) End users start with python 
> pyTables to consume the data. (Often after we have created indexes 
> from python pytables.col.col1.createIndex())
>
> Sometimes the users come up with something we want to do thousands of 
> times and performance is critical. But then we are falling back to c++ 
> We can use our own index method but would like to make dbl use of the 
> PyTables index.
>
> I know the python table.where( is implemented in C.
>
> Hi Jim,
>
> This is only kind of true. Querying (ie all of the where*() methods) 
> are actually mostly written in Python in the tables.py and 
> expressions.py files. However, they make use of numexpr [1].
>
>     Is there a way to access that from c or c++? Don't mind if I need
>     to do work to get the result I think in my case the work may be
>     worth it.
>
> *PLAN 1:* One possibility is that the parts of PyTables are written in 
> Cython. We could maybe try (without making any edits to these files) 
> to convert them to Cython. This has the advantage that for Cython 
> files, if you write the appropriate C++ header file and link against 
> the shared library correctly, it is possible to access certain 
> functions from C/C++. BUT, I am not sure how much of speed boost you 
> would get out of this since you would still be calling out to the 
> Python interpreter to get these result. You are just calling Python's 
> virtual machine from C++ rather than calling it from Python (like 
> normal). This has the advantage that you would basically get access to 
> these functions acting on tables from C++.
>
> *PLAN 2:* Alternatively, numexpr itself is mostly written in C++ 
> already. You should be able to call core numexpr functions directly. 
> However, you would have to feed it data that you read from the tables 
> yourself. These could even be table indexes. On a personal note, if 
> you get code working that does this, I would be interested in seeing 
> your implementation. (I have another project where I have tables that 
> I want to query from C++)
>
> Let us know what route you ultimately end up taking or if you have any 
> further questions!
>
> Be Well
>
> Anthony
>
> 1. http://code.google.com/p/numexpr/source/browse/#hg%2Fnumexpr
>
>     ------------------------------------------------------------------------
>
>     *Jim Knoll**
>     *Data Developer**
>
>     Spot Trading L.L.C
>     440 South LaSalle St., Suite 2800
>     Chicago, IL 60605
>     Office: 312.362.4550 <tel:312.362.4550>
>     Direct: 312-362-4798 <tel:312-362-4798>
>     Fax: 312.362.4551 <tel:312.362.4551>
>     [email protected] <mailto:[email protected]>
>     www.spottradingllc.com <http://www.spottradingllc.com/>
>
>     ------------------------------------------------------------------------
>
>     The information contained in this message may be privileged and
>     confidential and protected from disclosure. If the reader of this
>     message is not the intended recipient, or an employee or agent
>     responsible for delivering this message to the intended recipient,
>     you are hereby notified that any dissemination, distribution or
>     copying of this communication is strictly prohibited. If you have
>     received this communication in error, please notify us immediately
>     by replying to the message and deleting it from your computer.
>     Thank you. Spot Trading, LLC
>
>
>     
> ------------------------------------------------------------------------------
>     Everyone hates slow websites. So do we.
>     Make your web apps faster with AppDynamics
>     Download AppDynamics Lite for free today:
>     http://p.sf.net/sfu/appdyn_d2d_nov
>     _______________________________________________
>     Pytables-users mailing list
>     [email protected]
>     <mailto:[email protected]>
>     https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_nov
>
>
> _______________________________________________
> Pytables-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pytables-users


-- 
Francesc Alted


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] pyTable index from c++

Reply via email to