Re: [Pytables-users] In-kernal for subset?

2012-08-16 Thread Adam Dershowitz
From: Anthony Scopatz scop...@gmail.commailto:scop...@gmail.com
Reply-To: Discussion list for PyTables 
pytables-users@lists.sourceforge.netmailto:pytables-users@lists.sourceforge.net
Date: Wednesday, August 15, 2012 11:29 PM
To: Discussion list for PyTables 
pytables-users@lists.sourceforge.netmailto:pytables-users@lists.sourceforge.net
Subject: Re: [Pytables-users] In-kernal for subset?

On Thu, Aug 16, 2012 at 1:06 AM, Adam Dershowitz 
adershow...@exponent.commailto:adershow...@exponent.com wrote:
From: Anthony Scopatz scop...@gmail.commailto:scop...@gmail.com
Reply-To: Discussion list for PyTables 
pytables-users@lists.sourceforge.netmailto:pytables-users@lists.sourceforge.net
Date: Wednesday, August 15, 2012 2:47 PM
To: Discussion list for PyTables 
pytables-users@lists.sourceforge.netmailto:pytables-users@lists.sourceforge.net
Subject: Re: [Pytables-users] In-kernal for subset?

On Wed, Aug 15, 2012 at 12:33 PM, Adam Dershowitz 
adershow...@exponent.commailto:adershow...@exponent.com wrote:
I am trying to find all cases where a value transitions above a threshold.  So, 
my code first does a getwherelist to find values that are above the threshold, 
then it uses that list to find immediately prior values that are below.  The 
code is working, but the second part, searching through just a smaller subset 
is much slower (First search is on the order of 1 second, while the second is a 
minute).
Is there any way to get this second part of the search in-kernal?  Or any more 
general way to do a search for values above a threshold, where the prior value 
is below?
Essentially, what I am looking for is a way to speed up that second search for 
all rows in a prior defined list, where a condition is applied to the table

My table is just seconds and values, in chronological order.

Here is the code that I am using now:

h5data = tb.openFile(AllData.h5,r)
table1 = h5data.root.table1

#Find all values above threshold:
thelist= table1.getWhereList(Value  150)

#From the above list find all values where the immediately prior value is below:
transition=[]
for i in thelist:
if (table1[i-1]['Value']  150) and (i != 0) :
transition.append(i)

Hey Adam,

Sorry for taking a while to respond.  Assuming you don't mind one of these 
being = or =, you don't really need the second loop with a little index 
arithmetic:

import numpy as np
inds = np.array(thelist)
dinds = inds[1:] - inds[:-1]
transition = dinds[(1  dinds)]

This should get you an array of all of the transition indices since wherever 
the difference in indices is greater than 1 the Value must have dropped below 
the threshold and then returned back up.

Be Well
Anthony



Thanks much for the response.  At first it didn't work, but it gave me the 
right idea, and now I got it working.  There were two problems above.  1)  I 
believe that yon u had a typo and the last line should have been inds[(1  … 
and not dinds[(1…  Otherwise you just get back the deltas instead of the 
actual index values.

Whoops, serves me right for hacking this out so quickly!

But, that still returned an array that wasn't working.  Turns out, after 
thinking some, that it was actually offset by one.  So by prepending a value 
into dinds (greater then 1, since the first value greater than the threshold, 
must always be a transition or the first table entry) it seems to solve the 
problem.  Here is the code that seems to work:

import numpy as np
inds = np.array(thelist)
dinds=np.append([2],inds[1:] - inds[:-1])
trans=inds[(1dinds)]

Now, I am still curious, more for academic reasons, since the code now works, 
if there would be a way to speed up the second loop above?  It seems like there 
are other examples, where index arithmetic might not work, so is there a way to 
do an in-kernal search through just a subset of a table?

So the issue is that we rely on numexpr here for our in-kernel queries and 
numpexpr doesn't support indexing at all.  There may be hope for this in the 
future (see numba).  So the go stndexal here is to do whatever you can to not 
have queries which rely on comparing two different indexes of the same data.

If you really wanted to do this quickly and in kernel, you could probably store 
two copies of the data. Call 'a' the original and 'b' a copy of 'a' that is 
offset by 1 index and has a dummy value at the end (to make them the same 
size).  Then you could do something like:

tb.Expr('a == b')

This would only work on Array, CArray, and Earray data.  You might be able to 
get it to work using Tables with something like:

tb.Expr('a == b', uservars={'a': atable, 'b': btable})

I hope this helps.
Be Well
Anthony



Yes, it helps explain the issue.  I appreciate the info.

--Adam

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security

Re: [Pytables-users] In-kernal for subset?

2012-08-15 Thread Anthony Scopatz
On Wed, Aug 15, 2012 at 12:33 PM, Adam Dershowitz
adershow...@exponent.comwrote:

  I am trying to find all cases where a value transitions above a
 threshold.  So, my code first does a getwherelist to find values that are
 above the threshold, then it uses that list to find immediately prior
 values that are below.  The code is working, but the second part, searching
 through just a smaller subset is much slower (First search is on the order
 of 1 second, while the second is a minute).
 Is there any way to get this second part of the search in-kernal?  Or any
 more general way to do a search for values above a threshold, where the
 prior value is below?
 Essentially, what I am looking for is a way to speed up that second search
 for all rows in a prior defined list, where a condition is applied to the
 table

  My table is just seconds and values, in chronological order.

  Here is the code that I am using now:

  h5data = tb.openFile(AllData.h5,r)
 table1 = h5data.root.table1

  #Find all values above threshold:
  thelist= table1.getWhereList(Value  150)

  #From the above list find all values where the immediately prior value
 is below:
 transition=[]
 for i in thelist:
 if (table1[i-1]['Value']  150) and (i != 0) :
 transition.append(i)


Hey Adam,

Sorry for taking a while to respond.  Assuming you don't mind one of these
being = or =, you don't really need the second loop with a little index
arithmetic:

import numpy as np
inds = np.array(thelist)
dinds = inds[1:] - inds[:-1]
transition = dinds[(1  dinds)]

This should get you an array of all of the transition indices since
wherever the difference in indices is greater than 1 the Value must have
dropped below the threshold and then returned back up.

Be Well
Anthony



  Thanks,



 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users