Re: [Pytables-users] PyTables in-kernel query using Time64Col returns wrong results

Anthony Scopatz Mon, 15 Apr 2013 16:08:35 -0700

Hi Charles,

This is very likely a bug with respect to querying based off of Time64Cols
not being converted to Float64s for the query itself.  Under the covers,
HDF5 and PyTables represent Time64 as a posix times, which are structs of
two 4 byte ints [1].  These obviously have a very different memory layout
than your standard float64.  This is why this comparison is failing.


numexpr doesn't support the time64 datatype, nor does it support bit shift
operators.  This makes it difficult to impossible to use time64 columns
properly from within a query right now.

I'll make open a ticket for this, but if you want something working right
now using Float64Col is probably your best bet.  This is what I have always
done, and it works just fine.  I think that the Time64 stuff is in there
largely for C/HDF5 compliance.  Sorry about the confusion.

Be Well
Anthony

1. http://pubs.opengroup.org/onlinepubs/000095399/basedefs/sys/time.h.html


On Mon, Apr 15, 2013 at 2:20 PM, Charles de Villiers <chas...@yahoo.com>wrote:

> Hi Anthony,
>
> Thanks for your response.
>
> I had come across that discussion, but I don't think the floating-point
> precision thing really explains my results, because I'm querying for
> intervals, not instants.
> if I have a table containing, say, one-second samples between 500.0 and
> 1500.0, and I use a where clause like this:
> '(update_seconds >= 1000.0) & (update_seconds <= 1060.0)'
> then I expect to get at least 58 samples, even with floating-point
> 'fuzziness' - but in fact I get none.
> However, I have now tried the approach of storing my epoch seconds in
> Float64Cols and that seems to be working just fine.
> The question I'm left with is - just what does a Time64Col represent?
> Since there's no standard Python Time class with a float representation, I
> just guessed I  could assign it float seconds a la time.time(), but
> Float64 works just as well for that (and as it turns out, better). How
> could you use a Time64Col in practice?
>
> Thanks again,
>
> Charles de Villiers
>
> "They have computers, and they may have other weapons of mass destruction."
> (Janet Reno)
>
>   ------------------------------
>  *From:* Anthony Scopatz <scop...@gmail.com>
> *To:* Charles de Villiers <chas...@yahoo.com>; Discussion list for
> PyTables <pytables-users@lists.sourceforge.net>
> *Sent:* Monday, April 15, 2013 5:13 PM
> *Subject:* Re: [Pytables-users] PyTables in-kernel query using Time64Col
> returns wrong results
>
> Hi Charles,
>
> We just discussed this last week and I am too lazy to retype it all so
> here is a link to the archive post [1].
>
> Be Well
> Anthony
>
> 1. http://sourceforge.net/mailarchive/message.php?msg_id=30708089
>
>
> On Mon, Apr 15, 2013 at 9:20 AM, Charles de Villiers <chas...@yahoo.com>wrote:
>
>
> 0down 
> votefavorite<http://stackoverflow.com/questions/16013711/pytables-in-kernel-search-on-time64col#>
> **
>  I'm using PyTables 2.4.0 and Python 2.7 I've got a database that contains
> the following typical table:
>
> /anc/asc_wind_speed (Table(87591,), shuffle, blosc(3)) 'Wind speed'
>   description := {
>   "value_seconds": Time64Col(shape=(), dflt=0.0, pos=0),
>   "update_seconds": Time64Col(shape=(),
>  dflt=0.0, pos=1),
>   "status": UInt8Col(shape=(), dflt=0, pos=2),
>   "value": Float64Col(shape=(), dflt=0.0, pos=3)}
>   byteorder := 'little'
>   chunkshape := (2621,)
>   autoIndex := True
>   colindexes := {
>     "update_seconds": Index(9,
>  full, shuffle, zlib(1)).is_CSI=True,
>     "value": Index(9,
>  full, shuffle, zlib(1)).is_CSI=True}
>
> I populate the timestamp columns using float seconds.
> The data looks OK in my IPython session:
>
> array([(1343779432.2160001, 1343779431.8529999, 0, 5.2975000000000003),
>        (1343779433.2190001, 1343779432.9430001, 0, 5.7474999999999996),
>        (1343779434.217, 1343779433.9809999, 0, 5.8600000000000003), ...,
>        (1343866301.934, 1343866301.5139999, 0, 3.8424999999999998),
>        (1343866302.934, 1343866302.5799999, 0, 4.0599999999999996),
>        (1343866303.934, 1343866303.642, 0, 3.7825000000000002)],
>
>   dtype=[('value_seconds', '<f8'), ('update_seconds', '<f8'), ('status', 
> '|u1'), ('value', '<f8')])
>
> .. but when I try to do an in-kernel search using the indexed column
> 'update_seconds', everything goes pear-shaped:
>
> len(wstable.readWhere('(update_seconds <= 1343866303.642)'))0
>
> ie I get 0 rows returned when I was expecting all 87591 of them.
> Occasionally I do manage to get some rows with a '>=' query, but the
> timestamp columns are then returned as huge floats (~10^79). It seems that
> there is some implicit type-conversion going on that causes the Time64Col
> values to be misinterpreted. Can someone spot my mistake, or should I
> forget about Time64Cols and convert them all to Float64 (and how do I do
> this?)
>
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] PyTables in-kernel query using Time64Col returns wrong results

Reply via email to