[Pytables-users] ReadWhere() with a Time64Col in the condition

2013-04-10 Thread Julio Trevisan
Hi,

I am using a Time64Col called timestamp in a condition, and I noticed
that the condition does not work (i.e., no rows are selected) if I write
something as:

for row in node.where(timestamp == %f % t):
...

However, I had this idea of dividing the values by, say 1000, and it does
work:

for row in node.where(timestamp/1000 == %f % t/1000):
...

However, this doesn't seem to be an elegant solution. Please could someone
point out a better solution to this?

Could this be related to the fact that my column name is timestamp? I ask
this because I use a program called HDFView to brose the HDF5 file. This
program refuses to show the first column when it is called timestamp, but
shows it when it is called id. I don't know if the facts are related or
not.

I don't know if this is useful information, but the conversion of a typical
t to string gives something like this:

 print %f % t
1365597435.00
--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] ReadWhere() with a Time64Col in the condition

2013-04-10 Thread Anthony Scopatz
On Wed, Apr 10, 2013 at 7:44 AM, Julio Trevisan juliotrevi...@gmail.comwrote:

 Hi,

 I am using a Time64Col called timestamp in a condition, and I noticed
 that the condition does not work (i.e., no rows are selected) if I write
 something as:

 for row in node.where(timestamp == %f % t):
 ...

 However, I had this idea of dividing the values by, say 1000, and it does
 work:

 for row in node.where(timestamp/1000 == %f % t/1000):
 ...

 However, this doesn't seem to be an elegant solution. Please could someone
 point out a better solution to this?


Hello Julio,

While this may not be the most elegant solution it is probably one of the
most appropriate.  The problem here likely stems from the fact that
floating point numbers (which are how Time64Cols are stored) are not exact
representations of the desired value.  For example:

In [1]: 1.1 + 2.2
Out[1]: 3.3003

So when you divide my some constant order of magnitude, you are chopping
off the error associated with floating point precision.   You are creating
a bin of this constant's size around the target value that is close
enough to count as equivalent.  There are other mechanisms for alleviating
this issue: dividing and multiplying back (x/10)*10 == y,  right shifting
(platform dependent), taking the difference and have it be less than some
tolerance x - y = t.  You get the idea.   You have to mitigate this effect
some how.

For more information please refer to:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html


 Could this be related to the fact that my column name is timestamp? I
 ask this because I use a program called HDFView to brose the HDF5 file.
 This program refuses to show the first column when it is called
 timestamp, but shows it when it is called id. I don't know if the facts
 are related or not.


This is probably unrelated.

Be Well
Anthony



 I don't know if this is useful information, but the conversion of a
 typical t to string gives something like this:

  print %f % t
 1365597435.00




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] ReadWhere() with a Time64Col in the condition

2013-04-10 Thread Anthony Scopatz
On Wed, Apr 10, 2013 at 11:40 AM, Julio Trevisan juliotrevi...@gmail.comwrote:

 Hi Anthony

 Thanks again.* *If it is a problem related to floating-point precision, I
 might use an Int64Col instead, since I don't need the timestamp miliseconds.


Another good plan since integers are exact ;)




 Julio




 On Wed, Apr 10, 2013 at 1:17 PM, Anthony Scopatz scop...@gmail.comwrote:

 On Wed, Apr 10, 2013 at 7:44 AM, Julio Trevisan 
 juliotrevi...@gmail.comwrote:

 Hi,

 I am using a Time64Col called timestamp in a condition, and I noticed
 that the condition does not work (i.e., no rows are selected) if I write
 something as:

 for row in node.where(timestamp == %f % t):
 ...

 However, I had this idea of dividing the values by, say 1000, and it
 does work:

 for row in node.where(timestamp/1000 == %f % t/1000):
 ...

 However, this doesn't seem to be an elegant solution. Please could
 someone point out a better solution to this?


 Hello Julio,

 While this may not be the most elegant solution it is probably one of the
 most appropriate.  The problem here likely stems from the fact that
 floating point numbers (which are how Time64Cols are stored) are not exact
 representations of the desired value.  For example:

 In [1]: 1.1 + 2.2
 Out[1]: 3.3003

 So when you divide my some constant order of magnitude, you are chopping
 off the error associated with floating point precision.   You are creating
 a bin of this constant's size around the target value that is close
 enough to count as equivalent.  There are other mechanisms for alleviating
 this issue: dividing and multiplying back (x/10)*10 == y,  right shifting
 (platform dependent), taking the difference and have it be less than some
 tolerance x - y = t.  You get the idea.   You have to mitigate this effect
 some how.

 For more information please refer to:
 http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html


 Could this be related to the fact that my column name is timestamp? I
 ask this because I use a program called HDFView to brose the HDF5 file.
 This program refuses to show the first column when it is called
 timestamp, but shows it when it is called id. I don't know if the facts
 are related or not.


 This is probably unrelated.

 Be Well
 Anthony



 I don't know if this is useful information, but the conversion of a
 typical t to string gives something like this:

  print %f % t
 1365597435.00




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for
 building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users