[Pytables-users] ReadWhere() with a Time64Col in the condition
Hi, I am using a Time64Col called timestamp in a condition, and I noticed that the condition does not work (i.e., no rows are selected) if I write something as: for row in node.where(timestamp == %f % t): ... However, I had this idea of dividing the values by, say 1000, and it does work: for row in node.where(timestamp/1000 == %f % t/1000): ... However, this doesn't seem to be an elegant solution. Please could someone point out a better solution to this? Could this be related to the fact that my column name is timestamp? I ask this because I use a program called HDFView to brose the HDF5 file. This program refuses to show the first column when it is called timestamp, but shows it when it is called id. I don't know if the facts are related or not. I don't know if this is useful information, but the conversion of a typical t to string gives something like this: print %f % t 1365597435.00 -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] ReadWhere() with a Time64Col in the condition
On Wed, Apr 10, 2013 at 7:44 AM, Julio Trevisan juliotrevi...@gmail.comwrote: Hi, I am using a Time64Col called timestamp in a condition, and I noticed that the condition does not work (i.e., no rows are selected) if I write something as: for row in node.where(timestamp == %f % t): ... However, I had this idea of dividing the values by, say 1000, and it does work: for row in node.where(timestamp/1000 == %f % t/1000): ... However, this doesn't seem to be an elegant solution. Please could someone point out a better solution to this? Hello Julio, While this may not be the most elegant solution it is probably one of the most appropriate. The problem here likely stems from the fact that floating point numbers (which are how Time64Cols are stored) are not exact representations of the desired value. For example: In [1]: 1.1 + 2.2 Out[1]: 3.3003 So when you divide my some constant order of magnitude, you are chopping off the error associated with floating point precision. You are creating a bin of this constant's size around the target value that is close enough to count as equivalent. There are other mechanisms for alleviating this issue: dividing and multiplying back (x/10)*10 == y, right shifting (platform dependent), taking the difference and have it be less than some tolerance x - y = t. You get the idea. You have to mitigate this effect some how. For more information please refer to: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html Could this be related to the fact that my column name is timestamp? I ask this because I use a program called HDFView to brose the HDF5 file. This program refuses to show the first column when it is called timestamp, but shows it when it is called id. I don't know if the facts are related or not. This is probably unrelated. Be Well Anthony I don't know if this is useful information, but the conversion of a typical t to string gives something like this: print %f % t 1365597435.00 -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] ReadWhere() with a Time64Col in the condition
On Wed, Apr 10, 2013 at 11:40 AM, Julio Trevisan juliotrevi...@gmail.comwrote: Hi Anthony Thanks again.* *If it is a problem related to floating-point precision, I might use an Int64Col instead, since I don't need the timestamp miliseconds. Another good plan since integers are exact ;) Julio On Wed, Apr 10, 2013 at 1:17 PM, Anthony Scopatz scop...@gmail.comwrote: On Wed, Apr 10, 2013 at 7:44 AM, Julio Trevisan juliotrevi...@gmail.comwrote: Hi, I am using a Time64Col called timestamp in a condition, and I noticed that the condition does not work (i.e., no rows are selected) if I write something as: for row in node.where(timestamp == %f % t): ... However, I had this idea of dividing the values by, say 1000, and it does work: for row in node.where(timestamp/1000 == %f % t/1000): ... However, this doesn't seem to be an elegant solution. Please could someone point out a better solution to this? Hello Julio, While this may not be the most elegant solution it is probably one of the most appropriate. The problem here likely stems from the fact that floating point numbers (which are how Time64Cols are stored) are not exact representations of the desired value. For example: In [1]: 1.1 + 2.2 Out[1]: 3.3003 So when you divide my some constant order of magnitude, you are chopping off the error associated with floating point precision. You are creating a bin of this constant's size around the target value that is close enough to count as equivalent. There are other mechanisms for alleviating this issue: dividing and multiplying back (x/10)*10 == y, right shifting (platform dependent), taking the difference and have it be less than some tolerance x - y = t. You get the idea. You have to mitigate this effect some how. For more information please refer to: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html Could this be related to the fact that my column name is timestamp? I ask this because I use a program called HDFView to brose the HDF5 file. This program refuses to show the first column when it is called timestamp, but shows it when it is called id. I don't know if the facts are related or not. This is probably unrelated. Be Well Anthony I don't know if this is useful information, but the conversion of a typical t to string gives something like this: print %f % t 1365597435.00 -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users