Re: [Pytables-users] Some method like a "table.readWhereSorted"

2013-04-10 Thread Dr. Louis Wicker
I am also interested in the this capability, if it exists in some way...

Lou

On Apr 10, 2013, at 12:35 PM, Julio Trevisan  wrote:

> Hi,
> 
> Is there a way that I could have the ability of readWhere (i.e., specify 
> condition, and fast result) but also using a CSIndex so that the rows come 
> sorted in a particular order?
> 
> I checked readSorted() but it is iterative and does not allow to specify a 
> condition.
> 
> Julio
> --
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users


| Dr. Louis J. Wicker
| NSSL/WRDD  Rm 4366
| National Weather Center
| 120 David L. Boren Boulevard, Norman, OK 73072
|
| E-mail:   louis.wic...@noaa.gov
| HTTP:http://www.nssl.noaa.gov/~lwicker
| Phone:(405) 325-6340
| Fax:(405) 325-6780
|
| 
I  For every complex problem, there is a solution that is simple, 
|  neat, and wrong.
|
|   -- H. L. Mencken
|

| "The contents  of this message are mine personally and
| do not reflect any position of  the Government or NOAA."



--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] Some method like a "table.readWhereSorted"

2013-04-10 Thread Julio Trevisan
Hi,

Is there a way that I could have the ability of readWhere (i.e., specify
condition, and fast result) but also using a CSIndex so that the rows come
sorted in a particular order?

I checked readSorted() but it is iterative and does not allow to specify a
condition.

Julio
--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] ReadWhere() with a Time64Col in the condition

2013-04-10 Thread Julio Trevisan
Thanks again :)


On Wed, Apr 10, 2013 at 1:53 PM, Anthony Scopatz  wrote:

> On Wed, Apr 10, 2013 at 11:40 AM, Julio Trevisan 
> wrote:
>
>> Hi Anthony
>>
>> Thanks again.* *If it is a problem related to floating-point precision,
>> I might use an Int64Col instead, since I don't need the timestamp
>> miliseconds.
>>
>
> Another good plan since integers are exact ;)
>
>
>>
>>
>> Julio
>>
>>
>>
>>
>> On Wed, Apr 10, 2013 at 1:17 PM, Anthony Scopatz wrote:
>>
>>> On Wed, Apr 10, 2013 at 7:44 AM, Julio Trevisan >> > wrote:
>>>
 Hi,

 I am using a Time64Col called "timestamp" in a condition, and I noticed
 that the condition does not work (i.e., no rows are selected) if I write
 something as:

 for row in node.where("timestamp == %f" % t):
 ...

 However, I had this idea of dividing the values by, say 1000, and it
 does work:

 for row in node.where("timestamp/1000 == %f" % t/1000):
 ...

 However, this doesn't seem to be an elegant solution. Please could
 someone point out a better solution to this?

>>>
>>> Hello Julio,
>>>
>>> While this may not be the most elegant solution it is probably one of
>>> the most appropriate.  The problem here likely stems from the fact that
>>> floating point numbers (which are how Time64Cols are stored) are not exact
>>> representations of the desired value.  For example:
>>>
>>> In [1]: 1.1 + 2.2
>>> Out[1]: 3.3003
>>>
>>> So when you divide my some constant order of magnitude, you are chopping
>>> off the error associated with floating point precision.   You are creating
>>> a bin of this constant's size around the target value that is "close
>>> enough" to count as equivalent.  There are other mechanisms for alleviating
>>> this issue: dividing and multiplying back (x/10)*10 == y,  right shifting
>>> (platform dependent), taking the difference and have it be less than some
>>> tolerance x - y <= t.  You get the idea.   You have to mitigate this effect
>>> some how.
>>>
>>> For more information please refer to:
>>> http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
>>>
>>>
 Could this be related to the fact that my column name is "timestamp"? I
 ask this because I use a program called HDFView to brose the HDF5 file.
 This program refuses to show the first column when it is called
 "timestamp", but shows it when it is called "id". I don't know if the facts
 are related or not.

>>>
>>> This is probably unrelated.
>>>
>>> Be Well
>>> Anthony
>>>
>>>

 I don't know if this is useful information, but the conversion of a
 typical "t" to string gives something like this:

 >> print "%f" % t
 1365597435.00




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for
 building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis & visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


>>>
>>>
>>> --
>>> Precog is a next-generation analytics platform capable of advanced
>>> analytics on semi-structured data. The platform includes APIs for
>>> building
>>> apps and a phenomenal toolset for data science. Developers can use
>>> our toolset for easy data analysis & visualization. Get a free account!
>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> ___
>>> Pytables-users mailing list
>>> Pytables-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> --
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> ___
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our too

Re: [Pytables-users] ReadWhere() with a Time64Col in the condition

2013-04-10 Thread Anthony Scopatz
On Wed, Apr 10, 2013 at 11:40 AM, Julio Trevisan wrote:

> Hi Anthony
>
> Thanks again.* *If it is a problem related to floating-point precision, I
> might use an Int64Col instead, since I don't need the timestamp miliseconds.
>

Another good plan since integers are exact ;)


>
>
> Julio
>
>
>
>
> On Wed, Apr 10, 2013 at 1:17 PM, Anthony Scopatz wrote:
>
>> On Wed, Apr 10, 2013 at 7:44 AM, Julio Trevisan 
>> wrote:
>>
>>> Hi,
>>>
>>> I am using a Time64Col called "timestamp" in a condition, and I noticed
>>> that the condition does not work (i.e., no rows are selected) if I write
>>> something as:
>>>
>>> for row in node.where("timestamp == %f" % t):
>>> ...
>>>
>>> However, I had this idea of dividing the values by, say 1000, and it
>>> does work:
>>>
>>> for row in node.where("timestamp/1000 == %f" % t/1000):
>>> ...
>>>
>>> However, this doesn't seem to be an elegant solution. Please could
>>> someone point out a better solution to this?
>>>
>>
>> Hello Julio,
>>
>> While this may not be the most elegant solution it is probably one of the
>> most appropriate.  The problem here likely stems from the fact that
>> floating point numbers (which are how Time64Cols are stored) are not exact
>> representations of the desired value.  For example:
>>
>> In [1]: 1.1 + 2.2
>> Out[1]: 3.3003
>>
>> So when you divide my some constant order of magnitude, you are chopping
>> off the error associated with floating point precision.   You are creating
>> a bin of this constant's size around the target value that is "close
>> enough" to count as equivalent.  There are other mechanisms for alleviating
>> this issue: dividing and multiplying back (x/10)*10 == y,  right shifting
>> (platform dependent), taking the difference and have it be less than some
>> tolerance x - y <= t.  You get the idea.   You have to mitigate this effect
>> some how.
>>
>> For more information please refer to:
>> http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
>>
>>
>>> Could this be related to the fact that my column name is "timestamp"? I
>>> ask this because I use a program called HDFView to brose the HDF5 file.
>>> This program refuses to show the first column when it is called
>>> "timestamp", but shows it when it is called "id". I don't know if the facts
>>> are related or not.
>>>
>>
>> This is probably unrelated.
>>
>> Be Well
>> Anthony
>>
>>
>>>
>>> I don't know if this is useful information, but the conversion of a
>>> typical "t" to string gives something like this:
>>>
>>> >> print "%f" % t
>>> 1365597435.00
>>>
>>>
>>>
>>>
>>> --
>>> Precog is a next-generation analytics platform capable of advanced
>>> analytics on semi-structured data. The platform includes APIs for
>>> building
>>> apps and a phenomenal toolset for data science. Developers can use
>>> our toolset for easy data analysis & visualization. Get a free account!
>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> ___
>>> Pytables-users mailing list
>>> Pytables-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> --
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> ___
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.n

Re: [Pytables-users] ReadWhere() with a Time64Col in the condition

2013-04-10 Thread Julio Trevisan
Hi Anthony

Thanks again.* *If it is a problem related to floating-point precision, I
might use an Int64Col instead, since I don't need the timestamp miliseconds.

Julio




On Wed, Apr 10, 2013 at 1:17 PM, Anthony Scopatz  wrote:

> On Wed, Apr 10, 2013 at 7:44 AM, Julio Trevisan 
> wrote:
>
>> Hi,
>>
>> I am using a Time64Col called "timestamp" in a condition, and I noticed
>> that the condition does not work (i.e., no rows are selected) if I write
>> something as:
>>
>> for row in node.where("timestamp == %f" % t):
>> ...
>>
>> However, I had this idea of dividing the values by, say 1000, and it does
>> work:
>>
>> for row in node.where("timestamp/1000 == %f" % t/1000):
>> ...
>>
>> However, this doesn't seem to be an elegant solution. Please could
>> someone point out a better solution to this?
>>
>
> Hello Julio,
>
> While this may not be the most elegant solution it is probably one of the
> most appropriate.  The problem here likely stems from the fact that
> floating point numbers (which are how Time64Cols are stored) are not exact
> representations of the desired value.  For example:
>
> In [1]: 1.1 + 2.2
> Out[1]: 3.3003
>
> So when you divide my some constant order of magnitude, you are chopping
> off the error associated with floating point precision.   You are creating
> a bin of this constant's size around the target value that is "close
> enough" to count as equivalent.  There are other mechanisms for alleviating
> this issue: dividing and multiplying back (x/10)*10 == y,  right shifting
> (platform dependent), taking the difference and have it be less than some
> tolerance x - y <= t.  You get the idea.   You have to mitigate this effect
> some how.
>
> For more information please refer to:
> http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
>
>
>> Could this be related to the fact that my column name is "timestamp"? I
>> ask this because I use a program called HDFView to brose the HDF5 file.
>> This program refuses to show the first column when it is called
>> "timestamp", but shows it when it is called "id". I don't know if the facts
>> are related or not.
>>
>
> This is probably unrelated.
>
> Be Well
> Anthony
>
>
>>
>> I don't know if this is useful information, but the conversion of a
>> typical "t" to string gives something like this:
>>
>> >> print "%f" % t
>> 1365597435.00
>>
>>
>>
>>
>> --
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> ___
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] ReadWhere() with a Time64Col in the condition

2013-04-10 Thread Anthony Scopatz
On Wed, Apr 10, 2013 at 7:44 AM, Julio Trevisan wrote:

> Hi,
>
> I am using a Time64Col called "timestamp" in a condition, and I noticed
> that the condition does not work (i.e., no rows are selected) if I write
> something as:
>
> for row in node.where("timestamp == %f" % t):
> ...
>
> However, I had this idea of dividing the values by, say 1000, and it does
> work:
>
> for row in node.where("timestamp/1000 == %f" % t/1000):
> ...
>
> However, this doesn't seem to be an elegant solution. Please could someone
> point out a better solution to this?
>

Hello Julio,

While this may not be the most elegant solution it is probably one of the
most appropriate.  The problem here likely stems from the fact that
floating point numbers (which are how Time64Cols are stored) are not exact
representations of the desired value.  For example:

In [1]: 1.1 + 2.2
Out[1]: 3.3003

So when you divide my some constant order of magnitude, you are chopping
off the error associated with floating point precision.   You are creating
a bin of this constant's size around the target value that is "close
enough" to count as equivalent.  There are other mechanisms for alleviating
this issue: dividing and multiplying back (x/10)*10 == y,  right shifting
(platform dependent), taking the difference and have it be less than some
tolerance x - y <= t.  You get the idea.   You have to mitigate this effect
some how.

For more information please refer to:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html


> Could this be related to the fact that my column name is "timestamp"? I
> ask this because I use a program called HDFView to brose the HDF5 file.
> This program refuses to show the first column when it is called
> "timestamp", but shows it when it is called "id". I don't know if the facts
> are related or not.
>

This is probably unrelated.

Be Well
Anthony


>
> I don't know if this is useful information, but the conversion of a
> typical "t" to string gives something like this:
>
> >> print "%f" % t
> 1365597435.00
>
>
>
>
> --
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] ReadWhere() with a Time64Col in the condition

2013-04-10 Thread Julio Trevisan
Hi,

I am using a Time64Col called "timestamp" in a condition, and I noticed
that the condition does not work (i.e., no rows are selected) if I write
something as:

for row in node.where("timestamp == %f" % t):
...

However, I had this idea of dividing the values by, say 1000, and it does
work:

for row in node.where("timestamp/1000 == %f" % t/1000):
...

However, this doesn't seem to be an elegant solution. Please could someone
point out a better solution to this?

Could this be related to the fact that my column name is "timestamp"? I ask
this because I use a program called HDFView to brose the HDF5 file. This
program refuses to show the first column when it is called "timestamp", but
shows it when it is called "id". I don't know if the facts are related or
not.

I don't know if this is useful information, but the conversion of a typical
"t" to string gives something like this:

>> print "%f" % t
1365597435.00
--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users