On Thu, 9 Dec 2021 at 07:00, Wenyi Huang <[email protected]> wrote:
> Thanks, Joris.
>
> I was misled by the pandas -> Arrow Conversion table:
> https://arrow.apache.org/docs/python/pandas.html#pandas-arrow-conversion
> which does not show datetime.time -> time64 conversion.
>
> Thanks for pointing that out. I opened a JIRA to fix this:
https://issues.apache.org/jira/browse/ARROW-15043
> A followup question is:
>
> I have a table with a timestamp column in pandas/arrow and saved as
> parquet format.
> Is there a way to filter on the timestamp.time when reading the parquet?
> For example ('x.time', '=', '10:00:00') or ('x.time', '=',
> datetime.time(10)).
>
> Currently, I am doing so by saving both a timestamp column and a time
> column, which is kind of duplicate in information.
>
Unfortunately that is not yet possible, so your workaround might be the
best solution for now. All functionalities actually exist, and the dataset
filtering API can work with functions on field names, but we still need to
fit together the pieces to be able to construct such a filter in Python:
https://issues.apache.org/jira/browse/ARROW-12060
Joris
>
> Thanks,
> Wenyi
>
> On Mon, Dec 6, 2021 at 7:10 AM Joris Van den Bossche <
> [email protected]> wrote:
>
>> On Fri, 3 Dec 2021 at 23:36, Wenyi Huang <[email protected]> wrote:
>> >
>> > Hi Arrow Team,
>> >
>> > What is the best data type to save Time of the Day if I want to use
>> Pandas Datafrome, but dump data to parquet (or other formats) via PyArrow?
>> >
>> > I see that pandas Arrow conversion does not convert datetime.time nor
>> timedelta.
>>
>> Can you show an example? Because I think that this conversion should
>> handle both cases:
>>
>> >>> import datetime
>> >>> df = pd.DataFrame({"time": [datetime.time(9)], "timedelta":
>> [pd.Timedelta("9 hours")]})
>> >>> df
>> time timedelta
>> 0 09:00:00 0 days 09:00:00
>>
>> >>> pa.table(df)
>> pyarrow.Table
>> time: time64[us]
>> timedelta: duration[ns]
>> ----
>> time: [[09:00:00.000000]]
>> timedelta: [[32400000000000]]
>>
>> The resulting Arrow table has columns with time and duration type
>> (duration is the Arrow equivalent for timedelta).
>>
>> Joris
>>
>> >
>> > The use case is that I want to save the time of the day column. So that
>> while reading (parquet or other formats), I can filter by time.
>> >
>> > Best,
>> > Wenyi
>> >
>> > --
>> > Wenyi Huang
>> > LinkedIn: https://www.linkedin.com/in/harrywy/
>> > Google Scholar:
>> https://scholar.google.com/citations?user=K-RWg7gAAAAJ&hl=en
>> > Email: [email protected]
>> >
>> >
>>
>
>
> --
> Wenyi Huang
> AI & Quantitative Researcher @Citadel, LLC,
> LinkedIn: https://www.linkedin.com/in/harrywy/
> Google Scholar: *https://scholar.google.com/citations?user=K-RWg7gAAAAJ&hl=en
> <https://scholar.google.com/citations?user=K-RWg7gAAAAJ&hl=en>*
> Email: [email protected]
>
>
>