[ https://issues.apache.org/jira/browse/ARROW-15748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497471#comment-17497471 ]
Joris Van den Bossche edited comment on ARROW-15748 at 2/24/22, 3:26 PM: ------------------------------------------------------------------------- [~coady] thanks for the report! The link you provide for the actual behaviour points to the C++ docs, and while that indeed uses "day", the bindings in Python _do_ use "second": https://github.com/apache/arrow/blob/094c5ba186cddd69d4aa83de5ed2b62d4ed07081/python/pyarrow/_compute.pyx#L892 Now, the confusing part is that this class is not instantiated (I assume) if no options are used at all, and in that case it uses the defaults from C++. You can see this in the following example: {code:python} >>> arr = pa.array([pd.Timestamp("2012-01-01 09:01:02.123456")]) >>> import pyarrow.compute as pc >>> pc.round_temporal(arr) # <--- indeed uses "day" by default <pyarrow.lib.TimestampArray object at 0x7f5d7b56a040> [ 2012-01-01 00:00:00.000000 ] >>> pc.round_temporal(arr, unit="second") # <--- manually specifying >>> "second" still works <pyarrow.lib.TimestampArray object at 0x7f5d7a67fd00> [ 2012-01-01 09:01:02.000000 ] >>> pc.round_temporal(arr, multiple=5) # <--- but when specifying a >>> different option, it now actually defaults to "second" ... <pyarrow.lib.TimestampArray object at 0x7f5d7b548b80> [ 2012-01-01 09:01:00.000000 ] {code} Now, long story short, the simple conclusion is of course still that we should align the defaults in C++ and Python was (Author: jorisvandenbossche): The link you provide for the actual behaviour points to the C++ docs, and while that indeed uses "day", the bindings in Python _do_ use "second": https://github.com/apache/arrow/blob/094c5ba186cddd69d4aa83de5ed2b62d4ed07081/python/pyarrow/_compute.pyx#L892 Now, the confusing part is that this class is not instantiated (I assume) if no options are used at all, and in that case it uses the defaults from C++. You can see this in the following example: {code:python} >>> arr = pa.array([pd.Timestamp("2012-01-01 09:01:02.123456")]) >>> import pyarrow.compute as pc >>> pc.round_temporal(arr) # <--- indeed uses "day" by default <pyarrow.lib.TimestampArray object at 0x7f5d7b56a040> [ 2012-01-01 00:00:00.000000 ] >>> pc.round_temporal(arr, unit="second") # <--- manually specifying >>> "second" still works <pyarrow.lib.TimestampArray object at 0x7f5d7a67fd00> [ 2012-01-01 09:01:02.000000 ] >>> pc.round_temporal(arr, multiple=5) # <--- but when specifying a >>> different option, it now actually defaults to "second" ... <pyarrow.lib.TimestampArray object at 0x7f5d7b548b80> [ 2012-01-01 09:01:00.000000 ] {code} Now, long story short, the simple conclusion is of course still that we should align the defaults in C++ and Python > [Python] Round temporal options default unit is `day` but documented as > `second`. > --------------------------------------------------------------------------------- > > Key: ARROW-15748 > URL: https://issues.apache.org/jira/browse/ARROW-15748 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 8.0.0 > Reporter: A. Coady > Priority: Minor > > The [python documentation for round temporal options > |https://arrow.apache.org/docs/dev/python/generated/pyarrow.compute.RoundTemporalOptions.html] > says the default unit is `second`, but the [actual > behavior|https://arrow.apache.org/docs/dev/cpp/api/compute.html#classarrow_1_1compute_1_1_round_temporal_options] > is a default of `day`. -- This message was sent by Atlassian Jira (v8.20.1#820001)