Re: pyarrow: pa.compute.scalar vs pa.scalar

Felipe Oliveira Carvalho Mon, 27 May 2024 09:26:00 -0700

I couldn't find the docs for compute.scalar, but by checking the
source code I can say this:


pyarrow.scalar [1] creates an instance of a pyarrow.*Scalar class from
a Python object.
pyarrow.compute.scalar [2] creates an Arrow compute Expression
wrapping a scalar object.

You rarely need pyarrow.compute.scalar because when you pass an Arrow
Scalar or a Python object where an Expression is expected, it gets
automatically wrapped by Expression._expr_or_scalar() [3].

[1] 
https://arrow.apache.org/docs/python/generated/pyarrow.scalar.html#pyarrow.scalar
[2] https://github.com/apache/arrow/blob/main/python/pyarrow/compute.py#L718
[3] https://github.com/apache/arrow/blob/main/python/pyarrow/_compute.pyx#L2494

--
Felipe

On Mon, May 27, 2024 at 11:43 AM Adrian Garcia Badaracco
<[email protected]> wrote:
>
> These seem to be two different things, but there’s nothing in the docs 
> explaining what the difference is. Some things like pyarrow.dataset.dataset 
> seem to work with either or even a mix (for partitions / fragments).
>
> ```python
> from datetime import datetime, timezone
> import pyarrow as pa
> import pyarrow.compute as pc
>
> v = datetime(2000, 1, 1, tzinfo=timezone.utc)
> print(v)  # 2000-01-01 00:00:00+00:00
>
> print(pa.scalar(v, pa.timestamp('ns', tz='UTC')))  # 2000-01-01 00:00:00+00:00
>
> print(pc.scalar(v))  # 2000-01-01 00:00:00.000000Z
> # according to the docs this should be a bool, int float or str but at 
> runtime a datetime is accepted
> # seems to assume UTC but can't set ns precision
> ```
>
> Could someone clarify what the differences are, and if they’re on purpose or 
> accidental, etc.?

Re: pyarrow: pa.compute.scalar vs pa.scalar

Reply via email to