Thank you. So it sounds like always use pyarrow.scalar. Do you know if
libraries (like something using or creating a pyarrow dataset) expected to
handle both?

On Mon, May 27, 2024 at 6:26 PM Felipe Oliveira Carvalho <
[email protected]> wrote:

> I couldn't find the docs for compute.scalar, but by checking the
> source code I can say this:
>
> pyarrow.scalar [1] creates an instance of a pyarrow.*Scalar class from
> a Python object.
> pyarrow.compute.scalar [2] creates an Arrow compute Expression
> wrapping a scalar object.
>
> You rarely need pyarrow.compute.scalar because when you pass an Arrow
> Scalar or a Python object where an Expression is expected, it gets
> automatically wrapped by Expression._expr_or_scalar() [3].
>
> [1]
> https://arrow.apache.org/docs/python/generated/pyarrow.scalar.html#pyarrow.scalar
> [2]
> https://github.com/apache/arrow/blob/main/python/pyarrow/compute.py#L718
> [3]
> https://github.com/apache/arrow/blob/main/python/pyarrow/_compute.pyx#L2494
>
> --
> Felipe
>
> On Mon, May 27, 2024 at 11:43 AM Adrian Garcia Badaracco
> <[email protected]> wrote:
> >
> > These seem to be two different things, but there’s nothing in the docs
> explaining what the difference is. Some things like pyarrow.dataset.dataset
> seem to work with either or even a mix (for partitions / fragments).
> >
> > ```python
> > from datetime import datetime, timezone
> > import pyarrow as pa
> > import pyarrow.compute as pc
> >
> > v = datetime(2000, 1, 1, tzinfo=timezone.utc)
> > print(v)  # 2000-01-01 00:00:00+00:00
> >
> > print(pa.scalar(v, pa.timestamp('ns', tz='UTC')))  # 2000-01-01
> 00:00:00+00:00
> >
> > print(pc.scalar(v))  # 2000-01-01 00:00:00.000000Z
> > # according to the docs this should be a bool, int float or str but at
> runtime a datetime is accepted
> > # seems to assume UTC but can't set ns precision
> > ```
> >
> > Could someone clarify what the differences are, and if they’re on
> purpose or accidental, etc.?
>

Reply via email to