Hi David, Thanks for your reply, I'll keep an eye on that PR.
On Wed, 13 Jul 2022 at 17:43, David Li <[email protected]> wrote: > At the moment I think it's mostly metadata, but there is a PR that > validates non-nullable fields indeed do not contain nulls. [1] > > There are places in compute kernels that optimize based on the > presence/absence of nulls but they do so mostly by looking at the physical > data and not the type (so the optimization will still apply if there just > happen to not be nulls). > > [1]: https://github.com/apache/arrow/pull/12706 > > On Mon, Jul 11, 2022, at 17:20, Arthur Andres wrote: > > Hi all, > > Is the behaviour of pa.Field.nullable documented somewhere? > > I had some expectations of what it does. For example it should make sure > that you can't have null/missing value in a column that is declared with > nullable=False. But it doesn't seem to be the case. > > ``` > import pyarrow as pa > > schema = pa.schema( > [ > pa.field("nullable_true", pa.string(), nullable=True), > pa.field("nullable_false", pa.string(), nullable=False), > ] > ) > > table = pa.Table.from_arrays( > [ > pa.array(["", "foo", None], pa.string()), > pa.array(["", "foo", None], pa.string()), > ], > schema=schema, > ) > > assert table.schema == schema > assert table['nullable_true'].null_count == 1 > assert table['nullable_false'].null_count == 1 > assert table.validate() is None > assert table.validate(full=True) is None > ``` > > The only place where I've seen the nullable flag being used is when > casting nested column from nullable to non-nullable: > > ``` > import pyarrow as pa > > struct_array = pa.StructArray.from_arrays( > [ > pa.array(["", "foo", None], pa.string()), > ], > names=["nested_col_level_1"], > ) > nested_table = pa.Table.from_arrays([struct_array], > names=["nested_col_level_0"]) > assert nested_table.validate(full=True) is None > assert nested_table.validate() is None > > nested_table.cast( > pa.schema( > [ > pa.field( > "nested_col_level_0", > pa.struct( > [pa.field("nested_col_level_1", pa.string(), > nullable=False)] > ), > ) > ] > ) > ) > ``` > > Thanks for your help! > > > > >
