This is an automated email from the ASF dual-hosted git repository.

paleolimbot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-nanoarrow.git


The following commit(s) were added to refs/heads/main by this push:
     new fcc540a8 docs(python): Update Python bindings readme (#474)
fcc540a8 is described below

commit fcc540a8fabe03a38f07e010f8a72c733e18a4a8
Author: Dewey Dunnington <de...@dunnington.ca>
AuthorDate: Fri May 17 16:22:00 2024 -0300

    docs(python): Update Python bindings readme (#474)
    
    The previous readme was written for the previous release and is
    outdated!
---
 python/README.ipynb | 480 +++++++++++++++++++++++++++++++++++++++++-----------
 python/README.md    | 318 ++++++++++++++++++++++++++--------
 2 files changed, 624 insertions(+), 174 deletions(-)

diff --git a/python/README.ipynb b/python/README.ipynb
index 0f13829a..5d62065b 100644
--- a/python/README.ipynb
+++ b/python/README.ipynb
@@ -36,11 +36,19 @@
     "\n",
     "## Installation\n",
     "\n",
-    "Python bindings for nanoarrow are not yet available on PyPI. You can 
install via\n",
-    "URL (requires a C compiler):\n",
+    "The nanoarrow Python bindings are available from 
[PyPI](https://pypi.org/) and\n",
+    "[conda-forge](https://conda-forge.org/):\n",
     "\n",
-    "```bash\n",
-    "python -m pip install 
\"git+https://github.com/apache/arrow-nanoarrow.git#egg=nanoarrow&subdirectory=python\"\n";,
+    "```shell\n",
+    "pip install nanoarrow\n",
+    "conda install nanoarrow -c conda-forge\n",
+    "```\n",
+    "\n",
+    "Development versions (based on the `main` branch) are also available:\n",
+    "\n",
+    "```shell\n",
+    "pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ \\\n",
+    "    --prefer-binary --pre nanoarrow\n",
     "```\n",
     "\n",
     "If you can import the namespace, you're good to go!"
@@ -48,7 +56,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 46,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -56,102 +64,326 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Low-level C library bindings\n",
-    "\n",
-    "The Arrow C Data and Arrow C Stream interfaces are comprised of three 
structures: the `ArrowSchema` which represents a data type of an array, the 
`ArrowArray` which represents the values of an array, and an 
`ArrowArrayStream`, which represents zero or more `ArrowArray`s with a common 
`ArrowSchema`.\n",
+    "## Data types, arrays, and array streams\n",
     "\n",
-    "### Schemas\n",
-    "\n",
-    "Use `nanoarrow.c_schema()` to convert an object to an `ArrowSchema` and 
wrap it as a Python object. This works for any object implementing the [Arrow 
PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface.html) 
(e.g., `pyarrow.Schema`, `pyarrow.DataType`, and `pyarrow.Field`)."
+    "The Arrow C Data and Arrow C Stream interfaces are comprised of three 
structures: the `ArrowSchema` which represents a data type of an array, the 
`ArrowArray` which represents the values of an array, and an 
`ArrowArrayStream`, which represents zero or more `ArrowArray`s with a common 
`ArrowSchema`. These concepts map to the `nanoarrow.Schema`, `nanoarrow.Array`, 
and `nanoarrow.ArrayStream` in the Python package."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 47,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "<nanoarrow.c_lib.CSchema decimal128(10, 3)>\n",
-       "- format: 'd:10,3'\n",
-       "- name: ''\n",
-       "- flags: 2\n",
-       "- metadata: NULL\n",
-       "- dictionary: NULL\n",
-       "- children[0]:"
+       "<Schema> int32"
       ]
      },
-     "execution_count": 2,
+     "execution_count": 47,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "import pyarrow as pa\n",
-    "schema = na.c_schema(pa.decimal128(10, 3))\n",
-    "schema"
+    "na.int32()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "nanoarrow.Array<int32>[3]\n",
+       "1\n",
+       "2\n",
+       "3"
+      ]
+     },
+     "execution_count": 48,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "na.Array([1, 2, 3], na.int32())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `nanoarrow.Array` can accommodate arrays with any number of chunks, 
reflecting the reality that many array containers (e.g., 
`pyarrow.ChunkedArray`, `polars.Series`) support this."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "nanoarrow.Array<int32>[6]\n",
+       "1\n",
+       "2\n",
+       "3\n",
+       "4\n",
+       "5\n",
+       "6"
+      ]
+     },
+     "execution_count": 49,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chunked = na.Array.from_chunks([[1, 2, 3], [4, 5, 6]], na.int32())\n",
+    "chunked"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can extract the fields of a `CSchema` object one at a time or parse 
it into a view to extract deserialized parameters."
+    "Whereas chunks of an `Array` are always fully materialized when the 
object is constructed, the chunks of an `ArrayStream` have not necessarily been 
resolved yet."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "nanoarrow.ArrayStream<int32>"
+      ]
+     },
+     "execution_count": 50,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "stream = na.ArrayStream(chunked)\n",
+    "stream"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "nanoarrow.Array<int32>[3]\n",
+      "1\n",
+      "2\n",
+      "3\n",
+      "nanoarrow.Array<int32>[3]\n",
+      "4\n",
+      "5\n",
+      "6\n"
+     ]
+    }
+   ],
+   "source": [
+    "with stream:\n",
+    "    for chunk in stream:\n",
+    "        print(chunk)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `nanoarrow.ArrayStream` also provides an interface to nanoarrow's 
[Arrow 
IPC](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc)
 reader:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "nanoarrow.ArrayStream<non-nullable struct<commit: string, time: 
timestamp('us', 'UTC'), files: int3...>"
+      ]
+     },
+     "execution_count": 52,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "url = 
\"https://github.com/apache/arrow-experiments/raw/main/data/arrow-commits/arrow-commits.arrows\"\n";,
+    "na.ArrayStream.from_url(url)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "These objects implement the [Arrow PyCapsule 
interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html)
 for both producing and consuming and are interchangeable with `pyarrow` 
objects in many cases:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "pyarrow.Field<: int32>"
+      ]
+     },
+     "execution_count": 53,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import pyarrow as pa\n",
+    "\n",
+    "pa.field(na.int32())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<pyarrow.lib.ChunkedArray object at 0x12a49a250>\n",
+       "[\n",
+       "  [\n",
+       "    1,\n",
+       "    2,\n",
+       "    3\n",
+       "  ],\n",
+       "  [\n",
+       "    4,\n",
+       "    5,\n",
+       "    6\n",
+       "  ]\n",
+       "]"
+      ]
+     },
+     "execution_count": 54,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pa.chunked_array(chunked)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<pyarrow.lib.Int32Array object at 0x11b552500>\n",
+       "[\n",
+       "  4,\n",
+       "  5,\n",
+       "  6\n",
+       "]"
+      ]
+     },
+     "execution_count": 55,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pa.array(chunked.chunk(1))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "nanoarrow.Array<int64>[3]\n",
+       "10\n",
+       "11\n",
+       "12"
+      ]
+     },
+     "execution_count": 56,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "na.Array(pa.array([10, 11, 12]))"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 57,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "<nanoarrow.c_lib.CSchemaView>\n",
-       "- type: 'decimal128'\n",
-       "- storage_type: 'decimal128'\n",
-       "- decimal_bitwidth: 128\n",
-       "- decimal_precision: 10\n",
-       "- decimal_scale: 3\n",
-       "- dictionary_ordered: False\n",
-       "- map_keys_sorted: False\n",
-       "- nullable: True\n",
-       "- storage_type_id: 24\n",
-       "- type_id: 24"
+       "<Schema> string"
       ]
      },
-     "execution_count": 3,
+     "execution_count": 57,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "na.c_schema_view(schema)"
+    "na.Schema(pa.string())"
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Advanced users can allocate an empty `CSchema` and populate its contents 
by passing its `._addr()` to a schema-exporting function."
+    "## Low-level C library bindings\n",
+    "\n",
+    "The nanoarrow Python package also provides lower level wrappers around 
Arrow C interface structures. You can create these using 
`nanoarrow.c_schema()`, `nanoarrow.c_array()`, and 
`nanoarrow.c_array_stream()`.\n",
+    "\n",
+    "### Schemas\n",
+    "\n",
+    "Use `nanoarrow.c_schema()` to convert an object to an `ArrowSchema` and 
wrap it as a Python object. This works for any object implementing the [Arrow 
PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface.html) 
(e.g., `pyarrow.Schema`, `pyarrow.DataType`, and `pyarrow.Field`)."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 58,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "<nanoarrow.c_lib.CSchema int32>\n",
-       "- format: 'i'\n",
+       "<nanoarrow.c_schema.CSchema decimal128(10, 3)>\n",
+       "- format: 'd:10,3'\n",
        "- name: ''\n",
        "- flags: 2\n",
        "- metadata: NULL\n",
@@ -159,15 +391,41 @@
        "- children[0]:"
       ]
      },
-     "execution_count": 4,
+     "execution_count": 58,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "na.c_schema(pa.decimal128(10, 3))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Using `c_schema()` is a good fit for testing and for ephemeral schema 
objects that are being passed from one library to another. To extract the 
fields of a schema in a more convenient form, use `Schema()`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(10, 3)"
+      ]
+     },
+     "execution_count": 59,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "schema = na.allocate_c_schema()\n",
-    "pa.int32()._export_to_c(schema._addr())\n",
-    "schema"
+    "schema = na.Schema(pa.decimal128(10, 3))\n",
+    "schema.precision, schema.scale"
    ]
   },
   {
@@ -190,29 +448,28 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 60,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "<nanoarrow.c_lib.CArray string>\n",
+       "<nanoarrow.c_array.CArray string>\n",
        "- length: 4\n",
        "- offset: 0\n",
        "- null_count: 1\n",
-       "- buffers: (3678035706048, 3678035705984, 3678035706112)\n",
+       "- buffers: (4754305168, 4754307808, 4754310464)\n",
        "- dictionary: NULL\n",
        "- children[0]:"
       ]
      },
-     "execution_count": 5,
+     "execution_count": 60,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "array = na.c_array(pa.array([\"one\", \"two\", \"three\", None]))\n",
-    "array"
+    "na.c_array([\"one\", \"two\", \"three\", None], na.string())"
    ]
   },
   {
@@ -220,67 +477,87 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can extract the fields of a `CArray` one at a time or parse it into a 
view to extract deserialized content:"
+    "Using `c_array()` is a good fit for testing and for ephemeral array 
objects that are being passed from one library to another. For a higher level 
interface, use `Array()`:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 61,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "<nanoarrow.c_lib.CArrayView>\n",
-       "- storage_type: 'string'\n",
-       "- length: 4\n",
-       "- offset: 0\n",
-       "- null_count: 1\n",
-       "- buffers[3]:\n",
-       "  - validity <bool[1 b] 11100000>\n",
-       "  - data_offset <int32[20 b] 0 3 6 11 11>\n",
-       "  - data <string[11 b] b'onetwothree'>\n",
-       "- dictionary: NULL\n",
-       "- children[0]:"
+       "['one', 'two', 'three', None]"
       ]
      },
-     "execution_count": 6,
+     "execution_count": 61,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "na.c_array_view(array)"
+    "array = na.Array([\"one\", \"two\", \"three\", None], na.string())\n",
+    "array.to_pylist()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 62,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(nanoarrow.c_lib.CBufferView(bool[1 b] 11100000),\n",
+       " nanoarrow.c_lib.CBufferView(int32[20 b] 0 3 6 11 11),\n",
+       " nanoarrow.c_lib.CBufferView(string[11 b] b'onetwothree'))"
+      ]
+     },
+     "execution_count": 62,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "array.buffers"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Like the `CSchema`, you can allocate an empty one and access its address 
with `_addr()` to pass to other array-exporting functions."
+    "Advanced users can create arrays directly from buffers using 
`c_array_from_buffers()`:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 63,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "3"
+       "<nanoarrow.c_array.CArray string>\n",
+       "- length: 2\n",
+       "- offset: 0\n",
+       "- null_count: 0\n",
+       "- buffers: (0, 5002908320, 4999694624)\n",
+       "- dictionary: NULL\n",
+       "- children[0]:"
       ]
      },
-     "execution_count": 7,
+     "execution_count": 63,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "array = na.allocate_c_array()\n",
-    "pa.array([1, 2, 3])._export_to_c(array._addr(), array.schema._addr())\n",
-    "array.length"
+    "na.c_array_from_buffers(\n",
+    "    na.string(),\n",
+    "    2,\n",
+    "    [None, na.c_buffer([0, 3, 6], na.int32()), b\"abcdef\"]\n",
+    ")"
    ]
   },
   {
@@ -290,30 +567,29 @@
    "source": [
     "### Array streams\n",
     "\n",
-    "You can use `nanoarrow.c_array_stream()` to wrap an object representing a 
sequence of `CArray`s with a common `CSchema` to an `ArrowArrayStream` and wrap 
it as a Python object. This works for any object implementing the [Arrow 
PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface.html) 
(e.g., `pyarrow.RecordBatchReader`)."
+    "You can use `nanoarrow.c_array_stream()` to wrap an object representing a 
sequence of `CArray`s with a common `CSchema` to an `ArrowArrayStream` and wrap 
it as a Python object. This works for any object implementing the [Arrow 
PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface.html) 
(e.g., `pyarrow.RecordBatchReader`, `pyarrow.ChunkedArray`)."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 64,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "<nanoarrow.c_lib.CArrayStream>\n",
-       "- get_schema(): struct<some_column: int32>"
+       "<nanoarrow.c_array_stream.CArrayStream>\n",
+       "- get_schema(): struct<col1: int64>"
       ]
      },
-     "execution_count": 8,
+     "execution_count": 64,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "pa_array_child = pa.array([1, 2, 3], pa.int32())\n",
-    "pa_array = pa.record_batch([pa_array_child], names=[\"some_column\"])\n",
-    "reader = pa.RecordBatchReader.from_batches(pa_array.schema, 
[pa_array])\n",
+    "pa_batch = pa.record_batch({\"col1\": [1, 2, 3]})\n",
+    "reader = pa.RecordBatchReader.from_batches(pa_batch.schema, 
[pa_batch])\n",
     "array_stream = na.c_array_stream(reader)\n",
     "array_stream"
    ]
@@ -328,25 +604,25 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 65,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "<nanoarrow.c_lib.CArray struct<some_column: int32>>\n",
+      "<nanoarrow.c_array.CArray struct<col1: int64>>\n",
       "- length: 3\n",
       "- offset: 0\n",
       "- null_count: 0\n",
       "- buffers: (0,)\n",
       "- dictionary: NULL\n",
       "- children[1]:\n",
-      "  'some_column': <nanoarrow.c_lib.CArray int32>\n",
+      "  'col1': <nanoarrow.c_array.CArray int64>\n",
       "    - length: 3\n",
       "    - offset: 0\n",
       "    - null_count: 0\n",
-      "    - buffers: (0, 3678035837056)\n",
+      "    - buffers: (0, 2642948588352)\n",
       "    - dictionary: NULL\n",
       "    - children[0]:\n"
      ]
@@ -358,34 +634,34 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can also get the address of a freshly-allocated stream to pass to a 
suitable exporting function:"
+    "Use `ArrayStream()` for a higher level interface:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 66,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "<nanoarrow.c_lib.CArrayStream>\n",
-       "- get_schema(): struct<some_column: int32>"
+       "nanoarrow.Array<non-nullable struct<col1: int64>>[3]\n",
+       "{'col1': 1}\n",
+       "{'col1': 2}\n",
+       "{'col1': 3}"
       ]
      },
-     "execution_count": 10,
+     "execution_count": 66,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "array_stream = na.allocate_c_array_stream()\n",
-    "reader._export_to_c(array_stream._addr())\n",
-    "array_stream"
+    "reader = pa.RecordBatchReader.from_batches(pa_batch.schema, 
[pa_batch])\n",
+    "na.ArrayStream(reader).read_all()"
    ]
   },
   {
@@ -408,11 +684,13 @@
     "\n",
     "```shell\n",
     "# Install dependencies\n",
-    "pip install -e .[test]\n",
+    "pip install -e \".[test]\"\n",
     "\n",
     "# Run tests\n",
     "pytest -vvx\n",
-    "```"
+    "```\n",
+    "\n",
+    "CMake is currently required to ensure that the vendored copy of nanoarrow 
in the Python package stays in sync with the nanoarrow sources in the working 
tree."
    ]
   }
  ],
@@ -432,7 +710,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.12.3"
   },
   "orig_nbformat": 4
  },
diff --git a/python/README.md b/python/README.md
index 42b4e390..f279a095 100644
--- a/python/README.md
+++ b/python/README.md
@@ -29,11 +29,19 @@ interfaces.
 
 ## Installation
 
-Python bindings for nanoarrow are not yet available on PyPI. You can install 
via
-URL (requires a C compiler):
+The nanoarrow Python bindings are available from [PyPI](https://pypi.org/) and
+[conda-forge](https://conda-forge.org/):
 
-```bash
-python -m pip install 
"git+https://github.com/apache/arrow-nanoarrow.git#egg=nanoarrow&subdirectory=python";
+```shell
+pip install nanoarrow
+conda install nanoarrow -c conda-forge
+```
+
+Development versions (based on the `main` branch) are also available:
+
+```shell
+pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ \
+    --prefer-binary --pre nanoarrow
 ```
 
 If you can import the namespace, you're good to go!
@@ -43,72 +51,207 @@ If you can import the namespace, you're good to go!
 import nanoarrow as na
 ```
 
-## Low-level C library bindings
+## Data types, arrays, and array streams
 
-The Arrow C Data and Arrow C Stream interfaces are comprised of three 
structures: the `ArrowSchema` which represents a data type of an array, the 
`ArrowArray` which represents the values of an array, and an 
`ArrowArrayStream`, which represents zero or more `ArrowArray`s with a common 
`ArrowSchema`.
+The Arrow C Data and Arrow C Stream interfaces are comprised of three 
structures: the `ArrowSchema` which represents a data type of an array, the 
`ArrowArray` which represents the values of an array, and an 
`ArrowArrayStream`, which represents zero or more `ArrowArray`s with a common 
`ArrowSchema`. These concepts map to the `nanoarrow.Schema`, `nanoarrow.Array`, 
and `nanoarrow.ArrayStream` in the Python package.
+
+
+```python
+na.int32()
+```
+
+
+
+
+    <Schema> int32
 
-### Schemas
 
-Use `nanoarrow.c_schema()` to convert an object to an `ArrowSchema` and wrap 
it as a Python object. This works for any object implementing the [Arrow 
PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface.html) 
(e.g., `pyarrow.Schema`, `pyarrow.DataType`, and `pyarrow.Field`).
+
+
+```python
+na.Array([1, 2, 3], na.int32())
+```
+
+
+
+
+    nanoarrow.Array<int32>[3]
+    1
+    2
+    3
+
+
+
+The `nanoarrow.Array` can accommodate arrays with any number of chunks, 
reflecting the reality that many array containers (e.g., 
`pyarrow.ChunkedArray`, `polars.Series`) support this.
+
+
+```python
+chunked = na.Array.from_chunks([[1, 2, 3], [4, 5, 6]], na.int32())
+chunked
+```
+
+
+
+
+    nanoarrow.Array<int32>[6]
+    1
+    2
+    3
+    4
+    5
+    6
+
+
+
+Whereas chunks of an `Array` are always fully materialized when the object is 
constructed, the chunks of an `ArrayStream` have not necessarily been resolved 
yet.
+
+
+```python
+stream = na.ArrayStream(chunked)
+stream
+```
+
+
+
+
+    nanoarrow.ArrayStream<int32>
+
+
+
+
+```python
+with stream:
+    for chunk in stream:
+        print(chunk)
+```
+
+    nanoarrow.Array<int32>[3]
+    1
+    2
+    3
+    nanoarrow.Array<int32>[3]
+    4
+    5
+    6
+
+
+The `nanoarrow.ArrayStream` also provides an interface to nanoarrow's [Arrow 
IPC](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc)
 reader:
+
+
+```python
+url = 
"https://github.com/apache/arrow-experiments/raw/main/data/arrow-commits/arrow-commits.arrows";
+na.ArrayStream.from_url(url)
+```
+
+
+
+
+    nanoarrow.ArrayStream<non-nullable struct<commit: string, time: 
timestamp('us', 'UTC'), files: int3...>
+
+
+
+These objects implement the [Arrow PyCapsule 
interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html)
 for both producing and consuming and are interchangeable with `pyarrow` 
objects in many cases:
 
 
 ```python
 import pyarrow as pa
-schema = na.c_schema(pa.decimal128(10, 3))
-schema
+
+pa.field(na.int32())
 ```
 
 
 
 
-    <nanoarrow.c_lib.CSchema decimal128(10, 3)>
-    - format: 'd:10,3'
-    - name: ''
-    - flags: 2
-    - metadata: NULL
-    - dictionary: NULL
-    - children[0]:
+    pyarrow.Field<: int32>
+
+
+
+
+```python
+pa.chunked_array(chunked)
+```
+
+
+
+
+    <pyarrow.lib.ChunkedArray object at 0x12a49a250>
+    [
+      [
+        1,
+        2,
+        3
+      ],
+      [
+        4,
+        5,
+        6
+      ]
+    ]
+
+
+
+
+```python
+pa.array(chunked.chunk(1))
+```
+
+
+
+
+    <pyarrow.lib.Int32Array object at 0x11b552500>
+    [
+      4,
+      5,
+      6
+    ]
+
+
+
+
+```python
+na.Array(pa.array([10, 11, 12]))
+```
+
+
+
 
+    nanoarrow.Array<int64>[3]
+    10
+    11
+    12
 
 
-You can extract the fields of a `CSchema` object one at a time or parse it 
into a view to extract deserialized parameters.
 
 
 ```python
-na.c_schema_view(schema)
+na.Schema(pa.string())
 ```
 
 
 
 
-    <nanoarrow.c_lib.CSchemaView>
-    - type: 'decimal128'
-    - storage_type: 'decimal128'
-    - decimal_bitwidth: 128
-    - decimal_precision: 10
-    - decimal_scale: 3
-    - dictionary_ordered: False
-    - map_keys_sorted: False
-    - nullable: True
-    - storage_type_id: 24
-    - type_id: 24
+    <Schema> string
+
+
+
+## Low-level C library bindings
 
+The nanoarrow Python package also provides lower level wrappers around Arrow C 
interface structures. You can create these using `nanoarrow.c_schema()`, 
`nanoarrow.c_array()`, and `nanoarrow.c_array_stream()`.
 
+### Schemas
 
-Advanced users can allocate an empty `CSchema` and populate its contents by 
passing its `._addr()` to a schema-exporting function.
+Use `nanoarrow.c_schema()` to convert an object to an `ArrowSchema` and wrap 
it as a Python object. This works for any object implementing the [Arrow 
PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface.html) 
(e.g., `pyarrow.Schema`, `pyarrow.DataType`, and `pyarrow.Field`).
 
 
 ```python
-schema = na.allocate_c_schema()
-pa.int32()._export_to_c(schema._addr())
-schema
+na.c_schema(pa.decimal128(10, 3))
 ```
 
 
 
 
-    <nanoarrow.c_lib.CSchema int32>
-    - format: 'i'
+    <nanoarrow.c_schema.CSchema decimal128(10, 3)>
+    - format: 'd:10,3'
     - name: ''
     - flags: 2
     - metadata: NULL
@@ -117,6 +260,21 @@ schema
 
 
 
+Using `c_schema()` is a good fit for testing and for ephemeral schema objects 
that are being passed from one library to another. To extract the fields of a 
schema in a more convenient form, use `Schema()`:
+
+
+```python
+schema = na.Schema(pa.decimal128(10, 3))
+schema.precision, schema.scale
+```
+
+
+
+
+    (10, 3)
+
+
+
 The `CSchema` object cleans up after itself: when the object is deleted, the 
underlying `ArrowSchema` is released.
 
 ### Arrays
@@ -125,72 +283,83 @@ You can use `nanoarrow.c_array()` to convert an 
array-like object to an `ArrowAr
 
 
 ```python
-array = na.c_array(pa.array(["one", "two", "three", None]))
-array
+na.c_array(["one", "two", "three", None], na.string())
 ```
 
 
 
 
-    <nanoarrow.c_lib.CArray string>
+    <nanoarrow.c_array.CArray string>
     - length: 4
     - offset: 0
     - null_count: 1
-    - buffers: (3678035706048, 3678035705984, 3678035706112)
+    - buffers: (4754305168, 4754307808, 4754310464)
     - dictionary: NULL
     - children[0]:
 
 
 
-You can extract the fields of a `CArray` one at a time or parse it into a view 
to extract deserialized content:
+Using `c_array()` is a good fit for testing and for ephemeral array objects 
that are being passed from one library to another. For a higher level 
interface, use `Array()`:
 
 
 ```python
-na.c_array_view(array)
+array = na.Array(["one", "two", "three", None], na.string())
+array.to_pylist()
 ```
 
 
 
 
-    <nanoarrow.c_lib.CArrayView>
-    - storage_type: 'string'
-    - length: 4
-    - offset: 0
-    - null_count: 1
-    - buffers[3]:
-      - validity <bool[1 b] 11100000>
-      - data_offset <int32[20 b] 0 3 6 11 11>
-      - data <string[11 b] b'onetwothree'>
-    - dictionary: NULL
-    - children[0]:
+    ['one', 'two', 'three', None]
+
+
+
+
+```python
+array.buffers
+```
+
+
 
 
+    (nanoarrow.c_lib.CBufferView(bool[1 b] 11100000),
+     nanoarrow.c_lib.CBufferView(int32[20 b] 0 3 6 11 11),
+     nanoarrow.c_lib.CBufferView(string[11 b] b'onetwothree'))
 
-Like the `CSchema`, you can allocate an empty one and access its address with 
`_addr()` to pass to other array-exporting functions.
+
+
+Advanced users can create arrays directly from buffers using 
`c_array_from_buffers()`:
 
 
 ```python
-array = na.allocate_c_array()
-pa.array([1, 2, 3])._export_to_c(array._addr(), array.schema._addr())
-array.length
+na.c_array_from_buffers(
+    na.string(),
+    2,
+    [None, na.c_buffer([0, 3, 6], na.int32()), b"abcdef"]
+)
 ```
 
 
 
 
-    3
+    <nanoarrow.c_array.CArray string>
+    - length: 2
+    - offset: 0
+    - null_count: 0
+    - buffers: (0, 5002908320, 4999694624)
+    - dictionary: NULL
+    - children[0]:
 
 
 
 ### Array streams
 
-You can use `nanoarrow.c_array_stream()` to wrap an object representing a 
sequence of `CArray`s with a common `CSchema` to an `ArrowArrayStream` and wrap 
it as a Python object. This works for any object implementing the [Arrow 
PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface.html) 
(e.g., `pyarrow.RecordBatchReader`).
+You can use `nanoarrow.c_array_stream()` to wrap an object representing a 
sequence of `CArray`s with a common `CSchema` to an `ArrowArrayStream` and wrap 
it as a Python object. This works for any object implementing the [Arrow 
PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface.html) 
(e.g., `pyarrow.RecordBatchReader`, `pyarrow.ChunkedArray`).
 
 
 ```python
-pa_array_child = pa.array([1, 2, 3], pa.int32())
-pa_array = pa.record_batch([pa_array_child], names=["some_column"])
-reader = pa.RecordBatchReader.from_batches(pa_array.schema, [pa_array])
+pa_batch = pa.record_batch({"col1": [1, 2, 3]})
+reader = pa.RecordBatchReader.from_batches(pa_batch.schema, [pa_batch])
 array_stream = na.c_array_stream(reader)
 array_stream
 ```
@@ -198,8 +367,8 @@ array_stream
 
 
 
-    <nanoarrow.c_lib.CArrayStream>
-    - get_schema(): struct<some_column: int32>
+    <nanoarrow.c_array_stream.CArrayStream>
+    - get_schema(): struct<col1: int64>
 
 
 
@@ -211,36 +380,37 @@ for array in array_stream:
     print(array)
 ```
 
-    <nanoarrow.c_lib.CArray struct<some_column: int32>>
+    <nanoarrow.c_array.CArray struct<col1: int64>>
     - length: 3
     - offset: 0
     - null_count: 0
     - buffers: (0,)
     - dictionary: NULL
     - children[1]:
-      'some_column': <nanoarrow.c_lib.CArray int32>
+      'col1': <nanoarrow.c_array.CArray int64>
         - length: 3
         - offset: 0
         - null_count: 0
-        - buffers: (0, 3678035837056)
+        - buffers: (0, 2642948588352)
         - dictionary: NULL
         - children[0]:
 
 
-You can also get the address of a freshly-allocated stream to pass to a 
suitable exporting function:
+Use `ArrayStream()` for a higher level interface:
 
 
 ```python
-array_stream = na.allocate_c_array_stream()
-reader._export_to_c(array_stream._addr())
-array_stream
+reader = pa.RecordBatchReader.from_batches(pa_batch.schema, [pa_batch])
+na.ArrayStream(reader).read_all()
 ```
 
 
 
 
-    <nanoarrow.c_lib.CArrayStream>
-    - get_schema(): struct<some_column: int32>
+    nanoarrow.Array<non-nullable struct<col1: int64>>[3]
+    {'col1': 1}
+    {'col1': 2}
+    {'col1': 3}
 
 
 
@@ -264,3 +434,5 @@ pip install -e ".[test]"
 # Run tests
 pytest -vvx
 ```
+
+CMake is currently required to ensure that the vendored copy of nanoarrow in 
the Python package stays in sync with the nanoarrow sources in the working tree.

Reply via email to