(arrow-datafusion-python) branch upgrade-to-support-311 updated (270bb89 -> b734332)

2023-10-29 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch upgrade-to-support-311
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from 270bb89  upgrade
 add 501acff  Allow for multiple input files per table instead of a single 
file (#519)
 add c6a7af5  Add support for window function bindings (#521)
 add d7fcea2  small clippy fix (#524)
 add fc3c24b  Prepare 32.0.0 Release (#525)
 add aaaeeb1  First pass at getting architectured builds working (#350)
 add da6c183  Remove libprotobuf dep (#527)
 add b734332  Merge branch 'main' into upgrade-to-support-311

No new revisions were added by this update.

Summary of changes:
 .github/workflows/conda.yml|  92 ---
 CHANGELOG.md   |  38 -
 Cargo.lock | 229 -
 Cargo.toml |   3 +-
 docs/build.sh => conda/recipes/bld.bat |  14 +-
 conda/recipes/build.sh |  84 ++
 conda/recipes/meta.yaml|  31 +++-
 datafusion/__init__.py |   4 +-
 datafusion/input/location.py   |  10 +-
 datafusion/tests/test_input.py |   2 +-
 pyproject.toml |   1 +
 src/common/data_type.rs|  18 ++
 src/common/schema.rs   |  16 +-
 src/expr.rs|  13 ++
 src/expr/window.rs | 294 +
 src/functions.rs   |   4 +-
 src/lib.rs |  15 +-
 src/sql/logical.rs |   8 +-
 src/window_frame.rs| 110 
 19 files changed, 700 insertions(+), 286 deletions(-)
 copy docs/build.sh => conda/recipes/bld.bat (72%)
 mode change 100755 => 100644
 create mode 100644 conda/recipes/build.sh
 create mode 100644 src/expr/window.rs
 delete mode 100644 src/window_frame.rs



[arrow-datafusion-python] branch upgrade-to-support-311 updated (91e8f3e -> 270bb89)

2023-10-18 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch upgrade-to-support-311
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from 91e8f3e  format with black
 add 270bb89  upgrade

No new revisions were added by this update.

Summary of changes:
 requirements.txt | 308 ---
 1 file changed, 155 insertions(+), 153 deletions(-)



[arrow-datafusion-python] branch upgrade-to-support-311 updated (1a18099 -> 91e8f3e)

2023-10-18 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch upgrade-to-support-311
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from 1a18099  upgrade to support python 3.11
 add 91e8f3e  format with black

No new revisions were added by this update.

Summary of changes:
 benchmarks/db-benchmark/groupby-datafusion.py | 16 ++
 benchmarks/db-benchmark/join-datafusion.py| 24 ++--
 benchmarks/tpch/tpch.py   |  4 +-
 datafusion/input/base.py  |  8 +--
 datafusion/input/location.py  |  4 +-
 datafusion/tests/test_aggregation.py  | 34 +++
 datafusion/tests/test_context.py  |  8 +--
 datafusion/tests/test_dataframe.py|  8 +--
 datafusion/tests/test_functions.py| 81 ++-
 datafusion/tests/test_input.py|  4 +-
 datafusion/tests/test_substrait.py|  4 +-
 dev/release/check-rat-report.py   |  4 +-
 dev/release/generate-changelog.py |  8 +--
 examples/sql-on-polars.py |  4 +-
 examples/sql-using-python-udaf.py |  8 +--
 examples/substrait.py |  8 +--
 16 files changed, 52 insertions(+), 175 deletions(-)



[arrow-datafusion-python] branch upgrade-to-support-311 updated (61cfee5 -> 1a18099)

2023-10-18 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch upgrade-to-support-311
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


omit 61cfee5  upgrade to support python 3.11
 add 3711c73  Add Python script for generating changelog (#383)
 add 82b4a95  Update for DataFusion 25.0.0 (#386)
 add d912db5  Add Expr::Case when_then_else support to rex_call_operands 
function (#388)
 add 5664a1e  Introduce BaseSessionContext abstract class (#390)
 add 931cabc  CRUD Schema support for `BaseSessionContext` (#392)
 add 51158bd  CRUD Table support for `BaseSessionContext` (#394)
 add 1174969  Prepare for 26.0.0 release (#410)
 add c0be61b  LogicalPlan.to_variant() make public (#412)
 add 3f81513  Prepare 27.0.0 release (#423)
 add 58bdbd8  File based input utils (#433)
 add 5793db3  Upgrade to 28.0.0-rc1 (#434)
 add 93f8063  Introduces utility for obtaining SqlTable information from a 
file like location (#398)
 add 309fc48  feat: expose offset in python API (#437)
 add ffd1541  Use DataFusion 28 (#439)
 add 92ca34b  Build Linux aarch64 wheel (#443)
 add 1fde8e4  feat: add case function (#447) (#448)
 add e34d203  enhancement(docs): Add user guide (#432) (#445)
 add 37c91f4  docs: include pre-commit hooks section in contributor guide 
(#455)
 add e1b3740  feat: add compression options (#456)
 add 0b22c97  Upgrade to DF 28.0.0-rc1 (#457)
 add 217ede8  feat: add register_json (#458)
 add 499f045  feat: add basic compression configuration to write_parquet 
(#459)
 add 9c643bf  feat: add example of reading parquet from s3 (#460)
 add e24dc75  feat: add register_avro and read_table (#461)
 add bc62aaf  feat: add missing scalar math functions (#465)
 add 944b1c9  build(deps): bump arduino/setup-protoc from 1 to 2 (#452)
 add b4d383b  Revert "build(deps): bump arduino/setup-protoc from 1 to 2 
(#452)" (#474)
 add 0d7c19e  Minor: fix wrongly copied function description (#497)
 add af4f758  Upgrade to Datafusion 31.0.0 (#491)
 add beabf26  Add `isnan` and `iszero` (#495)
 add a47712e  Update CHANGELOG and run cargo update (#500)
 add 41d65d1  Improve release process documentation (#505)
 add 106786a  add Binary String Functions (#494)
 add c574d68  build(deps): bump mimalloc from 0.1.38 to 0.1.39 (#502)
 add 31241f8  build(deps): bump syn from 2.0.32 to 2.0.35 (#503)
 add 9ef0a57  build(deps): bump syn from 2.0.35 to 2.0.37 (#506)
 add 8e430ab  Use latest DataFusion (#511)
 add 4c7b14c  add bit_and,bit_or,bit_xor,bool_add,bool_or (#496)
 add 804d0eb  Use DataFusion 32 (#515)
 add a91188c  add first_value last_value (#498)
 add 484ed11  build(deps): bump regex-syntax from 0.7.5 to 0.8.1 (#517)
 add c4675b7  build(deps): bump pyo3-build-config from 0.19.2 to 0.20.0 
(#516)
 add 5ec45dd  add regr_* functions (#499)
 add 399fa75  feat: expose PyWindowFrame (#509)
 add c2768d8  Add random missing bindings (#522)
 add 59140f2  build(deps): bump rustix from 0.38.18 to 0.38.19 (#523)
 add 1a18099  upgrade to support python 3.11

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (61cfee5)
\
 N -- N -- N   refs/heads/upgrade-to-support-311 (1a18099)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build.yml|   30 +
 .gitignore |8 +-
 CHANGELOG.md   |   93 +-
 Cargo.lock | 2247 +++-
 Cargo.toml |   24 +-
 README.md  |6 +-
 datafusion/__init__.py |2 +
 datafusion/context.py  |  142 ++
 datafusion/cudf.py |   54 +-
 .../rust_fmt.sh => datafusion/input/__init__.py|9 +-
 datafusion/{tests/conftest.py => input/base.py}|   49 +-
 datafusion/input/location.py   |   87 +
 datafusion/pandas.py   |   44 +-
 datafusion/polars.py   |   36 +-
 datafusion/tests/test_aggregation.py   |   70 

[arrow-datafusion] branch Jimexist-patch-1 created (now d8ce32ee1d)

2023-09-16 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch Jimexist-patch-1
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


  at d8ce32ee1d Create codeql.yml

This branch includes the following new commits:

 new d8ce32ee1d Create codeql.yml

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-datafusion] 01/01: Create codeql.yml

2023-09-16 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch Jimexist-patch-1
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git

commit d8ce32ee1d7274460c3a0e3866d5816272bb24f5
Author: Jiayu Liu 
AuthorDate: Sat Sep 16 22:40:10 2023 +0800

Create codeql.yml
---
 .github/workflows/codeql.yml | 82 
 1 file changed, 82 insertions(+)

diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml
new file mode 100644
index 00..31bc0a5810
--- /dev/null
+++ b/.github/workflows/codeql.yml
@@ -0,0 +1,82 @@
+# For most projects, this workflow file will not need changing; you simply need
+# to commit it to your repository.
+#
+# You may wish to alter this file to override the set of languages analyzed,
+# or to provide custom queries or build logic.
+#
+#  NOTE 
+# We have attempted to detect the languages in your repository. Please check
+# the `language` matrix defined below to confirm you have the correct set of
+# supported CodeQL languages.
+#
+name: "CodeQL"
+
+on:
+  push:
+branches: [ "main" ]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [ "main" ]
+  schedule:
+- cron: '19 10 * * 6'
+
+jobs:
+  analyze:
+name: Analyze
+# Runner size impacts CodeQL analysis time. To learn more, please see:
+#   - https://gh.io/recommended-hardware-resources-for-running-codeql
+#   - https://gh.io/supported-runners-and-hardware-resources
+#   - https://gh.io/using-larger-runners
+# Consider using larger runners for possible analysis time improvements.
+runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 
'ubuntu-latest' }}
+timeout-minutes: ${{ (matrix.language == 'swift' && 120) || 360 }}
+permissions:
+  actions: read
+  contents: read
+  security-events: write
+
+strategy:
+  fail-fast: false
+  matrix:
+language: [ 'python' ]
+# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 
'python', 'ruby', 'swift' ]
+# Use only 'java' to analyze code written in Java, Kotlin or both
+# Use only 'javascript' to analyze code written in JavaScript, 
TypeScript or both
+# Learn more about CodeQL language support at 
https://aka.ms/codeql-docs/language-support
+
+steps:
+- name: Checkout repository
+  uses: actions/checkout@v3
+
+# Initializes the CodeQL tools for scanning.
+- name: Initialize CodeQL
+  uses: github/codeql-action/init@v2
+  with:
+languages: ${{ matrix.language }}
+# If you wish to specify custom queries, you can do so here or in a 
config file.
+# By default, queries listed here will override any specified in a 
config file.
+# Prefix the list here with "+" to use these queries and those in the 
config file.
+
+# For more details on CodeQL's query packs, refer to: 
https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
+# queries: security-extended,security-and-quality
+
+
+# Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, 
or Swift).
+# If this step fails, then you should remove it and run the build manually 
(see below)
+- name: Autobuild
+  uses: github/codeql-action/autobuild@v2
+
+# ℹ️ Command-line programs to run using the OS shell.
+#  See 
https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
+
+#   If the Autobuild fails above, remove it and uncomment the following 
three lines.
+#   modify them (or add more) to build your code if your project, please 
refer to the EXAMPLE below for guidance.
+
+# - run: |
+# echo "Run, Build Application using script"
+# ./location_of_script_within_repo/buildscript.sh
+
+- name: Perform CodeQL Analysis
+  uses: github/codeql-action/analyze@v2
+  with:
+category: "/language:${{matrix.language}}"



[arrow-datafusion] branch add-greatest-least updated (8f6812626c -> 15bc806f0f)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


 discard 8f6812626c fix coerce tests
 discard 6a4a04dbb0 coerce rules
 discard f8996c29fe add more unit tests
 discard 3ff0dd47c5 fix issue
 discard 74fb1fa8ad [built-in function] add greatest and least
 add 5ddcbc42c1 Resolve contradictory requirements by conversion of 
ordering sensitive aggregators (#6482)
 add 815413c4a4 fix: ignore panics if racing against catalog/schema changes 
(#6536)
 add 15bc806f0f [built-in function] add greatest and least

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (8f6812626c)
\
 N -- N -- N   refs/heads/add-greatest-least (15bc806f0f)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 datafusion/core/src/catalog/information_schema.rs  |  71 +++--
 datafusion/core/src/execution/context.rs   |   5 +-
 .../core/src/physical_plan/aggregates/mod.rs   | 338 +++--
 .../tests/sqllogictests/test_files/aggregate.slt   |   4 +-
 .../tests/sqllogictests/test_files/explain.slt |   2 +-
 .../tests/sqllogictests/test_files/groupby.slt | 205 +
 .../physical-expr/src/aggregate/first_last.rs  |  14 +-
 datafusion/physical-expr/src/aggregate/mod.rs  |  11 +
 datafusion/physical-expr/src/lib.rs|   6 +-
 datafusion/physical-expr/src/sort_expr.rs  |   5 +-
 datafusion/physical-expr/src/type_coercion.rs  |   5 +-
 datafusion/physical-expr/src/utils.rs  |  38 +++
 datafusion/physical-expr/src/window/aggregate.rs   |   6 +-
 datafusion/physical-expr/src/window/built_in.rs|   4 +-
 .../physical-expr/src/window/sliding_aggregate.rs  |   6 +-
 datafusion/physical-expr/src/window/window_expr.rs |  13 -
 16 files changed, 575 insertions(+), 158 deletions(-)



[arrow-datafusion] branch add-greatest-least updated (6a4a04dbb0 -> 8f6812626c)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from 6a4a04dbb0 coerce rules
 add 8f6812626c fix coerce tests

No new revisions were added by this update.

Summary of changes:
 datafusion/physical-expr/src/type_coercion.rs | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)



[arrow-datafusion] branch add-greatest-least updated (f8996c29fe -> 6a4a04dbb0)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from f8996c29fe add more unit tests
 add 6a4a04dbb0 coerce rules

No new revisions were added by this update.

Summary of changes:
 datafusion/expr/src/type_coercion/functions.rs | 52 +-
 datafusion/physical-expr/src/type_coercion.rs  |  2 +-
 2 files changed, 36 insertions(+), 18 deletions(-)



[arrow-datafusion] branch add-greatest-least updated (3ff0dd47c5 -> f8996c29fe)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from 3ff0dd47c5 fix issue
 add f8996c29fe add more unit tests

No new revisions were added by this update.

Summary of changes:
 datafusion/expr/src/type_coercion/functions.rs | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)



[arrow-datafusion] branch add-greatest-least updated (74fb1fa8ad -> 3ff0dd47c5)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from 74fb1fa8ad [built-in function] add greatest and least
 add 3ff0dd47c5 fix issue

No new revisions were added by this update.

Summary of changes:
 datafusion/expr/src/type_coercion/functions.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[arrow-datafusion] branch add-greatest-least updated (1b4e6c862d -> 74fb1fa8ad)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


 discard 1b4e6c862d fi
 discard 8a300c6d67 [built-in function] add greatest and least
 add 74fb1fa8ad [built-in function] add greatest and least

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (1b4e6c862d)
\
 N -- N -- N   refs/heads/add-greatest-least (74fb1fa8ad)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 datafusion/expr/src/type_coercion/functions.rs | 3 +++
 1 file changed, 3 insertions(+)



[arrow-datafusion] branch add-greatest-least updated (8a300c6d67 -> 1b4e6c862d)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from 8a300c6d67 [built-in function] add greatest and least
 add 1b4e6c862d fi

No new revisions were added by this update.

Summary of changes:
 datafusion/expr/src/type_coercion/functions.rs | 45 --
 1 file changed, 36 insertions(+), 9 deletions(-)



[arrow-datafusion] branch add-greatest-least updated (011d02afe0 -> 8a300c6d67)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


 discard 011d02afe0 Update datafusion/expr/src/type_coercion/functions.rs
 discard 30b575a4df [built-in function] add greatest and least
 add 8a300c6d67 [built-in function] add greatest and least

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (011d02afe0)
\
 N -- N -- N   refs/heads/add-greatest-least (8a300c6d67)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 datafusion/expr/src/type_coercion/functions.rs | 10 ++
 1 file changed, 10 insertions(+)



[arrow-datafusion] branch add-greatest-least updated (30b575a4df -> 011d02afe0)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from 30b575a4df [built-in function] add greatest and least
 add 011d02afe0 Update datafusion/expr/src/type_coercion/functions.rs

No new revisions were added by this update.

Summary of changes:
 datafusion/expr/src/type_coercion/functions.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[arrow-datafusion] branch add-greatest-least updated (85bda832ed -> 30b575a4df)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


 discard 85bda832ed fix unit test
 discard e6de9eb51f simplify code
 discard fe107096f6 variadic equal
 discard 538e085f42 [built-in function] add greatest and least
 add 30b575a4df [built-in function] add greatest and least

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (85bda832ed)
\
 N -- N -- N   refs/heads/add-greatest-least (30b575a4df)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 datafusion/expr/src/type_coercion/functions.rs | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)



[arrow-datafusion] branch add-greatest-least updated (e6de9eb51f -> 85bda832ed)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from e6de9eb51f simplify code
 add 85bda832ed fix unit test

No new revisions were added by this update.

Summary of changes:
 datafusion/physical-expr/src/type_coercion.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[arrow-datafusion] branch add-greatest-least updated (538e085f42 -> e6de9eb51f)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from 538e085f42 [built-in function] add greatest and least
 add fe107096f6 variadic equal
 add e6de9eb51f simplify code

No new revisions were added by this update.

Summary of changes:
 Cargo.toml |  2 +-
 datafusion/expr/src/function.rs|  9 ---
 datafusion/expr/src/function_err.rs|  2 +-
 datafusion/expr/src/signature.rs   | 11 
 datafusion/expr/src/type_coercion/functions.rs | 19 +-
 datafusion/physical-expr/Cargo.toml|  2 +-
 .../physical-expr/src/comparison_expressions.rs| 30 ++
 7 files changed, 47 insertions(+), 28 deletions(-)



[arrow-datafusion] branch add-greatest-least updated (4a2c33829e -> 538e085f42)

2023-06-03 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


 discard 4a2c33829e fix unit test
 discard a567b85f86 fix clippy
 discard b6f9f35b96 [built-in function] add greatest and least
 add 3466522779 Minor: Clean up `use`s to point at real crates (#6515)
 add 21a14a1af3 Standardize RUST_LOG configuration test setup (#6506)
 add e6af36a540 Fix new clippy lint (#6535)
 add d9d06a4433 feat: datafusion-cli support executes sql with escaped 
characters (#6498)
 add 5ec14e1757 Minor: Add EXCEPT/EXCLUDE to SQL guide (#6512)
 add 859251b4a2 fix: error instead of panic when date_bin interval is 0 
(#6522)
 add d450dc1dca Add link to Python Bindings (#6532)
 add 9d22054a1f feat: fix docs (#6534)
 add 538e085f42 [built-in function] add greatest and least

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (4a2c33829e)
\
 N -- N -- N   refs/heads/add-greatest-least (538e085f42)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 Cargo.toml |   2 +
 README.md  |   2 +-
 datafusion-cli/Cargo.lock  |  12 +-
 datafusion-cli/Cargo.toml  |   2 +-
 datafusion-cli/src/exec.rs |  26 ++--
 datafusion-cli/src/helper.rs   | 160 ++---
 datafusion/core/src/catalog/information_schema.rs  |   2 +-
 datafusion/core/src/catalog/listing_schema.rs  |  16 ++-
 datafusion/core/src/execution/mod.rs   |   8 +-
 .../aggregates/bounded_aggregate_stream.rs |  12 +-
 .../core/src/physical_plan/aggregates/mod.rs   |   7 +-
 .../src/physical_plan/aggregates/no_grouping.rs|   4 +-
 .../core/src/physical_plan/aggregates/row_hash.rs  |   9 +-
 datafusion/core/src/physical_plan/analyze.rs   |   2 +-
 .../core/src/physical_plan/coalesce_batches.rs |   2 +-
 .../core/src/physical_plan/coalesce_partitions.rs  |   2 +-
 datafusion/core/src/physical_plan/common.rs|   2 +-
 datafusion/core/src/physical_plan/empty.rs |   2 +-
 datafusion/core/src/physical_plan/explain.rs   |   2 +-
 .../core/src/physical_plan/file_format/avro.rs |   2 +-
 .../core/src/physical_plan/file_format/csv.rs  |   2 +-
 .../core/src/physical_plan/file_format/json.rs |   2 +-
 .../core/src/physical_plan/file_format/mod.rs  |   2 +-
 .../core/src/physical_plan/file_format/parquet.rs  |   2 +-
 datafusion/core/src/physical_plan/filter.rs|   2 +-
 datafusion/core/src/physical_plan/insert.rs|   2 +-
 .../core/src/physical_plan/joins/cross_join.rs |   6 +-
 .../src/physical_plan/joins/nested_loop_join.rs|   4 +-
 .../src/physical_plan/joins/sort_merge_join.rs |   6 +-
 .../src/physical_plan/joins/symmetric_hash_join.rs |   2 +-
 datafusion/core/src/physical_plan/limit.rs |   2 +-
 datafusion/core/src/physical_plan/memory.rs|   2 +-
 datafusion/core/src/physical_plan/mod.rs   |   2 +-
 datafusion/core/src/physical_plan/planner.rs   |   6 +-
 datafusion/core/src/physical_plan/projection.rs|   2 +-
 .../core/src/physical_plan/repartition/mod.rs  |   4 +-
 datafusion/core/src/physical_plan/sorts/sort.rs|  12 +-
 .../physical_plan/sorts/sort_preserving_merge.rs   |   2 +-
 datafusion/core/src/physical_plan/streaming.rs |   2 +-
 datafusion/core/src/physical_plan/union.rs |   2 +-
 datafusion/core/src/physical_plan/unnest.rs|   2 +-
 datafusion/core/src/physical_plan/values.rs|   2 +-
 .../windows/bounded_window_agg_exec.rs |   2 +-
 .../src/physical_plan/windows/window_agg_exec.rs   |   4 +-
 datafusion/core/tests/memory_limit.rs  |   1 +
 datafusion/core/tests/parquet/filter_pushdown.rs   |   7 -
 datafusion/core/tests/parquet/mod.rs   |   7 +
 datafusion/core/tests/sql/expr.rs  |  27 
 datafusion/core/tests/sql/subqueries.rs|   6 -
 datafusion/core/tests/sql_integration.rs   |   7 +
 .../tests/sqllogictests/test_files/timestamps.slt  |  21 ++-
 datafusion/execution/src/lib.rs|   2 +
 datafusion/expr/src/conditional_expressions.rs |   2 +-
 datafusion/ex

[arrow-rs] 01/01: add min and max kernel

2023-06-02 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch add-min-max-kernel
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git

commit 0ee26ec52d2831168404f492c3ac20e04044fa63
Author: Jiayu Liu 
AuthorDate: Sat Jun 3 09:22:34 2023 +0800

add min and max kernel
---
 arrow-ord/src/lib.rs |  1 +
 arrow-ord/src/min_max.rs | 48 
 2 files changed, 49 insertions(+)

diff --git a/arrow-ord/src/lib.rs b/arrow-ord/src/lib.rs
index 62338c022..e1eec2c3c 100644
--- a/arrow-ord/src/lib.rs
+++ b/arrow-ord/src/lib.rs
@@ -44,6 +44,7 @@
 //!
 
 pub mod comparison;
+pub mod min_max;
 pub mod ord;
 pub mod partition;
 pub mod sort;
diff --git a/arrow-ord/src/min_max.rs b/arrow-ord/src/min_max.rs
new file mode 100644
index 0..1a34e9ddf
--- /dev/null
+++ b/arrow-ord/src/min_max.rs
@@ -0,0 +1,48 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Functions to get min and max across arrays and scalars
+
+/// Perform min operation on two dynamic [`Array`]s.
+///
+/// Only when two arrays are of the same type the comparison is valid.
+pub fn min_dyn(left:  Array, right:  ) -> Result 
{
+unimplemented!()
+}
+
+/// Perform max operation on two dynamic [`Array`]s.
+///
+/// Only when two arrays are of the same type the comparison is valid.
+pub fn max_dyn(left:  Array, right:  ) -> Result 
{
+unimplemented!()
+}
+
+/// Perform min operation on a dynamic [`Array`] and a scalar value.
+pub fn min_dyn_scalar(left:  Array, right: T) -> Result
+where
+T: num::ToPrimitive + std::fmt::Debug,
+{
+unimplemented!()
+}
+
+/// Perform max operation on a dynamic [`Array`] and a scalar value.
+pub fn max_dyn_scalar(left:  Array, right: T) -> Result
+where
+T: num::ToPrimitive + std::fmt::Debug,
+{
+unimplemented!()
+}



[arrow-rs] branch add-min-max-kernel created (now 0ee26ec52)

2023-06-02 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-min-max-kernel
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


  at 0ee26ec52 add min and max kernel

This branch includes the following new commits:

 new 0ee26ec52 add min and max kernel

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-datafusion] branch add-greatest-least updated (a567b85f86 -> 4a2c33829e)

2023-06-02 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from a567b85f86 fix clippy
 add 4a2c33829e fix unit test

No new revisions were added by this update.

Summary of changes:
 datafusion/core/tests/sql/expr.rs | 2 ++
 1 file changed, 2 insertions(+)



[arrow-datafusion] branch add-greatest-least updated (b6f9f35b96 -> a567b85f86)

2023-06-02 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from b6f9f35b96 [built-in function] add greatest and least
 add a567b85f86 fix clippy

No new revisions were added by this update.

Summary of changes:
 datafusion/physical-expr/src/comparison_expressions.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[arrow-datafusion] branch add-greatest-least updated (5214393d1f -> b6f9f35b96)

2023-06-02 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


omit 5214393d1f [built-in function] add greatest and least
 add b6f9f35b96 [built-in function] add greatest and least

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (5214393d1f)
\
 N -- N -- N   refs/heads/add-greatest-least (b6f9f35b96)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 datafusion/expr/src/function.rs| 9 +-
 .../physical-expr/src/comparison_expressions.rs|   175 +-
 datafusion/proto/proto/proto_descriptor.bin|   Bin 85877 -> 0 bytes
 datafusion/proto/src/datafusion.rs |  2820 ---
 datafusion/proto/src/datafusion.serde.rs   | 22781 ---
 datafusion/proto/src/logical_plan/from_proto.rs| 6 +-
 6 files changed, 109 insertions(+), 25682 deletions(-)
 delete mode 100644 datafusion/proto/proto/proto_descriptor.bin
 delete mode 100644 datafusion/proto/src/datafusion.rs
 delete mode 100644 datafusion/proto/src/datafusion.serde.rs



[arrow-datafusion] 01/01: [built-in function] add greatest and least

2023-06-02 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git

commit 5214393d1fbf80fc7438fada6ff6563b949aeaf4
Author: Jiayu Liu 
AuthorDate: Fri Jun 2 15:52:38 2023 +0800

[built-in function] add greatest and least
---
 datafusion/core/tests/sql/expr.rs  |   6 +
 datafusion/expr/src/built_in_function.rs   |  10 ++
 datafusion/expr/src/comparison_expressions.rs  |  35 ++
 datafusion/expr/src/expr_fn.rs |  24 +++-
 datafusion/expr/src/function.rs|  15 ++-
 datafusion/expr/src/lib.rs |   1 +
 .../physical-expr/src/comparison_expressions.rs| 133 +
 datafusion/physical-expr/src/functions.rs  |   5 +-
 datafusion/physical-expr/src/lib.rs|   1 +
 datafusion/proto/proto/datafusion.proto|   2 +
 datafusion/proto/proto/proto_descriptor.bin| Bin 0 -> 85877 bytes
 .../src/{generated/prost.rs => datafusion.rs}  |   6 +
 .../{generated/pbjson.rs => datafusion.serde.rs}   |   6 +
 datafusion/proto/src/generated/pbjson.rs   |   6 +
 datafusion/proto/src/generated/prost.rs|   6 +
 datafusion/proto/src/logical_plan/from_proto.rs|  17 ++-
 datafusion/proto/src/logical_plan/to_proto.rs  |   2 +
 docs/source/user-guide/sql/sql_status.md   |   3 +
 18 files changed, 273 insertions(+), 5 deletions(-)

diff --git a/datafusion/core/tests/sql/expr.rs 
b/datafusion/core/tests/sql/expr.rs
index 6783670545..c432f62572 100644
--- a/datafusion/core/tests/sql/expr.rs
+++ b/datafusion/core/tests/sql/expr.rs
@@ -200,6 +200,12 @@ async fn binary_bitwise_shift() -> Result<()> {
 Ok(())
 }
 
+#[tokio::test]
+async fn test_comparison_func_expressions() -> Result<()> {
+test_expression!("greatest(1,2,3)", "3");
+test_expression!("least(1,2,3)", "1");
+}
+
 #[tokio::test]
 async fn test_interval_expressions() -> Result<()> {
 // day nano intervals
diff --git a/datafusion/expr/src/built_in_function.rs 
b/datafusion/expr/src/built_in_function.rs
index 3911939b4c..d4ca93ba24 100644
--- a/datafusion/expr/src/built_in_function.rs
+++ b/datafusion/expr/src/built_in_function.rs
@@ -205,6 +205,10 @@ pub enum BuiltinScalarFunction {
 Struct,
 /// arrow_typeof
 ArrowTypeof,
+/// greatest
+Greatest,
+/// least
+Least,
 }
 
 lazy_static! {
@@ -328,6 +332,8 @@ impl BuiltinScalarFunction {
 BuiltinScalarFunction::Struct => Volatility::Immutable,
 BuiltinScalarFunction::FromUnixtime => Volatility::Immutable,
 BuiltinScalarFunction::ArrowTypeof => Volatility::Immutable,
+BuiltinScalarFunction::Greatest => Volatility::Immutable,
+BuiltinScalarFunction::Least => Volatility::Immutable,
 
 // Stable builtin functions
 BuiltinScalarFunction::Now => Volatility::Stable,
@@ -414,6 +420,10 @@ fn aliases(func: ) -> &'static 
[&'static str] {
 BuiltinScalarFunction::Upper => &["upper"],
 BuiltinScalarFunction::Uuid => &["uuid"],
 
+// comparison functions
+BuiltinScalarFunction::Greatest => &["greatest"],
+BuiltinScalarFunction::Least => &["least"],
+
 // regex functions
 BuiltinScalarFunction::RegexpMatch => &["regexp_match"],
 BuiltinScalarFunction::RegexpReplace => &["regexp_replace"],
diff --git a/datafusion/expr/src/comparison_expressions.rs 
b/datafusion/expr/src/comparison_expressions.rs
new file mode 100644
index 00..c7f13f04f0
--- /dev/null
+++ b/datafusion/expr/src/comparison_expressions.rs
@@ -0,0 +1,35 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::datatypes::DataType;
+
+/// Currently supported types by the comparison function.
+pub static SUPPORTED_COMPARISON_TYPES: &[DataType] = &[
+DataType::Boolean,
+DataT

[arrow-datafusion] branch add-greatest-least created (now 5214393d1f)

2023-06-02 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-greatest-least
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


  at 5214393d1f [built-in function] add greatest and least

This branch includes the following new commits:

 new 5214393d1f [built-in function] add greatest and least

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-datafusion-python] branch upgrade-to-support-311 updated (fcbc976 -> 61cfee5)

2023-05-10 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch upgrade-to-support-311
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


 discard fcbc976  python -m pip install --require-hashes --no-deps -r 
requirements.txt
 discard 9a57d38  update pip before install
 discard 242f25d  remove unused 311
 discard 4ab1763  remove empty line
 discard 36c4eb7  upgrade to support python 3.11
 add 61cfee5  upgrade to support python 3.11

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (fcbc976)
\
 N -- N -- N   refs/heads/upgrade-to-support-311 (61cfee5)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .github/workflows/test.yaml  |  2 +
 datafusion/__init__.py   |  4 +-
 datafusion/cudf.py   |  4 +-
 datafusion/pandas.py |  4 +-
 datafusion/polars.py | 12 ++
 datafusion/tests/generic.py  | 12 ++
 datafusion/tests/test_aggregation.py | 32 ---
 datafusion/tests/test_config.py  |  5 +--
 datafusion/tests/test_context.py |  4 +-
 datafusion/tests/test_dataframe.py   | 36 +
 datafusion/tests/test_functions.py   | 77 
 datafusion/tests/test_sql.py | 28 -
 datafusion/tests/test_substrait.py   | 12 ++
 pyproject.toml   |  6 +++
 requirements.in  |  1 +
 15 files changed, 64 insertions(+), 175 deletions(-)



[arrow-datafusion-python] branch upgrade-to-support-311 updated (9a57d38 -> fcbc976)

2023-05-10 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch upgrade-to-support-311
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from 9a57d38  update pip before install
 add fcbc976  python -m pip install --require-hashes --no-deps -r 
requirements.txt

No new revisions were added by this update.

Summary of changes:
 .github/workflows/docs.yaml | 6 +++---
 .github/workflows/test.yaml | 4 ++--
 README.md   | 2 +-
 dev/release/verify-release-candidate.sh | 2 +-
 docs/source/index.rst   | 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)



[arrow-datafusion-python] branch upgrade-to-support-311 updated (36c4eb7 -> 9a57d38)

2023-05-10 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch upgrade-to-support-311
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from 36c4eb7  upgrade to support python 3.11
 add 4ab1763  remove empty line
 add 242f25d  remove unused 311
 add 9a57d38  update pip before install

No new revisions were added by this update.

Summary of changes:
 .github/workflows/test.yaml|   2 +
 datafusion/tests/test_dataframe.py |   1 -
 requirements-311.txt   | 199 -
 3 files changed, 2 insertions(+), 200 deletions(-)
 delete mode 100644 requirements-311.txt



[arrow-datafusion-python] 01/01: upgrade to support python 3.11

2023-05-10 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch upgrade-to-support-311
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git

commit 36c4eb71b6fce5273a225a955395d5f2d164b595
Author: Jiayu Liu 
AuthorDate: Wed May 10 21:31:59 2023 +0800

upgrade to support python 3.11
---
 .github/workflows/build.yml |   4 +-
 .github/workflows/conda.yml |   2 +-
 .github/workflows/dev.yml   |   2 +-
 .github/workflows/docs.yaml |   4 +-
 .github/workflows/test.yaml |  24 ++-
 README.md   |   6 +-
 dev/release/verify-release-candidate.sh |   2 +-
 docs/README.md  |   2 +-
 docs/source/index.rst   |  12 +-
 pyproject.toml  |   3 +-
 requirements-310.txt| 249 
 requirements.txt| 199 +
 12 files changed, 229 insertions(+), 280 deletions(-)

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index fe06b9c..50f2f15 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -47,7 +47,7 @@ jobs:
 strategy:
   fail-fast: false
   matrix:
-python-version: ["3.10"]
+python-version: ["3.11"]
 os: [macos-latest, windows-latest]
 steps:
   - uses: actions/checkout@v3
@@ -106,7 +106,7 @@ jobs:
 strategy:
   fail-fast: false
   matrix:
-python-version: ["3.10"]
+python-version: ["3.11"]
 steps:
   - uses: actions/checkout@v3
 
diff --git a/.github/workflows/conda.yml b/.github/workflows/conda.yml
index 9853230..6fcc1f5 100644
--- a/.github/workflows/conda.yml
+++ b/.github/workflows/conda.yml
@@ -24,7 +24,7 @@ jobs:
 with:
   miniforge-variant: Mambaforge
   use-mamba: true
-  python-version: "3.10"
+  python-version: "3.11"
   channel-priority: strict
   - name: Install dependencies
 run: |
diff --git a/.github/workflows/dev.yml b/.github/workflows/dev.yml
index 05cf8ce..6457ff4 100644
--- a/.github/workflows/dev.yml
+++ b/.github/workflows/dev.yml
@@ -29,6 +29,6 @@ jobs:
   - name: Setup Python
 uses: actions/setup-python@v4
 with:
-  python-version: "3.10"
+  python-version: "3.11"
   - name: Audit licenses
 run: ./dev/release/run-rat.sh .
diff --git a/.github/workflows/docs.yaml b/.github/workflows/docs.yaml
index d9e7ad4..fb422ce 100644
--- a/.github/workflows/docs.yaml
+++ b/.github/workflows/docs.yaml
@@ -35,7 +35,7 @@ jobs:
   - name: Setup Python
 uses: actions/setup-python@v4
 with:
-  python-version: "3.10"
+  python-version: "3.11"
 
   - name: Install Protoc
 uses: arduino/setup-protoc@v1
@@ -48,7 +48,7 @@ jobs:
   set -x
   python3 -m venv venv
   source venv/bin/activate
-  pip install -r requirements-310.txt
+  pip install -r requirements.txt
   pip install -r docs/requirements.txt
   - name: Build Datafusion
 run: |
diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml
index f672c81..8dd2b6a 100644
--- a/.github/workflows/test.yaml
+++ b/.github/workflows/test.yaml
@@ -33,15 +33,13 @@ jobs:
   fail-fast: false
   matrix:
 python-version:
+  - "3.7"
+  - "3.8"
+  - "3.9"
   - "3.10"
+  - "3.11"
 toolchain:
   - "stable"
-  # we are not that much eager in walking on the edge yet
-  # - nightly
-# build stable for only 3.7
-include:
-  - python-version: "3.7"
-toolchain: "stable"
 steps:
   - uses: actions/checkout@v3
 
@@ -55,7 +53,7 @@ jobs:
   - name: Install Protoc
 uses: arduino/setup-protoc@v1
 with:
-  version: '3.x'
+  version: "3.x"
   repo-token: ${{ secrets.GITHUB_TOKEN }}
 
   - name: Setup Python
@@ -71,24 +69,24 @@ jobs:
 
   - name: Check Formatting
 uses: actions-rs/cargo@v1
-if: ${{ matrix.python-version == '3.10' && matrix.toolchain == 
'stable' }}
+if: ${{ matrix.python-version == '3.11' && matrix.toolchain == 
'stable' }}
 with:
   command: fmt
   args: -- --check
 
   - name: Run Clippy
 uses: actions-rs/cargo@v1
-if: ${{ matrix.python-version == '3.10' && matrix.toolchain == 
'stable' }}
+if: ${{ matrix.python-version == '3.11' && matrix.toolchain == 
'stable' }}
 with:
   command: clippy
   args: --all-targets --all-features 

[arrow-datafusion-python] branch upgrade-to-support-311 created (now 36c4eb7)

2023-05-10 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch upgrade-to-support-311
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


  at 36c4eb7  upgrade to support python 3.11

This branch includes the following new commits:

 new 36c4eb7  upgrade to support python 3.11

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-datafusion-python] branch update-maturin deleted (was 4c40868)

2023-05-10 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch update-maturin
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


 was 4c40868  migrate maturin meta

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



[arrow-datafusion-python] branch main updated (5984bc7 -> 21ad90f)

2023-05-09 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from 5984bc7  build(deps): bump mimalloc from 0.1.36 to 0.1.37 (#361)
 add 21ad90f  build(deps): bump regex-syntax from 0.6.29 to 0.7.1 (#334)

No new revisions were added by this update.

Summary of changes:
 Cargo.lock | 2 +-
 Cargo.toml | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)



[arrow-datafusion-python] branch main updated (228b6e5 -> 5984bc7)

2023-05-09 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from 228b6e5  build(deps): bump uuid from 1.3.1 to 1.3.2 (#359)
 add 5984bc7  build(deps): bump mimalloc from 0.1.36 to 0.1.37 (#361)

No new revisions were added by this update.

Summary of changes:
 Cargo.lock | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)



[arrow-datafusion-python] branch main updated (9c75d03 -> 228b6e5)

2023-05-09 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from 9c75d03  Prepare 24.0.0 Release (#376)
 add 228b6e5  build(deps): bump uuid from 1.3.1 to 1.3.2 (#359)

No new revisions were added by this update.

Summary of changes:
 Cargo.lock | 4 ++--
 Cargo.toml | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)



[arrow-datafusion-python] branch update-maturin updated (d1a87a7 -> 4c40868)

2023-05-09 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch update-maturin
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from d1a87a7  upgrade maturin to 0.15.1
 add 4c40868  migrate maturin meta

No new revisions were added by this update.

Summary of changes:
 Cargo.toml | 3 ---
 pyproject.toml | 1 +
 2 files changed, 1 insertion(+), 3 deletions(-)



[arrow-datafusion-python] 01/01: upgrade maturin to 0.15.1

2023-05-09 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch update-maturin
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git

commit d1a87a7da57f05055f4eb03cd268b81cf57c7f20
Author: Jiayu Liu 
AuthorDate: Wed May 10 08:43:32 2023 +0800

upgrade maturin to 0.15.1
---
 .github/workflows/build.yml|   8 +-
 conda/environments/datafusion-dev.yaml |  48 ++---
 conda/recipes/meta.yaml|   6 +-
 docs/README.md |  15 +-
 pyproject.toml |   2 +-
 requirements-310.txt   | 340 ++---
 requirements.in|   4 +-
 requirements.txt   | 284 ---
 8 files changed, 231 insertions(+), 476 deletions(-)

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index c667dab..fe06b9c 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -64,7 +64,7 @@ jobs:
 run: python -m pip install --upgrade pip
 
   - name: Install maturin
-run: pip install maturin==0.14.2
+run: pip install maturin==0.15.1
 
   - run: rm LICENSE.txt
   - name: Download LICENSE.txt
@@ -76,7 +76,7 @@ jobs:
   - name: Install Protoc
 uses: arduino/setup-protoc@v1
 with:
-  version: '3.x'
+  version: "3.x"
   repo-token: ${{ secrets.GITHUB_TOKEN }}
 
   - name: Build Python package
@@ -125,7 +125,7 @@ jobs:
 run: python -m pip install --upgrade pip
 
   - name: Install maturin
-run: pip install maturin==0.14.2
+run: pip install maturin==0.15.1
 
   - run: rm LICENSE.txt
   - name: Download LICENSE.txt
@@ -137,7 +137,7 @@ jobs:
   - name: Install Protoc
 uses: arduino/setup-protoc@v1
 with:
-  version: '3.x'
+  version: "3.x"
   repo-token: ${{ secrets.GITHUB_TOKEN }}
 
   - name: Build Python package
diff --git a/conda/environments/datafusion-dev.yaml 
b/conda/environments/datafusion-dev.yaml
index d9405e4..ceab504 100644
--- a/conda/environments/datafusion-dev.yaml
+++ b/conda/environments/datafusion-dev.yaml
@@ -16,29 +16,29 @@
 # under the License.
 
 channels:
-- conda-forge
+  - conda-forge
 dependencies:
-- black
-- flake8
-- isort
-- maturin
-- mypy
-- numpy
-- pyarrow
-- pytest
-- toml
-- importlib_metadata
-- python>=3.10
-# Packages useful for building distributions and releasing
-- mamba
-- conda-build
-- anaconda-client
-# Packages for documentation building
-- sphinx
-- pydata-sphinx-theme==0.8.0
-- myst-parser
-- jinja2
-# GPU packages
-- cudf
-- cudatoolkit=11.8
+  - black
+  - flake8
+  - isort
+  - maturin>=0.15
+  - mypy
+  - numpy
+  - pyarrow>=11.0.0
+  - pytest
+  - toml
+  - importlib_metadata
+  - python>=3.10
+  # Packages useful for building distributions and releasing
+  - mamba
+  - conda-build
+  - anaconda-client
+  # Packages for documentation building
+  - sphinx
+  - pydata-sphinx-theme==0.8.0
+  - myst-parser
+  - jinja2
+  # GPU packages
+  - cudf
+  - cudatoolkit=11.8
 name: datafusion-dev
diff --git a/conda/recipes/meta.yaml b/conda/recipes/meta.yaml
index 48e95eb..e2bb8be 100644
--- a/conda/recipes/meta.yaml
+++ b/conda/recipes/meta.yaml
@@ -35,12 +35,12 @@ build:
 
 requirements:
   host:
-- python >=3.6
-- maturin >=0.14,<0.15
+- python >=3.7
+- maturin >=0.15,<0.16
 - libprotobuf =3
 - pip
   run:
-- python >=3.6
+- python >=3.7
 - pyarrow >=11.0.0
 
 test:
diff --git a/docs/README.md b/docs/README.md
index 04f46a9..8527858 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -19,17 +19,18 @@
 
 # DataFusion Documentation
 
-This folder contains the source content of the [python api](./source/api).
+This folder contains the source content of the [Python API](./source/api).
 This is published to https://arrow.apache.org/datafusion-python/ by a GitHub 
action
 when changes are merged to the main branch.
 
 ## Dependencies
 
 It's recommended to install build dependencies and build the documentation
-inside a Python virtualenv.
+inside a Python `venv`.
 
-- Python
-- `pip3 install -r requirements.txt`
+```bash
+python -m pip install -r requirements-310.txt
+```
 
 ## Build & Preview
 
@@ -57,8 +58,6 @@ version of the docs, follow these steps:
 2. Clone the arrow-site repo
 3. Checkout to the `asf-site` branch (NOT `master`)
 4. Copy build artifacts into `arrow-site` repo's `datafusion` folder with a 
command such as
-
-- `cp -rT ./build/html/ ../../arrow-site/datafusion/` (doesn't work on mac)
-- `rsync -avzr ./build/html/ ../../arrow-site/datafusion/`
-
+   - `cp -rT ./build/html/ ../../arrow-site/datafusion/` (doesn't work on mac)
+   - `rsync -avzr ./build/html/ ../../arrow-site/datafusion/`
 5. Commit changes in `arrow-site` and send a PR.
diff --git 

[arrow-datafusion-python] branch update-maturin created (now d1a87a7)

2023-05-09 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch update-maturin
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


  at d1a87a7  upgrade maturin to 0.15.1

This branch includes the following new commits:

 new d1a87a7  upgrade maturin to 0.15.1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-datafusion-python] branch update-310 updated (afba137 -> 2a21299)

2023-01-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch update-310
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from afba137  update requirements with upgrade
 add 2a21299  apply black update

No new revisions were added by this update.

Summary of changes:
 datafusion/__init__.py |  4 +--
 datafusion/tests/generic.py| 12 ++-
 datafusion/tests/test_dataframe.py | 28 
 datafusion/tests/test_functions.py | 65 --
 datafusion/tests/test_sql.py   | 24 --
 dev/release/check-rat-report.py|  4 +--
 6 files changed, 32 insertions(+), 105 deletions(-)



[arrow-datafusion-python] 02/02: update requirements with upgrade

2023-01-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch update-310
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git

commit afba137fcca0297329cb48853f74132a95222bf7
Author: Jiayu Liu 
AuthorDate: Sun Jan 15 09:39:31 2023 +

update requirements with upgrade
---
 requirements.txt | 207 +--
 1 file changed, 172 insertions(+), 35 deletions(-)

diff --git a/requirements.txt b/requirements.txt
index 805f521..46c6dbf 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -2,66 +2,203 @@
 # This file is autogenerated by pip-compile with Python 3.10
 # by the following command:
 #
-#pip-compile
+#pip-compile --generate-hashes --output-file=requirements.txt
 #
-attrs==21.2.0
+attrs==22.2.0 \
+
--hash=sha256:29e95c7f6778868dbd49170f98f8818f78f3dc5e0e37c0b1f474e3561b240836 \
+
--hash=sha256:c9227bfc2f01993c03f68db37d1d15c9690188323c067c641f1a35ca58185f99
 # via pytest
-black==21.9b0
+black==22.12.0 \
+
--hash=sha256:101c69b23df9b44247bd88e1d7e90154336ac4992502d4197bdac35dd7ee3320 \
+
--hash=sha256:159a46a4947f73387b4d83e87ea006dbb2337eab6c879620a3ba52699b1f4351 \
+
--hash=sha256:1f58cbe16dfe8c12b7434e50ff889fa479072096d79f0a7f25e4ab8e94cd8350 \
+
--hash=sha256:229351e5a18ca30f447bf724d007f890f97e13af070bb6ad4c0a441cd7596a2f \
+
--hash=sha256:436cc9167dd28040ad90d3b404aec22cedf24a6e4d7de221bec2730ec0c97bcf \
+
--hash=sha256:559c7a1ba9a006226f09e4916060982fd27334ae1998e7a38b3f33a37f7a2148 \
+
--hash=sha256:7412e75863aa5c5411886804678b7d083c7c28421210180d67dfd8cf1221e1f4 \
+
--hash=sha256:77d86c9f3db9b1bf6761244bc0b3572a546f5fe37917a044e02f3166d5aafa7d \
+
--hash=sha256:82d9fe8fee3401e02e79767016b4907820a7dc28d70d137eb397b92ef3cc5bfc \
+
--hash=sha256:9eedd20838bd5d75b80c9f5487dbcb06836a43833a37846cf1d8c1cc01cef59d \
+
--hash=sha256:c116eed0efb9ff870ded8b62fe9f28dd61ef6e9ddd28d83d7d264a38417dcee2 \
+
--hash=sha256:d30b212bffeb1e252b31dd269dfae69dd17e06d92b87ad26e23890f3efea366f
 # via -r requirements.in
-click==8.0.3
+click==8.1.3 \
+
--hash=sha256:7682dc8afb30297001674575ea00d1814d808d6a36af415a82bd481d37ba7b8e \
+
--hash=sha256:bb4d8133cb15a609f44e8213d9b391b0809795062913b383c62be0ee95b1db48
 # via black
-flake8==4.0.1
+exceptiongroup==1.1.0 \
+
--hash=sha256:327cbda3da756e2de031a3107b81ab7b3770a602c4d16ca618298c526f4bec1e \
+
--hash=sha256:bcb67d800a4497e1b404c2dd44fca47d3b7a5e5433dbab67f96c1a685cdfdf23
+# via pytest
+flake8==6.0.0 \
+
--hash=sha256:3833794e27ff64ea4e9cf5d410082a8b97ff1a06c16aa3d2027339cd0f1195c7 \
+
--hash=sha256:c61007e76655af75e6785a931f452915b371dc48f56efd765247c8fe68f2b181
 # via -r requirements.in
-iniconfig==1.1.1
+iniconfig==2.0.0 \
+
--hash=sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3 \
+
--hash=sha256:b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374
 # via pytest
-isort==5.9.3
+isort==5.11.4 \
+
--hash=sha256:6db30c5ded9815d813932c04c2f85a360bcdd35fed496f4d8f35495ef0a261b6 \
+
--hash=sha256:c033fd0edb91000a7f09527fe5c75321878f98322a77ddcc81adbd83724afb7b
 # via -r requirements.in
-maturin==0.14.2
+maturin==0.14.10 \
+
--hash=sha256:11b8550ceba5b81465a18d06f0d3a4cfc1cd6cbf68eda117c253bbf3324b1264 \
+
--hash=sha256:2f097a63f3bed20a7da56fc7ce4d44ef8376ee9870604da16b685f2d02c87c79 \
+
--hash=sha256:4946ad7545ba5fc0ad08bc98bc8e9f6ffabb6ded71db9ed282ad4596b998d42a \
+
--hash=sha256:5abf311d4618b673efa30cacdac5ae2d462e49da58db9a5bf0d8bde16d9c16be \
+
--hash=sha256:6cc9afb89f28bd591b62f8f3c29736c81c322cffe88f9ab8eb1749377bbc3521 \
+
--hash=sha256:895c48cbe56ae994c2a1f19475ca4819aa4c6412af727a63a772e8ef2d87 \
+
--hash=sha256:98bfed21c3498857b3381efeb041d77e004a93b22261bf9690fe2b9fbb4c210f \
+
--hash=sha256:9da98bee0a548ecaaa924cc8cb94e49075d5e71511c62a1633a6962c7831a29b \
+
--hash=sha256:b157e2e8a0216d02df1d0451201fcb977baf0dcd223890abfbfbfd01e0b44630 \
+
--hash=sha256:c0d25e82cb6e5de9f1c028fcf069784be4165b083e79412371edce05010b68f3 \
+
--hash=sha256:cf950ebfe449a97617b91d75e09766509e21a389ce3f7b6ef15130ad8a95430a \
+
--hash=sha256:e9c19dc0a28109280f7d091ca7b78e25f3fc340fcfac92801829a21198fa20eb \
+
--hash=sha256:ec8269c02cc435893308dfd50f57f14fb1be3554e4e61c5bf49b97363b289775
 # via -r requirements.in
-mccabe==0.6.1
+mccabe==0.7.0 \
+
--hash=sha256:348e0240c33b60bbdf4e523192ef919f28cb2c3d7d5c7794f74009290f236325 \
+
--hash=sha256:6c2d30ab6be0e4a46919781807b4f0d834ebdd6c6e3dca0bda5a15f863427b6e
 # via flake8
-mypy==0.910
+mypy==0.991 \
+
--hash=sha256:0714258640194d75677e86c786e80ccf294972cc76885d3ebbb560f11db0003d \
+
--hash=sha256:0c8f3be99e8a8bd403caa8c03be619544bc2c77a7093685dcf308c6b109426c6 \
+
--hash=sha256:0cca5adf694af539aeaa6ac633a7afe9bbd760df9d31be55ab780b77ab5ae8bf \
+
--hash=sha256

[arrow-datafusion-python] 01/02: update requirements for 310

2023-01-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch update-310
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git

commit 1a8114f50d065aaecf4b8ea8e2466b9573028041
Author: Jiayu Liu 
AuthorDate: Sun Jan 15 09:38:53 2023 +

update requirements for 310
---
 requirements-310.txt | 317 +--
 requirements.txt | 279 +
 2 files changed, 187 insertions(+), 409 deletions(-)

diff --git a/requirements-310.txt b/requirements-310.txt
index 332abdb..898747a 100644
--- a/requirements-310.txt
+++ b/requirements-310.txt
@@ -1,97 +1,97 @@
 #
-# This file is autogenerated by pip-compile with python 3.10
-# To update, run:
+# This file is autogenerated by pip-compile with Python 3.10
+# by the following command:
 #
-#pip-compile --generate-hashes
+#pip-compile --generate-hashes --output-file=requirements-310.txt
 #
-attrs==21.4.0 \
-
--hash=sha256:2d27e3784d7a565d36ab851fe94887c5eccd6a463168875832a1be79c82828b4 \
-
--hash=sha256:626ba8234211db98e869df76230a137c4c40a12d72445c45d5f5b716f076e2fd
+attrs==22.2.0 \
+
--hash=sha256:29e95c7f6778868dbd49170f98f8818f78f3dc5e0e37c0b1f474e3561b240836 \
+
--hash=sha256:c9227bfc2f01993c03f68db37d1d15c9690188323c067c641f1a35ca58185f99
 # via pytest
-black==22.3.0 \
-
--hash=sha256:06f9d8846f2340dfac80ceb20200ea5d1b3f181dd0556b47af4e8e0b24fa0a6b \
-
--hash=sha256:10dbe6e6d2988049b4655b2b739f98785a884d4d6b85bc35133a8fb9a2233176 \
-
--hash=sha256:2497f9c2386572e28921fa8bec7be3e51de6801f7459dffd6e62492531c47e09 \
-
--hash=sha256:30d78ba6bf080eeaf0b7b875d924b15cd46fec5fd044ddfbad38c8ea9171043a \
-
--hash=sha256:328efc0cc70ccb23429d6be184a15ce613f676bdfc85e5fe8ea2a9354b4e9015 \
-
--hash=sha256:35020b8886c022ced9282b51b5a875b6d1ab0c387b31a065b84db7c33085ca79 \
-
--hash=sha256:5795a0375eb87bfe902e80e0c8cfaedf8af4d49694d69161e5bd3206c18618bb \
-
--hash=sha256:5891ef8abc06576985de8fa88e95ab70641de6c1fca97e2a15820a9b69e51b20 \
-
--hash=sha256:637a4014c63fbf42a692d22b55d8ad6968a946b4a6ebc385c5505d9625b6a464 \
-
--hash=sha256:67c8301ec94e3bcc8906740fe071391bce40a862b7be0b86fb5382beefecd968 \
-
--hash=sha256:6d2fc92002d44746d3e7db7cf9313cf4452f43e9ea77a2c939defce3b10b5c82 \
-
--hash=sha256:6ee227b696ca60dd1c507be80a6bc849a5a6ab57ac7352aad1ffec9e8b805f21 \
-
--hash=sha256:863714200ada56cbc366dc9ae5291ceb936573155f8bf8e9de92aef51f3ad0f0 \
-
--hash=sha256:9b542ced1ec0ceeff5b37d69838106a6348e60db7b8fdd245294dc1d26136265 \
-
--hash=sha256:a6342964b43a99dbc72f72812bf88cad8f0217ae9acb47c0d4f141a6416d2d7b \
-
--hash=sha256:ad4efa5fad66b903b4a5f96d91461d90b9507a812b3c5de657d544215bb7877a \
-
--hash=sha256:bc58025940a896d7e5356952228b68f793cf5fcb342be703c3a2669a1488cb72 \
-
--hash=sha256:cc1e1de68c8e5444e8f94c3670bb48a2beef0e91dddfd4fcc29595ebd90bb9ce \
-
--hash=sha256:cee3e11161dde1b2a33a904b850b0899e0424cc331b7295f2a9698e79f9a69a0 \
-
--hash=sha256:e3556168e2e5c49629f7b0f377070240bd5511e45e25a4497bb0073d9dda776a \
-
--hash=sha256:e8477ec6bbfe0312c128e74644ac8a02ca06bcdb8982d4ee06f209be28cdf163 \
-
--hash=sha256:ee8f1f7228cce7dffc2b464f07ce769f478968bfb3dd1254a4c2eeed84928aad \
-
--hash=sha256:fd57160949179ec517d32ac2ac898b5f20d68ed1a9c977346efbac9c2f1e779d
+black==22.12.0 \
+
--hash=sha256:101c69b23df9b44247bd88e1d7e90154336ac4992502d4197bdac35dd7ee3320 \
+
--hash=sha256:159a46a4947f73387b4d83e87ea006dbb2337eab6c879620a3ba52699b1f4351 \
+
--hash=sha256:1f58cbe16dfe8c12b7434e50ff889fa479072096d79f0a7f25e4ab8e94cd8350 \
+
--hash=sha256:229351e5a18ca30f447bf724d007f890f97e13af070bb6ad4c0a441cd7596a2f \
+
--hash=sha256:436cc9167dd28040ad90d3b404aec22cedf24a6e4d7de221bec2730ec0c97bcf \
+
--hash=sha256:559c7a1ba9a006226f09e4916060982fd27334ae1998e7a38b3f33a37f7a2148 \
+
--hash=sha256:7412e75863aa5c5411886804678b7d083c7c28421210180d67dfd8cf1221e1f4 \
+
--hash=sha256:77d86c9f3db9b1bf6761244bc0b3572a546f5fe37917a044e02f3166d5aafa7d \
+
--hash=sha256:82d9fe8fee3401e02e79767016b4907820a7dc28d70d137eb397b92ef3cc5bfc \
+
--hash=sha256:9eedd20838bd5d75b80c9f5487dbcb06836a43833a37846cf1d8c1cc01cef59d \
+
--hash=sha256:c116eed0efb9ff870ded8b62fe9f28dd61ef6e9ddd28d83d7d264a38417dcee2 \
+
--hash=sha256:d30b212bffeb1e252b31dd269dfae69dd17e06d92b87ad26e23890f3efea366f
 # via -r requirements.in
 click==8.1.3 \
 
--hash=sha256:7682dc8afb30297001674575ea00d1814d808d6a36af415a82bd481d37ba7b8e \
 
--hash=sha256:bb4d8133cb15a609f44e8213d9b391b0809795062913b383c62be0ee95b1db48
 # via black
-flake8==4.0.1 \
-
--hash=sha256:479b1304f72536a55948cb40a32dce8bb0ffe3501e26eaf292c7e60eb5e0428d \
-
--hash=sha256:806e034dda44114815e23c16ef92f95c91e4c71100ff52813adf7132a6ad870d
+exceptiongroup==1.1.0 \
+
--hash=sha256

[arrow-datafusion-python] branch update-310 created (now afba137)

2023-01-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch update-310
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


  at afba137  update requirements with upgrade

This branch includes the following new commits:

 new 1a8114f  update requirements for 310
 new afba137  update requirements with upgrade

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-datafusion-python] branch master updated: build(deps): bump async-trait from 0.1.60 to 0.1.61 (#118)

2023-01-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


The following commit(s) were added to refs/heads/master by this push:
 new cb8afac  build(deps): bump async-trait from 0.1.60 to 0.1.61 (#118)
cb8afac is described below

commit cb8afac290cefb48e94b7d220bd03087e19cd93a
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
AuthorDate: Sun Jan 15 14:50:55 2023 +0800

build(deps): bump async-trait from 0.1.60 to 0.1.61 (#118)

Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.60 to 
0.1.61.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.60...0.1.61)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] 

Signed-off-by: dependabot[bot] 
Co-authored-by: dependabot[bot] 
<49699333+dependabot[bot]@users.noreply.github.com>
---
 Cargo.lock | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Cargo.lock b/Cargo.lock
index 99ac9d1..e4e14f9 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -268,9 +268,9 @@ dependencies = [
 
 [[package]]
 name = "async-trait"
-version = "0.1.60"
+version = "0.1.61"
 source = "registry+https://github.com/rust-lang/crates.io-index;
-checksum = "677d1d8ab452a3936018a687b20e6f7cf5363d713b732b8884001317b0e48aa3"
+checksum = "705339e0e4a9690e2908d2b3d049d85682cf19fbd5782494498fbf7003a6a282"
 dependencies = [
  "proc-macro2",
  "quote",



[arrow-datafusion-python] branch master updated (aa596ac -> b9b5a01)

2023-01-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from aa596ac  build(deps): bump bzip2 from 0.4.3 to 0.4.4 (#121)
 add b9b5a01  build(deps): bump mimalloc from 0.1.32 to 0.1.34 (#125)

No new revisions were added by this update.

Summary of changes:
 Cargo.lock | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)



[arrow-datafusion-python] branch master updated (940eec8 -> aa596ac)

2023-01-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from 940eec8  [Functions] - Add python function binding to `functions` (#73)
 add aa596ac  build(deps): bump bzip2 from 0.4.3 to 0.4.4 (#121)

No new revisions were added by this update.

Summary of changes:
 Cargo.lock | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[arrow-datafusion-python] branch master updated (2b6872b -> 545b88e)

2023-01-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from 2b6872b  build(deps): bump object_store from 0.5.2 to 0.5.3 (#126)
 add 545b88e  build(deps): bump tokio from 1.23.0 to 1.24.1 (#119)

No new revisions were added by this update.

Summary of changes:
 Cargo.lock | 4 ++--
 Cargo.toml | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)



[arrow-datafusion-python] branch master updated (b9b5a01 -> 2b6872b)

2023-01-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


from b9b5a01  build(deps): bump mimalloc from 0.1.32 to 0.1.34 (#125)
 add 2b6872b  build(deps): bump object_store from 0.5.2 to 0.5.3 (#126)

No new revisions were added by this update.

Summary of changes:
 Cargo.lock | 25 ++---
 Cargo.toml |  2 +-
 2 files changed, 15 insertions(+), 12 deletions(-)



[arrow-rs] branch master updated: fix clippy issues (#3398)

2022-12-27 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 1d0abfafe fix clippy issues (#3398)
1d0abfafe is described below

commit 1d0abfafe0da7c28b562fa0ba8c65a10b65a0821
Author: Jiayu Liu 
AuthorDate: Tue Dec 27 19:49:51 2022 +0800

fix clippy issues (#3398)
---
 arrow-integration-test/src/field.rs   | 2 +-
 arrow-integration-testing/src/bin/arrow-file-to-stream.rs | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arrow-integration-test/src/field.rs 
b/arrow-integration-test/src/field.rs
index 4bfbf8e99..dd0519157 100644
--- a/arrow-integration-test/src/field.rs
+++ b/arrow-integration-test/src/field.rs
@@ -253,7 +253,7 @@ pub fn field_from_json(json: _json::Value) -> 
Result {
 };
 
 let mut field =
-Field::new_dict(, data_type, nullable, dict_id, 
dict_is_ordered);
+Field::new_dict(name, data_type, nullable, dict_id, 
dict_is_ordered);
 field.set_metadata(metadata);
 Ok(field)
 }
diff --git a/arrow-integration-testing/src/bin/arrow-file-to-stream.rs 
b/arrow-integration-testing/src/bin/arrow-file-to-stream.rs
index e939fe4f0..3e027faef 100644
--- a/arrow-integration-testing/src/bin/arrow-file-to-stream.rs
+++ b/arrow-integration-testing/src/bin/arrow-file-to-stream.rs
@@ -30,7 +30,7 @@ struct Args {
 
 fn main() -> Result<()> {
 let args = Args::parse();
-let f = File::open(_name)?;
+let f = File::open(args.file_name)?;
 let reader = BufReader::new(f);
 let mut reader = FileReader::try_new(reader, None)?;
 let schema = reader.schema();



[arrow-datafusion-python] branch master updated: update release readme tag (#86)

2022-11-28 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


The following commit(s) were added to refs/heads/master by this push:
 new 9f0e731  update release readme tag (#86)
9f0e731 is described below

commit 9f0e73196a6cd84efb332312ddd976876db3ae22
Author: Jiayu Liu 
AuthorDate: Tue Nov 29 09:55:12 2022 +0800

update release readme tag (#86)

use `bash` not `py` for scripting
---
 dev/release/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/release/README.md b/dev/release/README.md
index dd378f8..20c1562 100644
--- a/dev/release/README.md
+++ b/dev/release/README.md
@@ -181,7 +181,7 @@ Go to the Test PyPI page of Datafusion, and download
 [all published artifacts](https://test.pypi.org/project/datafusion/#files) 
under `dist-release/` directory. Then proceed
 uploading them using `twine`:
 
-```py
+```bash
 twine upload --repository pypi dist-release/*
 ```
 



[arrow-datafusion-python] 01/01: update release readme tag

2022-11-28 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch Jimexist-patch-1
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git

commit e160203a8863f71c6eaab763d3b3791cdbb1b431
Author: Jiayu Liu 
AuthorDate: Mon Nov 28 22:56:48 2022 +0800

update release readme tag

use `bash` not `py` for scripting
---
 dev/release/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/release/README.md b/dev/release/README.md
index dd378f8..20c1562 100644
--- a/dev/release/README.md
+++ b/dev/release/README.md
@@ -181,7 +181,7 @@ Go to the Test PyPI page of Datafusion, and download
 [all published artifacts](https://test.pypi.org/project/datafusion/#files) 
under `dist-release/` directory. Then proceed
 uploading them using `twine`:
 
-```py
+```bash
 twine upload --repository pypi dist-release/*
 ```
 



[arrow-datafusion-python] branch Jimexist-patch-1 created (now e160203)

2022-11-28 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch Jimexist-patch-1
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git


  at e160203  update release readme tag

This branch includes the following new commits:

 new e160203  update release readme tag

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-datafusion] branch master updated (010aded5d -> e34c6c33a)

2022-11-25 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from 010aded5d Support to use Schedular in tpch benchmark (#4361)
 add e34c6c33a add support for xz file compression and `compression` 
feature (#3993)

No new revisions were added by this update.

Summary of changes:
 datafusion-cli/Cargo.lock  | 102 +
 datafusion/core/Cargo.toml |  10 +-
 datafusion/core/src/datasource/file_format/csv.rs  |   2 +-
 .../core/src/datasource/file_format/file_type.rs   |  88 +++---
 datafusion/core/src/datasource/file_format/json.rs |   4 +-
 .../core/src/physical_plan/file_format/csv.rs  |  19 ++--
 .../core/src/physical_plan/file_format/json.rs |  16 ++--
 datafusion/core/src/test/mod.rs|  16 
 datafusion/expr/src/logical_plan/plan.rs   |   2 +-
 datafusion/sql/src/parser.rs   |   5 +-
 10 files changed, 186 insertions(+), 78 deletions(-)



[arrow-datafusion] branch add-support-for-xz updated (a9d0bd2a3 -> de24ecb27)

2022-11-24 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-support-for-xz
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


from a9d0bd2a3 add support for xz file compression
 add de24ecb27 fix Cargo.toml formatting

No new revisions were added by this update.

Summary of changes:
 datafusion/core/Cargo.toml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[arrow-datafusion] branch add-support-for-xz created (now a9d0bd2a3)

2022-11-24 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-support-for-xz
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


  at a9d0bd2a3 add support for xz file compression

No new revisions were added by this update.



[arrow-rs] branch add-bloom-filter-3 updated (85014cea8 -> 37e145d38)

2022-11-18 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 85014cea8 Apply suggestions from code review
 add 0d6540dd0 remove underflow logic
 add 37e145d38 refactor write

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs | 32 +---
 parquet/src/file/properties.rs  |  2 +-
 parquet/src/file/writer.rs  |  6 +-
 3 files changed, 23 insertions(+), 17 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (8ed433799 -> 85014cea8)

2022-11-18 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


 discard 8ed433799 Apply suggestions from code review
 discard e102b87da fix clippy
 discard 8dcd75e80 add unit test
 discard b754a9ee1 fix doc
 discard 8ded33a6c incorporate ndv and fpp
 discard 414097535 remove default feature for twox
 discard a30bd097a fix clippy
 discard 93afb6c01 fix clippy
 discard 872473dc4 update row group vec
 discard 9b55ab6fb bloom filter part III
 add e55b95e8d Clippy parquet fixes (#3124)
 add 2a065bee3 Bump actions/labeler from 4.0.2 to 4.1.0 (#3129)
 add 5bce1044f Add COW conversion for Buffer and PrimitiveArray and 
unary_mut (#3115)
 add f4558aeb2 bloom filter part III
 add ea13d0aca update row group vec
 add 03edb7df7 fix clippy
 add 5fa74765f fix clippy
 add 3732e436c remove default feature for twox
 add 7f46a4b6e incorporate ndv and fpp
 add 27a404d3c fix doc
 add 35e56c135 add unit test
 add ec68e695e fix clippy
 add 85014cea8 Apply suggestions from code review

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (8ed433799)
\
 N -- N -- N   refs/heads/add-bloom-filter-3 (85014cea8)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .github/workflows/dev_pr.yml  |   2 +-
 arrow-array/src/array/mod.rs  |  12 ++
 arrow-array/src/array/primitive_array.rs  | 161 +-
 arrow-array/src/builder/boolean_buffer_builder.rs |   5 +
 arrow-array/src/builder/buffer_builder.rs |   9 ++
 arrow-array/src/builder/null_buffer_builder.rs|  25 +++-
 arrow-array/src/builder/primitive_builder.rs  |  24 
 arrow-buffer/src/buffer/immutable.rs  |  19 +++
 arrow-buffer/src/buffer/mutable.rs|  19 +++
 arrow-buffer/src/bytes.rs |   5 +
 arrow-json/src/reader.rs  |   2 -
 parquet/src/data_type.rs  |  24 
 parquet/src/encodings/decoding.rs |  15 +-
 parquet/src/record/api.rs |  11 +-
 14 files changed, 292 insertions(+), 41 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (e102b87da -> 8ed433799)

2022-11-18 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from e102b87da fix clippy
 add 8ed433799 Apply suggestions from code review

No new revisions were added by this update.

Summary of changes:
 parquet/src/file/properties.rs | 2 +-
 parquet/src/file/reader.rs | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (b754a9ee1 -> e102b87da)

2022-11-16 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from b754a9ee1 fix doc
 add 8dcd75e80 add unit test
 add e102b87da fix clippy

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs | 43 +++--
 parquet/src/file/properties.rs  |  2 +-
 2 files changed, 38 insertions(+), 7 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (8ded33a6c -> b754a9ee1)

2022-11-16 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 8ded33a6c incorporate ndv and fpp
 add b754a9ee1 fix doc

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (414097535 -> 8ded33a6c)

2022-11-16 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 414097535 remove default feature for twox
 add 8ded33a6c incorporate ndv and fpp

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs  | 48 
 parquet/src/column/writer/mod.rs | 16 --
 parquet/src/file/properties.rs   | 14 +++-
 3 files changed, 75 insertions(+), 3 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (a30bd097a -> 414097535)

2022-11-16 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from a30bd097a fix clippy
 add 414097535 remove default feature for twox

No new revisions were added by this update.

Summary of changes:
 parquet/Cargo.toml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[arrow-rs] branch add-bloom-filter-3 updated (93afb6c01 -> a30bd097a)

2022-11-16 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 93afb6c01 fix clippy
 add a30bd097a fix clippy

No new revisions were added by this update.

Summary of changes:
 parquet/src/bin/parquet-show-bloom-filter.rs | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (872473dc4 -> 93afb6c01)

2022-11-16 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 872473dc4 update row group vec
 add 93afb6c01 fix clippy

No new revisions were added by this update.

Summary of changes:
 parquet/src/file/properties.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (9b55ab6fb -> 872473dc4)

2022-11-16 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 9b55ab6fb bloom filter part III
 add 872473dc4 update row group vec

No new revisions were added by this update.

Summary of changes:
 parquet/src/file/writer.rs | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (ec3b5d0bd -> 9b55ab6fb)

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


omit ec3b5d0bd eager reading
omit 7ee3aa250 add rustdoc
omit acd26ce64 get rid of mention of bloom feature
omit 09ee38ccb add feature flag
 add 9b55ab6fb bloom filter part III

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (ec3b5d0bd)
\
 N -- N -- N   refs/heads/add-bloom-filter-3 (9b55ab6fb)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 parquet/src/file/properties.rs| 6 +-
 parquet/src/file/serialized_reader.rs | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (acd26ce64 -> ec3b5d0bd)

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from acd26ce64 get rid of mention of bloom feature
 add 7ee3aa250 add rustdoc
 add ec3b5d0bd eager reading

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs   |  3 +++
 parquet/src/file/metadata.rs  |  2 +-
 parquet/src/file/properties.rs| 21 +
 parquet/src/file/reader.rs|  2 +-
 parquet/src/file/serialized_reader.rs | 27 +++
 5 files changed, 45 insertions(+), 10 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (09ee38ccb -> acd26ce64)

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 09ee38ccb add feature flag
 add acd26ce64 get rid of mention of bloom feature

No new revisions were added by this update.

Summary of changes:
 parquet/Cargo.toml| 10 --
 parquet/src/column/writer/mod.rs  |  5 -
 parquet/src/file/reader.rs|  2 --
 parquet/src/file/serialized_reader.rs |  2 --
 parquet/src/file/writer.rs|  9 -
 parquet/src/lib.rs|  1 -
 6 files changed, 4 insertions(+), 25 deletions(-)



[arrow-rs] branch add-bloom-filter-3 updated (52bf18a94 -> 09ee38ccb)

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


 discard 52bf18a94 add feature flag
 discard f54178ac3 parquet bloom filter part II: read sbbf bitset from row 
group reader, update API, and add cli demo (#3102)
 discard fb167f6bf Include field name in merge error message (#3113)
 discard ec4c040d4 Expose `SortingColumn` in parquet files (#3103)
 discard b45790b30 Parse Time32/Time64 from formatted string (#3101)
 add 371ec57e3 Expose `SortingColumn` in parquet files (#3103)
 add c99d2f333 Include field name in merge error message (#3113)
 add c95eb4c80 Parse Time32/Time64 from formatted string (#3101)
 add 73d66d837 parquet bloom filter part II: read sbbf bitset from row 
group reader, update API, and add cli demo (#3102)
 add 09ee38ccb add feature flag

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (52bf18a94)
\
 N -- N -- N   refs/heads/add-bloom-filter-3 (09ee38ccb)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 ...67-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet.crc | Bin 16 -> 0 bytes
 bla.parquet/_SUCCESS  |   0
 ...d3a667-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet | Bin 587 -> 0 bytes
 3 files changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 
bla.parquet/.part-0-e0d3a667-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet.crc
 delete mode 100644 bla.parquet/_SUCCESS
 delete mode 100644 
bla.parquet/part-0-e0d3a667-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet



[arrow-rs] branch add-bloom-filter-3 updated (76aa88f6f -> 52bf18a94)

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


 discard 76aa88f6f add feature flag
 discard 881fcb10f parquet bloom filter part II: read sbbf bitset from row 
group reader, update API, and add cli demo (#3102)
 discard 287d16d3a Include field name in merge error message (#3113)
 discard 3baf6eb17 Parse Time32/Time64 from formatted string (#3101)
omit 371ec57e3 Expose `SortingColumn` in parquet files (#3103)
 add b45790b30 Parse Time32/Time64 from formatted string (#3101)
 add ec4c040d4 Expose `SortingColumn` in parquet files (#3103)
 add fb167f6bf Include field name in merge error message (#3113)
 add f54178ac3 parquet bloom filter part II: read sbbf bitset from row 
group reader, update API, and add cli demo (#3102)
 add 52bf18a94 add feature flag

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (76aa88f6f)
\
 N -- N -- N   refs/heads/add-bloom-filter-3 (52bf18a94)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:



[arrow-rs] branch add-bloom-filter-3 updated (5d7624860 -> 76aa88f6f)

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


omit 5d7624860 encode writing
omit 9b8a0f515 write out to bloom filter
omit 63fa6434a add writer properties
omit 777b0dc6f add column setter
omit 415c6fbb6 update help
omit 86673694f remove unused trait
omit a9480ad55 rework api
omit bd2fb2fd5 refactor to test
omit e7a33b693 get rid of loop read
omit 3ec6e292c remove extern crate
omit f0041d363 parquet-show-bloom-filter with bloom feature required
omit 1bc73cd46 update read method
omit f8b7a2781 adjust byte size
omit fa3639cca fix clippy
omit 7a51342e8 remove unused
omit c66d7a00a add bin
omit 5f4deae63 add a binary to demo
omit efd89916a refactor
omit 88cea8052 fix reading with chunk reader
omit 2557f2c4a add api
omit d5458bbdc add feature flag
 add fc06c84f4 Implements more temporal kernels using time_fraction_dyn 
(#3107)
 add 19f372d82 cast: unsigned numeric type with decimal (#3106)
 add 81ce601be Update instructions for new crates (#3111)
 add b0b5d8b4f Add PrimitiveArray::unary_opt (#3110)
 add 5c2801d08 Add downcast_array (#2901) (#3117)
 add 7d41e1c19 Check overflow while casting between decimal types (#3076)
 add 8bb2917ee Remove Option from `Field::metadata` (#3091)
 add 371ec57e3 Expose `SortingColumn` in parquet files (#3103)
 add 3baf6eb17 Parse Time32/Time64 from formatted string (#3101)
 add 287d16d3a Include field name in merge error message (#3113)
 add 881fcb10f parquet bloom filter part II: read sbbf bitset from row 
group reader, update API, and add cli demo (#3102)
 add 76aa88f6f add feature flag

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (5d7624860)
\
 N -- N -- N   refs/heads/add-bloom-filter-3 (76aa88f6f)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 arrow-array/src/array/primitive_array.rs   |  66 +++
 arrow-array/src/builder/struct_builder.rs  |   2 +-
 arrow-array/src/cast.rs|  33 ++
 arrow-cast/src/cast.rs | 638 +
 arrow-cast/src/parse.rs| 420 +-
 arrow-csv/src/reader.rs|  35 ++
 arrow-integration-test/src/field.rs|  10 +-
 arrow-integration-test/src/lib.rs  | 171 +++---
 arrow-ipc/src/convert.rs   |  30 +-
 arrow-schema/src/datatype.rs   |   6 +-
 arrow-schema/src/field.rs  | 113 ++--
 arrow-schema/src/schema.rs |  62 +-
 arrow/src/compute/kernels/temporal.rs  | 305 +-
 ...-4f6e-a9b1-084447078e60-c000.snappy.parquet.crc | Bin 0 -> 16 bytes
 bla.parquet/_SUCCESS   |   0
 ...4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet | Bin 0 -> 587 bytes
 dev/release/README.md  |   2 +
 parquet/src/arrow/arrow_reader/mod.rs  |   2 +-
 parquet/src/arrow/schema/complex.rs|  10 +-
 parquet/src/file/metadata.rs   |  21 +-
 parquet/src/file/properties.rs |  16 +
 parquet/src/file/writer.rs |  61 ++
 22 files changed, 1399 insertions(+), 604 deletions(-)
 create mode 100644 
bla.parquet/.part-0-e0d3a667-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet.crc
 create mode 100644 bla.parquet/_SUCCESS
 create mode 100644 
bla.parquet/part-0-e0d3a667-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet



[arrow-rs] 02/04: add writer properties

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git

commit 63fa6434aca28e9f646bf6840334fdf6fe0abedc
Author: Jiayu Liu 
AuthorDate: Tue Nov 15 21:23:02 2022 +0800

add writer properties
---
 parquet/src/file/properties.rs | 148 ++---
 1 file changed, 80 insertions(+), 68 deletions(-)

diff --git a/parquet/src/file/properties.rs b/parquet/src/file/properties.rs
index c0e789ca1..c62bfe0bc 100644
--- a/parquet/src/file/properties.rs
+++ b/parquet/src/file/properties.rs
@@ -64,6 +64,7 @@
 //! .build();
 //! ```
 
+use paste::paste;
 use std::{collections::HashMap, sync::Arc};
 
 use crate::basic::{Compression, Encoding};
@@ -81,6 +82,9 @@ const DEFAULT_STATISTICS_ENABLED: EnabledStatistics = 
EnabledStatistics::Page;
 const DEFAULT_MAX_STATISTICS_SIZE: usize = 4096;
 const DEFAULT_MAX_ROW_GROUP_SIZE: usize = 1024 * 1024;
 const DEFAULT_CREATED_BY:  = env!("PARQUET_CREATED_BY");
+const DEFAULT_BLOOM_FILTER_ENABLED: bool = false;
+const DEFAULT_BLOOM_FILTER_MAX_BYTES: u32 = 1024 * 1024;
+const DEFAULT_BLOOM_FILTER_FPP: f64 = 0.01;
 
 /// Parquet writer version.
 ///
@@ -123,6 +127,26 @@ pub struct WriterProperties {
 column_properties: HashMap,
 }
 
+macro_rules! def_col_property_getter {
+($field:ident, $field_type:ty) => {
+pub fn $field(, col: ) -> Option<$field_type> {
+self.column_properties
+.get(col)
+.and_then(|c| c.$field())
+.or_else(|| self.default_column_properties.$field())
+}
+};
+($field:ident, $field_type:ty, $default_val:expr) => {
+pub fn $field(, col: ) -> $field_type {
+self.column_properties
+.get(col)
+.and_then(|c| c.$field())
+.or_else(|| self.default_column_properties.$field())
+.unwrap_or($default_val)
+}
+};
+}
+
 impl WriterProperties {
 /// Returns builder for writer properties with default values.
 pub fn builder() -> WriterPropertiesBuilder {
@@ -249,14 +273,10 @@ impl WriterProperties {
 .unwrap_or(DEFAULT_MAX_STATISTICS_SIZE)
 }
 
-/// Returns `true` if bloom filter is enabled for a column.
-pub fn bloom_filter_enabled(, col: ) -> bool {
-self.column_properties
-.get(col)
-.and_then(|c| c.bloom_filter_enabled())
-.or_else(|| self.default_column_properties.bloom_filter_enabled())
-.unwrap_or(false)
-}
+def_col_property_getter!(bloom_filter_enabled, bool, 
DEFAULT_BLOOM_FILTER_ENABLED);
+def_col_property_getter!(bloom_filter_fpp, f64, DEFAULT_BLOOM_FILTER_FPP);
+def_col_property_getter!(bloom_filter_ndv, u64);
+def_col_property_getter!(bloom_filter_max_bytes, u32, 
DEFAULT_BLOOM_FILTER_MAX_BYTES);
 }
 
 /// Writer properties builder.
@@ -273,16 +293,40 @@ pub struct WriterPropertiesBuilder {
 column_properties: HashMap,
 }
 
-macro_rules! def_per_col_setter {
-($field:ident, $field_type:expr) => {
-// The macro will expand into the contents of this block.
-pub fn concat_idents!(set_, $field)(mut self, value: $field_type) -> 
Self {
-self.$field = value;
-self
+macro_rules! def_opt_field_setter {
+($field: ident, $type: ty) => {
+paste! {
+pub fn []( self, value: $type) ->  Self {
+self.$field = Some(value);
+self
+}
+}
+};
+}
+
+macro_rules! def_opt_field_getter {
+($field: ident, $type: ty) => {
+paste! {
+#[doc = "Returns " $field " if set."]
+pub fn $field() -> Option<$type> {
+self.$field
+}
 }
 };
 }
 
+macro_rules! def_per_col_setter {
+($field:ident, $field_type:ty) => {
+paste! {
+#[doc = "Sets " $field " for a column. Takes precedence over 
globally defined settings."]
+pub fn [](mut self, col: ColumnPath, value: 
$field_type) -> Self {
+self.get_mut_props(col).[](value);
+self
+}
+}
+}
+}
+
 impl WriterPropertiesBuilder {
 /// Returns default state of the builder.
 fn with_defaults() -> Self {
@@ -325,8 +369,6 @@ impl WriterPropertiesBuilder {
 self
 }
 
-def_per_col_setter!(writer_version, WriterVersion);
-
 /// Sets best effort maximum size of a data page in bytes.
 ///
 /// Note: this is a best effort limit based on value of
@@ -498,16 +540,10 @@ impl WriterPropertiesBuilder {
 self
 }
 
-/// Sets bloom filter enabled for a column.
-/// Takes precedence over globally defined settings.
-pub fn set_column_bloom_filter_enabled(
-   

[arrow-rs] 01/04: add column setter

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git

commit 777b0dc6f7d4a08af896772893071681c9d17b21
Author: Jiayu Liu 
AuthorDate: Tue Nov 15 20:53:32 2022 +0800

add column setter
---
 parquet/Cargo.toml |   1 +
 parquet/src/file/properties.rs | 102 +++--
 2 files changed, 89 insertions(+), 14 deletions(-)

diff --git a/parquet/Cargo.toml b/parquet/Cargo.toml
index fc7c8218a..72baaf338 100644
--- a/parquet/Cargo.toml
+++ b/parquet/Cargo.toml
@@ -58,6 +58,7 @@ futures = { version = "0.3", default-features = false, 
features = ["std"], optio
 tokio = { version = "1.0", optional = true, default-features = false, features 
= ["macros", "rt", "io-util"] }
 hashbrown = { version = "0.13", default-features = false }
 twox-hash = { version = "1.6", optional = true }
+paste = "1.0"
 
 [dev-dependencies]
 base64 = { version = "0.13", default-features = false, features = ["std"] }
diff --git a/parquet/src/file/properties.rs b/parquet/src/file/properties.rs
index cf821df21..c0e789ca1 100644
--- a/parquet/src/file/properties.rs
+++ b/parquet/src/file/properties.rs
@@ -248,6 +248,15 @@ impl WriterProperties {
 .or_else(|| self.default_column_properties.max_statistics_size())
 .unwrap_or(DEFAULT_MAX_STATISTICS_SIZE)
 }
+
+/// Returns `true` if bloom filter is enabled for a column.
+pub fn bloom_filter_enabled(, col: ) -> bool {
+self.column_properties
+.get(col)
+.and_then(|c| c.bloom_filter_enabled())
+.or_else(|| self.default_column_properties.bloom_filter_enabled())
+.unwrap_or(false)
+}
 }
 
 /// Writer properties builder.
@@ -264,6 +273,16 @@ pub struct WriterPropertiesBuilder {
 column_properties: HashMap,
 }
 
+macro_rules! def_per_col_setter {
+($field:ident, $field_type:expr) => {
+// The macro will expand into the contents of this block.
+pub fn concat_idents!(set_, $field)(mut self, value: $field_type) -> 
Self {
+self.$field = value;
+self
+}
+};
+}
+
 impl WriterPropertiesBuilder {
 /// Returns default state of the builder.
 fn with_defaults() -> Self {
@@ -276,7 +295,7 @@ impl WriterPropertiesBuilder {
 writer_version: DEFAULT_WRITER_VERSION,
 created_by: DEFAULT_CREATED_BY.to_string(),
 key_value_metadata: None,
-default_column_properties: ColumnProperties::new(),
+default_column_properties: Default::default(),
 column_properties: HashMap::new(),
 }
 }
@@ -306,6 +325,8 @@ impl WriterPropertiesBuilder {
 self
 }
 
+def_per_col_setter!(writer_version, WriterVersion);
+
 /// Sets best effort maximum size of a data page in bytes.
 ///
 /// Note: this is a best effort limit based on value of
@@ -423,7 +444,7 @@ impl WriterPropertiesBuilder {
 fn get_mut_props( self, col: ColumnPath) ->  ColumnProperties {
 self.column_properties
 .entry(col)
-.or_insert_with(ColumnProperties::new)
+.or_insert_with(Default::default)
 }
 
 /// Sets encoding for a column.
@@ -476,6 +497,17 @@ impl WriterPropertiesBuilder {
 self.get_mut_props(col).set_max_statistics_size(value);
 self
 }
+
+/// Sets bloom filter enabled for a column.
+/// Takes precedence over globally defined settings.
+pub fn set_column_bloom_filter_enabled(
+mut self,
+col: ColumnPath,
+value: bool,
+) -> Self {
+self.get_mut_props(col).set_bloom_filter_enabled(value);
+self
+}
 }
 
 /// Controls the level of statistics to be computed by the writer
@@ -499,27 +531,24 @@ impl Default for EnabledStatistics {
 ///
 /// If a field is `None`, it means that no specific value has been set for 
this column,
 /// so some subsequent or default value must be used.
-#[derive(Debug, Clone, PartialEq)]
+#[derive(Debug, Clone, Default, PartialEq)]
 struct ColumnProperties {
 encoding: Option,
 codec: Option,
 dictionary_enabled: Option,
 statistics_enabled: Option,
 max_statistics_size: Option,
+/// bloom filter enabled
+bloom_filter_enabled: Option,
+/// bloom filter expected number of distinct values
+bloom_filter_ndv: Option,
+/// bloom filter false positive probability
+bloom_filter_fpp: Option,
+/// bloom filter max number of bytes
+bloom_filter_max_bytes: Option,
 }
 
 impl ColumnProperties {
-/// Initialise column properties with default values.
-fn new() -> Self {
-Self {
-encoding: None,
-codec: None,
-dictionary_enabled: None,
- 

[arrow-rs] 03/04: write out to bloom filter

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git

commit 9b8a0f51517b3235ccd57461f439a400dbbee4c1
Author: Jiayu Liu 
AuthorDate: Tue Nov 15 21:47:56 2022 +0800

write out to bloom filter
---
 parquet/src/bloom_filter/mod.rs  |  1 +
 parquet/src/column/writer/mod.rs | 15 ++
 parquet/src/file/writer.rs   | 45 ++--
 3 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/parquet/src/bloom_filter/mod.rs b/parquet/src/bloom_filter/mod.rs
index 4944a93f8..d0bee8a5f 100644
--- a/parquet/src/bloom_filter/mod.rs
+++ b/parquet/src/bloom_filter/mod.rs
@@ -80,6 +80,7 @@ fn block_check(block: , hash: u32) -> bool {
 }
 
 /// A split block Bloom filter
+#[derive(Debug, Clone)]
 pub struct Sbbf(Vec);
 
 const SBBF_HEADER_SIZE_ESTIMATE: usize = 20;
diff --git a/parquet/src/column/writer/mod.rs b/parquet/src/column/writer/mod.rs
index 3cdf04f54..f8e79d792 100644
--- a/parquet/src/column/writer/mod.rs
+++ b/parquet/src/column/writer/mod.rs
@@ -16,6 +16,9 @@
 // under the License.
 
 //! Contains column writer API.
+
+#[cfg(feature = "bloom")]
+use crate::bloom_filter::Sbbf;
 use crate::format::{ColumnIndex, OffsetIndex};
 use std::collections::{BTreeSet, VecDeque};
 
@@ -154,6 +157,9 @@ pub struct ColumnCloseResult {
 pub rows_written: u64,
 /// Metadata for this column chunk
 pub metadata: ColumnChunkMetaData,
+/// Optional bloom filter for this column
+#[cfg(feature = "bloom")]
+pub bloom_filter: Option,
 /// Optional column index, for filtering
 pub column_index: Option,
 /// Optional offset index, identifying page locations
@@ -209,6 +215,10 @@ pub struct GenericColumnWriter<'a, E: ColumnValueEncoder> {
 rep_levels_sink: Vec,
 data_pages: VecDeque,
 
+// bloom filter
+#[cfg(feature = "bloom")]
+bloom_filter: Option,
+
 // column index and offset index
 column_index_builder: ColumnIndexBuilder,
 offset_index_builder: OffsetIndexBuilder,
@@ -260,6 +270,9 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> {
 num_column_nulls: 0,
 column_distinct_count: None,
 },
+// TODO!
+#[cfg(feature = "bloom")]
+bloom_filter: None,
 column_index_builder: ColumnIndexBuilder::new(),
 offset_index_builder: OffsetIndexBuilder::new(),
 encodings,
@@ -458,6 +471,8 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> {
 Ok(ColumnCloseResult {
 bytes_written: self.column_metrics.total_bytes_written,
 rows_written: self.column_metrics.total_rows_written,
+#[cfg(feature = "bloom")]
+bloom_filter: self.bloom_filter,
 metadata,
 column_index,
 offset_index,
diff --git a/parquet/src/file/writer.rs b/parquet/src/file/writer.rs
index 2efaf7caf..90c9b6bfc 100644
--- a/parquet/src/file/writer.rs
+++ b/parquet/src/file/writer.rs
@@ -18,10 +18,11 @@
 //! Contains file writer API, and provides methods to write row groups and 
columns by
 //! using row group writers and column writers respectively.
 
-use std::{io::Write, sync::Arc};
-
+#[cfg(feature = "bloom")]
+use crate::bloom_filter::Sbbf;
 use crate::format as parquet;
 use crate::format::{ColumnIndex, OffsetIndex, RowGroup};
+use std::{io::Write, sync::Arc};
 use thrift::protocol::{TCompactOutputProtocol, TOutputProtocol, TSerializable};
 
 use crate::basic::PageType;
@@ -116,6 +117,8 @@ pub struct SerializedFileWriter {
 descr: SchemaDescPtr,
 props: WriterPropertiesPtr,
 row_groups: Vec,
+#[cfg(feature = "bloom")]
+bloom_filters: Vec>>,
 column_indexes: Vec>>,
 offset_indexes: Vec>>,
 row_group_index: usize,
@@ -132,6 +135,8 @@ impl SerializedFileWriter {
 descr: Arc::new(SchemaDescriptor::new(schema)),
 props: properties,
 row_groups: vec![],
+#[cfg(feature = "bloom")]
+bloom_filters: vec![],
 column_indexes: Vec::new(),
 offset_indexes: Vec::new(),
 row_group_index: 0,
@@ -212,6 +217,32 @@ impl SerializedFileWriter {
 Ok(())
 }
 
+#[cfg(feature = "bloom")]
+/// Serialize all the bloom filter to the file
+fn write_bloom_filters( self, row_groups:  [RowGroup]) -> 
Result<()> {
+// iter row group
+// iter each column
+// write bloom filter to the file
+for (row_group_idx, row_group) in row_groups.iter_mut().enumerate() {
+for (column_idx, column_metadata) in 
row_group.columns.iter_mut().enumerate()
+{
+match _filters[row_group_idx][column_idx]

[arrow-rs] 04/04: encode writing

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git

commit 5d7624860d3c8aed11b4ed04b0d35ccbcc1802f2
Author: Jiayu Liu 
AuthorDate: Tue Nov 15 23:01:26 2022 +0800

encode writing
---
 parquet/src/bloom_filter/mod.rs | 26 ++
 parquet/src/file/writer.rs  | 16 +++-
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/parquet/src/bloom_filter/mod.rs b/parquet/src/bloom_filter/mod.rs
index d0bee8a5f..0122a3a76 100644
--- a/parquet/src/bloom_filter/mod.rs
+++ b/parquet/src/bloom_filter/mod.rs
@@ -24,9 +24,11 @@ use crate::file::metadata::ColumnChunkMetaData;
 use crate::file::reader::ChunkReader;
 use crate::format::{
 BloomFilterAlgorithm, BloomFilterCompression, BloomFilterHash, 
BloomFilterHeader,
+SplitBlockAlgorithm, Uncompressed, XxHash,
 };
 use bytes::{Buf, Bytes};
 use std::hash::Hasher;
+use std::io::Write;
 use std::sync::Arc;
 use thrift::protocol::{TCompactInputProtocol, TSerializable};
 use twox_hash::XxHash64;
@@ -129,6 +131,30 @@ impl Sbbf {
 Self(data)
 }
 
+pub fn write_bitset(, mut writer: W) -> Result<(), 
ParquetError> {
+for block in  {
+for word in block {
+writer.write_all(_le_bytes()).map_err(|e| {
+ParquetError::General(format!(
+"Could not write bloom filter bit set: {}",
+e
+))
+})?;
+}
+}
+Ok(())
+}
+
+pub fn header() -> BloomFilterHeader {
+BloomFilterHeader {
+// 8 i32 per block, 4 bytes per i32
+num_bytes: self.0.len() as i32 * 4 * 8,
+algorithm: BloomFilterAlgorithm::BLOCK(SplitBlockAlgorithm {}),
+hash: BloomFilterHash::XXHASH(XxHash {}),
+compression: BloomFilterCompression::UNCOMPRESSED(Uncompressed {}),
+}
+}
+
 pub fn read_from_column_chunk(
 column_metadata: ,
 reader: Arc,
diff --git a/parquet/src/file/writer.rs b/parquet/src/file/writer.rs
index 90c9b6bfc..bf6ec93fa 100644
--- a/parquet/src/file/writer.rs
+++ b/parquet/src/file/writer.rs
@@ -230,11 +230,16 @@ impl SerializedFileWriter {
 Some(bloom_filter) => {
 let start_offset = self.buf.bytes_written();
 let mut protocol = TCompactOutputProtocol::new( 
self.buf);
-bloom_filter.write_to_out_protocol( protocol)?;
+let header = bloom_filter.header();
+header.write_to_out_protocol( protocol)?;
 protocol.flush()?;
-let end_offset = self.buf.bytes_written();
+bloom_filter.write_bitset( self.buf)?;
 // set offset and index for bloom filter
-column_metadata.bloom_filter_offset = 
Some(start_offset as i64);
+column_metadata
+.meta_data
+.as_mut()
+.expect("can't have bloom filter without column 
metadata")
+.bloom_filter_offset = Some(start_offset as i64);
 }
 None => {}
 }
@@ -424,10 +429,10 @@ impl<'a, W: Write> SerializedRowGroupWriter<'a, W> {
 // Update row group writer metrics
 *total_bytes_written += r.bytes_written;
 column_chunks.push(r.metadata);
-column_indexes.push(r.column_index);
-offset_indexes.push(r.offset_index);
 #[cfg(feature = "bloom")]
 bloom_filters.push(r.bloom_filter);
+column_indexes.push(r.column_index);
+offset_indexes.push(r.offset_index);
 
 if let Some(rows) = *total_rows_written {
 if rows != r.rows_written {
@@ -663,6 +668,7 @@ impl<'a, W: Write> PageWriter for SerializedPageWriter<'a, 
W> {
 
 Ok(spec)
 }
+
 fn write_metadata( self, metadata: ) -> Result<()> 
{
 let mut protocol = TCompactOutputProtocol::new( self.sink);
 metadata



[arrow-rs] branch add-bloom-filter-3 created (now 5d7624860)

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-3
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


  at 5d7624860 encode writing

This branch includes the following new commits:

 new 777b0dc6f add column setter
 new 63fa6434a add writer properties
 new 9b8a0f515 write out to bloom filter
 new 5d7624860 encode writing

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-rs] branch add-bloom-filter-2 updated (86673694f -> 415c6fbb6)

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 86673694f remove unused trait
 add 415c6fbb6 update help

No new revisions were added by this update.

Summary of changes:
 parquet/src/bin/parquet-show-bloom-filter.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[arrow-rs] branch add-bloom-filter-2 updated (a9480ad55 -> 86673694f)

2022-11-15 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from a9480ad55 rework api
 add 86673694f remove unused trait

No new revisions were added by this update.

Summary of changes:
 parquet/src/bin/parquet-show-bloom-filter.rs | 1 -
 1 file changed, 1 deletion(-)



[arrow-rs] branch add-bloom-filter-2 updated (bd2fb2fd5 -> a9480ad55)

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from bd2fb2fd5 refactor to test
 add a9480ad55 rework api

No new revisions were added by this update.

Summary of changes:
 parquet/src/bin/parquet-show-bloom-filter.rs |  4 ++--
 parquet/src/bloom_filter/mod.rs  | 25 ++---
 2 files changed, 20 insertions(+), 9 deletions(-)



[arrow-rs] branch add-bloom-filter-2 updated (e7a33b693 -> bd2fb2fd5)

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from e7a33b693 get rid of loop read
 add bd2fb2fd5 refactor to test

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs | 58 -
 1 file changed, 52 insertions(+), 6 deletions(-)



[arrow-rs] branch add-bloom-filter-2 updated (f0041d363 -> e7a33b693)

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from f0041d363 parquet-show-bloom-filter with bloom feature required
 add 3ec6e292c remove extern crate
 add e7a33b693 get rid of loop read

No new revisions were added by this update.

Summary of changes:
 parquet/src/bin/parquet-read.rs  |  2 --
 parquet/src/bin/parquet-rowcount.rs  |  1 -
 parquet/src/bin/parquet-schema.rs|  1 -
 parquet/src/bin/parquet-show-bloom-filter.rs |  1 -
 parquet/src/bloom_filter/mod.rs  | 36 +++-
 5 files changed, 14 insertions(+), 27 deletions(-)



[arrow-rs] branch add-bloom-filter-2 updated: parquet-show-bloom-filter with bloom feature required

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/add-bloom-filter-2 by this 
push:
 new f0041d363 parquet-show-bloom-filter with bloom feature required
f0041d363 is described below

commit f0041d363a20dff1bb65f566f9c958de2f733775
Author: Jiayu Liu 
AuthorDate: Mon Nov 14 22:03:50 2022 +0800

parquet-show-bloom-filter with bloom feature required
---
 parquet/Cargo.toml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/parquet/Cargo.toml b/parquet/Cargo.toml
index 50fdac5f6..fc7c8218a 100644
--- a/parquet/Cargo.toml
+++ b/parquet/Cargo.toml
@@ -115,7 +115,7 @@ required-features = ["arrow", "cli"]
 
 [[bin]]
 name = "parquet-show-bloom-filter"
-required-features = ["cli"]
+required-features = ["cli", "bloom"]
 
 [[bench]]
 name = "arrow_writer"



[arrow-rs] branch add-bloom-filter-2 updated (f8b7a2781 -> 1bc73cd46)

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from f8b7a2781 adjust byte size
 add 1bc73cd46 update read method

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs | 23 +--
 1 file changed, 5 insertions(+), 18 deletions(-)



[arrow-rs] branch add-bloom-filter-2 updated (fa3639cca -> f8b7a2781)

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from fa3639cca fix clippy
 add f8b7a2781 adjust byte size

No new revisions were added by this update.

Summary of changes:
 parquet/src/bin/parquet-show-bloom-filter.rs |  2 +-
 parquet/src/bloom_filter/mod.rs  | 10 --
 2 files changed, 9 insertions(+), 3 deletions(-)



[arrow-rs] branch add-bloom-filter-2 updated (7a51342e8 -> fa3639cca)

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 7a51342e8 remove unused
 add fa3639cca fix clippy

No new revisions were added by this update.

Summary of changes:
 .github/workflows/arrow.yml  | 2 ++
 parquet/src/bin/parquet-show-bloom-filter.rs | 3 +--
 2 files changed, 3 insertions(+), 2 deletions(-)



[arrow-rs] branch add-bloom-filter-2 updated (c66d7a00a -> 7a51342e8)

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from c66d7a00a add bin
 add 7a51342e8 remove unused

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs | 1 -
 1 file changed, 1 deletion(-)



[arrow-rs] branch add-bloom-filter-2 updated (74a191c9a -> c66d7a00a)

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


 discard 74a191c9a add bin
 discard e8273d0f4 add a binary to demo
 discard 2562f9770 refactor
 discard c685f0c2c fix reading with chunk reader
 discard d3d407b29 add api
 discard 5e200d981 add feature flag
 add 20d81f578 Add FixedSizeBinaryArray::try_from_sparse_iter_with_size 
(#3054)
 add 46da60642 Cleanup temporal _internal functions (#3099)
 add 430eb84d0 Improve schema mismatch error message (#3098)
 add 0900be278 Upgrade to thrift 0.17 and fix issues (#3104)
 add d5458bbdc add feature flag
 add 2557f2c4a add api
 add 88cea8052 fix reading with chunk reader
 add efd89916a refactor
 add 5f4deae63 add a binary to demo
 add c66d7a00a add bin

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (74a191c9a)
\
 N -- N -- N   refs/heads/add-bloom-filter-2 (c66d7a00a)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 arrow-array/src/array/fixed_size_binary_array.rs | 121 +-
 arrow-schema/src/field.rs|  37 +-
 arrow-select/src/take.rs |   7 +-
 arrow/src/array/ffi.rs   |   3 +-
 arrow/src/compute/kernels/comparison.rs  |  41 ++
 arrow/src/compute/kernels/sort.rs|  15 +-
 arrow/src/compute/kernels/substring.rs   |   3 +-
 arrow/src/compute/kernels/temporal.rs| 152 ++-
 arrow/src/ffi.rs |   8 +-
 arrow/src/row/dictionary.rs  |   2 +-
 arrow/src/util/bench_util.rs |  25 +-
 arrow/tests/array_transform.rs   |   9 +-
 parquet/Cargo.toml   |   2 +-
 parquet/src/arrow/async_reader.rs|   2 +-
 parquet/src/bloom_filter/mod.rs  |   3 +-
 parquet/src/file/footer.rs   |   2 +-
 parquet/src/file/page_index/index_reader.rs  |   2 +-
 parquet/src/file/serialized_reader.rs|   4 +-
 parquet/src/file/writer.rs   |   2 +-
 parquet/src/format.rs| 494 +++
 20 files changed, 579 insertions(+), 355 deletions(-)



[arrow-rs] branch add-bloom-filter-2 updated (2562f9770 -> 74a191c9a)

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 2562f9770 refactor
 add e8273d0f4 add a binary to demo
 add 74a191c9a add bin

No new revisions were added by this update.

Summary of changes:
 parquet/Cargo.toml   |   4 +
 parquet/src/bin/parquet-show-bloom-filter.rs | 113 +++
 2 files changed, 117 insertions(+)
 create mode 100644 parquet/src/bin/parquet-show-bloom-filter.rs



[arrow-rs] branch master updated: Upgrade to thrift 0.17 and fix issues (#3104)

2022-11-14 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 0900be278 Upgrade to thrift 0.17 and fix issues (#3104)
0900be278 is described below

commit 0900be27859974b8717185d65422c36d7e735b4e
Author: Jiayu Liu 
AuthorDate: Mon Nov 14 16:59:32 2022 +0800

Upgrade to thrift 0.17 and fix issues (#3104)

* test with thrift 0.17 and fix issues

* rebase

* remove databend prefix

* fix async reader

* fix doc err

* fix more doc items
---
 arrow/src/row/dictionary.rs |   2 +-
 parquet/Cargo.toml  |   2 +-
 parquet/src/arrow/async_reader.rs   |   2 +-
 parquet/src/bloom_filter/mod.rs |   2 +-
 parquet/src/file/footer.rs  |   2 +-
 parquet/src/file/page_index/index_reader.rs |   2 +-
 parquet/src/file/serialized_reader.rs   |   2 +-
 parquet/src/file/writer.rs  |   2 +-
 parquet/src/format.rs   | 494 ++--
 9 files changed, 330 insertions(+), 180 deletions(-)

diff --git a/arrow/src/row/dictionary.rs b/arrow/src/row/dictionary.rs
index d8426ad0c..82169a37d 100644
--- a/arrow/src/row/dictionary.rs
+++ b/arrow/src/row/dictionary.rs
@@ -260,7 +260,7 @@ unsafe fn decode_fixed(
 .add_buffer(buffer.into());
 
 // SAFETY: Buffers correct length
-unsafe { builder.build_unchecked() }
+builder.build_unchecked()
 }
 
 /// Decodes a `PrimitiveArray` from dictionary values
diff --git a/parquet/Cargo.toml b/parquet/Cargo.toml
index dda0518f9..a5d43bf54 100644
--- a/parquet/Cargo.toml
+++ b/parquet/Cargo.toml
@@ -41,7 +41,7 @@ arrow-ipc = { version = "27.0.0", path = "../arrow-ipc", 
default-features = fals
 
 ahash = { version = "0.8", default-features = false, features = 
["compile-time-rng"] }
 bytes = { version = "1.1", default-features = false, features = ["std"] }
-thrift = { version = "0.16", default-features = false }
+thrift = { version = "0.17", default-features = false }
 snap = { version = "1.0", default-features = false, optional = true }
 brotli = { version = "3.3", default-features = false, features = ["std"], 
optional = true }
 flate2 = { version = "1.0", default-features = false, features = 
["rust_backend"], optional = true }
diff --git a/parquet/src/arrow/async_reader.rs 
b/parquet/src/arrow/async_reader.rs
index d52fa0406..e182cccbc 100644
--- a/parquet/src/arrow/async_reader.rs
+++ b/parquet/src/arrow/async_reader.rs
@@ -89,7 +89,7 @@ use bytes::{Buf, Bytes};
 use futures::future::{BoxFuture, FutureExt};
 use futures::ready;
 use futures::stream::Stream;
-use thrift::protocol::TCompactInputProtocol;
+use thrift::protocol::{TCompactInputProtocol, TSerializable};
 
 use tokio::io::{AsyncRead, AsyncReadExt, AsyncSeek, AsyncSeekExt};
 
diff --git a/parquet/src/bloom_filter/mod.rs b/parquet/src/bloom_filter/mod.rs
index 770fb53e8..adfd87307 100644
--- a/parquet/src/bloom_filter/mod.rs
+++ b/parquet/src/bloom_filter/mod.rs
@@ -25,7 +25,7 @@ use crate::format::{
 };
 use std::hash::Hasher;
 use std::io::{Read, Seek, SeekFrom};
-use thrift::protocol::TCompactInputProtocol;
+use thrift::protocol::{TCompactInputProtocol, TSerializable};
 use twox_hash::XxHash64;
 
 /// Salt as defined in the 
[spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md#technical-approach)
diff --git a/parquet/src/file/footer.rs b/parquet/src/file/footer.rs
index e8a114db7..27c07b78d 100644
--- a/parquet/src/file/footer.rs
+++ b/parquet/src/file/footer.rs
@@ -18,7 +18,7 @@
 use std::{io::Read, sync::Arc};
 
 use crate::format::{ColumnOrder as TColumnOrder, FileMetaData as 
TFileMetaData};
-use thrift::protocol::TCompactInputProtocol;
+use thrift::protocol::{TCompactInputProtocol, TSerializable};
 
 use crate::basic::ColumnOrder;
 
diff --git a/parquet/src/file/page_index/index_reader.rs 
b/parquet/src/file/page_index/index_reader.rs
index 99877a921..af23c0bd9 100644
--- a/parquet/src/file/page_index/index_reader.rs
+++ b/parquet/src/file/page_index/index_reader.rs
@@ -23,7 +23,7 @@ use crate::file::page_index::index::{BooleanIndex, 
ByteArrayIndex, Index, Native
 use crate::file::reader::ChunkReader;
 use crate::format::{ColumnIndex, OffsetIndex, PageLocation};
 use std::io::{Cursor, Read};
-use thrift::protocol::TCompactInputProtocol;
+use thrift::protocol::{TCompactInputProtocol, TSerializable};
 
 /// Read on row group's all columns indexes and change into  [`Index`]
 /// If not the format not available return an empty vector.
diff --git a/parquet/src/file/serialized_reader.rs 
b/parquet/src/file/serialized_reader.rs
index a400d4dab..ebe87aca6 100644
--- a/parquet/src/file/serialized_

[arrow-rs] branch test-thrift-017 updated: fix more doc items

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch test-thrift-017
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/test-thrift-017 by this push:
 new 71560ff51 fix more doc items
71560ff51 is described below

commit 71560ff517a738571254949faf14ac68c6b02547
Author: Jiayu Liu 
AuthorDate: Mon Nov 14 15:29:01 2022 +0800

fix more doc items
---
 parquet/src/format.rs | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/parquet/src/format.rs b/parquet/src/format.rs
index 2e57fa4f3..0851b2287 100644
--- a/parquet/src/format.rs
+++ b/parquet/src/format.rs
@@ -4587,7 +4587,7 @@ impl TSerializable for OffsetIndex {
 //
 
 /// Description for ColumnIndex.
-/// Each \[i\] refers to the page at 
OffsetIndex.page_locations\[i\]
+/// Each ``\[i\] refers to the page at 
OffsetIndex.page_locations\[i\]
 #[derive(Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
 pub struct ColumnIndex {
   /// A list of Boolean values to determine the validity of the corresponding
@@ -4605,7 +4605,7 @@ pub struct ColumnIndex {
   /// that list entries are populated before using them by inspecting 
null_pages.
   pub min_values: Vec>,
   pub max_values: Vec>,
-  /// Stores whether both min_values and max_values are orderd and if so, in
+  /// Stores whether both min_values and max_values are ordered and if so, in
   /// which direction. This allows readers to perform binary searches in both
   /// lists. Readers cannot assume that max_values\[i\] <= min_values\[i+1\], 
even
   /// if the lists are ordered.
@@ -5049,7 +5049,7 @@ pub struct FileMetaData {
   /// Optional key/value metadata *
   pub key_value_metadata: Option>,
   /// String for application that wrote this file.  This should be in the 
format
-  ///  version  (build ).
+  /// `` version `` (build ``).
   /// e.g. impala version 1.0 (build 6cf94d29b2b7115df4de2c06e2ab4326d721eb55)
   ///
   pub created_by: Option,



[arrow-rs] branch test-thrift-017 updated: fix doc err

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch test-thrift-017
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/test-thrift-017 by this push:
 new 5ecc0d0c8 fix doc err
5ecc0d0c8 is described below

commit 5ecc0d0c87f283c46fb54c935244e6a57dce434d
Author: Jiayu Liu 
AuthorDate: Mon Nov 14 14:27:18 2022 +0800

fix doc err
---
 parquet/src/format.rs | 100 +-
 1 file changed, 50 insertions(+), 50 deletions(-)

diff --git a/parquet/src/format.rs b/parquet/src/format.rs
index 3d38dd531..2e57fa4f3 100644
--- a/parquet/src/format.rs
+++ b/parquet/src/format.rs
@@ -99,7 +99,7 @@ impl From<> for i32 {
 
 /// DEPRECATED: Common types used by frameworks(e.g. hive, pig) using parquet.
 /// ConvertedType is superseded by LogicalType.  This enum should not be 
extended.
-/// 
+///
 /// See LogicalTypes.md for conversion between ConvertedType and LogicalType.
 #[derive(Copy, Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
 pub struct ConvertedType(pub i32);
@@ -117,12 +117,12 @@ impl ConvertedType {
   /// an enum is converted into a binary field
   pub const ENUM: ConvertedType = ConvertedType(4);
   /// A decimal value.
-  /// 
+  ///
   /// This may be used to annotate binary or fixed primitive types. The
   /// underlying byte array stores the unscaled value encoded as two's
   /// complement using big-endian byte order (the most significant byte is the
   /// zeroth element). The value of the decimal is the value * 10^{-scale}.
-  /// 
+  ///
   /// This must be accompanied by a (maximum) precision and a scale in the
   /// SchemaElement. The precision specifies the number of digits in the 
decimal
   /// and the scale stores the location of the decimal point. For example 1.23
@@ -130,62 +130,62 @@ impl ConvertedType {
   /// 2 digits over).
   pub const DECIMAL: ConvertedType = ConvertedType(5);
   /// A Date
-  /// 
+  ///
   /// Stored as days since Unix epoch, encoded as the INT32 physical type.
-  /// 
+  ///
   pub const DATE: ConvertedType = ConvertedType(6);
   /// A time
-  /// 
+  ///
   /// The total number of milliseconds since midnight.  The value is stored
   /// as an INT32 physical type.
   pub const TIME_MILLIS: ConvertedType = ConvertedType(7);
   /// A time.
-  /// 
+  ///
   /// The total number of microseconds since midnight.  The value is stored as
   /// an INT64 physical type.
   pub const TIME_MICROS: ConvertedType = ConvertedType(8);
   /// A date/time combination
-  /// 
+  ///
   /// Date and time recorded as milliseconds since the Unix epoch.  Recorded as
   /// a physical type of INT64.
   pub const TIMESTAMP_MILLIS: ConvertedType = ConvertedType(9);
   /// A date/time combination
-  /// 
+  ///
   /// Date and time recorded as microseconds since the Unix epoch.  The value 
is
   /// stored as an INT64 physical type.
   pub const TIMESTAMP_MICROS: ConvertedType = ConvertedType(10);
   /// An unsigned integer value.
-  /// 
+  ///
   /// The number describes the maximum number of meaningful data bits in
   /// the stored value. 8, 16 and 32 bit values are stored using the
   /// INT32 physical type.  64 bit values are stored using the INT64
   /// physical type.
-  /// 
+  ///
   pub const UINT_8: ConvertedType = ConvertedType(11);
   pub const UINT_16: ConvertedType = ConvertedType(12);
   pub const UINT_32: ConvertedType = ConvertedType(13);
   pub const UINT_64: ConvertedType = ConvertedType(14);
   /// A signed integer value.
-  /// 
+  ///
   /// The number describes the maximum number of meaningful data bits in
   /// the stored value. 8, 16 and 32 bit values are stored using the
   /// INT32 physical type.  64 bit values are stored using the INT64
   /// physical type.
-  /// 
+  ///
   pub const INT_8: ConvertedType = ConvertedType(15);
   pub const INT_16: ConvertedType = ConvertedType(16);
   pub const INT_32: ConvertedType = ConvertedType(17);
   pub const INT_64: ConvertedType = ConvertedType(18);
   /// An embedded JSON document
-  /// 
+  ///
   /// A JSON document embedded within a single UTF8 column.
   pub const JSON: ConvertedType = ConvertedType(19);
   /// An embedded BSON document
-  /// 
+  ///
   /// A BSON document embedded within a single BINARY column.
   pub const BSON: ConvertedType = ConvertedType(20);
   /// An interval of time
-  /// 
+  ///
   /// This type annotates data stored as a FIXED_LEN_BYTE_ARRAY of length 12
   /// This data is composed of three separate little endian unsigned
   /// integers.  Each stores a component of a duration of time.  The first
@@ -443,11 +443,11 @@ impl From<> for i32 {
 }
 
 /// Supported compression algorithms.
-/// 
+///
 /// Codecs added in format version X.Y can be read by readers based on X.Y and 
later.
 /// Codec support may vary between readers based on the format version and
 /// libraries available at runtime.
-/// 
+

[arrow-rs] branch test-thrift-017 updated: fix async reader

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch test-thrift-017
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/test-thrift-017 by this push:
 new 2acdbd185 fix async reader
2acdbd185 is described below

commit 2acdbd18573c3421d839ff60b05a33b770ed7d3c
Author: Jiayu Liu 
AuthorDate: Mon Nov 14 11:12:51 2022 +0800

fix async reader
---
 parquet/src/arrow/async_reader.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/parquet/src/arrow/async_reader.rs 
b/parquet/src/arrow/async_reader.rs
index d52fa0406..e182cccbc 100644
--- a/parquet/src/arrow/async_reader.rs
+++ b/parquet/src/arrow/async_reader.rs
@@ -89,7 +89,7 @@ use bytes::{Buf, Bytes};
 use futures::future::{BoxFuture, FutureExt};
 use futures::ready;
 use futures::stream::Stream;
-use thrift::protocol::TCompactInputProtocol;
+use thrift::protocol::{TCompactInputProtocol, TSerializable};
 
 use tokio::io::{AsyncRead, AsyncReadExt, AsyncSeek, AsyncSeekExt};
 



[arrow-rs] branch test-thrift-017 created (now 0a4dca99e)

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch test-thrift-017
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


  at 0a4dca99e remove databend prefix

No new revisions were added by this update.



[arrow-rs] branch add-bloom-filter-2 updated (c685f0c2c -> 2562f9770)

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from c685f0c2c fix reading with chunk reader
 add 2562f9770 refactor

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs | 63 +++--
 1 file changed, 36 insertions(+), 27 deletions(-)



[arrow-rs] branch add-bloom-filter-2 updated (d3d407b29 -> c685f0c2c)

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from d3d407b29 add api
 add c685f0c2c fix reading with chunk reader

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs   | 59 +--
 parquet/src/file/serialized_reader.rs |  5 +--
 2 files changed, 46 insertions(+), 18 deletions(-)



[arrow-rs] 02/02: add api

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git

commit d3d407b293091bd71c04f865b0c7c896ac52d452
Author: Jiayu Liu 
AuthorDate: Sun Nov 13 13:24:10 2022 +

add api
---
 parquet/src/file/reader.rs|  6 ++
 parquet/src/file/serialized_reader.rs | 15 +++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/parquet/src/file/reader.rs b/parquet/src/file/reader.rs
index 70ff37a41..325944c21 100644
--- a/parquet/src/file/reader.rs
+++ b/parquet/src/file/reader.rs
@@ -21,6 +21,8 @@
 use bytes::Bytes;
 use std::{boxed::Box, io::Read, sync::Arc};
 
+#[cfg(feature = "bloom")]
+use crate::bloom_filter::Sbbf;
 use crate::column::page::PageIterator;
 use crate::column::{page::PageReader, reader::ColumnReader};
 use crate::errors::{ParquetError, Result};
@@ -143,6 +145,10 @@ pub trait RowGroupReader: Send + Sync {
 Ok(col_reader)
 }
 
+#[cfg(feature = "bloom")]
+/// Get bloom filter for the `i`th column chunk, if present.
+fn get_column_bloom_filter(, i: usize) -> Result>;
+
 /// Get iterator of `Row`s from this row group.
 ///
 /// Projected schema can be a subset of or equal to the file schema, when 
it is None,
diff --git a/parquet/src/file/serialized_reader.rs 
b/parquet/src/file/serialized_reader.rs
index a400d4dab..8cefe1c5e 100644
--- a/parquet/src/file/serialized_reader.rs
+++ b/parquet/src/file/serialized_reader.rs
@@ -22,11 +22,9 @@ use std::collections::VecDeque;
 use std::io::Cursor;
 use std::{convert::TryFrom, fs::File, io::Read, path::Path, sync::Arc};
 
-use crate::format::{PageHeader, PageLocation, PageType};
-use bytes::{Buf, Bytes};
-use thrift::protocol::TCompactInputProtocol;
-
 use crate::basic::{Encoding, Type};
+#[cfg(feature = "bloom")]
+use crate::bloom_filter::Sbbf;
 use crate::column::page::{Page, PageMetadata, PageReader};
 use crate::compression::{create_codec, Codec};
 use crate::errors::{ParquetError, Result};
@@ -38,10 +36,13 @@ use crate::file::{
 reader::*,
 statistics,
 };
+use crate::format::{PageHeader, PageLocation, PageType};
 use crate::record::reader::RowIter;
 use crate::record::Row;
 use crate::schema::types::Type as SchemaType;
 use crate::util::{io::TryClone, memory::ByteBufferPtr};
+use bytes::{Buf, Bytes};
+use thrift::protocol::TCompactInputProtocol;
 // export `SliceableCursor` and `FileSource` publically so clients can
 // re-use the logic in their own ParquetFileWriter wrappers
 pub use crate::util::io::FileSource;
@@ -387,6 +388,12 @@ impl<'a, R: 'static + ChunkReader> RowGroupReader for 
SerializedRowGroupReader<'
 )?))
 }
 
+#[cfg(feature = "bloom")]
+/// get bloom filter for the ith column
+fn get_column_bloom_filter(, i: usize) -> Result> {
+todo!()
+}
+
 fn get_row_iter(, projection: Option) -> Result {
 RowIter::from_row_group(projection, self)
 }



[arrow-rs] branch add-bloom-filter-2 created (now d3d407b29)

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


  at d3d407b29 add api

This branch includes the following new commits:

 new 5e200d981 add feature flag
 new d3d407b29 add api

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-rs] 01/02: add feature flag

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch add-bloom-filter-2
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git

commit 5e200d9819669175f3ae2a3a3de384541fec9056
Author: Jiayu Liu 
AuthorDate: Sun Nov 13 13:13:05 2022 +

add feature flag
---
 .github/workflows/arrow.yml | 2 --
 parquet/README.md   | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/.github/workflows/arrow.yml b/.github/workflows/arrow.yml
index 2e1c64ebe..3e62ed775 100644
--- a/.github/workflows/arrow.yml
+++ b/.github/workflows/arrow.yml
@@ -39,7 +39,6 @@ on:
   - .github/**
 
 jobs:
-
   # test the crate
   linux-test:
 name: Test
@@ -134,7 +133,6 @@ jobs:
   - name: Check compilation --features simd --all-targets
 run: cargo check -p arrow --features simd --all-targets
 
-
   # test the arrow crate builds against wasm32 in nightly rust
   wasm32-build:
 name: Build wasm32
diff --git a/parquet/README.md b/parquet/README.md
index d904fc64e..c9245b082 100644
--- a/parquet/README.md
+++ b/parquet/README.md
@@ -41,6 +41,7 @@ However, for historical reasons, this crate uses versions 
with major numbers gre
 The `parquet` crate provides the following features which may be enabled in 
your `Cargo.toml`:
 
 - `arrow` (default) - support for reading / writing 
[`arrow`](https://crates.io/crates/arrow) arrays to / from parquet
+- `bloom` (default) - support for [split block bloom 
filter](https://github.com/apache/parquet-format/blob/master/BloomFilter.md) 
for reading from / writing to parquet
 - `async` - support `async` APIs for reading parquet
 - `json` - support for reading / writing `json` data to / from parquet
 - `brotli` (default) - support for parquet using `brotli` compression



[arrow-rs] branch master updated: add bloom filter implementation based on split block (sbbf) spec (#3057)

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new b7af85cb8 add bloom filter implementation based on split block (sbbf) 
spec (#3057)
b7af85cb8 is described below

commit b7af85cb8dfe6887bb3fd43d1d76f659473b6927
Author: Jiayu Liu 
AuthorDate: Sun Nov 13 21:07:11 2022 +0800

add bloom filter implementation based on split block (sbbf) spec (#3057)

* add bloom filter implementation based on split block spec

* format and also revist index method

* bloom filter reader

* create new function to facilitate fixture test

* fix clippy

* Update parquet/src/bloom_filter/mod.rs

Co-authored-by: Andrew Lamb 

* Update parquet/src/bloom_filter/mod.rs

Co-authored-by: Andrew Lamb 

* Update parquet/src/bloom_filter/mod.rs

Co-authored-by: Andrew Lamb 

* Update parquet/src/bloom_filter/mod.rs

Co-authored-by: Andrew Lamb 

* Update parquet/src/bloom_filter/mod.rs

* Update parquet/src/bloom_filter/mod.rs

Co-authored-by: Liang-Chi Hsieh 

* fix clippy

Co-authored-by: Andrew Lamb 
Co-authored-by: Liang-Chi Hsieh 
---
 parquet/Cargo.toml  |   5 +-
 parquet/src/bloom_filter/mod.rs | 217 
 parquet/src/lib.rs  |   2 +
 3 files changed, 223 insertions(+), 1 deletion(-)

diff --git a/parquet/Cargo.toml b/parquet/Cargo.toml
index b400b01a7..dda0518f9 100644
--- a/parquet/Cargo.toml
+++ b/parquet/Cargo.toml
@@ -57,6 +57,7 @@ seq-macro = { version = "0.3", default-features = false }
 futures = { version = "0.3", default-features = false, features = ["std"], 
optional = true }
 tokio = { version = "1.0", optional = true, default-features = false, features 
= ["macros", "rt", "io-util"] }
 hashbrown = { version = "0.13", default-features = false }
+twox-hash = { version = "1.6", optional = true }
 
 [dev-dependencies]
 base64 = { version = "0.13", default-features = false, features = ["std"] }
@@ -76,7 +77,7 @@ rand = { version = "0.8", default-features = false, features 
= ["std", "std_rng"
 all-features = true
 
 [features]
-default = ["arrow", "snap", "brotli", "flate2", "lz4", "zstd", "base64"]
+default = ["arrow", "bloom", "snap", "brotli", "flate2", "lz4", "zstd", 
"base64"]
 # Enable arrow reader/writer APIs
 arrow = ["base64", "arrow-array", "arrow-buffer", "arrow-cast", "arrow-data", 
"arrow-schema", "arrow-select", "arrow-ipc"]
 # Enable CLI tools
@@ -89,6 +90,8 @@ test_common = ["arrow/test_utils"]
 experimental = []
 # Enable async APIs
 async = ["futures", "tokio"]
+# Bloomfilter
+bloom = ["twox-hash"]
 
 [[test]]
 name = "arrow_writer_layout"
diff --git a/parquet/src/bloom_filter/mod.rs b/parquet/src/bloom_filter/mod.rs
new file mode 100644
index 0..770fb53e8
--- /dev/null
+++ b/parquet/src/bloom_filter/mod.rs
@@ -0,0 +1,217 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Bloom filter implementation specific to Parquet, as described
+//! in the 
[spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md)
+
+use crate::errors::ParquetError;
+use crate::file::metadata::ColumnChunkMetaData;
+use crate::format::{
+BloomFilterAlgorithm, BloomFilterCompression, BloomFilterHash, 
BloomFilterHeader,
+};
+use std::hash::Hasher;
+use std::io::{Read, Seek, SeekFrom};
+use thrift::protocol::TCompactInputProtocol;
+use twox_hash::XxHash64;
+
+/// Salt as defined in the 
[spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md#technical-approach)
+const SALT: [u32; 8] = [

[arrow-rs] branch add-bloom-filter updated (2f0e8bbaf -> c9208e79f)

2022-11-13 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 2f0e8bbaf Update parquet/src/bloom_filter/mod.rs
 add f9e34f6f9 fix clippy
 add 522625814 Make RowSelection::intersection a member function (#3084)
 add 01396822e Remove unused range module (#3085)
 add 02a3f5cd2 Move CSV test data (#3044) (#3051)
 add 561f63a23 Improved UX of  creating `TimestampNanosecondArray` with 
timezones (#3088)
 add 94565bca9 Update version to 27.0.0 and add changelog (#3089)
 add ccc44170a Fix clippy by avoiding deprecated functions in chrono (#3096)
 add aaf030f79 Fix prettyprint for Interval second fractions (#3093)
 add c7210ce2b Minor: Add diagrams and documentation to row format (#3094)
 add 3084ee258 Use ArrowNativeTypeOp instead of total_cmp directly (#3087)
 add c9208e79f Merge branch 'master' into add-bloom-filter

No new revisions were added by this update.

Summary of changes:
 CHANGELOG-old.md   |  95 +
 CHANGELOG.md   | 179 
 arrow-array/Cargo.toml |   8 +-
 arrow-array/src/array/primitive_array.rs   |  24 +-
 arrow-array/src/delta.rs   | 207 ++---
 arrow-array/src/temporal_conversions.rs|   2 +-
 arrow-array/src/timezone.rs|  34 +-
 arrow-array/src/types.rs   |   8 +-
 arrow-buffer/Cargo.toml|   2 +-
 arrow-cast/Cargo.toml  |  12 +-
 arrow-cast/src/cast.rs |  16 +-
 arrow-cast/src/display.rs  |   4 +-
 arrow-cast/src/parse.rs|  16 +-
 arrow-csv/Cargo.toml   |  12 +-
 arrow-csv/src/reader.rs| 449 ++-
 {arrow => arrow-csv}/test/data/decimal_test.csv|   0
 {arrow => arrow-csv}/test/data/null_test.csv   |   0
 {arrow => arrow-csv}/test/data/uk_cities.csv   |   0
 .../test/data/uk_cities_with_headers.csv   |   0
 {arrow => arrow-csv}/test/data/various_types.csv   |   0
 .../test/data/various_types_invalid.csv|   0
 arrow-data/Cargo.toml  |   6 +-
 arrow-flight/Cargo.toml|  10 +-
 arrow-flight/README.md |   2 +-
 arrow-integration-test/Cargo.toml  |   6 +-
 arrow-integration-testing/Cargo.toml   |   2 +-
 arrow-ipc/Cargo.toml   |  12 +-
 arrow-json/Cargo.toml  |  12 +-
 arrow-pyarrow-integration-testing/Cargo.toml   |   4 +-
 arrow-schema/Cargo.toml|   2 +-
 arrow-select/Cargo.toml|  10 +-
 arrow/Cargo.toml   |  22 +-
 arrow/README.md|   2 +-
 arrow/benches/cast_kernels.rs  |   6 +-
 arrow/examples/read_csv.rs |   5 +-
 arrow/examples/read_csv_infer_schema.rs|   2 +-
 arrow/src/compute/kernels/arithmetic.rs|  24 +-
 arrow/src/compute/kernels/comparison.rs| 112 ++---
 arrow/src/row/mod.rs   | 191 +++--
 arrow/src/util/pretty.rs   |  84 
 arrow/tests/csv.rs | 422 --
 dev/release/README.md  |   2 +-
 dev/release/rat_exclude_files.txt  |   1 +
 dev/release/update_change_log.sh   |   4 +-
 parquet/Cargo.toml |  20 +-
 parquet/src/arrow/arrow_reader/mod.rs  |   2 +-
 parquet/src/arrow/arrow_reader/selection.rs|  42 +-
 parquet/src/bloom_filter/mod.rs|   2 +-
 parquet/src/file/page_index/mod.rs |   3 -
 parquet/src/file/page_index/range.rs   | 475 -
 parquet/src/record/api.rs  |  21 +-
 parquet_derive/Cargo.toml  |   4 +-
 parquet_derive/README.md   |   4 +-
 parquet_derive_test/Cargo.toml |   6 +-
 54 files changed, 1285 insertions(+), 1305 deletions(-)
 rename {arrow => arrow-csv}/test/data/decimal_test.csv (100%)
 rename {arrow => arrow-csv}/test/data/null_test.csv (100%)
 rename {arrow => arrow-csv}/test/data/uk_cities.csv (100%)
 rename {arrow => arrow-csv}/test/data/uk_cities_with_headers.csv (100%)
 rename {arrow => arrow-csv}/test/data/various_types.csv (100%)
 rename {arrow => arrow-csv}/test/data/various_types_invalid.csv (100%)
 delete mode 100644 parquet/src/file/page_index/range.rs



[arrow-rs] branch add-bloom-filter updated (b08f97c0d -> 2f0e8bbaf)

2022-11-12 Thread jiayuliu
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch add-bloom-filter
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from b08f97c0d Update parquet/src/bloom_filter/mod.rs
 add 2f0e8bbaf Update parquet/src/bloom_filter/mod.rs

No new revisions were added by this update.

Summary of changes:
 parquet/src/bloom_filter/mod.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



  1   2   3   4   5   >