(arrow-datafusion-python) branch upgrade-to-support-311 updated (270bb89 -> b734332)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch upgrade-to-support-311 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from 270bb89 upgrade add 501acff Allow for multiple input files per table instead of a single file (#519) add c6a7af5 Add support for window function bindings (#521) add d7fcea2 small clippy fix (#524) add fc3c24b Prepare 32.0.0 Release (#525) add aaaeeb1 First pass at getting architectured builds working (#350) add da6c183 Remove libprotobuf dep (#527) add b734332 Merge branch 'main' into upgrade-to-support-311 No new revisions were added by this update. Summary of changes: .github/workflows/conda.yml| 92 --- CHANGELOG.md | 38 - Cargo.lock | 229 - Cargo.toml | 3 +- docs/build.sh => conda/recipes/bld.bat | 14 +- conda/recipes/build.sh | 84 ++ conda/recipes/meta.yaml| 31 +++- datafusion/__init__.py | 4 +- datafusion/input/location.py | 10 +- datafusion/tests/test_input.py | 2 +- pyproject.toml | 1 + src/common/data_type.rs| 18 ++ src/common/schema.rs | 16 +- src/expr.rs| 13 ++ src/expr/window.rs | 294 + src/functions.rs | 4 +- src/lib.rs | 15 +- src/sql/logical.rs | 8 +- src/window_frame.rs| 110 19 files changed, 700 insertions(+), 286 deletions(-) copy docs/build.sh => conda/recipes/bld.bat (72%) mode change 100755 => 100644 create mode 100644 conda/recipes/build.sh create mode 100644 src/expr/window.rs delete mode 100644 src/window_frame.rs
[arrow-datafusion-python] branch upgrade-to-support-311 updated (91e8f3e -> 270bb89)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch upgrade-to-support-311 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from 91e8f3e format with black add 270bb89 upgrade No new revisions were added by this update. Summary of changes: requirements.txt | 308 --- 1 file changed, 155 insertions(+), 153 deletions(-)
[arrow-datafusion-python] branch upgrade-to-support-311 updated (1a18099 -> 91e8f3e)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch upgrade-to-support-311 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from 1a18099 upgrade to support python 3.11 add 91e8f3e format with black No new revisions were added by this update. Summary of changes: benchmarks/db-benchmark/groupby-datafusion.py | 16 ++ benchmarks/db-benchmark/join-datafusion.py| 24 ++-- benchmarks/tpch/tpch.py | 4 +- datafusion/input/base.py | 8 +-- datafusion/input/location.py | 4 +- datafusion/tests/test_aggregation.py | 34 +++ datafusion/tests/test_context.py | 8 +-- datafusion/tests/test_dataframe.py| 8 +-- datafusion/tests/test_functions.py| 81 ++- datafusion/tests/test_input.py| 4 +- datafusion/tests/test_substrait.py| 4 +- dev/release/check-rat-report.py | 4 +- dev/release/generate-changelog.py | 8 +-- examples/sql-on-polars.py | 4 +- examples/sql-using-python-udaf.py | 8 +-- examples/substrait.py | 8 +-- 16 files changed, 52 insertions(+), 175 deletions(-)
[arrow-datafusion-python] branch upgrade-to-support-311 updated (61cfee5 -> 1a18099)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch upgrade-to-support-311 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git omit 61cfee5 upgrade to support python 3.11 add 3711c73 Add Python script for generating changelog (#383) add 82b4a95 Update for DataFusion 25.0.0 (#386) add d912db5 Add Expr::Case when_then_else support to rex_call_operands function (#388) add 5664a1e Introduce BaseSessionContext abstract class (#390) add 931cabc CRUD Schema support for `BaseSessionContext` (#392) add 51158bd CRUD Table support for `BaseSessionContext` (#394) add 1174969 Prepare for 26.0.0 release (#410) add c0be61b LogicalPlan.to_variant() make public (#412) add 3f81513 Prepare 27.0.0 release (#423) add 58bdbd8 File based input utils (#433) add 5793db3 Upgrade to 28.0.0-rc1 (#434) add 93f8063 Introduces utility for obtaining SqlTable information from a file like location (#398) add 309fc48 feat: expose offset in python API (#437) add ffd1541 Use DataFusion 28 (#439) add 92ca34b Build Linux aarch64 wheel (#443) add 1fde8e4 feat: add case function (#447) (#448) add e34d203 enhancement(docs): Add user guide (#432) (#445) add 37c91f4 docs: include pre-commit hooks section in contributor guide (#455) add e1b3740 feat: add compression options (#456) add 0b22c97 Upgrade to DF 28.0.0-rc1 (#457) add 217ede8 feat: add register_json (#458) add 499f045 feat: add basic compression configuration to write_parquet (#459) add 9c643bf feat: add example of reading parquet from s3 (#460) add e24dc75 feat: add register_avro and read_table (#461) add bc62aaf feat: add missing scalar math functions (#465) add 944b1c9 build(deps): bump arduino/setup-protoc from 1 to 2 (#452) add b4d383b Revert "build(deps): bump arduino/setup-protoc from 1 to 2 (#452)" (#474) add 0d7c19e Minor: fix wrongly copied function description (#497) add af4f758 Upgrade to Datafusion 31.0.0 (#491) add beabf26 Add `isnan` and `iszero` (#495) add a47712e Update CHANGELOG and run cargo update (#500) add 41d65d1 Improve release process documentation (#505) add 106786a add Binary String Functions (#494) add c574d68 build(deps): bump mimalloc from 0.1.38 to 0.1.39 (#502) add 31241f8 build(deps): bump syn from 2.0.32 to 2.0.35 (#503) add 9ef0a57 build(deps): bump syn from 2.0.35 to 2.0.37 (#506) add 8e430ab Use latest DataFusion (#511) add 4c7b14c add bit_and,bit_or,bit_xor,bool_add,bool_or (#496) add 804d0eb Use DataFusion 32 (#515) add a91188c add first_value last_value (#498) add 484ed11 build(deps): bump regex-syntax from 0.7.5 to 0.8.1 (#517) add c4675b7 build(deps): bump pyo3-build-config from 0.19.2 to 0.20.0 (#516) add 5ec45dd add regr_* functions (#499) add 399fa75 feat: expose PyWindowFrame (#509) add c2768d8 Add random missing bindings (#522) add 59140f2 build(deps): bump rustix from 0.38.18 to 0.38.19 (#523) add 1a18099 upgrade to support python 3.11 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (61cfee5) \ N -- N -- N refs/heads/upgrade-to-support-311 (1a18099) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: .github/workflows/build.yml| 30 + .gitignore |8 +- CHANGELOG.md | 93 +- Cargo.lock | 2247 +++- Cargo.toml | 24 +- README.md |6 +- datafusion/__init__.py |2 + datafusion/context.py | 142 ++ datafusion/cudf.py | 54 +- .../rust_fmt.sh => datafusion/input/__init__.py|9 +- datafusion/{tests/conftest.py => input/base.py}| 49 +- datafusion/input/location.py | 87 + datafusion/pandas.py | 44 +- datafusion/polars.py | 36 +- datafusion/tests/test_aggregation.py | 70
[arrow-datafusion] branch Jimexist-patch-1 created (now d8ce32ee1d)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch Jimexist-patch-1 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git at d8ce32ee1d Create codeql.yml This branch includes the following new commits: new d8ce32ee1d Create codeql.yml The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-datafusion] 01/01: Create codeql.yml
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch Jimexist-patch-1 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git commit d8ce32ee1d7274460c3a0e3866d5816272bb24f5 Author: Jiayu Liu AuthorDate: Sat Sep 16 22:40:10 2023 +0800 Create codeql.yml --- .github/workflows/codeql.yml | 82 1 file changed, 82 insertions(+) diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml new file mode 100644 index 00..31bc0a5810 --- /dev/null +++ b/.github/workflows/codeql.yml @@ -0,0 +1,82 @@ +# For most projects, this workflow file will not need changing; you simply need +# to commit it to your repository. +# +# You may wish to alter this file to override the set of languages analyzed, +# or to provide custom queries or build logic. +# +# NOTE +# We have attempted to detect the languages in your repository. Please check +# the `language` matrix defined below to confirm you have the correct set of +# supported CodeQL languages. +# +name: "CodeQL" + +on: + push: +branches: [ "main" ] + pull_request: +# The branches below must be a subset of the branches above +branches: [ "main" ] + schedule: +- cron: '19 10 * * 6' + +jobs: + analyze: +name: Analyze +# Runner size impacts CodeQL analysis time. To learn more, please see: +# - https://gh.io/recommended-hardware-resources-for-running-codeql +# - https://gh.io/supported-runners-and-hardware-resources +# - https://gh.io/using-larger-runners +# Consider using larger runners for possible analysis time improvements. +runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }} +timeout-minutes: ${{ (matrix.language == 'swift' && 120) || 360 }} +permissions: + actions: read + contents: read + security-events: write + +strategy: + fail-fast: false + matrix: +language: [ 'python' ] +# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby', 'swift' ] +# Use only 'java' to analyze code written in Java, Kotlin or both +# Use only 'javascript' to analyze code written in JavaScript, TypeScript or both +# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support + +steps: +- name: Checkout repository + uses: actions/checkout@v3 + +# Initializes the CodeQL tools for scanning. +- name: Initialize CodeQL + uses: github/codeql-action/init@v2 + with: +languages: ${{ matrix.language }} +# If you wish to specify custom queries, you can do so here or in a config file. +# By default, queries listed here will override any specified in a config file. +# Prefix the list here with "+" to use these queries and those in the config file. + +# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs +# queries: security-extended,security-and-quality + + +# Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, or Swift). +# If this step fails, then you should remove it and run the build manually (see below) +- name: Autobuild + uses: github/codeql-action/autobuild@v2 + +# ℹ️ Command-line programs to run using the OS shell. +# See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun + +# If the Autobuild fails above, remove it and uncomment the following three lines. +# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance. + +# - run: | +# echo "Run, Build Application using script" +# ./location_of_script_within_repo/buildscript.sh + +- name: Perform CodeQL Analysis + uses: github/codeql-action/analyze@v2 + with: +category: "/language:${{matrix.language}}"
[arrow-datafusion] branch add-greatest-least updated (8f6812626c -> 15bc806f0f)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git discard 8f6812626c fix coerce tests discard 6a4a04dbb0 coerce rules discard f8996c29fe add more unit tests discard 3ff0dd47c5 fix issue discard 74fb1fa8ad [built-in function] add greatest and least add 5ddcbc42c1 Resolve contradictory requirements by conversion of ordering sensitive aggregators (#6482) add 815413c4a4 fix: ignore panics if racing against catalog/schema changes (#6536) add 15bc806f0f [built-in function] add greatest and least This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (8f6812626c) \ N -- N -- N refs/heads/add-greatest-least (15bc806f0f) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: datafusion/core/src/catalog/information_schema.rs | 71 +++-- datafusion/core/src/execution/context.rs | 5 +- .../core/src/physical_plan/aggregates/mod.rs | 338 +++-- .../tests/sqllogictests/test_files/aggregate.slt | 4 +- .../tests/sqllogictests/test_files/explain.slt | 2 +- .../tests/sqllogictests/test_files/groupby.slt | 205 + .../physical-expr/src/aggregate/first_last.rs | 14 +- datafusion/physical-expr/src/aggregate/mod.rs | 11 + datafusion/physical-expr/src/lib.rs| 6 +- datafusion/physical-expr/src/sort_expr.rs | 5 +- datafusion/physical-expr/src/type_coercion.rs | 5 +- datafusion/physical-expr/src/utils.rs | 38 +++ datafusion/physical-expr/src/window/aggregate.rs | 6 +- datafusion/physical-expr/src/window/built_in.rs| 4 +- .../physical-expr/src/window/sliding_aggregate.rs | 6 +- datafusion/physical-expr/src/window/window_expr.rs | 13 - 16 files changed, 575 insertions(+), 158 deletions(-)
[arrow-datafusion] branch add-greatest-least updated (6a4a04dbb0 -> 8f6812626c)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from 6a4a04dbb0 coerce rules add 8f6812626c fix coerce tests No new revisions were added by this update. Summary of changes: datafusion/physical-expr/src/type_coercion.rs | 8 1 file changed, 4 insertions(+), 4 deletions(-)
[arrow-datafusion] branch add-greatest-least updated (f8996c29fe -> 6a4a04dbb0)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from f8996c29fe add more unit tests add 6a4a04dbb0 coerce rules No new revisions were added by this update. Summary of changes: datafusion/expr/src/type_coercion/functions.rs | 52 +- datafusion/physical-expr/src/type_coercion.rs | 2 +- 2 files changed, 36 insertions(+), 18 deletions(-)
[arrow-datafusion] branch add-greatest-least updated (3ff0dd47c5 -> f8996c29fe)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from 3ff0dd47c5 fix issue add f8996c29fe add more unit tests No new revisions were added by this update. Summary of changes: datafusion/expr/src/type_coercion/functions.rs | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-)
[arrow-datafusion] branch add-greatest-least updated (74fb1fa8ad -> 3ff0dd47c5)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from 74fb1fa8ad [built-in function] add greatest and least add 3ff0dd47c5 fix issue No new revisions were added by this update. Summary of changes: datafusion/expr/src/type_coercion/functions.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[arrow-datafusion] branch add-greatest-least updated (1b4e6c862d -> 74fb1fa8ad)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git discard 1b4e6c862d fi discard 8a300c6d67 [built-in function] add greatest and least add 74fb1fa8ad [built-in function] add greatest and least This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (1b4e6c862d) \ N -- N -- N refs/heads/add-greatest-least (74fb1fa8ad) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: datafusion/expr/src/type_coercion/functions.rs | 3 +++ 1 file changed, 3 insertions(+)
[arrow-datafusion] branch add-greatest-least updated (8a300c6d67 -> 1b4e6c862d)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from 8a300c6d67 [built-in function] add greatest and least add 1b4e6c862d fi No new revisions were added by this update. Summary of changes: datafusion/expr/src/type_coercion/functions.rs | 45 -- 1 file changed, 36 insertions(+), 9 deletions(-)
[arrow-datafusion] branch add-greatest-least updated (011d02afe0 -> 8a300c6d67)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git discard 011d02afe0 Update datafusion/expr/src/type_coercion/functions.rs discard 30b575a4df [built-in function] add greatest and least add 8a300c6d67 [built-in function] add greatest and least This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (011d02afe0) \ N -- N -- N refs/heads/add-greatest-least (8a300c6d67) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: datafusion/expr/src/type_coercion/functions.rs | 10 ++ 1 file changed, 10 insertions(+)
[arrow-datafusion] branch add-greatest-least updated (30b575a4df -> 011d02afe0)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from 30b575a4df [built-in function] add greatest and least add 011d02afe0 Update datafusion/expr/src/type_coercion/functions.rs No new revisions were added by this update. Summary of changes: datafusion/expr/src/type_coercion/functions.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[arrow-datafusion] branch add-greatest-least updated (85bda832ed -> 30b575a4df)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git discard 85bda832ed fix unit test discard e6de9eb51f simplify code discard fe107096f6 variadic equal discard 538e085f42 [built-in function] add greatest and least add 30b575a4df [built-in function] add greatest and least This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (85bda832ed) \ N -- N -- N refs/heads/add-greatest-least (30b575a4df) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: datafusion/expr/src/type_coercion/functions.rs | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-)
[arrow-datafusion] branch add-greatest-least updated (e6de9eb51f -> 85bda832ed)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from e6de9eb51f simplify code add 85bda832ed fix unit test No new revisions were added by this update. Summary of changes: datafusion/physical-expr/src/type_coercion.rs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
[arrow-datafusion] branch add-greatest-least updated (538e085f42 -> e6de9eb51f)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from 538e085f42 [built-in function] add greatest and least add fe107096f6 variadic equal add e6de9eb51f simplify code No new revisions were added by this update. Summary of changes: Cargo.toml | 2 +- datafusion/expr/src/function.rs| 9 --- datafusion/expr/src/function_err.rs| 2 +- datafusion/expr/src/signature.rs | 11 datafusion/expr/src/type_coercion/functions.rs | 19 +- datafusion/physical-expr/Cargo.toml| 2 +- .../physical-expr/src/comparison_expressions.rs| 30 ++ 7 files changed, 47 insertions(+), 28 deletions(-)
[arrow-datafusion] branch add-greatest-least updated (4a2c33829e -> 538e085f42)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git discard 4a2c33829e fix unit test discard a567b85f86 fix clippy discard b6f9f35b96 [built-in function] add greatest and least add 3466522779 Minor: Clean up `use`s to point at real crates (#6515) add 21a14a1af3 Standardize RUST_LOG configuration test setup (#6506) add e6af36a540 Fix new clippy lint (#6535) add d9d06a4433 feat: datafusion-cli support executes sql with escaped characters (#6498) add 5ec14e1757 Minor: Add EXCEPT/EXCLUDE to SQL guide (#6512) add 859251b4a2 fix: error instead of panic when date_bin interval is 0 (#6522) add d450dc1dca Add link to Python Bindings (#6532) add 9d22054a1f feat: fix docs (#6534) add 538e085f42 [built-in function] add greatest and least This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (4a2c33829e) \ N -- N -- N refs/heads/add-greatest-least (538e085f42) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: Cargo.toml | 2 + README.md | 2 +- datafusion-cli/Cargo.lock | 12 +- datafusion-cli/Cargo.toml | 2 +- datafusion-cli/src/exec.rs | 26 ++-- datafusion-cli/src/helper.rs | 160 ++--- datafusion/core/src/catalog/information_schema.rs | 2 +- datafusion/core/src/catalog/listing_schema.rs | 16 ++- datafusion/core/src/execution/mod.rs | 8 +- .../aggregates/bounded_aggregate_stream.rs | 12 +- .../core/src/physical_plan/aggregates/mod.rs | 7 +- .../src/physical_plan/aggregates/no_grouping.rs| 4 +- .../core/src/physical_plan/aggregates/row_hash.rs | 9 +- datafusion/core/src/physical_plan/analyze.rs | 2 +- .../core/src/physical_plan/coalesce_batches.rs | 2 +- .../core/src/physical_plan/coalesce_partitions.rs | 2 +- datafusion/core/src/physical_plan/common.rs| 2 +- datafusion/core/src/physical_plan/empty.rs | 2 +- datafusion/core/src/physical_plan/explain.rs | 2 +- .../core/src/physical_plan/file_format/avro.rs | 2 +- .../core/src/physical_plan/file_format/csv.rs | 2 +- .../core/src/physical_plan/file_format/json.rs | 2 +- .../core/src/physical_plan/file_format/mod.rs | 2 +- .../core/src/physical_plan/file_format/parquet.rs | 2 +- datafusion/core/src/physical_plan/filter.rs| 2 +- datafusion/core/src/physical_plan/insert.rs| 2 +- .../core/src/physical_plan/joins/cross_join.rs | 6 +- .../src/physical_plan/joins/nested_loop_join.rs| 4 +- .../src/physical_plan/joins/sort_merge_join.rs | 6 +- .../src/physical_plan/joins/symmetric_hash_join.rs | 2 +- datafusion/core/src/physical_plan/limit.rs | 2 +- datafusion/core/src/physical_plan/memory.rs| 2 +- datafusion/core/src/physical_plan/mod.rs | 2 +- datafusion/core/src/physical_plan/planner.rs | 6 +- datafusion/core/src/physical_plan/projection.rs| 2 +- .../core/src/physical_plan/repartition/mod.rs | 4 +- datafusion/core/src/physical_plan/sorts/sort.rs| 12 +- .../physical_plan/sorts/sort_preserving_merge.rs | 2 +- datafusion/core/src/physical_plan/streaming.rs | 2 +- datafusion/core/src/physical_plan/union.rs | 2 +- datafusion/core/src/physical_plan/unnest.rs| 2 +- datafusion/core/src/physical_plan/values.rs| 2 +- .../windows/bounded_window_agg_exec.rs | 2 +- .../src/physical_plan/windows/window_agg_exec.rs | 4 +- datafusion/core/tests/memory_limit.rs | 1 + datafusion/core/tests/parquet/filter_pushdown.rs | 7 - datafusion/core/tests/parquet/mod.rs | 7 + datafusion/core/tests/sql/expr.rs | 27 datafusion/core/tests/sql/subqueries.rs| 6 - datafusion/core/tests/sql_integration.rs | 7 + .../tests/sqllogictests/test_files/timestamps.slt | 21 ++- datafusion/execution/src/lib.rs| 2 + datafusion/expr/src/conditional_expressions.rs | 2 +- datafusion/ex
[arrow-rs] 01/01: add min and max kernel
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch add-min-max-kernel in repository https://gitbox.apache.org/repos/asf/arrow-rs.git commit 0ee26ec52d2831168404f492c3ac20e04044fa63 Author: Jiayu Liu AuthorDate: Sat Jun 3 09:22:34 2023 +0800 add min and max kernel --- arrow-ord/src/lib.rs | 1 + arrow-ord/src/min_max.rs | 48 2 files changed, 49 insertions(+) diff --git a/arrow-ord/src/lib.rs b/arrow-ord/src/lib.rs index 62338c022..e1eec2c3c 100644 --- a/arrow-ord/src/lib.rs +++ b/arrow-ord/src/lib.rs @@ -44,6 +44,7 @@ //! pub mod comparison; +pub mod min_max; pub mod ord; pub mod partition; pub mod sort; diff --git a/arrow-ord/src/min_max.rs b/arrow-ord/src/min_max.rs new file mode 100644 index 0..1a34e9ddf --- /dev/null +++ b/arrow-ord/src/min_max.rs @@ -0,0 +1,48 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Functions to get min and max across arrays and scalars + +/// Perform min operation on two dynamic [`Array`]s. +/// +/// Only when two arrays are of the same type the comparison is valid. +pub fn min_dyn(left: Array, right: ) -> Result { +unimplemented!() +} + +/// Perform max operation on two dynamic [`Array`]s. +/// +/// Only when two arrays are of the same type the comparison is valid. +pub fn max_dyn(left: Array, right: ) -> Result { +unimplemented!() +} + +/// Perform min operation on a dynamic [`Array`] and a scalar value. +pub fn min_dyn_scalar(left: Array, right: T) -> Result +where +T: num::ToPrimitive + std::fmt::Debug, +{ +unimplemented!() +} + +/// Perform max operation on a dynamic [`Array`] and a scalar value. +pub fn max_dyn_scalar(left: Array, right: T) -> Result +where +T: num::ToPrimitive + std::fmt::Debug, +{ +unimplemented!() +}
[arrow-rs] branch add-min-max-kernel created (now 0ee26ec52)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-min-max-kernel in repository https://gitbox.apache.org/repos/asf/arrow-rs.git at 0ee26ec52 add min and max kernel This branch includes the following new commits: new 0ee26ec52 add min and max kernel The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-datafusion] branch add-greatest-least updated (a567b85f86 -> 4a2c33829e)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from a567b85f86 fix clippy add 4a2c33829e fix unit test No new revisions were added by this update. Summary of changes: datafusion/core/tests/sql/expr.rs | 2 ++ 1 file changed, 2 insertions(+)
[arrow-datafusion] branch add-greatest-least updated (b6f9f35b96 -> a567b85f86)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from b6f9f35b96 [built-in function] add greatest and least add a567b85f86 fix clippy No new revisions were added by this update. Summary of changes: datafusion/physical-expr/src/comparison_expressions.rs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
[arrow-datafusion] branch add-greatest-least updated (5214393d1f -> b6f9f35b96)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git omit 5214393d1f [built-in function] add greatest and least add b6f9f35b96 [built-in function] add greatest and least This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (5214393d1f) \ N -- N -- N refs/heads/add-greatest-least (b6f9f35b96) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: datafusion/expr/src/function.rs| 9 +- .../physical-expr/src/comparison_expressions.rs| 175 +- datafusion/proto/proto/proto_descriptor.bin| Bin 85877 -> 0 bytes datafusion/proto/src/datafusion.rs | 2820 --- datafusion/proto/src/datafusion.serde.rs | 22781 --- datafusion/proto/src/logical_plan/from_proto.rs| 6 +- 6 files changed, 109 insertions(+), 25682 deletions(-) delete mode 100644 datafusion/proto/proto/proto_descriptor.bin delete mode 100644 datafusion/proto/src/datafusion.rs delete mode 100644 datafusion/proto/src/datafusion.serde.rs
[arrow-datafusion] 01/01: [built-in function] add greatest and least
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git commit 5214393d1fbf80fc7438fada6ff6563b949aeaf4 Author: Jiayu Liu AuthorDate: Fri Jun 2 15:52:38 2023 +0800 [built-in function] add greatest and least --- datafusion/core/tests/sql/expr.rs | 6 + datafusion/expr/src/built_in_function.rs | 10 ++ datafusion/expr/src/comparison_expressions.rs | 35 ++ datafusion/expr/src/expr_fn.rs | 24 +++- datafusion/expr/src/function.rs| 15 ++- datafusion/expr/src/lib.rs | 1 + .../physical-expr/src/comparison_expressions.rs| 133 + datafusion/physical-expr/src/functions.rs | 5 +- datafusion/physical-expr/src/lib.rs| 1 + datafusion/proto/proto/datafusion.proto| 2 + datafusion/proto/proto/proto_descriptor.bin| Bin 0 -> 85877 bytes .../src/{generated/prost.rs => datafusion.rs} | 6 + .../{generated/pbjson.rs => datafusion.serde.rs} | 6 + datafusion/proto/src/generated/pbjson.rs | 6 + datafusion/proto/src/generated/prost.rs| 6 + datafusion/proto/src/logical_plan/from_proto.rs| 17 ++- datafusion/proto/src/logical_plan/to_proto.rs | 2 + docs/source/user-guide/sql/sql_status.md | 3 + 18 files changed, 273 insertions(+), 5 deletions(-) diff --git a/datafusion/core/tests/sql/expr.rs b/datafusion/core/tests/sql/expr.rs index 6783670545..c432f62572 100644 --- a/datafusion/core/tests/sql/expr.rs +++ b/datafusion/core/tests/sql/expr.rs @@ -200,6 +200,12 @@ async fn binary_bitwise_shift() -> Result<()> { Ok(()) } +#[tokio::test] +async fn test_comparison_func_expressions() -> Result<()> { +test_expression!("greatest(1,2,3)", "3"); +test_expression!("least(1,2,3)", "1"); +} + #[tokio::test] async fn test_interval_expressions() -> Result<()> { // day nano intervals diff --git a/datafusion/expr/src/built_in_function.rs b/datafusion/expr/src/built_in_function.rs index 3911939b4c..d4ca93ba24 100644 --- a/datafusion/expr/src/built_in_function.rs +++ b/datafusion/expr/src/built_in_function.rs @@ -205,6 +205,10 @@ pub enum BuiltinScalarFunction { Struct, /// arrow_typeof ArrowTypeof, +/// greatest +Greatest, +/// least +Least, } lazy_static! { @@ -328,6 +332,8 @@ impl BuiltinScalarFunction { BuiltinScalarFunction::Struct => Volatility::Immutable, BuiltinScalarFunction::FromUnixtime => Volatility::Immutable, BuiltinScalarFunction::ArrowTypeof => Volatility::Immutable, +BuiltinScalarFunction::Greatest => Volatility::Immutable, +BuiltinScalarFunction::Least => Volatility::Immutable, // Stable builtin functions BuiltinScalarFunction::Now => Volatility::Stable, @@ -414,6 +420,10 @@ fn aliases(func: ) -> &'static [&'static str] { BuiltinScalarFunction::Upper => &["upper"], BuiltinScalarFunction::Uuid => &["uuid"], +// comparison functions +BuiltinScalarFunction::Greatest => &["greatest"], +BuiltinScalarFunction::Least => &["least"], + // regex functions BuiltinScalarFunction::RegexpMatch => &["regexp_match"], BuiltinScalarFunction::RegexpReplace => &["regexp_replace"], diff --git a/datafusion/expr/src/comparison_expressions.rs b/datafusion/expr/src/comparison_expressions.rs new file mode 100644 index 00..c7f13f04f0 --- /dev/null +++ b/datafusion/expr/src/comparison_expressions.rs @@ -0,0 +1,35 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use arrow::datatypes::DataType; + +/// Currently supported types by the comparison function. +pub static SUPPORTED_COMPARISON_TYPES: &[DataType] = &[ +DataType::Boolean, +DataT
[arrow-datafusion] branch add-greatest-least created (now 5214393d1f)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-greatest-least in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git at 5214393d1f [built-in function] add greatest and least This branch includes the following new commits: new 5214393d1f [built-in function] add greatest and least The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-datafusion-python] branch upgrade-to-support-311 updated (fcbc976 -> 61cfee5)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch upgrade-to-support-311 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git discard fcbc976 python -m pip install --require-hashes --no-deps -r requirements.txt discard 9a57d38 update pip before install discard 242f25d remove unused 311 discard 4ab1763 remove empty line discard 36c4eb7 upgrade to support python 3.11 add 61cfee5 upgrade to support python 3.11 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (fcbc976) \ N -- N -- N refs/heads/upgrade-to-support-311 (61cfee5) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: .github/workflows/test.yaml | 2 + datafusion/__init__.py | 4 +- datafusion/cudf.py | 4 +- datafusion/pandas.py | 4 +- datafusion/polars.py | 12 ++ datafusion/tests/generic.py | 12 ++ datafusion/tests/test_aggregation.py | 32 --- datafusion/tests/test_config.py | 5 +-- datafusion/tests/test_context.py | 4 +- datafusion/tests/test_dataframe.py | 36 + datafusion/tests/test_functions.py | 77 datafusion/tests/test_sql.py | 28 - datafusion/tests/test_substrait.py | 12 ++ pyproject.toml | 6 +++ requirements.in | 1 + 15 files changed, 64 insertions(+), 175 deletions(-)
[arrow-datafusion-python] branch upgrade-to-support-311 updated (9a57d38 -> fcbc976)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch upgrade-to-support-311 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from 9a57d38 update pip before install add fcbc976 python -m pip install --require-hashes --no-deps -r requirements.txt No new revisions were added by this update. Summary of changes: .github/workflows/docs.yaml | 6 +++--- .github/workflows/test.yaml | 4 ++-- README.md | 2 +- dev/release/verify-release-candidate.sh | 2 +- docs/source/index.rst | 2 +- 5 files changed, 8 insertions(+), 8 deletions(-)
[arrow-datafusion-python] branch upgrade-to-support-311 updated (36c4eb7 -> 9a57d38)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch upgrade-to-support-311 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from 36c4eb7 upgrade to support python 3.11 add 4ab1763 remove empty line add 242f25d remove unused 311 add 9a57d38 update pip before install No new revisions were added by this update. Summary of changes: .github/workflows/test.yaml| 2 + datafusion/tests/test_dataframe.py | 1 - requirements-311.txt | 199 - 3 files changed, 2 insertions(+), 200 deletions(-) delete mode 100644 requirements-311.txt
[arrow-datafusion-python] 01/01: upgrade to support python 3.11
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch upgrade-to-support-311 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git commit 36c4eb71b6fce5273a225a955395d5f2d164b595 Author: Jiayu Liu AuthorDate: Wed May 10 21:31:59 2023 +0800 upgrade to support python 3.11 --- .github/workflows/build.yml | 4 +- .github/workflows/conda.yml | 2 +- .github/workflows/dev.yml | 2 +- .github/workflows/docs.yaml | 4 +- .github/workflows/test.yaml | 24 ++- README.md | 6 +- dev/release/verify-release-candidate.sh | 2 +- docs/README.md | 2 +- docs/source/index.rst | 12 +- pyproject.toml | 3 +- requirements-310.txt| 249 requirements.txt| 199 + 12 files changed, 229 insertions(+), 280 deletions(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index fe06b9c..50f2f15 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -47,7 +47,7 @@ jobs: strategy: fail-fast: false matrix: -python-version: ["3.10"] +python-version: ["3.11"] os: [macos-latest, windows-latest] steps: - uses: actions/checkout@v3 @@ -106,7 +106,7 @@ jobs: strategy: fail-fast: false matrix: -python-version: ["3.10"] +python-version: ["3.11"] steps: - uses: actions/checkout@v3 diff --git a/.github/workflows/conda.yml b/.github/workflows/conda.yml index 9853230..6fcc1f5 100644 --- a/.github/workflows/conda.yml +++ b/.github/workflows/conda.yml @@ -24,7 +24,7 @@ jobs: with: miniforge-variant: Mambaforge use-mamba: true - python-version: "3.10" + python-version: "3.11" channel-priority: strict - name: Install dependencies run: | diff --git a/.github/workflows/dev.yml b/.github/workflows/dev.yml index 05cf8ce..6457ff4 100644 --- a/.github/workflows/dev.yml +++ b/.github/workflows/dev.yml @@ -29,6 +29,6 @@ jobs: - name: Setup Python uses: actions/setup-python@v4 with: - python-version: "3.10" + python-version: "3.11" - name: Audit licenses run: ./dev/release/run-rat.sh . diff --git a/.github/workflows/docs.yaml b/.github/workflows/docs.yaml index d9e7ad4..fb422ce 100644 --- a/.github/workflows/docs.yaml +++ b/.github/workflows/docs.yaml @@ -35,7 +35,7 @@ jobs: - name: Setup Python uses: actions/setup-python@v4 with: - python-version: "3.10" + python-version: "3.11" - name: Install Protoc uses: arduino/setup-protoc@v1 @@ -48,7 +48,7 @@ jobs: set -x python3 -m venv venv source venv/bin/activate - pip install -r requirements-310.txt + pip install -r requirements.txt pip install -r docs/requirements.txt - name: Build Datafusion run: | diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml index f672c81..8dd2b6a 100644 --- a/.github/workflows/test.yaml +++ b/.github/workflows/test.yaml @@ -33,15 +33,13 @@ jobs: fail-fast: false matrix: python-version: + - "3.7" + - "3.8" + - "3.9" - "3.10" + - "3.11" toolchain: - "stable" - # we are not that much eager in walking on the edge yet - # - nightly -# build stable for only 3.7 -include: - - python-version: "3.7" -toolchain: "stable" steps: - uses: actions/checkout@v3 @@ -55,7 +53,7 @@ jobs: - name: Install Protoc uses: arduino/setup-protoc@v1 with: - version: '3.x' + version: "3.x" repo-token: ${{ secrets.GITHUB_TOKEN }} - name: Setup Python @@ -71,24 +69,24 @@ jobs: - name: Check Formatting uses: actions-rs/cargo@v1 -if: ${{ matrix.python-version == '3.10' && matrix.toolchain == 'stable' }} +if: ${{ matrix.python-version == '3.11' && matrix.toolchain == 'stable' }} with: command: fmt args: -- --check - name: Run Clippy uses: actions-rs/cargo@v1 -if: ${{ matrix.python-version == '3.10' && matrix.toolchain == 'stable' }} +if: ${{ matrix.python-version == '3.11' && matrix.toolchain == 'stable' }} with: command: clippy args: --all-targets --all-features
[arrow-datafusion-python] branch upgrade-to-support-311 created (now 36c4eb7)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch upgrade-to-support-311 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git at 36c4eb7 upgrade to support python 3.11 This branch includes the following new commits: new 36c4eb7 upgrade to support python 3.11 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-datafusion-python] branch update-maturin deleted (was 4c40868)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch update-maturin in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git was 4c40868 migrate maturin meta The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
[arrow-datafusion-python] branch main updated (5984bc7 -> 21ad90f)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from 5984bc7 build(deps): bump mimalloc from 0.1.36 to 0.1.37 (#361) add 21ad90f build(deps): bump regex-syntax from 0.6.29 to 0.7.1 (#334) No new revisions were added by this update. Summary of changes: Cargo.lock | 2 +- Cargo.toml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
[arrow-datafusion-python] branch main updated (228b6e5 -> 5984bc7)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from 228b6e5 build(deps): bump uuid from 1.3.1 to 1.3.2 (#359) add 5984bc7 build(deps): bump mimalloc from 0.1.36 to 0.1.37 (#361) No new revisions were added by this update. Summary of changes: Cargo.lock | 8 1 file changed, 4 insertions(+), 4 deletions(-)
[arrow-datafusion-python] branch main updated (9c75d03 -> 228b6e5)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from 9c75d03 Prepare 24.0.0 Release (#376) add 228b6e5 build(deps): bump uuid from 1.3.1 to 1.3.2 (#359) No new revisions were added by this update. Summary of changes: Cargo.lock | 4 ++-- Cargo.toml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
[arrow-datafusion-python] branch update-maturin updated (d1a87a7 -> 4c40868)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch update-maturin in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from d1a87a7 upgrade maturin to 0.15.1 add 4c40868 migrate maturin meta No new revisions were added by this update. Summary of changes: Cargo.toml | 3 --- pyproject.toml | 1 + 2 files changed, 1 insertion(+), 3 deletions(-)
[arrow-datafusion-python] 01/01: upgrade maturin to 0.15.1
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch update-maturin in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git commit d1a87a7da57f05055f4eb03cd268b81cf57c7f20 Author: Jiayu Liu AuthorDate: Wed May 10 08:43:32 2023 +0800 upgrade maturin to 0.15.1 --- .github/workflows/build.yml| 8 +- conda/environments/datafusion-dev.yaml | 48 ++--- conda/recipes/meta.yaml| 6 +- docs/README.md | 15 +- pyproject.toml | 2 +- requirements-310.txt | 340 ++--- requirements.in| 4 +- requirements.txt | 284 --- 8 files changed, 231 insertions(+), 476 deletions(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index c667dab..fe06b9c 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -64,7 +64,7 @@ jobs: run: python -m pip install --upgrade pip - name: Install maturin -run: pip install maturin==0.14.2 +run: pip install maturin==0.15.1 - run: rm LICENSE.txt - name: Download LICENSE.txt @@ -76,7 +76,7 @@ jobs: - name: Install Protoc uses: arduino/setup-protoc@v1 with: - version: '3.x' + version: "3.x" repo-token: ${{ secrets.GITHUB_TOKEN }} - name: Build Python package @@ -125,7 +125,7 @@ jobs: run: python -m pip install --upgrade pip - name: Install maturin -run: pip install maturin==0.14.2 +run: pip install maturin==0.15.1 - run: rm LICENSE.txt - name: Download LICENSE.txt @@ -137,7 +137,7 @@ jobs: - name: Install Protoc uses: arduino/setup-protoc@v1 with: - version: '3.x' + version: "3.x" repo-token: ${{ secrets.GITHUB_TOKEN }} - name: Build Python package diff --git a/conda/environments/datafusion-dev.yaml b/conda/environments/datafusion-dev.yaml index d9405e4..ceab504 100644 --- a/conda/environments/datafusion-dev.yaml +++ b/conda/environments/datafusion-dev.yaml @@ -16,29 +16,29 @@ # under the License. channels: -- conda-forge + - conda-forge dependencies: -- black -- flake8 -- isort -- maturin -- mypy -- numpy -- pyarrow -- pytest -- toml -- importlib_metadata -- python>=3.10 -# Packages useful for building distributions and releasing -- mamba -- conda-build -- anaconda-client -# Packages for documentation building -- sphinx -- pydata-sphinx-theme==0.8.0 -- myst-parser -- jinja2 -# GPU packages -- cudf -- cudatoolkit=11.8 + - black + - flake8 + - isort + - maturin>=0.15 + - mypy + - numpy + - pyarrow>=11.0.0 + - pytest + - toml + - importlib_metadata + - python>=3.10 + # Packages useful for building distributions and releasing + - mamba + - conda-build + - anaconda-client + # Packages for documentation building + - sphinx + - pydata-sphinx-theme==0.8.0 + - myst-parser + - jinja2 + # GPU packages + - cudf + - cudatoolkit=11.8 name: datafusion-dev diff --git a/conda/recipes/meta.yaml b/conda/recipes/meta.yaml index 48e95eb..e2bb8be 100644 --- a/conda/recipes/meta.yaml +++ b/conda/recipes/meta.yaml @@ -35,12 +35,12 @@ build: requirements: host: -- python >=3.6 -- maturin >=0.14,<0.15 +- python >=3.7 +- maturin >=0.15,<0.16 - libprotobuf =3 - pip run: -- python >=3.6 +- python >=3.7 - pyarrow >=11.0.0 test: diff --git a/docs/README.md b/docs/README.md index 04f46a9..8527858 100644 --- a/docs/README.md +++ b/docs/README.md @@ -19,17 +19,18 @@ # DataFusion Documentation -This folder contains the source content of the [python api](./source/api). +This folder contains the source content of the [Python API](./source/api). This is published to https://arrow.apache.org/datafusion-python/ by a GitHub action when changes are merged to the main branch. ## Dependencies It's recommended to install build dependencies and build the documentation -inside a Python virtualenv. +inside a Python `venv`. -- Python -- `pip3 install -r requirements.txt` +```bash +python -m pip install -r requirements-310.txt +``` ## Build & Preview @@ -57,8 +58,6 @@ version of the docs, follow these steps: 2. Clone the arrow-site repo 3. Checkout to the `asf-site` branch (NOT `master`) 4. Copy build artifacts into `arrow-site` repo's `datafusion` folder with a command such as - -- `cp -rT ./build/html/ ../../arrow-site/datafusion/` (doesn't work on mac) -- `rsync -avzr ./build/html/ ../../arrow-site/datafusion/` - + - `cp -rT ./build/html/ ../../arrow-site/datafusion/` (doesn't work on mac) + - `rsync -avzr ./build/html/ ../../arrow-site/datafusion/` 5. Commit changes in `arrow-site` and send a PR. diff --git
[arrow-datafusion-python] branch update-maturin created (now d1a87a7)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch update-maturin in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git at d1a87a7 upgrade maturin to 0.15.1 This branch includes the following new commits: new d1a87a7 upgrade maturin to 0.15.1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-datafusion-python] branch update-310 updated (afba137 -> 2a21299)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch update-310 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from afba137 update requirements with upgrade add 2a21299 apply black update No new revisions were added by this update. Summary of changes: datafusion/__init__.py | 4 +-- datafusion/tests/generic.py| 12 ++- datafusion/tests/test_dataframe.py | 28 datafusion/tests/test_functions.py | 65 -- datafusion/tests/test_sql.py | 24 -- dev/release/check-rat-report.py| 4 +-- 6 files changed, 32 insertions(+), 105 deletions(-)
[arrow-datafusion-python] 02/02: update requirements with upgrade
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch update-310 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git commit afba137fcca0297329cb48853f74132a95222bf7 Author: Jiayu Liu AuthorDate: Sun Jan 15 09:39:31 2023 + update requirements with upgrade --- requirements.txt | 207 +-- 1 file changed, 172 insertions(+), 35 deletions(-) diff --git a/requirements.txt b/requirements.txt index 805f521..46c6dbf 100644 --- a/requirements.txt +++ b/requirements.txt @@ -2,66 +2,203 @@ # This file is autogenerated by pip-compile with Python 3.10 # by the following command: # -#pip-compile +#pip-compile --generate-hashes --output-file=requirements.txt # -attrs==21.2.0 +attrs==22.2.0 \ + --hash=sha256:29e95c7f6778868dbd49170f98f8818f78f3dc5e0e37c0b1f474e3561b240836 \ + --hash=sha256:c9227bfc2f01993c03f68db37d1d15c9690188323c067c641f1a35ca58185f99 # via pytest -black==21.9b0 +black==22.12.0 \ + --hash=sha256:101c69b23df9b44247bd88e1d7e90154336ac4992502d4197bdac35dd7ee3320 \ + --hash=sha256:159a46a4947f73387b4d83e87ea006dbb2337eab6c879620a3ba52699b1f4351 \ + --hash=sha256:1f58cbe16dfe8c12b7434e50ff889fa479072096d79f0a7f25e4ab8e94cd8350 \ + --hash=sha256:229351e5a18ca30f447bf724d007f890f97e13af070bb6ad4c0a441cd7596a2f \ + --hash=sha256:436cc9167dd28040ad90d3b404aec22cedf24a6e4d7de221bec2730ec0c97bcf \ + --hash=sha256:559c7a1ba9a006226f09e4916060982fd27334ae1998e7a38b3f33a37f7a2148 \ + --hash=sha256:7412e75863aa5c5411886804678b7d083c7c28421210180d67dfd8cf1221e1f4 \ + --hash=sha256:77d86c9f3db9b1bf6761244bc0b3572a546f5fe37917a044e02f3166d5aafa7d \ + --hash=sha256:82d9fe8fee3401e02e79767016b4907820a7dc28d70d137eb397b92ef3cc5bfc \ + --hash=sha256:9eedd20838bd5d75b80c9f5487dbcb06836a43833a37846cf1d8c1cc01cef59d \ + --hash=sha256:c116eed0efb9ff870ded8b62fe9f28dd61ef6e9ddd28d83d7d264a38417dcee2 \ + --hash=sha256:d30b212bffeb1e252b31dd269dfae69dd17e06d92b87ad26e23890f3efea366f # via -r requirements.in -click==8.0.3 +click==8.1.3 \ + --hash=sha256:7682dc8afb30297001674575ea00d1814d808d6a36af415a82bd481d37ba7b8e \ + --hash=sha256:bb4d8133cb15a609f44e8213d9b391b0809795062913b383c62be0ee95b1db48 # via black -flake8==4.0.1 +exceptiongroup==1.1.0 \ + --hash=sha256:327cbda3da756e2de031a3107b81ab7b3770a602c4d16ca618298c526f4bec1e \ + --hash=sha256:bcb67d800a4497e1b404c2dd44fca47d3b7a5e5433dbab67f96c1a685cdfdf23 +# via pytest +flake8==6.0.0 \ + --hash=sha256:3833794e27ff64ea4e9cf5d410082a8b97ff1a06c16aa3d2027339cd0f1195c7 \ + --hash=sha256:c61007e76655af75e6785a931f452915b371dc48f56efd765247c8fe68f2b181 # via -r requirements.in -iniconfig==1.1.1 +iniconfig==2.0.0 \ + --hash=sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3 \ + --hash=sha256:b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374 # via pytest -isort==5.9.3 +isort==5.11.4 \ + --hash=sha256:6db30c5ded9815d813932c04c2f85a360bcdd35fed496f4d8f35495ef0a261b6 \ + --hash=sha256:c033fd0edb91000a7f09527fe5c75321878f98322a77ddcc81adbd83724afb7b # via -r requirements.in -maturin==0.14.2 +maturin==0.14.10 \ + --hash=sha256:11b8550ceba5b81465a18d06f0d3a4cfc1cd6cbf68eda117c253bbf3324b1264 \ + --hash=sha256:2f097a63f3bed20a7da56fc7ce4d44ef8376ee9870604da16b685f2d02c87c79 \ + --hash=sha256:4946ad7545ba5fc0ad08bc98bc8e9f6ffabb6ded71db9ed282ad4596b998d42a \ + --hash=sha256:5abf311d4618b673efa30cacdac5ae2d462e49da58db9a5bf0d8bde16d9c16be \ + --hash=sha256:6cc9afb89f28bd591b62f8f3c29736c81c322cffe88f9ab8eb1749377bbc3521 \ + --hash=sha256:895c48cbe56ae994c2a1f19475ca4819aa4c6412af727a63a772e8ef2d87 \ + --hash=sha256:98bfed21c3498857b3381efeb041d77e004a93b22261bf9690fe2b9fbb4c210f \ + --hash=sha256:9da98bee0a548ecaaa924cc8cb94e49075d5e71511c62a1633a6962c7831a29b \ + --hash=sha256:b157e2e8a0216d02df1d0451201fcb977baf0dcd223890abfbfbfd01e0b44630 \ + --hash=sha256:c0d25e82cb6e5de9f1c028fcf069784be4165b083e79412371edce05010b68f3 \ + --hash=sha256:cf950ebfe449a97617b91d75e09766509e21a389ce3f7b6ef15130ad8a95430a \ + --hash=sha256:e9c19dc0a28109280f7d091ca7b78e25f3fc340fcfac92801829a21198fa20eb \ + --hash=sha256:ec8269c02cc435893308dfd50f57f14fb1be3554e4e61c5bf49b97363b289775 # via -r requirements.in -mccabe==0.6.1 +mccabe==0.7.0 \ + --hash=sha256:348e0240c33b60bbdf4e523192ef919f28cb2c3d7d5c7794f74009290f236325 \ + --hash=sha256:6c2d30ab6be0e4a46919781807b4f0d834ebdd6c6e3dca0bda5a15f863427b6e # via flake8 -mypy==0.910 +mypy==0.991 \ + --hash=sha256:0714258640194d75677e86c786e80ccf294972cc76885d3ebbb560f11db0003d \ + --hash=sha256:0c8f3be99e8a8bd403caa8c03be619544bc2c77a7093685dcf308c6b109426c6 \ + --hash=sha256:0cca5adf694af539aeaa6ac633a7afe9bbd760df9d31be55ab780b77ab5ae8bf \ + --hash=sha256
[arrow-datafusion-python] 01/02: update requirements for 310
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch update-310 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git commit 1a8114f50d065aaecf4b8ea8e2466b9573028041 Author: Jiayu Liu AuthorDate: Sun Jan 15 09:38:53 2023 + update requirements for 310 --- requirements-310.txt | 317 +-- requirements.txt | 279 + 2 files changed, 187 insertions(+), 409 deletions(-) diff --git a/requirements-310.txt b/requirements-310.txt index 332abdb..898747a 100644 --- a/requirements-310.txt +++ b/requirements-310.txt @@ -1,97 +1,97 @@ # -# This file is autogenerated by pip-compile with python 3.10 -# To update, run: +# This file is autogenerated by pip-compile with Python 3.10 +# by the following command: # -#pip-compile --generate-hashes +#pip-compile --generate-hashes --output-file=requirements-310.txt # -attrs==21.4.0 \ - --hash=sha256:2d27e3784d7a565d36ab851fe94887c5eccd6a463168875832a1be79c82828b4 \ - --hash=sha256:626ba8234211db98e869df76230a137c4c40a12d72445c45d5f5b716f076e2fd +attrs==22.2.0 \ + --hash=sha256:29e95c7f6778868dbd49170f98f8818f78f3dc5e0e37c0b1f474e3561b240836 \ + --hash=sha256:c9227bfc2f01993c03f68db37d1d15c9690188323c067c641f1a35ca58185f99 # via pytest -black==22.3.0 \ - --hash=sha256:06f9d8846f2340dfac80ceb20200ea5d1b3f181dd0556b47af4e8e0b24fa0a6b \ - --hash=sha256:10dbe6e6d2988049b4655b2b739f98785a884d4d6b85bc35133a8fb9a2233176 \ - --hash=sha256:2497f9c2386572e28921fa8bec7be3e51de6801f7459dffd6e62492531c47e09 \ - --hash=sha256:30d78ba6bf080eeaf0b7b875d924b15cd46fec5fd044ddfbad38c8ea9171043a \ - --hash=sha256:328efc0cc70ccb23429d6be184a15ce613f676bdfc85e5fe8ea2a9354b4e9015 \ - --hash=sha256:35020b8886c022ced9282b51b5a875b6d1ab0c387b31a065b84db7c33085ca79 \ - --hash=sha256:5795a0375eb87bfe902e80e0c8cfaedf8af4d49694d69161e5bd3206c18618bb \ - --hash=sha256:5891ef8abc06576985de8fa88e95ab70641de6c1fca97e2a15820a9b69e51b20 \ - --hash=sha256:637a4014c63fbf42a692d22b55d8ad6968a946b4a6ebc385c5505d9625b6a464 \ - --hash=sha256:67c8301ec94e3bcc8906740fe071391bce40a862b7be0b86fb5382beefecd968 \ - --hash=sha256:6d2fc92002d44746d3e7db7cf9313cf4452f43e9ea77a2c939defce3b10b5c82 \ - --hash=sha256:6ee227b696ca60dd1c507be80a6bc849a5a6ab57ac7352aad1ffec9e8b805f21 \ - --hash=sha256:863714200ada56cbc366dc9ae5291ceb936573155f8bf8e9de92aef51f3ad0f0 \ - --hash=sha256:9b542ced1ec0ceeff5b37d69838106a6348e60db7b8fdd245294dc1d26136265 \ - --hash=sha256:a6342964b43a99dbc72f72812bf88cad8f0217ae9acb47c0d4f141a6416d2d7b \ - --hash=sha256:ad4efa5fad66b903b4a5f96d91461d90b9507a812b3c5de657d544215bb7877a \ - --hash=sha256:bc58025940a896d7e5356952228b68f793cf5fcb342be703c3a2669a1488cb72 \ - --hash=sha256:cc1e1de68c8e5444e8f94c3670bb48a2beef0e91dddfd4fcc29595ebd90bb9ce \ - --hash=sha256:cee3e11161dde1b2a33a904b850b0899e0424cc331b7295f2a9698e79f9a69a0 \ - --hash=sha256:e3556168e2e5c49629f7b0f377070240bd5511e45e25a4497bb0073d9dda776a \ - --hash=sha256:e8477ec6bbfe0312c128e74644ac8a02ca06bcdb8982d4ee06f209be28cdf163 \ - --hash=sha256:ee8f1f7228cce7dffc2b464f07ce769f478968bfb3dd1254a4c2eeed84928aad \ - --hash=sha256:fd57160949179ec517d32ac2ac898b5f20d68ed1a9c977346efbac9c2f1e779d +black==22.12.0 \ + --hash=sha256:101c69b23df9b44247bd88e1d7e90154336ac4992502d4197bdac35dd7ee3320 \ + --hash=sha256:159a46a4947f73387b4d83e87ea006dbb2337eab6c879620a3ba52699b1f4351 \ + --hash=sha256:1f58cbe16dfe8c12b7434e50ff889fa479072096d79f0a7f25e4ab8e94cd8350 \ + --hash=sha256:229351e5a18ca30f447bf724d007f890f97e13af070bb6ad4c0a441cd7596a2f \ + --hash=sha256:436cc9167dd28040ad90d3b404aec22cedf24a6e4d7de221bec2730ec0c97bcf \ + --hash=sha256:559c7a1ba9a006226f09e4916060982fd27334ae1998e7a38b3f33a37f7a2148 \ + --hash=sha256:7412e75863aa5c5411886804678b7d083c7c28421210180d67dfd8cf1221e1f4 \ + --hash=sha256:77d86c9f3db9b1bf6761244bc0b3572a546f5fe37917a044e02f3166d5aafa7d \ + --hash=sha256:82d9fe8fee3401e02e79767016b4907820a7dc28d70d137eb397b92ef3cc5bfc \ + --hash=sha256:9eedd20838bd5d75b80c9f5487dbcb06836a43833a37846cf1d8c1cc01cef59d \ + --hash=sha256:c116eed0efb9ff870ded8b62fe9f28dd61ef6e9ddd28d83d7d264a38417dcee2 \ + --hash=sha256:d30b212bffeb1e252b31dd269dfae69dd17e06d92b87ad26e23890f3efea366f # via -r requirements.in click==8.1.3 \ --hash=sha256:7682dc8afb30297001674575ea00d1814d808d6a36af415a82bd481d37ba7b8e \ --hash=sha256:bb4d8133cb15a609f44e8213d9b391b0809795062913b383c62be0ee95b1db48 # via black -flake8==4.0.1 \ - --hash=sha256:479b1304f72536a55948cb40a32dce8bb0ffe3501e26eaf292c7e60eb5e0428d \ - --hash=sha256:806e034dda44114815e23c16ef92f95c91e4c71100ff52813adf7132a6ad870d +exceptiongroup==1.1.0 \ + --hash=sha256
[arrow-datafusion-python] branch update-310 created (now afba137)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch update-310 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git at afba137 update requirements with upgrade This branch includes the following new commits: new 1a8114f update requirements for 310 new afba137 update requirements with upgrade The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-datafusion-python] branch master updated: build(deps): bump async-trait from 0.1.60 to 0.1.61 (#118)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git The following commit(s) were added to refs/heads/master by this push: new cb8afac build(deps): bump async-trait from 0.1.60 to 0.1.61 (#118) cb8afac is described below commit cb8afac290cefb48e94b7d220bd03087e19cd93a Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> AuthorDate: Sun Jan 15 14:50:55 2023 +0800 build(deps): bump async-trait from 0.1.60 to 0.1.61 (#118) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.60 to 0.1.61. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.60...0.1.61) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- Cargo.lock | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Cargo.lock b/Cargo.lock index 99ac9d1..e4e14f9 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -268,9 +268,9 @@ dependencies = [ [[package]] name = "async-trait" -version = "0.1.60" +version = "0.1.61" source = "registry+https://github.com/rust-lang/crates.io-index; -checksum = "677d1d8ab452a3936018a687b20e6f7cf5363d713b732b8884001317b0e48aa3" +checksum = "705339e0e4a9690e2908d2b3d049d85682cf19fbd5782494498fbf7003a6a282" dependencies = [ "proc-macro2", "quote",
[arrow-datafusion-python] branch master updated (aa596ac -> b9b5a01)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from aa596ac build(deps): bump bzip2 from 0.4.3 to 0.4.4 (#121) add b9b5a01 build(deps): bump mimalloc from 0.1.32 to 0.1.34 (#125) No new revisions were added by this update. Summary of changes: Cargo.lock | 8 1 file changed, 4 insertions(+), 4 deletions(-)
[arrow-datafusion-python] branch master updated (940eec8 -> aa596ac)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from 940eec8 [Functions] - Add python function binding to `functions` (#73) add aa596ac build(deps): bump bzip2 from 0.4.3 to 0.4.4 (#121) No new revisions were added by this update. Summary of changes: Cargo.lock | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
[arrow-datafusion-python] branch master updated (2b6872b -> 545b88e)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from 2b6872b build(deps): bump object_store from 0.5.2 to 0.5.3 (#126) add 545b88e build(deps): bump tokio from 1.23.0 to 1.24.1 (#119) No new revisions were added by this update. Summary of changes: Cargo.lock | 4 ++-- Cargo.toml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
[arrow-datafusion-python] branch master updated (b9b5a01 -> 2b6872b)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git from b9b5a01 build(deps): bump mimalloc from 0.1.32 to 0.1.34 (#125) add 2b6872b build(deps): bump object_store from 0.5.2 to 0.5.3 (#126) No new revisions were added by this update. Summary of changes: Cargo.lock | 25 ++--- Cargo.toml | 2 +- 2 files changed, 15 insertions(+), 12 deletions(-)
[arrow-rs] branch master updated: fix clippy issues (#3398)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-rs.git The following commit(s) were added to refs/heads/master by this push: new 1d0abfafe fix clippy issues (#3398) 1d0abfafe is described below commit 1d0abfafe0da7c28b562fa0ba8c65a10b65a0821 Author: Jiayu Liu AuthorDate: Tue Dec 27 19:49:51 2022 +0800 fix clippy issues (#3398) --- arrow-integration-test/src/field.rs | 2 +- arrow-integration-testing/src/bin/arrow-file-to-stream.rs | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arrow-integration-test/src/field.rs b/arrow-integration-test/src/field.rs index 4bfbf8e99..dd0519157 100644 --- a/arrow-integration-test/src/field.rs +++ b/arrow-integration-test/src/field.rs @@ -253,7 +253,7 @@ pub fn field_from_json(json: _json::Value) -> Result { }; let mut field = -Field::new_dict(, data_type, nullable, dict_id, dict_is_ordered); +Field::new_dict(name, data_type, nullable, dict_id, dict_is_ordered); field.set_metadata(metadata); Ok(field) } diff --git a/arrow-integration-testing/src/bin/arrow-file-to-stream.rs b/arrow-integration-testing/src/bin/arrow-file-to-stream.rs index e939fe4f0..3e027faef 100644 --- a/arrow-integration-testing/src/bin/arrow-file-to-stream.rs +++ b/arrow-integration-testing/src/bin/arrow-file-to-stream.rs @@ -30,7 +30,7 @@ struct Args { fn main() -> Result<()> { let args = Args::parse(); -let f = File::open(_name)?; +let f = File::open(args.file_name)?; let reader = BufReader::new(f); let mut reader = FileReader::try_new(reader, None)?; let schema = reader.schema();
[arrow-datafusion-python] branch master updated: update release readme tag (#86)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git The following commit(s) were added to refs/heads/master by this push: new 9f0e731 update release readme tag (#86) 9f0e731 is described below commit 9f0e73196a6cd84efb332312ddd976876db3ae22 Author: Jiayu Liu AuthorDate: Tue Nov 29 09:55:12 2022 +0800 update release readme tag (#86) use `bash` not `py` for scripting --- dev/release/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/release/README.md b/dev/release/README.md index dd378f8..20c1562 100644 --- a/dev/release/README.md +++ b/dev/release/README.md @@ -181,7 +181,7 @@ Go to the Test PyPI page of Datafusion, and download [all published artifacts](https://test.pypi.org/project/datafusion/#files) under `dist-release/` directory. Then proceed uploading them using `twine`: -```py +```bash twine upload --repository pypi dist-release/* ```
[arrow-datafusion-python] 01/01: update release readme tag
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch Jimexist-patch-1 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git commit e160203a8863f71c6eaab763d3b3791cdbb1b431 Author: Jiayu Liu AuthorDate: Mon Nov 28 22:56:48 2022 +0800 update release readme tag use `bash` not `py` for scripting --- dev/release/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/release/README.md b/dev/release/README.md index dd378f8..20c1562 100644 --- a/dev/release/README.md +++ b/dev/release/README.md @@ -181,7 +181,7 @@ Go to the Test PyPI page of Datafusion, and download [all published artifacts](https://test.pypi.org/project/datafusion/#files) under `dist-release/` directory. Then proceed uploading them using `twine`: -```py +```bash twine upload --repository pypi dist-release/* ```
[arrow-datafusion-python] branch Jimexist-patch-1 created (now e160203)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch Jimexist-patch-1 in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git at e160203 update release readme tag This branch includes the following new commits: new e160203 update release readme tag The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-datafusion] branch master updated (010aded5d -> e34c6c33a)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from 010aded5d Support to use Schedular in tpch benchmark (#4361) add e34c6c33a add support for xz file compression and `compression` feature (#3993) No new revisions were added by this update. Summary of changes: datafusion-cli/Cargo.lock | 102 + datafusion/core/Cargo.toml | 10 +- datafusion/core/src/datasource/file_format/csv.rs | 2 +- .../core/src/datasource/file_format/file_type.rs | 88 +++--- datafusion/core/src/datasource/file_format/json.rs | 4 +- .../core/src/physical_plan/file_format/csv.rs | 19 ++-- .../core/src/physical_plan/file_format/json.rs | 16 ++-- datafusion/core/src/test/mod.rs| 16 datafusion/expr/src/logical_plan/plan.rs | 2 +- datafusion/sql/src/parser.rs | 5 +- 10 files changed, 186 insertions(+), 78 deletions(-)
[arrow-datafusion] branch add-support-for-xz updated (a9d0bd2a3 -> de24ecb27)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-support-for-xz in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git from a9d0bd2a3 add support for xz file compression add de24ecb27 fix Cargo.toml formatting No new revisions were added by this update. Summary of changes: datafusion/core/Cargo.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[arrow-datafusion] branch add-support-for-xz created (now a9d0bd2a3)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-support-for-xz in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git at a9d0bd2a3 add support for xz file compression No new revisions were added by this update.
[arrow-rs] branch add-bloom-filter-3 updated (85014cea8 -> 37e145d38)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 85014cea8 Apply suggestions from code review add 0d6540dd0 remove underflow logic add 37e145d38 refactor write No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 32 +--- parquet/src/file/properties.rs | 2 +- parquet/src/file/writer.rs | 6 +- 3 files changed, 23 insertions(+), 17 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (8ed433799 -> 85014cea8)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git discard 8ed433799 Apply suggestions from code review discard e102b87da fix clippy discard 8dcd75e80 add unit test discard b754a9ee1 fix doc discard 8ded33a6c incorporate ndv and fpp discard 414097535 remove default feature for twox discard a30bd097a fix clippy discard 93afb6c01 fix clippy discard 872473dc4 update row group vec discard 9b55ab6fb bloom filter part III add e55b95e8d Clippy parquet fixes (#3124) add 2a065bee3 Bump actions/labeler from 4.0.2 to 4.1.0 (#3129) add 5bce1044f Add COW conversion for Buffer and PrimitiveArray and unary_mut (#3115) add f4558aeb2 bloom filter part III add ea13d0aca update row group vec add 03edb7df7 fix clippy add 5fa74765f fix clippy add 3732e436c remove default feature for twox add 7f46a4b6e incorporate ndv and fpp add 27a404d3c fix doc add 35e56c135 add unit test add ec68e695e fix clippy add 85014cea8 Apply suggestions from code review This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (8ed433799) \ N -- N -- N refs/heads/add-bloom-filter-3 (85014cea8) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: .github/workflows/dev_pr.yml | 2 +- arrow-array/src/array/mod.rs | 12 ++ arrow-array/src/array/primitive_array.rs | 161 +- arrow-array/src/builder/boolean_buffer_builder.rs | 5 + arrow-array/src/builder/buffer_builder.rs | 9 ++ arrow-array/src/builder/null_buffer_builder.rs| 25 +++- arrow-array/src/builder/primitive_builder.rs | 24 arrow-buffer/src/buffer/immutable.rs | 19 +++ arrow-buffer/src/buffer/mutable.rs| 19 +++ arrow-buffer/src/bytes.rs | 5 + arrow-json/src/reader.rs | 2 - parquet/src/data_type.rs | 24 parquet/src/encodings/decoding.rs | 15 +- parquet/src/record/api.rs | 11 +- 14 files changed, 292 insertions(+), 41 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (e102b87da -> 8ed433799)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from e102b87da fix clippy add 8ed433799 Apply suggestions from code review No new revisions were added by this update. Summary of changes: parquet/src/file/properties.rs | 2 +- parquet/src/file/reader.rs | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (b754a9ee1 -> e102b87da)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from b754a9ee1 fix doc add 8dcd75e80 add unit test add e102b87da fix clippy No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 43 +++-- parquet/src/file/properties.rs | 2 +- 2 files changed, 38 insertions(+), 7 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (8ded33a6c -> b754a9ee1)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 8ded33a6c incorporate ndv and fpp add b754a9ee1 fix doc No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (414097535 -> 8ded33a6c)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 414097535 remove default feature for twox add 8ded33a6c incorporate ndv and fpp No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 48 parquet/src/column/writer/mod.rs | 16 -- parquet/src/file/properties.rs | 14 +++- 3 files changed, 75 insertions(+), 3 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (a30bd097a -> 414097535)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from a30bd097a fix clippy add 414097535 remove default feature for twox No new revisions were added by this update. Summary of changes: parquet/Cargo.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[arrow-rs] branch add-bloom-filter-3 updated (93afb6c01 -> a30bd097a)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 93afb6c01 fix clippy add a30bd097a fix clippy No new revisions were added by this update. Summary of changes: parquet/src/bin/parquet-show-bloom-filter.rs | 5 + 1 file changed, 1 insertion(+), 4 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (872473dc4 -> 93afb6c01)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 872473dc4 update row group vec add 93afb6c01 fix clippy No new revisions were added by this update. Summary of changes: parquet/src/file/properties.rs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (9b55ab6fb -> 872473dc4)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 9b55ab6fb bloom filter part III add 872473dc4 update row group vec No new revisions were added by this update. Summary of changes: parquet/src/file/writer.rs | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (ec3b5d0bd -> 9b55ab6fb)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git omit ec3b5d0bd eager reading omit 7ee3aa250 add rustdoc omit acd26ce64 get rid of mention of bloom feature omit 09ee38ccb add feature flag add 9b55ab6fb bloom filter part III This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (ec3b5d0bd) \ N -- N -- N refs/heads/add-bloom-filter-3 (9b55ab6fb) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: parquet/src/file/properties.rs| 6 +- parquet/src/file/serialized_reader.rs | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (acd26ce64 -> ec3b5d0bd)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from acd26ce64 get rid of mention of bloom feature add 7ee3aa250 add rustdoc add ec3b5d0bd eager reading No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 3 +++ parquet/src/file/metadata.rs | 2 +- parquet/src/file/properties.rs| 21 + parquet/src/file/reader.rs| 2 +- parquet/src/file/serialized_reader.rs | 27 +++ 5 files changed, 45 insertions(+), 10 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (09ee38ccb -> acd26ce64)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 09ee38ccb add feature flag add acd26ce64 get rid of mention of bloom feature No new revisions were added by this update. Summary of changes: parquet/Cargo.toml| 10 -- parquet/src/column/writer/mod.rs | 5 - parquet/src/file/reader.rs| 2 -- parquet/src/file/serialized_reader.rs | 2 -- parquet/src/file/writer.rs| 9 - parquet/src/lib.rs| 1 - 6 files changed, 4 insertions(+), 25 deletions(-)
[arrow-rs] branch add-bloom-filter-3 updated (52bf18a94 -> 09ee38ccb)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git discard 52bf18a94 add feature flag discard f54178ac3 parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo (#3102) discard fb167f6bf Include field name in merge error message (#3113) discard ec4c040d4 Expose `SortingColumn` in parquet files (#3103) discard b45790b30 Parse Time32/Time64 from formatted string (#3101) add 371ec57e3 Expose `SortingColumn` in parquet files (#3103) add c99d2f333 Include field name in merge error message (#3113) add c95eb4c80 Parse Time32/Time64 from formatted string (#3101) add 73d66d837 parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo (#3102) add 09ee38ccb add feature flag This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (52bf18a94) \ N -- N -- N refs/heads/add-bloom-filter-3 (09ee38ccb) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: ...67-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet.crc | Bin 16 -> 0 bytes bla.parquet/_SUCCESS | 0 ...d3a667-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet | Bin 587 -> 0 bytes 3 files changed, 0 insertions(+), 0 deletions(-) delete mode 100644 bla.parquet/.part-0-e0d3a667-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet.crc delete mode 100644 bla.parquet/_SUCCESS delete mode 100644 bla.parquet/part-0-e0d3a667-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet
[arrow-rs] branch add-bloom-filter-3 updated (76aa88f6f -> 52bf18a94)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git discard 76aa88f6f add feature flag discard 881fcb10f parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo (#3102) discard 287d16d3a Include field name in merge error message (#3113) discard 3baf6eb17 Parse Time32/Time64 from formatted string (#3101) omit 371ec57e3 Expose `SortingColumn` in parquet files (#3103) add b45790b30 Parse Time32/Time64 from formatted string (#3101) add ec4c040d4 Expose `SortingColumn` in parquet files (#3103) add fb167f6bf Include field name in merge error message (#3113) add f54178ac3 parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo (#3102) add 52bf18a94 add feature flag This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (76aa88f6f) \ N -- N -- N refs/heads/add-bloom-filter-3 (52bf18a94) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes:
[arrow-rs] branch add-bloom-filter-3 updated (5d7624860 -> 76aa88f6f)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git omit 5d7624860 encode writing omit 9b8a0f515 write out to bloom filter omit 63fa6434a add writer properties omit 777b0dc6f add column setter omit 415c6fbb6 update help omit 86673694f remove unused trait omit a9480ad55 rework api omit bd2fb2fd5 refactor to test omit e7a33b693 get rid of loop read omit 3ec6e292c remove extern crate omit f0041d363 parquet-show-bloom-filter with bloom feature required omit 1bc73cd46 update read method omit f8b7a2781 adjust byte size omit fa3639cca fix clippy omit 7a51342e8 remove unused omit c66d7a00a add bin omit 5f4deae63 add a binary to demo omit efd89916a refactor omit 88cea8052 fix reading with chunk reader omit 2557f2c4a add api omit d5458bbdc add feature flag add fc06c84f4 Implements more temporal kernels using time_fraction_dyn (#3107) add 19f372d82 cast: unsigned numeric type with decimal (#3106) add 81ce601be Update instructions for new crates (#3111) add b0b5d8b4f Add PrimitiveArray::unary_opt (#3110) add 5c2801d08 Add downcast_array (#2901) (#3117) add 7d41e1c19 Check overflow while casting between decimal types (#3076) add 8bb2917ee Remove Option from `Field::metadata` (#3091) add 371ec57e3 Expose `SortingColumn` in parquet files (#3103) add 3baf6eb17 Parse Time32/Time64 from formatted string (#3101) add 287d16d3a Include field name in merge error message (#3113) add 881fcb10f parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo (#3102) add 76aa88f6f add feature flag This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (5d7624860) \ N -- N -- N refs/heads/add-bloom-filter-3 (76aa88f6f) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: arrow-array/src/array/primitive_array.rs | 66 +++ arrow-array/src/builder/struct_builder.rs | 2 +- arrow-array/src/cast.rs| 33 ++ arrow-cast/src/cast.rs | 638 + arrow-cast/src/parse.rs| 420 +- arrow-csv/src/reader.rs| 35 ++ arrow-integration-test/src/field.rs| 10 +- arrow-integration-test/src/lib.rs | 171 +++--- arrow-ipc/src/convert.rs | 30 +- arrow-schema/src/datatype.rs | 6 +- arrow-schema/src/field.rs | 113 ++-- arrow-schema/src/schema.rs | 62 +- arrow/src/compute/kernels/temporal.rs | 305 +- ...-4f6e-a9b1-084447078e60-c000.snappy.parquet.crc | Bin 0 -> 16 bytes bla.parquet/_SUCCESS | 0 ...4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet | Bin 0 -> 587 bytes dev/release/README.md | 2 + parquet/src/arrow/arrow_reader/mod.rs | 2 +- parquet/src/arrow/schema/complex.rs| 10 +- parquet/src/file/metadata.rs | 21 +- parquet/src/file/properties.rs | 16 + parquet/src/file/writer.rs | 61 ++ 22 files changed, 1399 insertions(+), 604 deletions(-) create mode 100644 bla.parquet/.part-0-e0d3a667-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet.crc create mode 100644 bla.parquet/_SUCCESS create mode 100644 bla.parquet/part-0-e0d3a667-4c2e-4f6e-a9b1-084447078e60-c000.snappy.parquet
[arrow-rs] 02/04: add writer properties
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git commit 63fa6434aca28e9f646bf6840334fdf6fe0abedc Author: Jiayu Liu AuthorDate: Tue Nov 15 21:23:02 2022 +0800 add writer properties --- parquet/src/file/properties.rs | 148 ++--- 1 file changed, 80 insertions(+), 68 deletions(-) diff --git a/parquet/src/file/properties.rs b/parquet/src/file/properties.rs index c0e789ca1..c62bfe0bc 100644 --- a/parquet/src/file/properties.rs +++ b/parquet/src/file/properties.rs @@ -64,6 +64,7 @@ //! .build(); //! ``` +use paste::paste; use std::{collections::HashMap, sync::Arc}; use crate::basic::{Compression, Encoding}; @@ -81,6 +82,9 @@ const DEFAULT_STATISTICS_ENABLED: EnabledStatistics = EnabledStatistics::Page; const DEFAULT_MAX_STATISTICS_SIZE: usize = 4096; const DEFAULT_MAX_ROW_GROUP_SIZE: usize = 1024 * 1024; const DEFAULT_CREATED_BY: = env!("PARQUET_CREATED_BY"); +const DEFAULT_BLOOM_FILTER_ENABLED: bool = false; +const DEFAULT_BLOOM_FILTER_MAX_BYTES: u32 = 1024 * 1024; +const DEFAULT_BLOOM_FILTER_FPP: f64 = 0.01; /// Parquet writer version. /// @@ -123,6 +127,26 @@ pub struct WriterProperties { column_properties: HashMap, } +macro_rules! def_col_property_getter { +($field:ident, $field_type:ty) => { +pub fn $field(, col: ) -> Option<$field_type> { +self.column_properties +.get(col) +.and_then(|c| c.$field()) +.or_else(|| self.default_column_properties.$field()) +} +}; +($field:ident, $field_type:ty, $default_val:expr) => { +pub fn $field(, col: ) -> $field_type { +self.column_properties +.get(col) +.and_then(|c| c.$field()) +.or_else(|| self.default_column_properties.$field()) +.unwrap_or($default_val) +} +}; +} + impl WriterProperties { /// Returns builder for writer properties with default values. pub fn builder() -> WriterPropertiesBuilder { @@ -249,14 +273,10 @@ impl WriterProperties { .unwrap_or(DEFAULT_MAX_STATISTICS_SIZE) } -/// Returns `true` if bloom filter is enabled for a column. -pub fn bloom_filter_enabled(, col: ) -> bool { -self.column_properties -.get(col) -.and_then(|c| c.bloom_filter_enabled()) -.or_else(|| self.default_column_properties.bloom_filter_enabled()) -.unwrap_or(false) -} +def_col_property_getter!(bloom_filter_enabled, bool, DEFAULT_BLOOM_FILTER_ENABLED); +def_col_property_getter!(bloom_filter_fpp, f64, DEFAULT_BLOOM_FILTER_FPP); +def_col_property_getter!(bloom_filter_ndv, u64); +def_col_property_getter!(bloom_filter_max_bytes, u32, DEFAULT_BLOOM_FILTER_MAX_BYTES); } /// Writer properties builder. @@ -273,16 +293,40 @@ pub struct WriterPropertiesBuilder { column_properties: HashMap, } -macro_rules! def_per_col_setter { -($field:ident, $field_type:expr) => { -// The macro will expand into the contents of this block. -pub fn concat_idents!(set_, $field)(mut self, value: $field_type) -> Self { -self.$field = value; -self +macro_rules! def_opt_field_setter { +($field: ident, $type: ty) => { +paste! { +pub fn []( self, value: $type) -> Self { +self.$field = Some(value); +self +} +} +}; +} + +macro_rules! def_opt_field_getter { +($field: ident, $type: ty) => { +paste! { +#[doc = "Returns " $field " if set."] +pub fn $field() -> Option<$type> { +self.$field +} } }; } +macro_rules! def_per_col_setter { +($field:ident, $field_type:ty) => { +paste! { +#[doc = "Sets " $field " for a column. Takes precedence over globally defined settings."] +pub fn [](mut self, col: ColumnPath, value: $field_type) -> Self { +self.get_mut_props(col).[](value); +self +} +} +} +} + impl WriterPropertiesBuilder { /// Returns default state of the builder. fn with_defaults() -> Self { @@ -325,8 +369,6 @@ impl WriterPropertiesBuilder { self } -def_per_col_setter!(writer_version, WriterVersion); - /// Sets best effort maximum size of a data page in bytes. /// /// Note: this is a best effort limit based on value of @@ -498,16 +540,10 @@ impl WriterPropertiesBuilder { self } -/// Sets bloom filter enabled for a column. -/// Takes precedence over globally defined settings. -pub fn set_column_bloom_filter_enabled( -
[arrow-rs] 01/04: add column setter
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git commit 777b0dc6f7d4a08af896772893071681c9d17b21 Author: Jiayu Liu AuthorDate: Tue Nov 15 20:53:32 2022 +0800 add column setter --- parquet/Cargo.toml | 1 + parquet/src/file/properties.rs | 102 +++-- 2 files changed, 89 insertions(+), 14 deletions(-) diff --git a/parquet/Cargo.toml b/parquet/Cargo.toml index fc7c8218a..72baaf338 100644 --- a/parquet/Cargo.toml +++ b/parquet/Cargo.toml @@ -58,6 +58,7 @@ futures = { version = "0.3", default-features = false, features = ["std"], optio tokio = { version = "1.0", optional = true, default-features = false, features = ["macros", "rt", "io-util"] } hashbrown = { version = "0.13", default-features = false } twox-hash = { version = "1.6", optional = true } +paste = "1.0" [dev-dependencies] base64 = { version = "0.13", default-features = false, features = ["std"] } diff --git a/parquet/src/file/properties.rs b/parquet/src/file/properties.rs index cf821df21..c0e789ca1 100644 --- a/parquet/src/file/properties.rs +++ b/parquet/src/file/properties.rs @@ -248,6 +248,15 @@ impl WriterProperties { .or_else(|| self.default_column_properties.max_statistics_size()) .unwrap_or(DEFAULT_MAX_STATISTICS_SIZE) } + +/// Returns `true` if bloom filter is enabled for a column. +pub fn bloom_filter_enabled(, col: ) -> bool { +self.column_properties +.get(col) +.and_then(|c| c.bloom_filter_enabled()) +.or_else(|| self.default_column_properties.bloom_filter_enabled()) +.unwrap_or(false) +} } /// Writer properties builder. @@ -264,6 +273,16 @@ pub struct WriterPropertiesBuilder { column_properties: HashMap, } +macro_rules! def_per_col_setter { +($field:ident, $field_type:expr) => { +// The macro will expand into the contents of this block. +pub fn concat_idents!(set_, $field)(mut self, value: $field_type) -> Self { +self.$field = value; +self +} +}; +} + impl WriterPropertiesBuilder { /// Returns default state of the builder. fn with_defaults() -> Self { @@ -276,7 +295,7 @@ impl WriterPropertiesBuilder { writer_version: DEFAULT_WRITER_VERSION, created_by: DEFAULT_CREATED_BY.to_string(), key_value_metadata: None, -default_column_properties: ColumnProperties::new(), +default_column_properties: Default::default(), column_properties: HashMap::new(), } } @@ -306,6 +325,8 @@ impl WriterPropertiesBuilder { self } +def_per_col_setter!(writer_version, WriterVersion); + /// Sets best effort maximum size of a data page in bytes. /// /// Note: this is a best effort limit based on value of @@ -423,7 +444,7 @@ impl WriterPropertiesBuilder { fn get_mut_props( self, col: ColumnPath) -> ColumnProperties { self.column_properties .entry(col) -.or_insert_with(ColumnProperties::new) +.or_insert_with(Default::default) } /// Sets encoding for a column. @@ -476,6 +497,17 @@ impl WriterPropertiesBuilder { self.get_mut_props(col).set_max_statistics_size(value); self } + +/// Sets bloom filter enabled for a column. +/// Takes precedence over globally defined settings. +pub fn set_column_bloom_filter_enabled( +mut self, +col: ColumnPath, +value: bool, +) -> Self { +self.get_mut_props(col).set_bloom_filter_enabled(value); +self +} } /// Controls the level of statistics to be computed by the writer @@ -499,27 +531,24 @@ impl Default for EnabledStatistics { /// /// If a field is `None`, it means that no specific value has been set for this column, /// so some subsequent or default value must be used. -#[derive(Debug, Clone, PartialEq)] +#[derive(Debug, Clone, Default, PartialEq)] struct ColumnProperties { encoding: Option, codec: Option, dictionary_enabled: Option, statistics_enabled: Option, max_statistics_size: Option, +/// bloom filter enabled +bloom_filter_enabled: Option, +/// bloom filter expected number of distinct values +bloom_filter_ndv: Option, +/// bloom filter false positive probability +bloom_filter_fpp: Option, +/// bloom filter max number of bytes +bloom_filter_max_bytes: Option, } impl ColumnProperties { -/// Initialise column properties with default values. -fn new() -> Self { -Self { -encoding: None, -codec: None, -dictionary_enabled: None, -
[arrow-rs] 03/04: write out to bloom filter
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git commit 9b8a0f51517b3235ccd57461f439a400dbbee4c1 Author: Jiayu Liu AuthorDate: Tue Nov 15 21:47:56 2022 +0800 write out to bloom filter --- parquet/src/bloom_filter/mod.rs | 1 + parquet/src/column/writer/mod.rs | 15 ++ parquet/src/file/writer.rs | 45 ++-- 3 files changed, 59 insertions(+), 2 deletions(-) diff --git a/parquet/src/bloom_filter/mod.rs b/parquet/src/bloom_filter/mod.rs index 4944a93f8..d0bee8a5f 100644 --- a/parquet/src/bloom_filter/mod.rs +++ b/parquet/src/bloom_filter/mod.rs @@ -80,6 +80,7 @@ fn block_check(block: , hash: u32) -> bool { } /// A split block Bloom filter +#[derive(Debug, Clone)] pub struct Sbbf(Vec); const SBBF_HEADER_SIZE_ESTIMATE: usize = 20; diff --git a/parquet/src/column/writer/mod.rs b/parquet/src/column/writer/mod.rs index 3cdf04f54..f8e79d792 100644 --- a/parquet/src/column/writer/mod.rs +++ b/parquet/src/column/writer/mod.rs @@ -16,6 +16,9 @@ // under the License. //! Contains column writer API. + +#[cfg(feature = "bloom")] +use crate::bloom_filter::Sbbf; use crate::format::{ColumnIndex, OffsetIndex}; use std::collections::{BTreeSet, VecDeque}; @@ -154,6 +157,9 @@ pub struct ColumnCloseResult { pub rows_written: u64, /// Metadata for this column chunk pub metadata: ColumnChunkMetaData, +/// Optional bloom filter for this column +#[cfg(feature = "bloom")] +pub bloom_filter: Option, /// Optional column index, for filtering pub column_index: Option, /// Optional offset index, identifying page locations @@ -209,6 +215,10 @@ pub struct GenericColumnWriter<'a, E: ColumnValueEncoder> { rep_levels_sink: Vec, data_pages: VecDeque, +// bloom filter +#[cfg(feature = "bloom")] +bloom_filter: Option, + // column index and offset index column_index_builder: ColumnIndexBuilder, offset_index_builder: OffsetIndexBuilder, @@ -260,6 +270,9 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> { num_column_nulls: 0, column_distinct_count: None, }, +// TODO! +#[cfg(feature = "bloom")] +bloom_filter: None, column_index_builder: ColumnIndexBuilder::new(), offset_index_builder: OffsetIndexBuilder::new(), encodings, @@ -458,6 +471,8 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> { Ok(ColumnCloseResult { bytes_written: self.column_metrics.total_bytes_written, rows_written: self.column_metrics.total_rows_written, +#[cfg(feature = "bloom")] +bloom_filter: self.bloom_filter, metadata, column_index, offset_index, diff --git a/parquet/src/file/writer.rs b/parquet/src/file/writer.rs index 2efaf7caf..90c9b6bfc 100644 --- a/parquet/src/file/writer.rs +++ b/parquet/src/file/writer.rs @@ -18,10 +18,11 @@ //! Contains file writer API, and provides methods to write row groups and columns by //! using row group writers and column writers respectively. -use std::{io::Write, sync::Arc}; - +#[cfg(feature = "bloom")] +use crate::bloom_filter::Sbbf; use crate::format as parquet; use crate::format::{ColumnIndex, OffsetIndex, RowGroup}; +use std::{io::Write, sync::Arc}; use thrift::protocol::{TCompactOutputProtocol, TOutputProtocol, TSerializable}; use crate::basic::PageType; @@ -116,6 +117,8 @@ pub struct SerializedFileWriter { descr: SchemaDescPtr, props: WriterPropertiesPtr, row_groups: Vec, +#[cfg(feature = "bloom")] +bloom_filters: Vec>>, column_indexes: Vec>>, offset_indexes: Vec>>, row_group_index: usize, @@ -132,6 +135,8 @@ impl SerializedFileWriter { descr: Arc::new(SchemaDescriptor::new(schema)), props: properties, row_groups: vec![], +#[cfg(feature = "bloom")] +bloom_filters: vec![], column_indexes: Vec::new(), offset_indexes: Vec::new(), row_group_index: 0, @@ -212,6 +217,32 @@ impl SerializedFileWriter { Ok(()) } +#[cfg(feature = "bloom")] +/// Serialize all the bloom filter to the file +fn write_bloom_filters( self, row_groups: [RowGroup]) -> Result<()> { +// iter row group +// iter each column +// write bloom filter to the file +for (row_group_idx, row_group) in row_groups.iter_mut().enumerate() { +for (column_idx, column_metadata) in row_group.columns.iter_mut().enumerate() +{ +match _filters[row_group_idx][column_idx]
[arrow-rs] 04/04: encode writing
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git commit 5d7624860d3c8aed11b4ed04b0d35ccbcc1802f2 Author: Jiayu Liu AuthorDate: Tue Nov 15 23:01:26 2022 +0800 encode writing --- parquet/src/bloom_filter/mod.rs | 26 ++ parquet/src/file/writer.rs | 16 +++- 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/parquet/src/bloom_filter/mod.rs b/parquet/src/bloom_filter/mod.rs index d0bee8a5f..0122a3a76 100644 --- a/parquet/src/bloom_filter/mod.rs +++ b/parquet/src/bloom_filter/mod.rs @@ -24,9 +24,11 @@ use crate::file::metadata::ColumnChunkMetaData; use crate::file::reader::ChunkReader; use crate::format::{ BloomFilterAlgorithm, BloomFilterCompression, BloomFilterHash, BloomFilterHeader, +SplitBlockAlgorithm, Uncompressed, XxHash, }; use bytes::{Buf, Bytes}; use std::hash::Hasher; +use std::io::Write; use std::sync::Arc; use thrift::protocol::{TCompactInputProtocol, TSerializable}; use twox_hash::XxHash64; @@ -129,6 +131,30 @@ impl Sbbf { Self(data) } +pub fn write_bitset(, mut writer: W) -> Result<(), ParquetError> { +for block in { +for word in block { +writer.write_all(_le_bytes()).map_err(|e| { +ParquetError::General(format!( +"Could not write bloom filter bit set: {}", +e +)) +})?; +} +} +Ok(()) +} + +pub fn header() -> BloomFilterHeader { +BloomFilterHeader { +// 8 i32 per block, 4 bytes per i32 +num_bytes: self.0.len() as i32 * 4 * 8, +algorithm: BloomFilterAlgorithm::BLOCK(SplitBlockAlgorithm {}), +hash: BloomFilterHash::XXHASH(XxHash {}), +compression: BloomFilterCompression::UNCOMPRESSED(Uncompressed {}), +} +} + pub fn read_from_column_chunk( column_metadata: , reader: Arc, diff --git a/parquet/src/file/writer.rs b/parquet/src/file/writer.rs index 90c9b6bfc..bf6ec93fa 100644 --- a/parquet/src/file/writer.rs +++ b/parquet/src/file/writer.rs @@ -230,11 +230,16 @@ impl SerializedFileWriter { Some(bloom_filter) => { let start_offset = self.buf.bytes_written(); let mut protocol = TCompactOutputProtocol::new( self.buf); -bloom_filter.write_to_out_protocol( protocol)?; +let header = bloom_filter.header(); +header.write_to_out_protocol( protocol)?; protocol.flush()?; -let end_offset = self.buf.bytes_written(); +bloom_filter.write_bitset( self.buf)?; // set offset and index for bloom filter -column_metadata.bloom_filter_offset = Some(start_offset as i64); +column_metadata +.meta_data +.as_mut() +.expect("can't have bloom filter without column metadata") +.bloom_filter_offset = Some(start_offset as i64); } None => {} } @@ -424,10 +429,10 @@ impl<'a, W: Write> SerializedRowGroupWriter<'a, W> { // Update row group writer metrics *total_bytes_written += r.bytes_written; column_chunks.push(r.metadata); -column_indexes.push(r.column_index); -offset_indexes.push(r.offset_index); #[cfg(feature = "bloom")] bloom_filters.push(r.bloom_filter); +column_indexes.push(r.column_index); +offset_indexes.push(r.offset_index); if let Some(rows) = *total_rows_written { if rows != r.rows_written { @@ -663,6 +668,7 @@ impl<'a, W: Write> PageWriter for SerializedPageWriter<'a, W> { Ok(spec) } + fn write_metadata( self, metadata: ) -> Result<()> { let mut protocol = TCompactOutputProtocol::new( self.sink); metadata
[arrow-rs] branch add-bloom-filter-3 created (now 5d7624860)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-3 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git at 5d7624860 encode writing This branch includes the following new commits: new 777b0dc6f add column setter new 63fa6434a add writer properties new 9b8a0f515 write out to bloom filter new 5d7624860 encode writing The 4 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-rs] branch add-bloom-filter-2 updated (86673694f -> 415c6fbb6)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 86673694f remove unused trait add 415c6fbb6 update help No new revisions were added by this update. Summary of changes: parquet/src/bin/parquet-show-bloom-filter.rs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
[arrow-rs] branch add-bloom-filter-2 updated (a9480ad55 -> 86673694f)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from a9480ad55 rework api add 86673694f remove unused trait No new revisions were added by this update. Summary of changes: parquet/src/bin/parquet-show-bloom-filter.rs | 1 - 1 file changed, 1 deletion(-)
[arrow-rs] branch add-bloom-filter-2 updated (bd2fb2fd5 -> a9480ad55)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from bd2fb2fd5 refactor to test add a9480ad55 rework api No new revisions were added by this update. Summary of changes: parquet/src/bin/parquet-show-bloom-filter.rs | 4 ++-- parquet/src/bloom_filter/mod.rs | 25 ++--- 2 files changed, 20 insertions(+), 9 deletions(-)
[arrow-rs] branch add-bloom-filter-2 updated (e7a33b693 -> bd2fb2fd5)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from e7a33b693 get rid of loop read add bd2fb2fd5 refactor to test No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 58 - 1 file changed, 52 insertions(+), 6 deletions(-)
[arrow-rs] branch add-bloom-filter-2 updated (f0041d363 -> e7a33b693)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from f0041d363 parquet-show-bloom-filter with bloom feature required add 3ec6e292c remove extern crate add e7a33b693 get rid of loop read No new revisions were added by this update. Summary of changes: parquet/src/bin/parquet-read.rs | 2 -- parquet/src/bin/parquet-rowcount.rs | 1 - parquet/src/bin/parquet-schema.rs| 1 - parquet/src/bin/parquet-show-bloom-filter.rs | 1 - parquet/src/bloom_filter/mod.rs | 36 +++- 5 files changed, 14 insertions(+), 27 deletions(-)
[arrow-rs] branch add-bloom-filter-2 updated: parquet-show-bloom-filter with bloom feature required
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git The following commit(s) were added to refs/heads/add-bloom-filter-2 by this push: new f0041d363 parquet-show-bloom-filter with bloom feature required f0041d363 is described below commit f0041d363a20dff1bb65f566f9c958de2f733775 Author: Jiayu Liu AuthorDate: Mon Nov 14 22:03:50 2022 +0800 parquet-show-bloom-filter with bloom feature required --- parquet/Cargo.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/parquet/Cargo.toml b/parquet/Cargo.toml index 50fdac5f6..fc7c8218a 100644 --- a/parquet/Cargo.toml +++ b/parquet/Cargo.toml @@ -115,7 +115,7 @@ required-features = ["arrow", "cli"] [[bin]] name = "parquet-show-bloom-filter" -required-features = ["cli"] +required-features = ["cli", "bloom"] [[bench]] name = "arrow_writer"
[arrow-rs] branch add-bloom-filter-2 updated (f8b7a2781 -> 1bc73cd46)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from f8b7a2781 adjust byte size add 1bc73cd46 update read method No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 23 +-- 1 file changed, 5 insertions(+), 18 deletions(-)
[arrow-rs] branch add-bloom-filter-2 updated (fa3639cca -> f8b7a2781)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from fa3639cca fix clippy add f8b7a2781 adjust byte size No new revisions were added by this update. Summary of changes: parquet/src/bin/parquet-show-bloom-filter.rs | 2 +- parquet/src/bloom_filter/mod.rs | 10 -- 2 files changed, 9 insertions(+), 3 deletions(-)
[arrow-rs] branch add-bloom-filter-2 updated (7a51342e8 -> fa3639cca)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 7a51342e8 remove unused add fa3639cca fix clippy No new revisions were added by this update. Summary of changes: .github/workflows/arrow.yml | 2 ++ parquet/src/bin/parquet-show-bloom-filter.rs | 3 +-- 2 files changed, 3 insertions(+), 2 deletions(-)
[arrow-rs] branch add-bloom-filter-2 updated (c66d7a00a -> 7a51342e8)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from c66d7a00a add bin add 7a51342e8 remove unused No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 1 - 1 file changed, 1 deletion(-)
[arrow-rs] branch add-bloom-filter-2 updated (74a191c9a -> c66d7a00a)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git discard 74a191c9a add bin discard e8273d0f4 add a binary to demo discard 2562f9770 refactor discard c685f0c2c fix reading with chunk reader discard d3d407b29 add api discard 5e200d981 add feature flag add 20d81f578 Add FixedSizeBinaryArray::try_from_sparse_iter_with_size (#3054) add 46da60642 Cleanup temporal _internal functions (#3099) add 430eb84d0 Improve schema mismatch error message (#3098) add 0900be278 Upgrade to thrift 0.17 and fix issues (#3104) add d5458bbdc add feature flag add 2557f2c4a add api add 88cea8052 fix reading with chunk reader add efd89916a refactor add 5f4deae63 add a binary to demo add c66d7a00a add bin This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (74a191c9a) \ N -- N -- N refs/heads/add-bloom-filter-2 (c66d7a00a) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: arrow-array/src/array/fixed_size_binary_array.rs | 121 +- arrow-schema/src/field.rs| 37 +- arrow-select/src/take.rs | 7 +- arrow/src/array/ffi.rs | 3 +- arrow/src/compute/kernels/comparison.rs | 41 ++ arrow/src/compute/kernels/sort.rs| 15 +- arrow/src/compute/kernels/substring.rs | 3 +- arrow/src/compute/kernels/temporal.rs| 152 ++- arrow/src/ffi.rs | 8 +- arrow/src/row/dictionary.rs | 2 +- arrow/src/util/bench_util.rs | 25 +- arrow/tests/array_transform.rs | 9 +- parquet/Cargo.toml | 2 +- parquet/src/arrow/async_reader.rs| 2 +- parquet/src/bloom_filter/mod.rs | 3 +- parquet/src/file/footer.rs | 2 +- parquet/src/file/page_index/index_reader.rs | 2 +- parquet/src/file/serialized_reader.rs| 4 +- parquet/src/file/writer.rs | 2 +- parquet/src/format.rs| 494 +++ 20 files changed, 579 insertions(+), 355 deletions(-)
[arrow-rs] branch add-bloom-filter-2 updated (2562f9770 -> 74a191c9a)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 2562f9770 refactor add e8273d0f4 add a binary to demo add 74a191c9a add bin No new revisions were added by this update. Summary of changes: parquet/Cargo.toml | 4 + parquet/src/bin/parquet-show-bloom-filter.rs | 113 +++ 2 files changed, 117 insertions(+) create mode 100644 parquet/src/bin/parquet-show-bloom-filter.rs
[arrow-rs] branch master updated: Upgrade to thrift 0.17 and fix issues (#3104)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-rs.git The following commit(s) were added to refs/heads/master by this push: new 0900be278 Upgrade to thrift 0.17 and fix issues (#3104) 0900be278 is described below commit 0900be27859974b8717185d65422c36d7e735b4e Author: Jiayu Liu AuthorDate: Mon Nov 14 16:59:32 2022 +0800 Upgrade to thrift 0.17 and fix issues (#3104) * test with thrift 0.17 and fix issues * rebase * remove databend prefix * fix async reader * fix doc err * fix more doc items --- arrow/src/row/dictionary.rs | 2 +- parquet/Cargo.toml | 2 +- parquet/src/arrow/async_reader.rs | 2 +- parquet/src/bloom_filter/mod.rs | 2 +- parquet/src/file/footer.rs | 2 +- parquet/src/file/page_index/index_reader.rs | 2 +- parquet/src/file/serialized_reader.rs | 2 +- parquet/src/file/writer.rs | 2 +- parquet/src/format.rs | 494 ++-- 9 files changed, 330 insertions(+), 180 deletions(-) diff --git a/arrow/src/row/dictionary.rs b/arrow/src/row/dictionary.rs index d8426ad0c..82169a37d 100644 --- a/arrow/src/row/dictionary.rs +++ b/arrow/src/row/dictionary.rs @@ -260,7 +260,7 @@ unsafe fn decode_fixed( .add_buffer(buffer.into()); // SAFETY: Buffers correct length -unsafe { builder.build_unchecked() } +builder.build_unchecked() } /// Decodes a `PrimitiveArray` from dictionary values diff --git a/parquet/Cargo.toml b/parquet/Cargo.toml index dda0518f9..a5d43bf54 100644 --- a/parquet/Cargo.toml +++ b/parquet/Cargo.toml @@ -41,7 +41,7 @@ arrow-ipc = { version = "27.0.0", path = "../arrow-ipc", default-features = fals ahash = { version = "0.8", default-features = false, features = ["compile-time-rng"] } bytes = { version = "1.1", default-features = false, features = ["std"] } -thrift = { version = "0.16", default-features = false } +thrift = { version = "0.17", default-features = false } snap = { version = "1.0", default-features = false, optional = true } brotli = { version = "3.3", default-features = false, features = ["std"], optional = true } flate2 = { version = "1.0", default-features = false, features = ["rust_backend"], optional = true } diff --git a/parquet/src/arrow/async_reader.rs b/parquet/src/arrow/async_reader.rs index d52fa0406..e182cccbc 100644 --- a/parquet/src/arrow/async_reader.rs +++ b/parquet/src/arrow/async_reader.rs @@ -89,7 +89,7 @@ use bytes::{Buf, Bytes}; use futures::future::{BoxFuture, FutureExt}; use futures::ready; use futures::stream::Stream; -use thrift::protocol::TCompactInputProtocol; +use thrift::protocol::{TCompactInputProtocol, TSerializable}; use tokio::io::{AsyncRead, AsyncReadExt, AsyncSeek, AsyncSeekExt}; diff --git a/parquet/src/bloom_filter/mod.rs b/parquet/src/bloom_filter/mod.rs index 770fb53e8..adfd87307 100644 --- a/parquet/src/bloom_filter/mod.rs +++ b/parquet/src/bloom_filter/mod.rs @@ -25,7 +25,7 @@ use crate::format::{ }; use std::hash::Hasher; use std::io::{Read, Seek, SeekFrom}; -use thrift::protocol::TCompactInputProtocol; +use thrift::protocol::{TCompactInputProtocol, TSerializable}; use twox_hash::XxHash64; /// Salt as defined in the [spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md#technical-approach) diff --git a/parquet/src/file/footer.rs b/parquet/src/file/footer.rs index e8a114db7..27c07b78d 100644 --- a/parquet/src/file/footer.rs +++ b/parquet/src/file/footer.rs @@ -18,7 +18,7 @@ use std::{io::Read, sync::Arc}; use crate::format::{ColumnOrder as TColumnOrder, FileMetaData as TFileMetaData}; -use thrift::protocol::TCompactInputProtocol; +use thrift::protocol::{TCompactInputProtocol, TSerializable}; use crate::basic::ColumnOrder; diff --git a/parquet/src/file/page_index/index_reader.rs b/parquet/src/file/page_index/index_reader.rs index 99877a921..af23c0bd9 100644 --- a/parquet/src/file/page_index/index_reader.rs +++ b/parquet/src/file/page_index/index_reader.rs @@ -23,7 +23,7 @@ use crate::file::page_index::index::{BooleanIndex, ByteArrayIndex, Index, Native use crate::file::reader::ChunkReader; use crate::format::{ColumnIndex, OffsetIndex, PageLocation}; use std::io::{Cursor, Read}; -use thrift::protocol::TCompactInputProtocol; +use thrift::protocol::{TCompactInputProtocol, TSerializable}; /// Read on row group's all columns indexes and change into [`Index`] /// If not the format not available return an empty vector. diff --git a/parquet/src/file/serialized_reader.rs b/parquet/src/file/serialized_reader.rs index a400d4dab..ebe87aca6 100644 --- a/parquet/src/file/serialized_
[arrow-rs] branch test-thrift-017 updated: fix more doc items
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch test-thrift-017 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git The following commit(s) were added to refs/heads/test-thrift-017 by this push: new 71560ff51 fix more doc items 71560ff51 is described below commit 71560ff517a738571254949faf14ac68c6b02547 Author: Jiayu Liu AuthorDate: Mon Nov 14 15:29:01 2022 +0800 fix more doc items --- parquet/src/format.rs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/parquet/src/format.rs b/parquet/src/format.rs index 2e57fa4f3..0851b2287 100644 --- a/parquet/src/format.rs +++ b/parquet/src/format.rs @@ -4587,7 +4587,7 @@ impl TSerializable for OffsetIndex { // /// Description for ColumnIndex. -/// Each \[i\] refers to the page at OffsetIndex.page_locations\[i\] +/// Each ``\[i\] refers to the page at OffsetIndex.page_locations\[i\] #[derive(Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)] pub struct ColumnIndex { /// A list of Boolean values to determine the validity of the corresponding @@ -4605,7 +4605,7 @@ pub struct ColumnIndex { /// that list entries are populated before using them by inspecting null_pages. pub min_values: Vec>, pub max_values: Vec>, - /// Stores whether both min_values and max_values are orderd and if so, in + /// Stores whether both min_values and max_values are ordered and if so, in /// which direction. This allows readers to perform binary searches in both /// lists. Readers cannot assume that max_values\[i\] <= min_values\[i+1\], even /// if the lists are ordered. @@ -5049,7 +5049,7 @@ pub struct FileMetaData { /// Optional key/value metadata * pub key_value_metadata: Option>, /// String for application that wrote this file. This should be in the format - /// version (build ). + /// `` version `` (build ``). /// e.g. impala version 1.0 (build 6cf94d29b2b7115df4de2c06e2ab4326d721eb55) /// pub created_by: Option,
[arrow-rs] branch test-thrift-017 updated: fix doc err
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch test-thrift-017 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git The following commit(s) were added to refs/heads/test-thrift-017 by this push: new 5ecc0d0c8 fix doc err 5ecc0d0c8 is described below commit 5ecc0d0c87f283c46fb54c935244e6a57dce434d Author: Jiayu Liu AuthorDate: Mon Nov 14 14:27:18 2022 +0800 fix doc err --- parquet/src/format.rs | 100 +- 1 file changed, 50 insertions(+), 50 deletions(-) diff --git a/parquet/src/format.rs b/parquet/src/format.rs index 3d38dd531..2e57fa4f3 100644 --- a/parquet/src/format.rs +++ b/parquet/src/format.rs @@ -99,7 +99,7 @@ impl From<> for i32 { /// DEPRECATED: Common types used by frameworks(e.g. hive, pig) using parquet. /// ConvertedType is superseded by LogicalType. This enum should not be extended. -/// +/// /// See LogicalTypes.md for conversion between ConvertedType and LogicalType. #[derive(Copy, Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)] pub struct ConvertedType(pub i32); @@ -117,12 +117,12 @@ impl ConvertedType { /// an enum is converted into a binary field pub const ENUM: ConvertedType = ConvertedType(4); /// A decimal value. - /// + /// /// This may be used to annotate binary or fixed primitive types. The /// underlying byte array stores the unscaled value encoded as two's /// complement using big-endian byte order (the most significant byte is the /// zeroth element). The value of the decimal is the value * 10^{-scale}. - /// + /// /// This must be accompanied by a (maximum) precision and a scale in the /// SchemaElement. The precision specifies the number of digits in the decimal /// and the scale stores the location of the decimal point. For example 1.23 @@ -130,62 +130,62 @@ impl ConvertedType { /// 2 digits over). pub const DECIMAL: ConvertedType = ConvertedType(5); /// A Date - /// + /// /// Stored as days since Unix epoch, encoded as the INT32 physical type. - /// + /// pub const DATE: ConvertedType = ConvertedType(6); /// A time - /// + /// /// The total number of milliseconds since midnight. The value is stored /// as an INT32 physical type. pub const TIME_MILLIS: ConvertedType = ConvertedType(7); /// A time. - /// + /// /// The total number of microseconds since midnight. The value is stored as /// an INT64 physical type. pub const TIME_MICROS: ConvertedType = ConvertedType(8); /// A date/time combination - /// + /// /// Date and time recorded as milliseconds since the Unix epoch. Recorded as /// a physical type of INT64. pub const TIMESTAMP_MILLIS: ConvertedType = ConvertedType(9); /// A date/time combination - /// + /// /// Date and time recorded as microseconds since the Unix epoch. The value is /// stored as an INT64 physical type. pub const TIMESTAMP_MICROS: ConvertedType = ConvertedType(10); /// An unsigned integer value. - /// + /// /// The number describes the maximum number of meaningful data bits in /// the stored value. 8, 16 and 32 bit values are stored using the /// INT32 physical type. 64 bit values are stored using the INT64 /// physical type. - /// + /// pub const UINT_8: ConvertedType = ConvertedType(11); pub const UINT_16: ConvertedType = ConvertedType(12); pub const UINT_32: ConvertedType = ConvertedType(13); pub const UINT_64: ConvertedType = ConvertedType(14); /// A signed integer value. - /// + /// /// The number describes the maximum number of meaningful data bits in /// the stored value. 8, 16 and 32 bit values are stored using the /// INT32 physical type. 64 bit values are stored using the INT64 /// physical type. - /// + /// pub const INT_8: ConvertedType = ConvertedType(15); pub const INT_16: ConvertedType = ConvertedType(16); pub const INT_32: ConvertedType = ConvertedType(17); pub const INT_64: ConvertedType = ConvertedType(18); /// An embedded JSON document - /// + /// /// A JSON document embedded within a single UTF8 column. pub const JSON: ConvertedType = ConvertedType(19); /// An embedded BSON document - /// + /// /// A BSON document embedded within a single BINARY column. pub const BSON: ConvertedType = ConvertedType(20); /// An interval of time - /// + /// /// This type annotates data stored as a FIXED_LEN_BYTE_ARRAY of length 12 /// This data is composed of three separate little endian unsigned /// integers. Each stores a component of a duration of time. The first @@ -443,11 +443,11 @@ impl From<> for i32 { } /// Supported compression algorithms. -/// +/// /// Codecs added in format version X.Y can be read by readers based on X.Y and later. /// Codec support may vary between readers based on the format version and /// libraries available at runtime. -/// +
[arrow-rs] branch test-thrift-017 updated: fix async reader
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch test-thrift-017 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git The following commit(s) were added to refs/heads/test-thrift-017 by this push: new 2acdbd185 fix async reader 2acdbd185 is described below commit 2acdbd18573c3421d839ff60b05a33b770ed7d3c Author: Jiayu Liu AuthorDate: Mon Nov 14 11:12:51 2022 +0800 fix async reader --- parquet/src/arrow/async_reader.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/parquet/src/arrow/async_reader.rs b/parquet/src/arrow/async_reader.rs index d52fa0406..e182cccbc 100644 --- a/parquet/src/arrow/async_reader.rs +++ b/parquet/src/arrow/async_reader.rs @@ -89,7 +89,7 @@ use bytes::{Buf, Bytes}; use futures::future::{BoxFuture, FutureExt}; use futures::ready; use futures::stream::Stream; -use thrift::protocol::TCompactInputProtocol; +use thrift::protocol::{TCompactInputProtocol, TSerializable}; use tokio::io::{AsyncRead, AsyncReadExt, AsyncSeek, AsyncSeekExt};
[arrow-rs] branch test-thrift-017 created (now 0a4dca99e)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch test-thrift-017 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git at 0a4dca99e remove databend prefix No new revisions were added by this update.
[arrow-rs] branch add-bloom-filter-2 updated (c685f0c2c -> 2562f9770)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from c685f0c2c fix reading with chunk reader add 2562f9770 refactor No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 63 +++-- 1 file changed, 36 insertions(+), 27 deletions(-)
[arrow-rs] branch add-bloom-filter-2 updated (d3d407b29 -> c685f0c2c)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from d3d407b29 add api add c685f0c2c fix reading with chunk reader No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 59 +-- parquet/src/file/serialized_reader.rs | 5 +-- 2 files changed, 46 insertions(+), 18 deletions(-)
[arrow-rs] 02/02: add api
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git commit d3d407b293091bd71c04f865b0c7c896ac52d452 Author: Jiayu Liu AuthorDate: Sun Nov 13 13:24:10 2022 + add api --- parquet/src/file/reader.rs| 6 ++ parquet/src/file/serialized_reader.rs | 15 +++ 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/parquet/src/file/reader.rs b/parquet/src/file/reader.rs index 70ff37a41..325944c21 100644 --- a/parquet/src/file/reader.rs +++ b/parquet/src/file/reader.rs @@ -21,6 +21,8 @@ use bytes::Bytes; use std::{boxed::Box, io::Read, sync::Arc}; +#[cfg(feature = "bloom")] +use crate::bloom_filter::Sbbf; use crate::column::page::PageIterator; use crate::column::{page::PageReader, reader::ColumnReader}; use crate::errors::{ParquetError, Result}; @@ -143,6 +145,10 @@ pub trait RowGroupReader: Send + Sync { Ok(col_reader) } +#[cfg(feature = "bloom")] +/// Get bloom filter for the `i`th column chunk, if present. +fn get_column_bloom_filter(, i: usize) -> Result>; + /// Get iterator of `Row`s from this row group. /// /// Projected schema can be a subset of or equal to the file schema, when it is None, diff --git a/parquet/src/file/serialized_reader.rs b/parquet/src/file/serialized_reader.rs index a400d4dab..8cefe1c5e 100644 --- a/parquet/src/file/serialized_reader.rs +++ b/parquet/src/file/serialized_reader.rs @@ -22,11 +22,9 @@ use std::collections::VecDeque; use std::io::Cursor; use std::{convert::TryFrom, fs::File, io::Read, path::Path, sync::Arc}; -use crate::format::{PageHeader, PageLocation, PageType}; -use bytes::{Buf, Bytes}; -use thrift::protocol::TCompactInputProtocol; - use crate::basic::{Encoding, Type}; +#[cfg(feature = "bloom")] +use crate::bloom_filter::Sbbf; use crate::column::page::{Page, PageMetadata, PageReader}; use crate::compression::{create_codec, Codec}; use crate::errors::{ParquetError, Result}; @@ -38,10 +36,13 @@ use crate::file::{ reader::*, statistics, }; +use crate::format::{PageHeader, PageLocation, PageType}; use crate::record::reader::RowIter; use crate::record::Row; use crate::schema::types::Type as SchemaType; use crate::util::{io::TryClone, memory::ByteBufferPtr}; +use bytes::{Buf, Bytes}; +use thrift::protocol::TCompactInputProtocol; // export `SliceableCursor` and `FileSource` publically so clients can // re-use the logic in their own ParquetFileWriter wrappers pub use crate::util::io::FileSource; @@ -387,6 +388,12 @@ impl<'a, R: 'static + ChunkReader> RowGroupReader for SerializedRowGroupReader<' )?)) } +#[cfg(feature = "bloom")] +/// get bloom filter for the ith column +fn get_column_bloom_filter(, i: usize) -> Result> { +todo!() +} + fn get_row_iter(, projection: Option) -> Result { RowIter::from_row_group(projection, self) }
[arrow-rs] branch add-bloom-filter-2 created (now d3d407b29)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git at d3d407b29 add api This branch includes the following new commits: new 5e200d981 add feature flag new d3d407b29 add api The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-rs] 01/02: add feature flag
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch add-bloom-filter-2 in repository https://gitbox.apache.org/repos/asf/arrow-rs.git commit 5e200d9819669175f3ae2a3a3de384541fec9056 Author: Jiayu Liu AuthorDate: Sun Nov 13 13:13:05 2022 + add feature flag --- .github/workflows/arrow.yml | 2 -- parquet/README.md | 1 + 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/.github/workflows/arrow.yml b/.github/workflows/arrow.yml index 2e1c64ebe..3e62ed775 100644 --- a/.github/workflows/arrow.yml +++ b/.github/workflows/arrow.yml @@ -39,7 +39,6 @@ on: - .github/** jobs: - # test the crate linux-test: name: Test @@ -134,7 +133,6 @@ jobs: - name: Check compilation --features simd --all-targets run: cargo check -p arrow --features simd --all-targets - # test the arrow crate builds against wasm32 in nightly rust wasm32-build: name: Build wasm32 diff --git a/parquet/README.md b/parquet/README.md index d904fc64e..c9245b082 100644 --- a/parquet/README.md +++ b/parquet/README.md @@ -41,6 +41,7 @@ However, for historical reasons, this crate uses versions with major numbers gre The `parquet` crate provides the following features which may be enabled in your `Cargo.toml`: - `arrow` (default) - support for reading / writing [`arrow`](https://crates.io/crates/arrow) arrays to / from parquet +- `bloom` (default) - support for [split block bloom filter](https://github.com/apache/parquet-format/blob/master/BloomFilter.md) for reading from / writing to parquet - `async` - support `async` APIs for reading parquet - `json` - support for reading / writing `json` data to / from parquet - `brotli` (default) - support for parquet using `brotli` compression
[arrow-rs] branch master updated: add bloom filter implementation based on split block (sbbf) spec (#3057)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-rs.git The following commit(s) were added to refs/heads/master by this push: new b7af85cb8 add bloom filter implementation based on split block (sbbf) spec (#3057) b7af85cb8 is described below commit b7af85cb8dfe6887bb3fd43d1d76f659473b6927 Author: Jiayu Liu AuthorDate: Sun Nov 13 21:07:11 2022 +0800 add bloom filter implementation based on split block (sbbf) spec (#3057) * add bloom filter implementation based on split block spec * format and also revist index method * bloom filter reader * create new function to facilitate fixture test * fix clippy * Update parquet/src/bloom_filter/mod.rs Co-authored-by: Andrew Lamb * Update parquet/src/bloom_filter/mod.rs Co-authored-by: Andrew Lamb * Update parquet/src/bloom_filter/mod.rs Co-authored-by: Andrew Lamb * Update parquet/src/bloom_filter/mod.rs Co-authored-by: Andrew Lamb * Update parquet/src/bloom_filter/mod.rs * Update parquet/src/bloom_filter/mod.rs Co-authored-by: Liang-Chi Hsieh * fix clippy Co-authored-by: Andrew Lamb Co-authored-by: Liang-Chi Hsieh --- parquet/Cargo.toml | 5 +- parquet/src/bloom_filter/mod.rs | 217 parquet/src/lib.rs | 2 + 3 files changed, 223 insertions(+), 1 deletion(-) diff --git a/parquet/Cargo.toml b/parquet/Cargo.toml index b400b01a7..dda0518f9 100644 --- a/parquet/Cargo.toml +++ b/parquet/Cargo.toml @@ -57,6 +57,7 @@ seq-macro = { version = "0.3", default-features = false } futures = { version = "0.3", default-features = false, features = ["std"], optional = true } tokio = { version = "1.0", optional = true, default-features = false, features = ["macros", "rt", "io-util"] } hashbrown = { version = "0.13", default-features = false } +twox-hash = { version = "1.6", optional = true } [dev-dependencies] base64 = { version = "0.13", default-features = false, features = ["std"] } @@ -76,7 +77,7 @@ rand = { version = "0.8", default-features = false, features = ["std", "std_rng" all-features = true [features] -default = ["arrow", "snap", "brotli", "flate2", "lz4", "zstd", "base64"] +default = ["arrow", "bloom", "snap", "brotli", "flate2", "lz4", "zstd", "base64"] # Enable arrow reader/writer APIs arrow = ["base64", "arrow-array", "arrow-buffer", "arrow-cast", "arrow-data", "arrow-schema", "arrow-select", "arrow-ipc"] # Enable CLI tools @@ -89,6 +90,8 @@ test_common = ["arrow/test_utils"] experimental = [] # Enable async APIs async = ["futures", "tokio"] +# Bloomfilter +bloom = ["twox-hash"] [[test]] name = "arrow_writer_layout" diff --git a/parquet/src/bloom_filter/mod.rs b/parquet/src/bloom_filter/mod.rs new file mode 100644 index 0..770fb53e8 --- /dev/null +++ b/parquet/src/bloom_filter/mod.rs @@ -0,0 +1,217 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Bloom filter implementation specific to Parquet, as described +//! in the [spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md) + +use crate::errors::ParquetError; +use crate::file::metadata::ColumnChunkMetaData; +use crate::format::{ +BloomFilterAlgorithm, BloomFilterCompression, BloomFilterHash, BloomFilterHeader, +}; +use std::hash::Hasher; +use std::io::{Read, Seek, SeekFrom}; +use thrift::protocol::TCompactInputProtocol; +use twox_hash::XxHash64; + +/// Salt as defined in the [spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md#technical-approach) +const SALT: [u32; 8] = [
[arrow-rs] branch add-bloom-filter updated (2f0e8bbaf -> c9208e79f)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from 2f0e8bbaf Update parquet/src/bloom_filter/mod.rs add f9e34f6f9 fix clippy add 522625814 Make RowSelection::intersection a member function (#3084) add 01396822e Remove unused range module (#3085) add 02a3f5cd2 Move CSV test data (#3044) (#3051) add 561f63a23 Improved UX of creating `TimestampNanosecondArray` with timezones (#3088) add 94565bca9 Update version to 27.0.0 and add changelog (#3089) add ccc44170a Fix clippy by avoiding deprecated functions in chrono (#3096) add aaf030f79 Fix prettyprint for Interval second fractions (#3093) add c7210ce2b Minor: Add diagrams and documentation to row format (#3094) add 3084ee258 Use ArrowNativeTypeOp instead of total_cmp directly (#3087) add c9208e79f Merge branch 'master' into add-bloom-filter No new revisions were added by this update. Summary of changes: CHANGELOG-old.md | 95 + CHANGELOG.md | 179 arrow-array/Cargo.toml | 8 +- arrow-array/src/array/primitive_array.rs | 24 +- arrow-array/src/delta.rs | 207 ++--- arrow-array/src/temporal_conversions.rs| 2 +- arrow-array/src/timezone.rs| 34 +- arrow-array/src/types.rs | 8 +- arrow-buffer/Cargo.toml| 2 +- arrow-cast/Cargo.toml | 12 +- arrow-cast/src/cast.rs | 16 +- arrow-cast/src/display.rs | 4 +- arrow-cast/src/parse.rs| 16 +- arrow-csv/Cargo.toml | 12 +- arrow-csv/src/reader.rs| 449 ++- {arrow => arrow-csv}/test/data/decimal_test.csv| 0 {arrow => arrow-csv}/test/data/null_test.csv | 0 {arrow => arrow-csv}/test/data/uk_cities.csv | 0 .../test/data/uk_cities_with_headers.csv | 0 {arrow => arrow-csv}/test/data/various_types.csv | 0 .../test/data/various_types_invalid.csv| 0 arrow-data/Cargo.toml | 6 +- arrow-flight/Cargo.toml| 10 +- arrow-flight/README.md | 2 +- arrow-integration-test/Cargo.toml | 6 +- arrow-integration-testing/Cargo.toml | 2 +- arrow-ipc/Cargo.toml | 12 +- arrow-json/Cargo.toml | 12 +- arrow-pyarrow-integration-testing/Cargo.toml | 4 +- arrow-schema/Cargo.toml| 2 +- arrow-select/Cargo.toml| 10 +- arrow/Cargo.toml | 22 +- arrow/README.md| 2 +- arrow/benches/cast_kernels.rs | 6 +- arrow/examples/read_csv.rs | 5 +- arrow/examples/read_csv_infer_schema.rs| 2 +- arrow/src/compute/kernels/arithmetic.rs| 24 +- arrow/src/compute/kernels/comparison.rs| 112 ++--- arrow/src/row/mod.rs | 191 +++-- arrow/src/util/pretty.rs | 84 arrow/tests/csv.rs | 422 -- dev/release/README.md | 2 +- dev/release/rat_exclude_files.txt | 1 + dev/release/update_change_log.sh | 4 +- parquet/Cargo.toml | 20 +- parquet/src/arrow/arrow_reader/mod.rs | 2 +- parquet/src/arrow/arrow_reader/selection.rs| 42 +- parquet/src/bloom_filter/mod.rs| 2 +- parquet/src/file/page_index/mod.rs | 3 - parquet/src/file/page_index/range.rs | 475 - parquet/src/record/api.rs | 21 +- parquet_derive/Cargo.toml | 4 +- parquet_derive/README.md | 4 +- parquet_derive_test/Cargo.toml | 6 +- 54 files changed, 1285 insertions(+), 1305 deletions(-) rename {arrow => arrow-csv}/test/data/decimal_test.csv (100%) rename {arrow => arrow-csv}/test/data/null_test.csv (100%) rename {arrow => arrow-csv}/test/data/uk_cities.csv (100%) rename {arrow => arrow-csv}/test/data/uk_cities_with_headers.csv (100%) rename {arrow => arrow-csv}/test/data/various_types.csv (100%) rename {arrow => arrow-csv}/test/data/various_types_invalid.csv (100%) delete mode 100644 parquet/src/file/page_index/range.rs
[arrow-rs] branch add-bloom-filter updated (b08f97c0d -> 2f0e8bbaf)
This is an automated email from the ASF dual-hosted git repository. jiayuliu pushed a change to branch add-bloom-filter in repository https://gitbox.apache.org/repos/asf/arrow-rs.git from b08f97c0d Update parquet/src/bloom_filter/mod.rs add 2f0e8bbaf Update parquet/src/bloom_filter/mod.rs No new revisions were added by this update. Summary of changes: parquet/src/bloom_filter/mod.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)