This is an automated email from the ASF dual-hosted git repository. alamb pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/asf-site by this push: new 201ebf0 add datafusion roadmap (#154) 201ebf0 is described below commit 201ebf0d7238c89c3749ee228bc8583008678970 Author: QP Hou <q...@scribd.com> AuthorDate: Wed Oct 20 03:36:09 2021 -0700 add datafusion roadmap (#154) --- datafusion/_modules/index.html | 7 + datafusion/_sources/cli/index.rst.txt | 6 +- datafusion/_sources/community/communication.md.txt | 17 ++ datafusion/_sources/index.rst.txt | 1 + datafusion/_sources/specification/roadmap.md.txt | 99 ++++++++ .../_sources/user-guide/example-usage.md.txt | 4 - datafusion/_sources/user-guide/library.md.txt | 5 +- datafusion/cli/index.html | 16 +- datafusion/community/communication.html | 24 ++ datafusion/genindex.html | 32 ++- datafusion/index.html | 8 + datafusion/objects.inv | Bin 1632 -> 1694 bytes datafusion/py-modindex.html | 5 + datafusion/python/api/dataframe.html | 10 + datafusion/python/api/execution_context.html | 10 + datafusion/python/api/expression.html | 10 + datafusion/python/api/functions.html | 10 + .../python/generated/datafusion.DataFrame.html | 10 + .../generated/datafusion.ExecutionContext.html | 10 + .../python/generated/datafusion.Expression.html | 10 + .../python/generated/datafusion.functions.html | 44 ++++ datafusion/search.html | 5 + datafusion/searchindex.js | 2 +- .../roadmap.html} | 250 ++++++++++++++------- datafusion/user-guide/example-usage.html | 14 +- datafusion/user-guide/library.html | 15 +- 26 files changed, 525 insertions(+), 99 deletions(-) diff --git a/datafusion/_modules/index.html b/datafusion/_modules/index.html index 7bcb0b0..0233d14 100644 --- a/datafusion/_modules/index.html +++ b/datafusion/_modules/index.html @@ -319,6 +319,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../specification/invariants.html"> DataFusion’s Invariants </a> @@ -392,6 +397,8 @@ <h1>All modules for which code is available</h1> <ul><li><a href="builtins.html">builtins</a></li> +<li><a href="datafusion/functions.html">datafusion.functions</a></li> +<li><a href="functions.html">functions</a></li> </ul> </div> diff --git a/datafusion/_sources/cli/index.rst.txt b/datafusion/_sources/cli/index.rst.txt index 93ae173..2b91430 100644 --- a/datafusion/_sources/cli/index.rst.txt +++ b/datafusion/_sources/cli/index.rst.txt @@ -53,7 +53,7 @@ Usage .. code-block:: bash - DataFusion 5.0.0-SNAPSHOT + DataFusion 5.1.0-SNAPSHOT DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries against CSV and Parquet files as well as querying directly against in-memory data. @@ -68,8 +68,10 @@ Usage OPTIONS: -c, --batch-size <batch-size> The batch size of each query, or use DataFusion default -p, --data-path <data-path> Path to your data, default to current directory - -f, --file <file> Execute commands from file, then exit + -f, --file <file>... Execute commands from file(s), then exit --format <format> Output format [default: table] [possible values: csv, tsv, table, json, ndjson] + --host <host> Ballista scheduler host + --port <port> Ballista scheduler port Type `exit` or `quit` to exit the CLI. diff --git a/datafusion/_sources/community/communication.md.txt b/datafusion/_sources/community/communication.md.txt index bbf07a1..7d8e58a 100644 --- a/datafusion/_sources/community/communication.md.txt +++ b/datafusion/_sources/community/communication.md.txt @@ -52,6 +52,23 @@ server ([invite link](https://discord.gg/Qw5gKqHxUM)) in case you are not able to join the Slack workspace. If you need an invite to the Slack workspace, you can also ask for one in our Discord server. +### Sync up Zoom calls + +We have biweekly sync calls every other Thursdays at 16:00 UTC +(starting September 30, 2021) on Zoom [Meeting Link](https://influxdata.zoom.us/j/94666921249) + +The[agenda](https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit) +is available if you would like to add a topic for discussion or see what is planned. + +The goals of these calls are: + +1. Help "put a face to the name" of some of other contributors we are working with +2. Discuss / synchronize on the goals and major initiatives from different stakeholders to identify areas where more alignment is needed + +No decisions are made on the call and anything of substance will be discussed on this mailing list or in github issues / google docs. + +We will send a summary of all sync ups to the d...@arrow.apache.org mailing list. + ## Contributing Our source code is hosted on diff --git a/datafusion/_sources/index.rst.txt b/datafusion/_sources/index.rst.txt index 6956d0b..bf6b250 100644 --- a/datafusion/_sources/index.rst.txt +++ b/datafusion/_sources/index.rst.txt @@ -52,6 +52,7 @@ Table of content :maxdepth: 1 :caption: Specification + specification/roadmap specification/invariants specification/output-field-name-semantic diff --git a/datafusion/_sources/specification/roadmap.md.txt b/datafusion/_sources/specification/roadmap.md.txt new file mode 100644 index 0000000..520815b --- /dev/null +++ b/datafusion/_sources/specification/roadmap.md.txt @@ -0,0 +1,99 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Roadmap + +This document describes high level goals of the DataFusion and +Ballista development community. It is not meant to restrict +possibilities, but rather help newcomers understand the broader +context of where the community is headed, and inspire +additional contributions. + +DataFusion and Ballista are part of the [Apache +Arrow](https://arrow.apache.org/) project and governed by the Apache +Software Foundation governance model. These projects are entirely +driven by volunteers, and we welcome contributions for items not on +this roadmap. However, before submitting a large PR, we strongly +suggest you start a coversation using a github issue or the +d...@arrow.apache.org mailing list to make review efficient and avoid +surprises. + +# DataFusion + +DataFusion's goal is to become the embedded query engine of choice +for new analytic applications, by leveraging the unique features of +[Rust](https://www.rust-lang.org/) and [Apache Arrow](https://arrow.apache.org/) +to provide: + +1. Best-in-class single node query performance +2. A Declarative SQL query interface compatible with PostgreSQL +3. A Dataframe API, similar to those offered by Pandas and Spark +4. A Procedural API for programatically creating and running execution plans +5. High performance, data race free, erogonomic extensibility points at at every layer + +## Additional SQL Language Features + +- Complete support list on [status](https://github.com/apache/arrow-datafusion/blob/master/README.md#status) +- Timestamp Arithmetic [#194](https://github.com/apache/arrow-datafusion/issues/194) +- SQL Parser extension point [#533](https://github.com/apache/arrow-datafusion/issues/533) +- Support for nested structures (fields, lists, structs) [#119](https://github.com/apache/arrow-datafusion/issues/119) +- Remaining Set Operators (`INTERSECT` / `EXCEPT`) [#1082](https://github.com/apache/arrow-datafusion/issues/1082) +- Run all queries from the TPCH benchmark (see [milestone](https://github.com/apache/arrow-datafusion/milestone/2) for more details) + +## Query Optimizer + +- Additional constant folding / partial evaluation [#1070](https://github.com/apache/arrow-datafusion/issues/1070) +- More sophisticated cost based optimizer for join ordering +- Implement advanced query optimization framework (Tokomak) #440 + +## Datasources + +- Better support for reading data from remote filesystems (e.g. S3) without caching it locally [#907](https://github.com/apache/arrow-datafusion/issues/907) [#1060](https://github.com/apache/arrow-datafusion/issues/1060) +- Support for partitioned datasources [#1139](https://github.com/apache/arrow-datafusion/issues/1139) and make the integration of other table formats (Delta, Iceberg...) simpler +- Improve performances of file format datasources (parallelize file listings, async Arrow readers, file chunk prefetching capability...) + +## Runtime / Infrastructure + +- Migrate to some sort of arrow2 based implementation (see [milestone](https://github.com/apache/arrow-datafusion/milestone/3) for more details) +- Add DataFusion to h2oai/db-benchmark [147](https://github.com/apache/arrow-datafusion/issues/147) +- Improve build time [348](https://github.com/apache/arrow-datafusion/issues/348) + +## Resource Management + +- Finer grain control and limit of runtime memory [#587](https://github.com/apache/arrow-datafusion/issues/587) and CPU usage [#54](https://github.com/apache/arrow-datafusion/issues/64) + +## Python Interface + +TBD + +## DataFusion CLI (`datafusion-cli`) + +Note: There are some additional thoughts on a datafusion-cli vision on [#1096](https://github.com/apache/arrow-datafusion/issues/1096#issuecomment-939418770). + +- Better abstraction between REPL parsing and queries so that commands are separated and handled correctly +- Connect to the `Statistics` subsystem and have the cli print out more stats for query debugging, etc. +- Improved error handling for interactive use and shell scripting usage +- publishing to apt, brew, and possible NuGet registry so that people can use it more easily +- adopt a shorter name, like dfcli? + +## Ballista + +# Vision + +TBD diff --git a/datafusion/_sources/user-guide/example-usage.md.txt b/datafusion/_sources/user-guide/example-usage.md.txt index 4280079..c09e1e8 100644 --- a/datafusion/_sources/user-guide/example-usage.md.txt +++ b/datafusion/_sources/user-guide/example-usage.md.txt @@ -23,8 +23,6 @@ Run a SQL query against data stored in a CSV: ```rust use datafusion::prelude::*; -use arrow::util::pretty::print_batches; -use arrow::record_batch::RecordBatch; #[tokio::main] async fn main() -> datafusion::error::Result<()> { @@ -45,8 +43,6 @@ Use the DataFrame API to process data stored in a CSV: ```rust use datafusion::prelude::*; -use arrow::util::pretty::print_batches; -use arrow::record_batch::RecordBatch; #[tokio::main] async fn main() -> datafusion::error::Result<()> { diff --git a/datafusion/_sources/user-guide/library.md.txt b/datafusion/_sources/user-guide/library.md.txt index bfaf741..f4c5083 100644 --- a/datafusion/_sources/user-guide/library.md.txt +++ b/datafusion/_sources/user-guide/library.md.txt @@ -38,9 +38,8 @@ worth noting that using the settings in the `[profile.release]` section will sig ```toml [dependencies] datafusion = { version = "5.0" , features = ["simd"]} -tokio = { version = "^1.0", features = ["macros", "rt", "rt-multi-thread"] } -snmalloc-rs = {version = "0.2", features= ["cache-friendly"]} -num_cpus = "1.0" +tokio = { version = "^1.0", features = ["rt-multi-thread"] } +snmalloc-rs = "0.2" [profile.release] lto = true diff --git a/datafusion/cli/index.html b/datafusion/cli/index.html index 07aed1b..0715064 100644 --- a/datafusion/cli/index.html +++ b/datafusion/cli/index.html @@ -321,6 +321,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../specification/invariants.html"> DataFusion’s Invariants </a> @@ -364,6 +369,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> @@ -464,7 +474,7 @@ docker run -it -v <span class="k">$(</span>your_data_location<span class="k">)</ </div> <div class="section" id="usage"> <h2>Usage<a class="headerlink" href="#usage" title="Permalink to this headline">¶</a></h2> -<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>DataFusion <span class="m">5</span>.0.0-SNAPSHOT +<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>DataFusion <span class="m">5</span>.1.0-SNAPSHOT DataFusion is an <span class="k">in</span>-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries against CSV and Parquet files as well as querying directly against <span class="k">in</span>-memory data. @@ -479,8 +489,10 @@ FLAGS: OPTIONS: -c, --batch-size <batch-size> The batch size of each query, or use DataFusion default -p, --data-path <data-path> Path to your data, default to current directory - -f, --file <file> Execute commands from file, <span class="k">then</span> <span class="nb">exit</span> + -f, --file <file>... Execute commands from file<span class="o">(</span>s<span class="o">)</span>, <span class="k">then</span> <span class="nb">exit</span> --format <format> Output format <span class="o">[</span>default: table<span class="o">]</span> <span class="o">[</span>possible values: csv, tsv, table, json, ndjson<span class="o">]</span> + --host <host> Ballista scheduler host + --port <port> Ballista scheduler port </pre></div> </div> <p>Type <cite>exit</cite> or <cite>quit</cite> to exit the CLI.</p> diff --git a/datafusion/community/communication.html b/datafusion/community/communication.html index 5b23a71..f06b978 100644 --- a/datafusion/community/communication.html +++ b/datafusion/community/communication.html @@ -320,6 +320,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../specification/invariants.html"> DataFusion’s Invariants </a> @@ -404,6 +409,11 @@ Slack and Discord </a> </li> + <li class="toc-h3 nav-item toc-entry"> + <a class="reference internal nav-link" href="#sync-up-zoom-calls"> + Sync up Zoom calls + </a> + </li> </ul> </li> <li class="toc-h2 nav-item toc-entry"> @@ -488,6 +498,20 @@ server (<a class="reference external" href="https://discord.gg/Qw5gKqHxUM">invit to join the Slack workspace. If you need an invite to the Slack workspace, you can also ask for one in our Discord server.</p> </div> +<div class="section" id="sync-up-zoom-calls"> +<h3>Sync up Zoom calls<a class="headerlink" href="#sync-up-zoom-calls" title="Permalink to this headline">¶</a></h3> +<p>We have biweekly sync calls every other Thursdays at 16:00 UTC +(starting September 30, 2021) on Zoom <a class="reference external" href="https://influxdata.zoom.us/j/94666921249">Meeting Link</a></p> +<p>The<a class="reference external" href="https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit">agenda</a> +is available if you would like to add a topic for discussion or see what is planned.</p> +<p>The goals of these calls are:</p> +<ol class="simple"> +<li><p>Help “put a face to the name” of some of other contributors we are working with</p></li> +<li><p>Discuss / synchronize on the goals and major initiatives from different stakeholders to identify areas where more alignment is needed</p></li> +</ol> +<p>No decisions are made on the call and anything of substance will be discussed on this mailing list or in github issues / google docs.</p> +<p>We will send a summary of all sync ups to the dev@arrow.apache.org mailing list.</p> +</div> </div> <div class="section" id="contributing"> <h2>Contributing<a class="headerlink" href="#contributing" title="Permalink to this headline">¶</a></h2> diff --git a/datafusion/genindex.html b/datafusion/genindex.html index e1f44b7..fe789e5 100644 --- a/datafusion/genindex.html +++ b/datafusion/genindex.html @@ -320,6 +320,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="specification/invariants.html"> DataFusion’s Invariants </a> @@ -412,6 +417,7 @@ | <a href="#S"><strong>S</strong></a> | <a href="#T"><strong>T</strong></a> | <a href="#U"><strong>U</strong></a> + | <a href="#V"><strong>V</strong></a> </div> <h2 id="_">_</h2> @@ -439,6 +445,8 @@ </li> <li><a href="python/generated/datafusion.Expression.html#datafusion.Expression.alias">alias() (datafusion.Expression method)</a> </li> + <li><a href="python/generated/datafusion.functions.html#datafusion.functions.approx_distinct">approx_distinct() (in module datafusion.functions)</a> +</li> </ul></td> <td style="width: 33%; vertical-align: top;"><ul> <li><a href="python/generated/datafusion.functions.html#datafusion.functions.array">array() (in module datafusion.functions)</a> @@ -503,6 +511,8 @@ <td style="width: 33%; vertical-align: top;"><ul> <li><a href="python/generated/datafusion.functions.html#module-datafusion.functions">datafusion.functions (module)</a> </li> + <li><a href="python/generated/datafusion.functions.html#datafusion.functions.digest">digest() (in module datafusion.functions)</a> +</li> </ul></td> </tr></table> @@ -535,10 +545,12 @@ <h2 id="I">I</h2> <table style="width: 100%" class="indextable genindextable"><tr> <td style="width: 33%; vertical-align: top;"><ul> - <li><a href="python/generated/datafusion.functions.html#datafusion.functions.in_list">in_list() (in module datafusion.functions)</a> + <li><a href="python/generated/datafusion.functions.html#datafusion.functions.Volatility.immutable">immutable() (datafusion.functions.Volatility static method)</a> </li> </ul></td> <td style="width: 33%; vertical-align: top;"><ul> + <li><a href="python/generated/datafusion.functions.html#datafusion.functions.in_list">in_list() (in module datafusion.functions)</a> +</li> <li><a href="python/generated/datafusion.functions.html#datafusion.functions.initcap">initcap() (in module datafusion.functions)</a> </li> </ul></td> @@ -661,20 +673,22 @@ </li> <li><a href="python/generated/datafusion.functions.html#datafusion.functions.sin">sin() (in module datafusion.functions)</a> </li> - </ul></td> - <td style="width: 33%; vertical-align: top;"><ul> <li><a href="python/generated/datafusion.DataFrame.html#datafusion.DataFrame.sort">sort() (datafusion.DataFrame method)</a> <ul> <li><a href="python/generated/datafusion.Expression.html#datafusion.Expression.sort">(datafusion.Expression method)</a> </li> </ul></li> + </ul></td> + <td style="width: 33%; vertical-align: top;"><ul> <li><a href="python/generated/datafusion.functions.html#datafusion.functions.split_part">split_part() (in module datafusion.functions)</a> </li> <li><a href="python/generated/datafusion.ExecutionContext.html#datafusion.ExecutionContext.sql">sql() (datafusion.ExecutionContext method)</a> </li> <li><a href="python/generated/datafusion.functions.html#datafusion.functions.sqrt">sqrt() (in module datafusion.functions)</a> </li> + <li><a href="python/generated/datafusion.functions.html#datafusion.functions.Volatility.stable">stable() (datafusion.functions.Volatility static method)</a> +</li> <li><a href="python/generated/datafusion.functions.html#datafusion.functions.starts_with">starts_with() (in module datafusion.functions)</a> </li> <li><a href="python/generated/datafusion.functions.html#datafusion.functions.strpos">strpos() (in module datafusion.functions)</a> @@ -720,6 +734,18 @@ </ul></td> </tr></table> +<h2 id="V">V</h2> +<table style="width: 100%" class="indextable genindextable"><tr> + <td style="width: 33%; vertical-align: top;"><ul> + <li><a href="python/generated/datafusion.functions.html#datafusion.functions.Volatility.volatile">volatile() (datafusion.functions.Volatility static method)</a> +</li> + </ul></td> + <td style="width: 33%; vertical-align: top;"><ul> + <li><a href="python/generated/datafusion.functions.html#datafusion.functions.Volatility">Volatility (class in datafusion.functions)</a> +</li> + </ul></td> +</tr></table> + </div> diff --git a/datafusion/index.html b/datafusion/index.html index 87c1527..8a69e67 100644 --- a/datafusion/index.html +++ b/datafusion/index.html @@ -320,6 +320,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="specification/invariants.html"> DataFusion’s Invariants </a> @@ -451,6 +456,9 @@ <div class="toctree-wrapper compound" id="toc-specs"> <p class="caption"><span class="caption-text">Specification</span><a class="headerlink" href="#toc-specs" title="Permalink to this toctree">¶</a></p> <ul> +<li class="toctree-l1"><a class="reference internal" href="specification/roadmap.html">Roadmap</a></li> +<li class="toctree-l1"><a class="reference internal" href="specification/roadmap.html#datafusion">DataFusion</a></li> +<li class="toctree-l1"><a class="reference internal" href="specification/roadmap.html#vision">Vision</a></li> <li class="toctree-l1"><a class="reference internal" href="specification/invariants.html">DataFusion’s Invariants</a></li> <li class="toctree-l1"><a class="reference internal" href="specification/output-field-name-semantic.html">Datafusion output field name semantic</a></li> </ul> diff --git a/datafusion/objects.inv b/datafusion/objects.inv index 75f7edc..630ef29 100644 Binary files a/datafusion/objects.inv and b/datafusion/objects.inv differ diff --git a/datafusion/py-modindex.html b/datafusion/py-modindex.html index d5bd232..ebc4985 100644 --- a/datafusion/py-modindex.html +++ b/datafusion/py-modindex.html @@ -322,6 +322,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="specification/invariants.html"> DataFusion’s Invariants </a> diff --git a/datafusion/python/api/dataframe.html b/datafusion/python/api/dataframe.html index 651ddaf..9965b9a 100644 --- a/datafusion/python/api/dataframe.html +++ b/datafusion/python/api/dataframe.html @@ -273,6 +273,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../../specification/invariants.html"> DataFusion’s Invariants </a> @@ -316,6 +321,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> diff --git a/datafusion/python/api/execution_context.html b/datafusion/python/api/execution_context.html index 95f538c..b0580c2 100644 --- a/datafusion/python/api/execution_context.html +++ b/datafusion/python/api/execution_context.html @@ -273,6 +273,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../../specification/invariants.html"> DataFusion’s Invariants </a> @@ -316,6 +321,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> diff --git a/datafusion/python/api/expression.html b/datafusion/python/api/expression.html index c1ef4b0..1e9ab9a 100644 --- a/datafusion/python/api/expression.html +++ b/datafusion/python/api/expression.html @@ -273,6 +273,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../../specification/invariants.html"> DataFusion’s Invariants </a> @@ -316,6 +321,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> diff --git a/datafusion/python/api/functions.html b/datafusion/python/api/functions.html index 342d708..f771b80 100644 --- a/datafusion/python/api/functions.html +++ b/datafusion/python/api/functions.html @@ -273,6 +273,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../../specification/invariants.html"> DataFusion’s Invariants </a> @@ -316,6 +321,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> diff --git a/datafusion/python/generated/datafusion.DataFrame.html b/datafusion/python/generated/datafusion.DataFrame.html index bd238c2..e03283a 100644 --- a/datafusion/python/generated/datafusion.DataFrame.html +++ b/datafusion/python/generated/datafusion.DataFrame.html @@ -273,6 +273,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../../specification/invariants.html"> DataFusion’s Invariants </a> @@ -316,6 +321,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> diff --git a/datafusion/python/generated/datafusion.ExecutionContext.html b/datafusion/python/generated/datafusion.ExecutionContext.html index 547bdb4..0b4078c 100644 --- a/datafusion/python/generated/datafusion.ExecutionContext.html +++ b/datafusion/python/generated/datafusion.ExecutionContext.html @@ -273,6 +273,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../../specification/invariants.html"> DataFusion’s Invariants </a> @@ -316,6 +321,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> diff --git a/datafusion/python/generated/datafusion.Expression.html b/datafusion/python/generated/datafusion.Expression.html index b2cb1db..1809823 100644 --- a/datafusion/python/generated/datafusion.Expression.html +++ b/datafusion/python/generated/datafusion.Expression.html @@ -273,6 +273,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../../specification/invariants.html"> DataFusion’s Invariants </a> @@ -316,6 +321,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> diff --git a/datafusion/python/generated/datafusion.functions.html b/datafusion/python/generated/datafusion.functions.html index b9db979..6229760 100644 --- a/datafusion/python/generated/datafusion.functions.html +++ b/datafusion/python/generated/datafusion.functions.html @@ -273,6 +273,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../../specification/invariants.html"> DataFusion’s Invariants </a> @@ -316,6 +321,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> @@ -560,6 +570,27 @@ </tr> </tbody> </table> +<dl class="class"> +<dt id="datafusion.functions.Volatility"> +<em class="property">class </em><code class="sig-prename descclassname">datafusion.functions.</code><code class="sig-name descname">Volatility</code><a class="headerlink" href="#datafusion.functions.Volatility" title="Permalink to this definition">¶</a></dt> +<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p> +<dl class="method"> +<dt id="datafusion.functions.Volatility.immutable"> +<em class="property">static </em><code class="sig-name descname">immutable</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#datafusion.functions.Volatility.immutable" title="Permalink to this definition">¶</a></dt> +<dd></dd></dl> + +<dl class="method"> +<dt id="datafusion.functions.Volatility.stable"> +<em class="property">static </em><code class="sig-name descname">stable</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#datafusion.functions.Volatility.stable" title="Permalink to this definition">¶</a></dt> +<dd></dd></dl> + +<dl class="method"> +<dt id="datafusion.functions.Volatility.volatile"> +<em class="property">static </em><code class="sig-name descname">volatile</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#datafusion.functions.Volatility.volatile" title="Permalink to this definition">¶</a></dt> +<dd></dd></dl> + +</dd></dl> + <dl class="function"> <dt id="datafusion.functions.abs"> <code class="sig-prename descclassname">datafusion.functions.</code><code class="sig-name descname">abs</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#datafusion.functions.abs" title="Permalink to this definition">¶</a></dt> @@ -571,6 +602,12 @@ <dd></dd></dl> <dl class="function"> +<dt id="datafusion.functions.approx_distinct"> +<code class="sig-prename descclassname">datafusion.functions.</code><code class="sig-name descname">approx_distinct</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#datafusion.functions.approx_distinct" title="Permalink to this definition">¶</a></dt> +<dd><p>This function is not documented yet</p> +</dd></dl> + +<dl class="function"> <dt id="datafusion.functions.array"> <code class="sig-prename descclassname">datafusion.functions.</code><code class="sig-name descname">array</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#datafusion.functions.array" title="Permalink to this definition">¶</a></dt> <dd></dd></dl> @@ -737,6 +774,13 @@ NULL arguments are ignored.</p> </dd></dl> <dl class="function"> +<dt id="datafusion.functions.digest"> +<code class="sig-prename descclassname">datafusion.functions.</code><code class="sig-name descname">digest</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#datafusion.functions.digest" title="Permalink to this definition">¶</a></dt> +<dd><p>Computes a binary hash of the given data. type is the algorithm to use. +Standard algorithms are md5, sha224, sha256, sha384, sha512, blake2s, blake2b, and blake3.</p> +</dd></dl> + +<dl class="function"> <dt id="datafusion.functions.min"> <code class="sig-prename descclassname">datafusion.functions.</code><code class="sig-name descname">min</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#datafusion.functions.min" title="Permalink to this definition">¶</a></dt> <dd><p>This function is not documented yet</p> diff --git a/datafusion/search.html b/datafusion/search.html index 497f011..ff49897 100644 --- a/datafusion/search.html +++ b/datafusion/search.html @@ -324,6 +324,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="specification/invariants.html"> DataFusion’s Invariants </a> diff --git a/datafusion/searchindex.js b/datafusion/searchindex.js index 6ed70da..58a3f9c 100644 --- a/datafusion/searchindex.js +++ b/datafusion/searchindex.js @@ -1 +1 @@ -Search.setIndex({docnames:["cli/index","community/communication","index","python/api","python/api/dataframe","python/api/execution_context","python/api/expression","python/api/functions","python/generated/datafusion.DataFrame","python/generated/datafusion.ExecutionContext","python/generated/datafusion.Expression","python/generated/datafusion.functions","python/index","specification/invariants","specification/output-field-name-semantic","user-guide/cli","user-guide/distributed/clients/ind [...] \ No newline at end of file +Search.setIndex({docnames:["cli/index","community/communication","index","python/api","python/api/dataframe","python/api/execution_context","python/api/expression","python/api/functions","python/generated/datafusion.DataFrame","python/generated/datafusion.ExecutionContext","python/generated/datafusion.Expression","python/generated/datafusion.functions","python/index","specification/invariants","specification/output-field-name-semantic","specification/roadmap","user-guide/cli","user-guide [...] \ No newline at end of file diff --git a/datafusion/community/communication.html b/datafusion/specification/roadmap.html similarity index 56% copy from datafusion/community/communication.html copy to datafusion/specification/roadmap.html index 5b23a71..a8178ef 100644 --- a/datafusion/community/communication.html +++ b/datafusion/specification/roadmap.html @@ -4,7 +4,7 @@ <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8" /> - <title>Communication — Arrow Datafusion documentation</title> + <title>Roadmap — Arrow Datafusion documentation</title> <link href="../_static/css/theme.css" rel="stylesheet" /> <link href="../_static/css/index.c5995385ac14fb8791e8eb36b4908be2.css" rel="stylesheet" /> @@ -34,7 +34,8 @@ <script src="../_static/language_data.js"></script> <link rel="index" title="Index" href="../genindex.html" /> <link rel="search" title="Search" href="../search.html" /> - <link rel="prev" title="Datafusion output field name semantic" href="../specification/output-field-name-semantic.html" /> + <link rel="next" title="DataFusion’s Invariants" href="invariants.html" /> + <link rel="prev" title="Frequently Asked Questions" href="../user-guide/faq.html" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="docsearch:language" content="en" /> @@ -318,14 +319,19 @@ Specification </span> </p> -<ul class="nav bd-sidenav"> +<ul class="current nav bd-sidenav"> + <li class="toctree-l1 current active"> + <a class="current reference internal" href="#"> + Roadmap + </a> + </li> <li class="toctree-l1"> - <a class="reference internal" href="../specification/invariants.html"> + <a class="reference internal" href="invariants.html"> DataFusion’s Invariants </a> </li> <li class="toctree-l1"> - <a class="reference internal" href="../specification/output-field-name-semantic.html"> + <a class="reference internal" href="output-field-name-semantic.html"> Datafusion output field name semantic </a> </li> @@ -352,9 +358,9 @@ Community </span> </p> -<ul class="current nav bd-sidenav"> - <li class="toctree-l1 current active"> - <a class="current reference internal" href="#"> +<ul class="nav bd-sidenav"> + <li class="toctree-l1"> + <a class="reference internal" href="../community/communication.html"> Communication </a> </li> @@ -389,26 +395,67 @@ <nav id="bd-toc-nav"> <ul class="visible nav section-nav flex-column"> - <li class="toc-h2 nav-item toc-entry"> - <a class="reference internal nav-link" href="#questions"> - Questions? + <li class="toc-h1 nav-item toc-entry"> + <a class="reference internal nav-link" href="#"> + Roadmap + </a> + </li> + <li class="toc-h1 nav-item toc-entry"> + <a class="reference internal nav-link" href="#datafusion"> + DataFusion </a> - <ul class="nav section-nav flex-column"> - <li class="toc-h3 nav-item toc-entry"> - <a class="reference internal nav-link" href="#mailing-list"> - Mailing list + <ul class="visible nav section-nav flex-column"> + <li class="toc-h2 nav-item toc-entry"> + <a class="reference internal nav-link" href="#additional-sql-language-features"> + Additional SQL Language Features + </a> + </li> + <li class="toc-h2 nav-item toc-entry"> + <a class="reference internal nav-link" href="#query-optimizer"> + Query Optimizer + </a> + </li> + <li class="toc-h2 nav-item toc-entry"> + <a class="reference internal nav-link" href="#datasources"> + Datasources + </a> + </li> + <li class="toc-h2 nav-item toc-entry"> + <a class="reference internal nav-link" href="#runtime-infrastructure"> + Runtime / Infrastructure </a> </li> - <li class="toc-h3 nav-item toc-entry"> - <a class="reference internal nav-link" href="#slack-and-discord"> - Slack and Discord + <li class="toc-h2 nav-item toc-entry"> + <a class="reference internal nav-link" href="#resource-management"> + Resource Management + </a> + </li> + <li class="toc-h2 nav-item toc-entry"> + <a class="reference internal nav-link" href="#python-interface"> + Python Interface + </a> + </li> + <li class="toc-h2 nav-item toc-entry"> + <a class="reference internal nav-link" href="#datafusion-cli-datafusion-cli"> + DataFusion CLI ( + <code class="docutils literal notranslate"> + <span class="pre"> + datafusion-cli + </span> + </code> + ) + </a> + </li> + <li class="toc-h2 nav-item toc-entry"> + <a class="reference internal nav-link" href="#ballista"> + Ballista </a> </li> </ul> </li> - <li class="toc-h2 nav-item toc-entry"> - <a class="reference internal nav-link" href="#contributing"> - Contributing + <li class="toc-h1 nav-item toc-entry"> + <a class="reference internal nav-link" href="#vision"> + Vision </a> </li> </ul> @@ -420,7 +467,7 @@ <div class="tocsection editthispage"> - <a href="https://github.com/apache/arrow-datafusion/edit/master/docs/source/community/communication.md"> + <a href="https://github.com/apache/arrow-datafusion/edit/master/docs/source/specification/roadmap.md"> <i class="fas fa-pencil-alt"></i> Edit this page </a> </div> @@ -439,68 +486,116 @@ <div> - <!--- - Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at + <!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. --> -<div class="section" id="communication"> -<h1>Communication<a class="headerlink" href="#communication" title="Permalink to this headline">¶</a></h1> -<p>We welcome participation from everyone and encourage you to join us, ask -questions, and get involved.</p> -<p>All participation in the Apache Arrow DataFusion project is governed by the -Apache Software Foundation’s <a class="reference external" href="https://www.apache.org/foundation/policies/conduct.html">code of -conduct</a>.</p> -<div class="section" id="questions"> -<h2>Questions?<a class="headerlink" href="#questions" title="Permalink to this headline">¶</a></h2> -<div class="section" id="mailing-list"> -<h3>Mailing list<a class="headerlink" href="#mailing-list" title="Permalink to this headline">¶</a></h3> -<p>We use arrow.apache.org’s <code class="docutils literal notranslate"><span class="pre">dev@</span></code> mailing list for project management, release -coorindation and design discussions -(<a class="reference external" href="mailto:dev-subscribe%40arrow.apache.org">subscribe</a>, -<a class="reference external" href="mailto:dev-unsubscribe%40arrow.apache.org">unsubscribe</a>, -<a class="reference external" href="https://lists.apache.org/list.html?dev@arrow.apache.org">archives</a>).</p> -<p>When emailing the dev list, please make sure to prefix the subject line with a -<code class="docutils literal notranslate"><span class="pre">[DataFusion]</span></code> tag, e.g. <code class="docutils literal notranslate"><span class="pre">"[DataFusion]</span> <span class="pre">New</span> <span class="pre">API</span> <span class="pre">for</span> <span class="pre">remote</span> <span class="pre">data</span> <span class="pre">sources"</span></code>, so -that the appropriate people in the Apache Arrow community notice the message.</p> +<div class="section" id="roadmap"> +<h1>Roadmap<a class="headerlink" href="#roadmap" title="Permalink to this headline">¶</a></h1> +<p>This document describes high level goals of the DataFusion and +Ballista development community. It is not meant to restrict +possibilities, but rather help newcomers understand the broader +context of where the community is headed, and inspire +additional contributions.</p> +<p>DataFusion and Ballista are part of the <a class="reference external" href="https://arrow.apache.org/">Apache +Arrow</a> project and governed by the Apache +Software Foundation governance model. These projects are entirely +driven by volunteers, and we welcome contributions for items not on +this roadmap. However, before submitting a large PR, we strongly +suggest you start a coversation using a github issue or the +dev@arrow.apache.org mailing list to make review efficient and avoid +surprises.</p> +</div> +<div class="section" id="datafusion"> +<h1>DataFusion<a class="headerlink" href="#datafusion" title="Permalink to this headline">¶</a></h1> +<p>DataFusion’s goal is to become the embedded query engine of choice +for new analytic applications, by leveraging the unique features of +<a class="reference external" href="https://www.rust-lang.org/">Rust</a> and <a class="reference external" href="https://arrow.apache.org/">Apache Arrow</a> +to provide:</p> +<ol class="simple"> +<li><p>Best-in-class single node query performance</p></li> +<li><p>A Declarative SQL query interface compatible with PostgreSQL</p></li> +<li><p>A Dataframe API, similar to those offered by Pandas and Spark</p></li> +<li><p>A Procedural API for programatically creating and running execution plans</p></li> +<li><p>High performance, data race free, erogonomic extensibility points at at every layer</p></li> +</ol> +<div class="section" id="additional-sql-language-features"> +<h2>Additional SQL Language Features<a class="headerlink" href="#additional-sql-language-features" title="Permalink to this headline">¶</a></h2> +<ul class="simple"> +<li><p>Complete support list on <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/README.md#status">status</a></p></li> +<li><p>Timestamp Arithmetic <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/194">#194</a></p></li> +<li><p>SQL Parser extension point <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/533">#533</a></p></li> +<li><p>Support for nested structures (fields, lists, structs) <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/119">#119</a></p></li> +<li><p>Remaining Set Operators (<code class="docutils literal notranslate"><span class="pre">INTERSECT</span></code> / <code class="docutils literal notranslate"><span class="pre">EXCEPT</span></code>) <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/1082">#1082</a></p></li> +<li><p>Run all queries from the TPCH benchmark (see <a class="reference external" href="https://github.com/apache/arrow-datafusion/milestone/2">milestone</a> for more details)</p></li> +</ul> +</div> +<div class="section" id="query-optimizer"> +<h2>Query Optimizer<a class="headerlink" href="#query-optimizer" title="Permalink to this headline">¶</a></h2> +<ul class="simple"> +<li><p>Additional constant folding / partial evaluation <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/1070">#1070</a></p></li> +<li><p>More sophisticated cost based optimizer for join ordering</p></li> +<li><p>Implement advanced query optimization framework (Tokomak) #440</p></li> +</ul> +</div> +<div class="section" id="datasources"> +<h2>Datasources<a class="headerlink" href="#datasources" title="Permalink to this headline">¶</a></h2> +<ul class="simple"> +<li><p>Better support for reading data from remote filesystems (e.g. S3) without caching it locally <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/907">#907</a> <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/1060">#1060</a></p></li> +<li><p>Support for partitioned datasources <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/1139">#1139</a> and make the integration of other table formats (Delta, Iceberg…) simpler</p></li> +<li><p>Improve performances of file format datasources (parallelize file listings, async Arrow readers, file chunk prefetching capability…)</p></li> +</ul> +</div> +<div class="section" id="runtime-infrastructure"> +<h2>Runtime / Infrastructure<a class="headerlink" href="#runtime-infrastructure" title="Permalink to this headline">¶</a></h2> +<ul class="simple"> +<li><p>Migrate to some sort of arrow2 based implementation (see <a class="reference external" href="https://github.com/apache/arrow-datafusion/milestone/3">milestone</a> for more details)</p></li> +<li><p>Add DataFusion to h2oai/db-benchmark <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/147">147</a></p></li> +<li><p>Improve build time <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/348">348</a></p></li> +</ul> </div> -<div class="section" id="slack-and-discord"> -<h3>Slack and Discord<a class="headerlink" href="#slack-and-discord" title="Permalink to this headline">¶</a></h3> -<p>We use the official <a class="reference external" href="https://s.apache.org/slack-invite">ASF</a> Slack workspace -for informal discussions and coordination. This is a great place to meet other -contributors and get guidance on where to contribute. Join us in the -<code class="docutils literal notranslate"><span class="pre">#arrow-rust</span></code> channel.</p> -<p>We also have a backup Arrow Rust Discord -server (<a class="reference external" href="https://discord.gg/Qw5gKqHxUM">invite link</a>) in case you are not able -to join the Slack workspace. If you need an invite to the Slack workspace, you -can also ask for one in our Discord server.</p> +<div class="section" id="resource-management"> +<h2>Resource Management<a class="headerlink" href="#resource-management" title="Permalink to this headline">¶</a></h2> +<ul class="simple"> +<li><p>Finer grain control and limit of runtime memory <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/587">#587</a> and CPU usage <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/64">#54</a></p></li> +</ul> +</div> +<div class="section" id="python-interface"> +<h2>Python Interface<a class="headerlink" href="#python-interface" title="Permalink to this headline">¶</a></h2> +<p>TBD</p> +</div> +<div class="section" id="datafusion-cli-datafusion-cli"> +<h2>DataFusion CLI (<code class="docutils literal notranslate"><span class="pre">datafusion-cli</span></code>)<a class="headerlink" href="#datafusion-cli-datafusion-cli" title="Permalink to this headline">¶</a></h2> +<p>Note: There are some additional thoughts on a datafusion-cli vision on <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/1096#issuecomment-939418770">#1096</a>.</p> +<ul class="simple"> +<li><p>Better abstraction between REPL parsing and queries so that commands are separated and handled correctly</p></li> +<li><p>Connect to the <code class="docutils literal notranslate"><span class="pre">Statistics</span></code> subsystem and have the cli print out more stats for query debugging, etc.</p></li> +<li><p>Improved error handling for interactive use and shell scripting usage</p></li> +<li><p>publishing to apt, brew, and possible NuGet registry so that people can use it more easily</p></li> +<li><p>adopt a shorter name, like dfcli?</p></li> +</ul> </div> +<div class="section" id="ballista"> +<h2>Ballista<a class="headerlink" href="#ballista" title="Permalink to this headline">¶</a></h2> </div> -<div class="section" id="contributing"> -<h2>Contributing<a class="headerlink" href="#contributing" title="Permalink to this headline">¶</a></h2> -<p>Our source code is hosted on -<a class="reference external" href="https://github.com/apache/arrow-datafusion">GitHub</a>. For developers new to -the project, we have curated a -<a class="reference external" href="https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22">good-first-issue</a> -list to help you get started.</p> -<p>We use GitHub issues for maintaining a queue of development work and as the -public record. We often use Google docs, Github issues and pull requests for -quick and small design discussions. For major design change proposals, please -make sure to send them to the dev list for more visibility.</p> </div> +<div class="section" id="vision"> +<h1>Vision<a class="headerlink" href="#vision" title="Permalink to this headline">¶</a></h1> +<p>TBD</p> </div> @@ -509,7 +604,8 @@ make sure to send them to the dev list for more visibility.</p> <div class='prev-next-bottom'> - <a class='left-prev' id="prev-link" href="../specification/output-field-name-semantic.html" title="previous page">Datafusion output field name semantic</a> + <a class='left-prev' id="prev-link" href="../user-guide/faq.html" title="previous page">Frequently Asked Questions</a> + <a class='right-next' id="next-link" href="invariants.html" title="next page">DataFusion’s Invariants</a> </div> diff --git a/datafusion/user-guide/example-usage.html b/datafusion/user-guide/example-usage.html index 9a29bb5..34237d6 100644 --- a/datafusion/user-guide/example-usage.html +++ b/datafusion/user-guide/example-usage.html @@ -321,6 +321,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../specification/invariants.html"> DataFusion’s Invariants </a> @@ -364,6 +369,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> @@ -430,8 +440,6 @@ <h1>Example Usage<a class="headerlink" href="#example-usage" title="Permalink to this headline">¶</a></h1> <p>Run a SQL query against data stored in a CSV:</p> <div class="highlight-rust notranslate"><div class="highlight"><pre><span></span><span class="k">use</span><span class="w"> </span><span class="n">datafusion</span>::<span class="n">prelude</span>::<span class="o">*</span><span class="p">;</span><span class="w"></span> -<span class="k">use</span><span class="w"> </span><span class="n">arrow</span>::<span class="n">util</span>::<span class="n">pretty</span>::<span class="n">print_batches</span><span class="p">;</span><span class="w"></span> -<span class="k">use</span><span class="w"> </span><span class="n">arrow</span>::<span class="n">record_batch</span>::<span class="n">RecordBatch</span><span class="p">;</span><span class="w"></span> <span class="cp">#[tokio::main]</span><span class="w"></span> <span class="k">async</span><span class="w"> </span><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span>-> <span class="nc">datafusion</span>::<span class="n">error</span>::<span class="nb">Result</span><span class="o"><</span><span class="p">()</span><span class="o">></span><span class="w"> </span><span class="p">{</span><span class="w"></span> @@ -450,8 +458,6 @@ </div> <p>Use the DataFrame API to process data stored in a CSV:</p> <div class="highlight-rust notranslate"><div class="highlight"><pre><span></span><span class="k">use</span><span class="w"> </span><span class="n">datafusion</span>::<span class="n">prelude</span>::<span class="o">*</span><span class="p">;</span><span class="w"></span> -<span class="k">use</span><span class="w"> </span><span class="n">arrow</span>::<span class="n">util</span>::<span class="n">pretty</span>::<span class="n">print_batches</span><span class="p">;</span><span class="w"></span> -<span class="k">use</span><span class="w"> </span><span class="n">arrow</span>::<span class="n">record_batch</span>::<span class="n">RecordBatch</span><span class="p">;</span><span class="w"></span> <span class="cp">#[tokio::main]</span><span class="w"></span> <span class="k">async</span><span class="w"> </span><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span>-> <span class="nc">datafusion</span>::<span class="n">error</span>::<span class="nb">Result</span><span class="o"><</span><span class="p">()</span><span class="o">></span><span class="w"> </span><span class="p">{</span><span class="w"></span> diff --git a/datafusion/user-guide/library.html b/datafusion/user-guide/library.html index cc49c3c..abc081c 100644 --- a/datafusion/user-guide/library.html +++ b/datafusion/user-guide/library.html @@ -321,6 +321,11 @@ </p> <ul class="nav bd-sidenav"> <li class="toctree-l1"> + <a class="reference internal" href="../specification/roadmap.html"> + Roadmap + </a> + </li> + <li class="toctree-l1"> <a class="reference internal" href="../specification/invariants.html"> DataFusion’s Invariants </a> @@ -364,6 +369,11 @@ Issue tracker </a> </li> + <li class="toctree-l1"> + <a class="reference external" href="https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md"> + Code of conduct + </a> + </li> </ul> @@ -458,9 +468,8 @@ worth noting that using the settings in the <code class="docutils literal notranslate"><span class="pre">[profile.release]</span></code> section will significantly increase the build time.</p> <div class="highlight-toml notranslate"><div class="highlight"><pre><span></span><span class="k">[dependencies]</span> <span class="n">datafusion</span> <span class="o">=</span> <span class="p">{</span> <span class="n">version</span> <span class="o">=</span> <span class="s">"5.0"</span> <span class="p">,</span> <span class="n">features</span> <span class="o">=</span> <span class="p">[</span><span class="s">"simd"</span><span class="p">]}</span> -<span class="n">tokio</span> <span class="o">=</span> <span class="p">{</span> <span class="n">version</span> <span class="o">=</span> <span class="s">"^1.0"</span><span class="p">,</span> <span class="n">features</span> <span class="o">=</span> <span class="p">[</span><span class="s">"macros"</span><span class="p">,</span> <span class="s">"rt"</span><span class="p">,</span> <span class="s">"rt-multi-thread"</span><span class="p">]</span> <span cla [...] -<span class="n">snmalloc-rs</span> <span class="o">=</span> <span class="p">{</span><span class="n">version</span> <span class="o">=</span> <span class="s">"0.2"</span><span class="p">,</span> <span class="n">features</span><span class="o">=</span> <span class="p">[</span><span class="s">"cache-friendly"</span><span class="p">]}</span> -<span class="n">num_cpus</span> <span class="o">=</span> <span class="s">"1.0"</span> +<span class="n">tokio</span> <span class="o">=</span> <span class="p">{</span> <span class="n">version</span> <span class="o">=</span> <span class="s">"^1.0"</span><span class="p">,</span> <span class="n">features</span> <span class="o">=</span> <span class="p">[</span><span class="s">"rt-multi-thread"</span><span class="p">]</span> <span class="p">}</span> +<span class="n">snmalloc-rs</span> <span class="o">=</span> <span class="s">"0.2"</span> <span class="k">[profile.release]</span> <span class="n">lto</span> <span class="o">=</span> <span class="kc">true</span>