[jira] [Created] (ARROW-8563) Minor change to make newBuilder public
Amol Umbarkar created ARROW-8563: Summary: Minor change to make newBuilder public Key: ARROW-8563 URL: https://issues.apache.org/jira/browse/ARROW-8563 Project: Apache Arrow Issue Type: Improvement Components: Go Reporter: Amol Umbarkar This minor change makes newBuilder() public to reduce verbosity for downstream. To give you example, I am working on a parquet read / write into Arrow Record batch where the parquet data types are mapped to Arrow data types. My Repo: [https://github.com/mindhash/arrow-parquet-go] In such cases, it will be nice to have a builder API (newBuilder) be generic to accept a data type and return a respective array. I am looking at a similar situation for JSON reader. I think this change will make the builder API much easier for upstream as well as internal packages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Add "trivial" RecordBatch body compression to Arrow IPC protocol
My vote: +1 Best, Liya Fan On Thu, Apr 23, 2020 at 8:24 AM Wes McKinney wrote: > Hello, > > I have proposed adding a simple RecordBatch IPC message body > compression scheme (using either LZ4 or ZSTD) to the Arrow IPC > protocol in GitHub PR [1] as discussed on the mailing list [2]. This > is distinct from separate discussions about adding in-memory encodings > (like RLE-encoding) to the Arrow columnar format. > > This change is not forward compatible so it will not be safe to send > compressed messages to old libraries, but since we are still pre-1.0.0 > the consensus is that this is acceptable. We may separately consider > increasing the metadata version for 1.0.0 to require clients to > upgrade. > > Please vote whether to accept the addition. The vote will be open for > at least 72 hours. > > [ ] +1 Accept this addition to the IPC protocol > [ ] +0 > [ ] -1 Do not accept the changes because... > > Here is my vote: +1 > > Thanks, > Wes > > [1]: https://github.com/apache/arrow/pull/6707 > [2]: > https://lists.apache.org/thread.html/r58c9d23ad159644fca590d8f841df80d180b11bfb72f949d601d764b%40%3Cdev.arrow.apache.org%3E >
[VOTE] Add "trivial" RecordBatch body compression to Arrow IPC protocol
Hello, I have proposed adding a simple RecordBatch IPC message body compression scheme (using either LZ4 or ZSTD) to the Arrow IPC protocol in GitHub PR [1] as discussed on the mailing list [2]. This is distinct from separate discussions about adding in-memory encodings (like RLE-encoding) to the Arrow columnar format. This change is not forward compatible so it will not be safe to send compressed messages to old libraries, but since we are still pre-1.0.0 the consensus is that this is acceptable. We may separately consider increasing the metadata version for 1.0.0 to require clients to upgrade. Please vote whether to accept the addition. The vote will be open for at least 72 hours. [ ] +1 Accept this addition to the IPC protocol [ ] +0 [ ] -1 Do not accept the changes because... Here is my vote: +1 Thanks, Wes [1]: https://github.com/apache/arrow/pull/6707 [2]: https://lists.apache.org/thread.html/r58c9d23ad159644fca590d8f841df80d180b11bfb72f949d601d764b%40%3Cdev.arrow.apache.org%3E
[jira] [Created] (ARROW-8562) [C++] IO: Parameterize I/O coalescing using S3 storage metrics
Mayur Srivastava created ARROW-8562: --- Summary: [C++] IO: Parameterize I/O coalescing using S3 storage metrics Key: ARROW-8562 URL: https://issues.apache.org/jira/browse/ARROW-8562 Project: Apache Arrow Issue Type: Improvement Reporter: Mayur Srivastava Related to https://issues.apache.org/jira/browse/ARROW-7995 The adaptive I/O coalescing algorithm uses two parameters: 1. max_io_gap: Max I/O gap/hole size in bytes 2. ideal_request_size = Ideal I/O Request size in bytes These parameters can be derived from S3 metrics as described below: In an S3 compatible storage, there are two main metrics: 1. Seek-time or Time-To-First-Byte (TTFB) in seconds: call setup latency of a new S3 request 2. Transfer Bandwidth (BW) for data in bytes/sec 1. Computing max_io_gap: max_io_gap = TTFB * BW This is also called Bandwidth-Delay-Product (BDP). Two byte ranges that have a gap can still be mapped to the same read if the gap is less than the bandwidt-delay product [TTFB * TransferBandwidth], i.e. if the Time-To-First-Byte (or call setup latency of a new S3 request) is expected to be greater than just reading and discarding the extra bytes on an existing HTTP request. 2. Computing ideal_request_size: We want to have high bandwidth utilization per S3 connections, i.e. transfer large amounts of data to amortize the seek overhead. But, we also want to leverage parallelism by slicing very large IO chunks. We define two more config parameters with suggested default values to control the slice size and seek to balance the two effects with the goal of maximizing net data load performance. BW_util (ideal bandwidth utilization): This means what fraction of per connection bandwidth should be utilized to maximinze net data load. A good default value is 90% or 0.9. MAX_IDEAL_REQUEST_SIZE: This means what is the maximum single request size (in bytes) to maximinze net data load. A good default value is 64 MiB. The amount of data that needs to be transferred in a single S3 get_object request to achieve effective bandwidth eff_BW = BW_util * BW is as follows: eff_BW = ideal_request_size / (TTFB + ideal_request_size / BW) Substituting TTFB = max_io_gap / BW and eff_BW = BW_util * BW, we get the following result: ideal_request_size = max_io_gap * BW_util / (1 - BW_util) Applying the MAX_IDEAL_REQUEST_SIZE, we get the following: ideal_request_size = min(MAX_IDEAL_REQUEST_SIZE, max_io_gap * BW_util / (1 - BW_util)) The proposal is to create a named constructor in the io::CacheOptions (PR: [https://github.com/apache/arrow/pull/6744] created by [~lidavidm]) to compute max_io_gap and ideal_request_size from TTFB and BW which will then be passed to reader to configure the I/O coalescing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8561) [C++][Gandiva] Stop using deprecated google::protobuf::MessageLite::ByteSize()
Kouhei Sutou created ARROW-8561: --- Summary: [C++][Gandiva] Stop using deprecated google::protobuf::MessageLite::ByteSize() Key: ARROW-8561 URL: https://issues.apache.org/jira/browse/ARROW-8561 Project: Apache Arrow Issue Type: Improvement Components: C++ - Gandiva Reporter: Kouhei Sutou Assignee: Kouhei Sutou It's deprecated since Protobuf 3.4.0. https://github.com/protocolbuffers/protobuf/blob/v3.4.0/CHANGES.txt#L58-L59 {quote} * ByteSize() and SpaceUsed() are deprecated.Use ByteSizeLong() and SpaceUsedLong() instead {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8560) [Rust] Docs for MutableBuffer resize are incorrect
Paddy Horan created ARROW-8560: -- Summary: [Rust] Docs for MutableBuffer resize are incorrect Key: ARROW-8560 URL: https://issues.apache.org/jira/browse/ARROW-8560 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Paddy Horan Assignee: Paddy Horan -- This message was sent by Atlassian Jira (v8.3.4#803005)
[Discuss] [Rust] Common Trait(s) for iterating over RecordBatch's
Hi All, I just open ARROW-8559 [1] to consolidate the traits for Record Batch iterators. I feel this needs to be done prior to 1.0 as we need to be clear as to what external crates should implement to integrate with the Arrow ecosystem. This might be disruptive though so I wanted to bring it to the attention of the mailing list. Paddy [1] - https://issues.apache.org/jira/browse/ARROW-8559
[jira] [Created] (ARROW-8559) [Rust] Consolidate Record Batch iterator traits in main arrow crate
Paddy Horan created ARROW-8559: -- Summary: [Rust] Consolidate Record Batch iterator traits in main arrow crate Key: ARROW-8559 URL: https://issues.apache.org/jira/browse/ARROW-8559 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Paddy Horan Assignee: Paddy Horan We have the `BatchIterator` trait in DataFusion and the `RecordBatchReader` trait in the main arrow crate. They differ in that `BatchIterator` is Send + Sync. They should both be in the Arrow crate and be named `BatchIterator` and `SendableBatchIterator` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8558) [Rust] GitHub Actions missing rustfmt
Paddy Horan created ARROW-8558: -- Summary: [Rust] GitHub Actions missing rustfmt Key: ARROW-8558 URL: https://issues.apache.org/jira/browse/ARROW-8558 Project: Apache Arrow Issue Type: New Feature Components: CI, Rust Reporter: Paddy Horan Assignee: Neville Dipale -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release Apache Arrow 0.17.0 - RC0
arrow.apache.org. They looked like 30 minutes before you updated it :/ On Wed, Apr 22, 2020 at 11:14 AM Krisztián Szűcs wrote: > Which website? > > On Wed, Apr 22, 2020 at 7:31 PM Neal Richardson > wrote: > > > > On the post-release tasks, CRAN has accepted the 0.17 release. Homebrew > > hasn't yet accepted because on initial review, they didn't believe that > > we'd done the release because the website hadn't been updated yet. > > > > Neal > > > > On Wed, Apr 22, 2020 at 6:17 AM Wes McKinney > wrote: > > > > > FTR it seems that the compiler error on VS 2017 on Windows is showing > > > up elsewhere > > > > > > > > > > https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=147598=logs=e5cdccbf-4751-5a24-7406-185c9d30d021=1a956d00-e7ee-5aca-9899-9cb1f0a4d4a8=1045 > > > > > > If the -DCMAKE_UNITY_BUILD=ON workaround doesn't solve it we may have > > > to make a 0.17.1 release > > > > > > On Tue, Apr 21, 2020 at 9:45 AM Wes McKinney > wrote: > > > > > > > > It looks like the rebase-PR step didn't work correctly per Micah's > > > > comment (didn't work on my PR for ARROW-2714 either). Might want to > > > > look into why not > > > > > > > > On Tue, Apr 21, 2020 at 6:23 AM Krisztián Szűcs > > > > wrote: > > > > > > > > > > On Tue, Apr 21, 2020 at 4:28 AM Andy Grove > > > wrote: > > > > > > > > > > > > Well, I got trhe crates published, but there's a nasty workaround > > > for users > > > > > > that want to use these crates as a dependency and it means there > is > > > no real > > > > > > dependency management on the Flight protocol version. I think the > > > answer is > > > > > > that we need to publish the Flight.proto as part of the > arrow-flight > > > crate > > > > > > and make sure that version is used in the custom build script. > I'll > > > look at > > > > > > this again tomorrow and try and come up with a solution for the > next > > > > > > release. > > > > > Thanks for handling it Andy! > > > > > > > > > > It occasionally happens that typically dependency problems come of > > > with the > > > > > crates during the release. Can we automatize the testing of it? > > > > > > > > > > > > Here's the JIRA to track this specific issue. > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-8536 > > > > > I set it to critical for the next version. > > > > > > > > > > > > On Mon, Apr 20, 2020 at 7:49 PM Andy Grove < > andygrov...@gmail.com> > > > wrote: > > > > > > > > > > > > > I've run into issues publishing the Rust crates and I don't > think > > > I can > > > > > > > resolve this tonight. I am documenting the issue in > > > > > > > https://issues.apache.org/jira/browse/ARROW-8535 > > > > > > > > > > > > > > > > > > > > > On Mon, Apr 20, 2020 at 5:02 PM Krisztián Szűcs < > > > szucs.kriszt...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > >> Created a PR with updated docs. > > > > > > >> > > > > > > >> Conda post release task is left, it's a bit strange that the > > > conda-forge > > > > > > >> autotick bot has not created the version bump PRs yet. I'm > > > updating > > > > > > >> them manually tomorrow. > > > > > > >> > > > > > > >> 1. [x] rebase > > > > > > >> 2. [x] upload source > > > > > > >> 3. [x] upload binaries > > > > > > >> 4. [x] update website > > > > > > >> 5. [x] upload ruby gems > > > > > > >> 6. [x] upload js packages > > > > > > >> 8. [x] upload C# packages > > > > > > >> 9. [Andy] upload rust crates > > > > > > >> 10. [ ] update conda recipes > > > > > > >> 11. [x] upload wheels to pypi > > > > > > >> 12. [Neal] update homebrew packages > > > > > > >> 13. [x] update maven artifacts > > > > > > >> 14. [kou] update msys2 > > > > > > >> 15. [Neal] update R packages > > > > > > >> 16. [x] update docs > > > > > > >> > > > > > > >> I'm going to announce 0.17 once the site PRs get merged. > > > > > > >> > > > > > > > >> > > > > > > > >> > Thanks, > > > > > > >> > -- > > > > > > >> > kou > > > > > > >> > > > > > > > >> > In > > u+1f3...@mail.gmail.com> > > > > > > >> > "Re: [VOTE] Release Apache Arrow 0.17.0 - RC0" on Mon, 20 > Apr > > > 2020 > > > > > > >> 23:20:48 +0200, > > > > > > >> > Krisztián Szűcs wrote: > > > > > > >> > > > > > > > >> > > On Mon, Apr 20, 2020 at 11:17 PM Andy Grove < > > > andygrov...@gmail.com> > > > > > > >> wrote: > > > > > > >> > >> > > > > > > >> > >> Ok, I can look into this after work today (in about 3 > hours). > > > > > > >> > > Great, thanks! > > > > > > >> > > > > > > > > >> > > The current status is (`x` means done): > > > > > > >> > > > > > > > > >> > > 1. [x] rebase > > > > > > >> > > 2. [x] upload source > > > > > > >> > > 3. [x] upload binaries > > > > > > >> > > 4. [x] update website > > > > > > >> > > 5. [x] upload ruby gems > > > > > > >> > > 6. [x] upload js packages > > > > > > >> > > 8. [ ] upload C# crates > > > > > > >> > > 9. [Andy] upload rust crates > > > > > > >> > > 10. [ ] update conda recipes > > > > > > >> > > 11. [x] upload wheels to pypi > > > > > > >> > > 12. [Neal]
Re: [VOTE] Release Apache Arrow 0.17.0 - RC0
Which website? On Wed, Apr 22, 2020 at 7:31 PM Neal Richardson wrote: > > On the post-release tasks, CRAN has accepted the 0.17 release. Homebrew > hasn't yet accepted because on initial review, they didn't believe that > we'd done the release because the website hadn't been updated yet. > > Neal > > On Wed, Apr 22, 2020 at 6:17 AM Wes McKinney wrote: > > > FTR it seems that the compiler error on VS 2017 on Windows is showing > > up elsewhere > > > > > > https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=147598=logs=e5cdccbf-4751-5a24-7406-185c9d30d021=1a956d00-e7ee-5aca-9899-9cb1f0a4d4a8=1045 > > > > If the -DCMAKE_UNITY_BUILD=ON workaround doesn't solve it we may have > > to make a 0.17.1 release > > > > On Tue, Apr 21, 2020 at 9:45 AM Wes McKinney wrote: > > > > > > It looks like the rebase-PR step didn't work correctly per Micah's > > > comment (didn't work on my PR for ARROW-2714 either). Might want to > > > look into why not > > > > > > On Tue, Apr 21, 2020 at 6:23 AM Krisztián Szűcs > > > wrote: > > > > > > > > On Tue, Apr 21, 2020 at 4:28 AM Andy Grove > > wrote: > > > > > > > > > > Well, I got trhe crates published, but there's a nasty workaround > > for users > > > > > that want to use these crates as a dependency and it means there is > > no real > > > > > dependency management on the Flight protocol version. I think the > > answer is > > > > > that we need to publish the Flight.proto as part of the arrow-flight > > crate > > > > > and make sure that version is used in the custom build script. I'll > > look at > > > > > this again tomorrow and try and come up with a solution for the next > > > > > release. > > > > Thanks for handling it Andy! > > > > > > > > It occasionally happens that typically dependency problems come of > > with the > > > > crates during the release. Can we automatize the testing of it? > > > > > > > > > > Here's the JIRA to track this specific issue. > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-8536 > > > > I set it to critical for the next version. > > > > > > > > > > On Mon, Apr 20, 2020 at 7:49 PM Andy Grove > > wrote: > > > > > > > > > > > I've run into issues publishing the Rust crates and I don't think > > I can > > > > > > resolve this tonight. I am documenting the issue in > > > > > > https://issues.apache.org/jira/browse/ARROW-8535 > > > > > > > > > > > > > > > > > > On Mon, Apr 20, 2020 at 5:02 PM Krisztián Szűcs < > > szucs.kriszt...@gmail.com> > > > > > > wrote: > > > > > > > > > > > >> Created a PR with updated docs. > > > > > >> > > > > > >> Conda post release task is left, it's a bit strange that the > > conda-forge > > > > > >> autotick bot has not created the version bump PRs yet. I'm > > updating > > > > > >> them manually tomorrow. > > > > > >> > > > > > >> 1. [x] rebase > > > > > >> 2. [x] upload source > > > > > >> 3. [x] upload binaries > > > > > >> 4. [x] update website > > > > > >> 5. [x] upload ruby gems > > > > > >> 6. [x] upload js packages > > > > > >> 8. [x] upload C# packages > > > > > >> 9. [Andy] upload rust crates > > > > > >> 10. [ ] update conda recipes > > > > > >> 11. [x] upload wheels to pypi > > > > > >> 12. [Neal] update homebrew packages > > > > > >> 13. [x] update maven artifacts > > > > > >> 14. [kou] update msys2 > > > > > >> 15. [Neal] update R packages > > > > > >> 16. [x] update docs > > > > > >> > > > > > >> I'm going to announce 0.17 once the site PRs get merged. > > > > > >> > > > > > > >> > > > > > > >> > Thanks, > > > > > >> > -- > > > > > >> > kou > > > > > >> > > > > > > >> > In > u+1f3...@mail.gmail.com> > > > > > >> > "Re: [VOTE] Release Apache Arrow 0.17.0 - RC0" on Mon, 20 Apr > > 2020 > > > > > >> 23:20:48 +0200, > > > > > >> > Krisztián Szűcs wrote: > > > > > >> > > > > > > >> > > On Mon, Apr 20, 2020 at 11:17 PM Andy Grove < > > andygrov...@gmail.com> > > > > > >> wrote: > > > > > >> > >> > > > > > >> > >> Ok, I can look into this after work today (in about 3 hours). > > > > > >> > > Great, thanks! > > > > > >> > > > > > > > >> > > The current status is (`x` means done): > > > > > >> > > > > > > > >> > > 1. [x] rebase > > > > > >> > > 2. [x] upload source > > > > > >> > > 3. [x] upload binaries > > > > > >> > > 4. [x] update website > > > > > >> > > 5. [x] upload ruby gems > > > > > >> > > 6. [x] upload js packages > > > > > >> > > 8. [ ] upload C# crates > > > > > >> > > 9. [Andy] upload rust crates > > > > > >> > > 10. [ ] update conda recipes > > > > > >> > > 11. [x] upload wheels to pypi > > > > > >> > > 12. [Neal] update homebrew packages > > > > > >> > > 13. [x] update maven artifacts > > > > > >> > > 14. [ ] update msys2 > > > > > >> > > 15. [Neal] update R packages > > > > > >> > > 16. [Krisztian] update docs > > > > > >> > >> > > > > > >> > >> On Mon, Apr 20, 2020, 2:47 PM Krisztián Szűcs < > > > > > >> szucs.kriszt...@gmail.com> > > > > > >> > >> wrote: > > > > > >> > >> > > > > > >> > >> > Thanks Andy! I tried to upload the rust
[jira] [Created] (ARROW-8557) from pyarrow import parquet fails with AttributeError: type object 'pyarrow._parquet.Statistics' has no attribute '__reduce_cython__'
Haluk Tokgozoglu created ARROW-8557: --- Summary: from pyarrow import parquet fails with AttributeError: type object 'pyarrow._parquet.Statistics' has no attribute '__reduce_cython__' Key: ARROW-8557 URL: https://issues.apache.org/jira/browse/ARROW-8557 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.17.0, 0.16.0, 0.15.1 Environment: Python 3.8.4, GCC 4.8.4, Debian 8 Reporter: Haluk Tokgozoglu I have tried versions 0.15.1, 0.16.0, 0.17.0. Same error on all. I've seen in other issues that co-installations of tensorflow and numpy might be causing issues. I have tensorflow==1.14.0 and numpy==1.16.4 (and many other libraries, but I've read that those tend to cause issues) {{}} {code:java} from pyarrow import parquet ~/python/lib/python3.6/site-packages/pyarrow/parquet.py in 32 import pyarrow as pa 33 import pyarrow.lib as lib ---> 34 import pyarrow._parquet as _parquet 35 36 from pyarrow._parquet import (ParquetReader, Statistics, # noqa ~/python/lib/python3.6/site-packages/pyarrow/_parquet.pyx in init pyarrow._parquet() AttributeError: type object 'pyarrow._parquet.Statistics' has no attribute '__reduce_cython__' {code} {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release Apache Arrow 0.17.0 - RC0
On the post-release tasks, CRAN has accepted the 0.17 release. Homebrew hasn't yet accepted because on initial review, they didn't believe that we'd done the release because the website hadn't been updated yet. Neal On Wed, Apr 22, 2020 at 6:17 AM Wes McKinney wrote: > FTR it seems that the compiler error on VS 2017 on Windows is showing > up elsewhere > > > https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=147598=logs=e5cdccbf-4751-5a24-7406-185c9d30d021=1a956d00-e7ee-5aca-9899-9cb1f0a4d4a8=1045 > > If the -DCMAKE_UNITY_BUILD=ON workaround doesn't solve it we may have > to make a 0.17.1 release > > On Tue, Apr 21, 2020 at 9:45 AM Wes McKinney wrote: > > > > It looks like the rebase-PR step didn't work correctly per Micah's > > comment (didn't work on my PR for ARROW-2714 either). Might want to > > look into why not > > > > On Tue, Apr 21, 2020 at 6:23 AM Krisztián Szűcs > > wrote: > > > > > > On Tue, Apr 21, 2020 at 4:28 AM Andy Grove > wrote: > > > > > > > > Well, I got trhe crates published, but there's a nasty workaround > for users > > > > that want to use these crates as a dependency and it means there is > no real > > > > dependency management on the Flight protocol version. I think the > answer is > > > > that we need to publish the Flight.proto as part of the arrow-flight > crate > > > > and make sure that version is used in the custom build script. I'll > look at > > > > this again tomorrow and try and come up with a solution for the next > > > > release. > > > Thanks for handling it Andy! > > > > > > It occasionally happens that typically dependency problems come of > with the > > > crates during the release. Can we automatize the testing of it? > > > > > > > > Here's the JIRA to track this specific issue. > > > > > > > > https://issues.apache.org/jira/browse/ARROW-8536 > > > I set it to critical for the next version. > > > > > > > > On Mon, Apr 20, 2020 at 7:49 PM Andy Grove > wrote: > > > > > > > > > I've run into issues publishing the Rust crates and I don't think > I can > > > > > resolve this tonight. I am documenting the issue in > > > > > https://issues.apache.org/jira/browse/ARROW-8535 > > > > > > > > > > > > > > > On Mon, Apr 20, 2020 at 5:02 PM Krisztián Szűcs < > szucs.kriszt...@gmail.com> > > > > > wrote: > > > > > > > > > >> Created a PR with updated docs. > > > > >> > > > > >> Conda post release task is left, it's a bit strange that the > conda-forge > > > > >> autotick bot has not created the version bump PRs yet. I'm > updating > > > > >> them manually tomorrow. > > > > >> > > > > >> 1. [x] rebase > > > > >> 2. [x] upload source > > > > >> 3. [x] upload binaries > > > > >> 4. [x] update website > > > > >> 5. [x] upload ruby gems > > > > >> 6. [x] upload js packages > > > > >> 8. [x] upload C# packages > > > > >> 9. [Andy] upload rust crates > > > > >> 10. [ ] update conda recipes > > > > >> 11. [x] upload wheels to pypi > > > > >> 12. [Neal] update homebrew packages > > > > >> 13. [x] update maven artifacts > > > > >> 14. [kou] update msys2 > > > > >> 15. [Neal] update R packages > > > > >> 16. [x] update docs > > > > >> > > > > >> I'm going to announce 0.17 once the site PRs get merged. > > > > >> > > > > > >> > > > > > >> > Thanks, > > > > >> > -- > > > > >> > kou > > > > >> > > > > > >> > In u+1f3...@mail.gmail.com> > > > > >> > "Re: [VOTE] Release Apache Arrow 0.17.0 - RC0" on Mon, 20 Apr > 2020 > > > > >> 23:20:48 +0200, > > > > >> > Krisztián Szűcs wrote: > > > > >> > > > > > >> > > On Mon, Apr 20, 2020 at 11:17 PM Andy Grove < > andygrov...@gmail.com> > > > > >> wrote: > > > > >> > >> > > > > >> > >> Ok, I can look into this after work today (in about 3 hours). > > > > >> > > Great, thanks! > > > > >> > > > > > > >> > > The current status is (`x` means done): > > > > >> > > > > > > >> > > 1. [x] rebase > > > > >> > > 2. [x] upload source > > > > >> > > 3. [x] upload binaries > > > > >> > > 4. [x] update website > > > > >> > > 5. [x] upload ruby gems > > > > >> > > 6. [x] upload js packages > > > > >> > > 8. [ ] upload C# crates > > > > >> > > 9. [Andy] upload rust crates > > > > >> > > 10. [ ] update conda recipes > > > > >> > > 11. [x] upload wheels to pypi > > > > >> > > 12. [Neal] update homebrew packages > > > > >> > > 13. [x] update maven artifacts > > > > >> > > 14. [ ] update msys2 > > > > >> > > 15. [Neal] update R packages > > > > >> > > 16. [Krisztian] update docs > > > > >> > >> > > > > >> > >> On Mon, Apr 20, 2020, 2:47 PM Krisztián Szűcs < > > > > >> szucs.kriszt...@gmail.com> > > > > >> > >> wrote: > > > > >> > >> > > > > >> > >> > Thanks Andy! I tried to upload the rust packages but > arrow-flight, > > > > >> > >> > but a version pin is missing from the package tree: > > > > >> > >> > > > > > >> > >> > error: all dependencies must have a version specified when > > > > >> publishing. > > > > >> > >> > dependency `arrow-flight` does not specify a version > > > > >> > >> > > > > > >> > >> > Please upload
[jira] [Created] (ARROW-8556) [R] Installation fails with `LIBARROW_MINIMAL=false`
Karl Dunkle Werner created ARROW-8556: - Summary: [R] Installation fails with `LIBARROW_MINIMAL=false` Key: ARROW-8556 URL: https://issues.apache.org/jira/browse/ARROW-8556 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 0.17.0 Environment: Ubuntu 19.10 R 3.6.1 Reporter: Karl Dunkle Werner I would like to install the `arrow` R package on my Ubuntu 19.10 system. Prebuilt binaries are unavailable, and I want to enable compression, so I set the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks like the package is able to compile, but can't be loaded. I'm able to install correctly if I don't set the {{LIBARROW_MINIMAL}} variable. Here's the error I get: {code:java} ** testing if installed package can be loaded from temporary location Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so': ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: ZSTD_initCStream Error: loading failed Execution halted ERROR: loading failed * removing ‘~/.R/3.6/arrow’ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8555) [FlightRPC][Java] Implement Flight DoExchange for Java
David Li created ARROW-8555: --- Summary: [FlightRPC][Java] Implement Flight DoExchange for Java Key: ARROW-8555 URL: https://issues.apache.org/jira/browse/ARROW-8555 Project: Apache Arrow Issue Type: New Feature Components: FlightRPC Reporter: David Li Assignee: David Li As described in the mailing list vote. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: 0.17 release blog post: help needed
No, nothing major that needs to be copied over On Tue, Apr 21, 2020 at 5:15 PM Neal Richardson wrote: > > Hope you weren't editing the google doc while I was moving it to > https://github.com/apache/arrow-site/pull/55. If so, would you mind copying > over any relevant changes? > > Neal > > > On Tue, Apr 21, 2020 at 1:53 PM Wes McKinney wrote: > > > I did a few tweaks and cleanups. There are still a number of TODO > > items in this document. It would be good to finish (or remove) these > > so this can be published tomorrow or Thursday > > > > On Mon, Apr 20, 2020 at 7:47 AM Fan Liya wrote: > > > > > > I have added some Java items. > > > > > > Best, > > > Liya Fan > > > > > > On Mon, Apr 20, 2020 at 10:49 AM Kenta Murata wrote: > > > > > > > I've edited Ruby and C GLib parts. > > > > Kou and Shiro will check them later. > > > > > > > > 2020年4月20日(月) 11:09 Wes McKinney : > > > > > > > > > > I made a pass through the changelog and added a bunch of TODOs > > related > > > > > to C++. In general, as a reminder, in these blog posts since the > > > > > releases are growing large we should try to present as compact a high > > > > > level summary as possible to convey some of the highlights of our > > > > > labors (so likely not needed to write out any JIRA numbers, people > > can > > > > > look at the changelog for that). I'll spend some more time on the > > blog > > > > > post after others have had a chance to take a pass through > > > > > > > > > > On Sat, Apr 18, 2020 at 12:13 PM Neal Richardson > > > > > wrote: > > > > > > > > > > > > Hi all, > > > > > > Since it looks like we're close to releasing 0.17, we need to fill > > in > > > > the > > > > > > details for our blog post announcement. I've started a document > > here: > > > > > > > > > > > > https://docs.google.com/document/d/16UKZtvL49o8nCDN8JU3Ut6y76Y9d8-4qXv5vFv7aNvs/edit#heading=h.kqqacbm2lpv8 > > > > > > > > > > > > Please fill in the details for the parts of the project you're > > close > > > > to. > > > > > > I'll handle wrapping this up in the usual boilerplate when we're > > done. > > > > > > > > > > > > Thanks, > > > > > > Neal > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > Kenta Murata > > > > > >
[jira] [Created] (ARROW-8554) [C++][Benchmark] Fix building error "cannot bind lvalue"
Jiajia Li created ARROW-8554: Summary: [C++][Benchmark] Fix building error "cannot bind lvalue" Key: ARROW-8554 URL: https://issues.apache.org/jira/browse/ARROW-8554 Project: Apache Arrow Issue Type: Bug Components: Benchmarking Reporter: Jiajia Li When running the commads: ``` cmake -DARROW_BUILD_BENCHMARKS=ON .. make ``` with following error: bit_util_benchmark.cc:96:10: error: cannot bind ‘std::unique_ptr’ lvalue to ‘std::unique_ptr&&’ return buffer; -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release Apache Arrow 0.17.0 - RC0
FTR it seems that the compiler error on VS 2017 on Windows is showing up elsewhere https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=147598=logs=e5cdccbf-4751-5a24-7406-185c9d30d021=1a956d00-e7ee-5aca-9899-9cb1f0a4d4a8=1045 If the -DCMAKE_UNITY_BUILD=ON workaround doesn't solve it we may have to make a 0.17.1 release On Tue, Apr 21, 2020 at 9:45 AM Wes McKinney wrote: > > It looks like the rebase-PR step didn't work correctly per Micah's > comment (didn't work on my PR for ARROW-2714 either). Might want to > look into why not > > On Tue, Apr 21, 2020 at 6:23 AM Krisztián Szűcs > wrote: > > > > On Tue, Apr 21, 2020 at 4:28 AM Andy Grove wrote: > > > > > > Well, I got trhe crates published, but there's a nasty workaround for > > > users > > > that want to use these crates as a dependency and it means there is no > > > real > > > dependency management on the Flight protocol version. I think the answer > > > is > > > that we need to publish the Flight.proto as part of the arrow-flight crate > > > and make sure that version is used in the custom build script. I'll look > > > at > > > this again tomorrow and try and come up with a solution for the next > > > release. > > Thanks for handling it Andy! > > > > It occasionally happens that typically dependency problems come of with the > > crates during the release. Can we automatize the testing of it? > > > > > > Here's the JIRA to track this specific issue. > > > > > > https://issues.apache.org/jira/browse/ARROW-8536 > > I set it to critical for the next version. > > > > > > On Mon, Apr 20, 2020 at 7:49 PM Andy Grove wrote: > > > > > > > I've run into issues publishing the Rust crates and I don't think I can > > > > resolve this tonight. I am documenting the issue in > > > > https://issues.apache.org/jira/browse/ARROW-8535 > > > > > > > > > > > > On Mon, Apr 20, 2020 at 5:02 PM Krisztián Szűcs > > > > > > > > wrote: > > > > > > > >> Created a PR with updated docs. > > > >> > > > >> Conda post release task is left, it's a bit strange that the > > > >> conda-forge > > > >> autotick bot has not created the version bump PRs yet. I'm updating > > > >> them manually tomorrow. > > > >> > > > >> 1. [x] rebase > > > >> 2. [x] upload source > > > >> 3. [x] upload binaries > > > >> 4. [x] update website > > > >> 5. [x] upload ruby gems > > > >> 6. [x] upload js packages > > > >> 8. [x] upload C# packages > > > >> 9. [Andy] upload rust crates > > > >> 10. [ ] update conda recipes > > > >> 11. [x] upload wheels to pypi > > > >> 12. [Neal] update homebrew packages > > > >> 13. [x] update maven artifacts > > > >> 14. [kou] update msys2 > > > >> 15. [Neal] update R packages > > > >> 16. [x] update docs > > > >> > > > >> I'm going to announce 0.17 once the site PRs get merged. > > > >> > > > > >> > > > > >> > Thanks, > > > >> > -- > > > >> > kou > > > >> > > > > >> > In > > > >> > > > > >> > "Re: [VOTE] Release Apache Arrow 0.17.0 - RC0" on Mon, 20 Apr 2020 > > > >> 23:20:48 +0200, > > > >> > Krisztián Szűcs wrote: > > > >> > > > > >> > > On Mon, Apr 20, 2020 at 11:17 PM Andy Grove > > > >> wrote: > > > >> > >> > > > >> > >> Ok, I can look into this after work today (in about 3 hours). > > > >> > > Great, thanks! > > > >> > > > > > >> > > The current status is (`x` means done): > > > >> > > > > > >> > > 1. [x] rebase > > > >> > > 2. [x] upload source > > > >> > > 3. [x] upload binaries > > > >> > > 4. [x] update website > > > >> > > 5. [x] upload ruby gems > > > >> > > 6. [x] upload js packages > > > >> > > 8. [ ] upload C# crates > > > >> > > 9. [Andy] upload rust crates > > > >> > > 10. [ ] update conda recipes > > > >> > > 11. [x] upload wheels to pypi > > > >> > > 12. [Neal] update homebrew packages > > > >> > > 13. [x] update maven artifacts > > > >> > > 14. [ ] update msys2 > > > >> > > 15. [Neal] update R packages > > > >> > > 16. [Krisztian] update docs > > > >> > >> > > > >> > >> On Mon, Apr 20, 2020, 2:47 PM Krisztián Szűcs < > > > >> szucs.kriszt...@gmail.com> > > > >> > >> wrote: > > > >> > >> > > > >> > >> > Thanks Andy! I tried to upload the rust packages but > > > >> > >> > arrow-flight, > > > >> > >> > but a version pin is missing from the package tree: > > > >> > >> > > > > >> > >> > error: all dependencies must have a version specified when > > > >> publishing. > > > >> > >> > dependency `arrow-flight` does not specify a version > > > >> > >> > > > > >> > >> > Please upload the packages! > > > >> > >> > > > > >> > >> > Also added Uwe and Kou to the package owners. > > > >> > >> > > > > >> > >> > On Mon, Apr 20, 2020 at 10:24 PM Andy Grove > > > >> > >> > > > > >> wrote: > > > >> > >> > > > > > >> > >> > > You should have an invite for the arrow-flight crate. Please > > > >> check > > > >> > >> > > https://crates.io/me/pending-invites > > > >> > >> > > > > > >> > >> > > On Mon, Apr 20, 2020 at 2:10 PM Krisztián Szűcs < > > > >> > >> > szucs.kriszt...@gmail.com> > > > >> > >> > > wrote: > > > >> > >>
Re: [C++] Revamping approach to Arrow compute kernel development
On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield wrote: > > Hi Wes, > I haven't had time to read the doc, but wanted to ask some questions on > points raised on the thread. > > * For efficiency, kernels used for array-expr evaluation should write > > into preallocated memory as their default mode. This enables the > > interpreter to avoid temporary memory allocations and improve CPU > > cache utilization. Almost none of our kernels are implemented this way > > currently. > > Did something change, I was pretty sure I submitted a patch a while ago for > boolean kernels, that separated out memory allocation from computation. > Which should allow for writing to the same memory. Is this a concern with > the public Function APIs for the Kernel APIs themselves, or a lower level > implementation concern? Yes, you did in the internal implementation [1]. The concern is the public API and the general approach to implementing new kernels. I'm working on this right now (it's a large project so it will take me a little while to produce something to be reviewed) so bear with me =) [1]: https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76 > * Sorting is generally handled by different data processing nodes from > > Projections, Aggregations / Hash Aggregations, Filters, and Joins. > > Projections and Filters use expressions, they do not sort. > > Would sorting the list-column elements per row be an array-expr? Yes, as that's an element-wise function. When I said sorting I was referring to ORDER BY. The functions we have that do sorting do so in the context of a single array [2]. A query engine must be able to sort a (potentially very large) stream of record batches. One approach is for the Sort operator to exhaust its child input, accumulating all of the record batches in memory (spilling to disk as needed) and then sorting and emitting record batches from the sorted records/tuples. See e.g. Impala's sorting code [3] [4] [2]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34 [3]: https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h [4]: https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h > > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney wrote: > > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou wrote: > > > > > > > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit : > > > >> > > > >> That said, in the SortToIndices case, this wouldn't be a problem, > > since > > > >> only the second pass writes to the output. > > > > > > > > This kernel is not valid for normal array-exprs (see the spreadsheet I > > > > linked), such as what you can write in SQL > > > > > > > > Kernels like SortToIndices are a different type of function (in other > > > > words, "not a SQL function") and so if we choose to allow such a > > > > "non-SQL-like" functions in the expression evaluator then different > > > > logic must be used. > > > > > > Hmm, I think that maybe I'm misunderstanding at which level we're > > > talking here. SortToIndices() may not be a "SQL function", but it looks > > > like an important basic block for a query engine (since, after all, > > > sorting results is an often used feature in SQL and other languages). > > > So it should be usable *inside* the expression engine, even though it's > > > not part of the exposed vocabulary, no? > > > > No, not as part of "expressions" as they are defined in the context of > > SQL engines. > > > > Sorting is generally handled by different data processing nodes from > > Projections, Aggregations / Hash Aggregations, Filters, and Joins. > > Projections and Filters use expressions, they do not sort. > > > > > Regards > > > > > > Antoine. > >
Re: [C++] Big-endian support
hi Kazuaki On Wed, Apr 22, 2020 at 12:41 AM Kazuaki Ishizaki wrote: > > Thank you for your comments. I see that the developers would assist of > other parts, too. > > For developing OSS on big-endian, here are resource for an environment and > CI. They would be helpful for code review, too. > A trial zLinux VM for OSS development is available. Once we create a VM > with RHEL or SLES, it is available up to 120 days. The procedure to create > a VM is available at > https://github.com/linuxone-community-cloud/technical-resources/blob/master/deploy-virtual-server.md > . > Regarding CI, TravisCI on zLinux is available. The article is available at > https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z This is good to know. I think we will need you or one of your colleagues to contribute to the setup and maintenance of this in the project's CI infrastructure. > > Kazuaki Ishizaki, > > > > From: Wes McKinney > To: dev > Date: 2020/04/21 21:11 > Subject:[EXTERNAL] Re: [C++] Big-endian support > > > > I will add that I think big-endian support would be valuable so that > the library can be used everywhere, including more exotic mainframe > type systems like IBM Z. > > That said, the code review burden to other C++ developers is likely to > become significant, so a solo developer with access to big-endian > hardware submitting pull requests could be problematic since no one > else with close knowledge of the codebase has a need to support > big-endian. That said, if big-endian developers would assist with > other parts of the C++ project as a sort of "quid-pro-quo" to balance > the time spent on code review relating to big-endian that would be > helpful. > > On Mon, Apr 20, 2020 at 12:38 PM Antoine Pitrou > wrote: > > > > > > Hello, > > > > Recently some issues have been opened for big-endian support (i.e. > > support for big-endian *hosts*), and a couple patches submitted, thanks > > to Kazuaki Ishizaki. See e.g.: > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ARROW-2D8457=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=LDU6YozRdOdA8sz-N3IT2-1CDlHn6-VgsQhvAmqcjF0=wWPbfEjThpmG7B3LCiHadi28EXcx7v7yhYYAZ8p80cI= > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ARROW-2D8467=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=LDU6YozRdOdA8sz-N3IT2-1CDlHn6-VgsQhvAmqcjF0=xuVttzSLurzBSLUpBFdnMwWtZ7rKCbEcgjCYm72K2QY= > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ARROW-2D8486=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=LDU6YozRdOdA8sz-N3IT2-1CDlHn6-VgsQhvAmqcjF0=StvnEO4FScjt-7328AEqPbMEe-fLs-Ms2g94VHkYHF4= > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ARROW-2D8506=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=LDU6YozRdOdA8sz-N3IT2-1CDlHn6-VgsQhvAmqcjF0=U6wwz875yuTkN4WdS7v_zB4SjIyooH6bgeVh57ByPnE= > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_PARQUET-2D1845=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=LDU6YozRdOdA8sz-N3IT2-1CDlHn6-VgsQhvAmqcjF0=ZrNKpegyRlg0SErlYs1FOdBjElPuUzHdxRINWQIzn98= > > > > > Achieving big-endian support support accross the C++ Arrow and Parquet > > codebases is likely to be a very significant effort, potentially > > requiring cooperation between multiple developers. An additional > > problem is that, without any Continuous Integration set up, it will be > > impossible to ensure progress and be notified of regressions. > > > > If other people are seriously interested in the desired outcome, they > > should probably team up with Kazuaki Ishizaki and discuss a practical > > plan to avoid drowning in the difficulties. > > > > Regards > > > > Antoine. > > > >
Re: [C++] Big-endian support
On Wed, Apr 22, 2020 at 12:05 AM Micah Kornfield wrote: > > > > > That said, if big-endian developers would assist with > > other parts of the C++ project as a sort of "quid-pro-quo" to balance > > the time spent on code review relating to big-endian that would be > > helpful. > > I think setting/resetting up setting up CI would need to be included in > this, otherwise even with in-depth reviews, I think it will be easy to > forget about big endian architectures. > > An additional > > problem is that, without any Continuous Integration set up, it will be > > impossible to ensure progress and be notified of regressions. > > This might be hijacking the thread, but I think we might have similar > issues for AVX-512 specific code? Yes, this is true, but AVX-512-capable machines are significantly less exotic (I develop on one -- i9-9960X -- for example) > Thanks, > Micah > > > On Tue, Apr 21, 2020 at 5:10 AM Wes McKinney wrote: > > > I will add that I think big-endian support would be valuable so that > > the library can be used everywhere, including more exotic mainframe > > type systems like IBM Z. > > > > That said, the code review burden to other C++ developers is likely to > > become significant, so a solo developer with access to big-endian > > hardware submitting pull requests could be problematic since no one > > else with close knowledge of the codebase has a need to support > > big-endian. That said, if big-endian developers would assist with > > other parts of the C++ project as a sort of "quid-pro-quo" to balance > > the time spent on code review relating to big-endian that would be > > helpful. > > > > On Mon, Apr 20, 2020 at 12:38 PM Antoine Pitrou > > wrote: > > > > > > > > > Hello, > > > > > > Recently some issues have been opened for big-endian support (i.e. > > > support for big-endian *hosts*), and a couple patches submitted, thanks > > > to Kazuaki Ishizaki. See e.g.: > > > > > > https://issues.apache.org/jira/browse/ARROW-8457 > > > https://issues.apache.org/jira/browse/ARROW-8467 > > > https://issues.apache.org/jira/browse/ARROW-8486 > > > https://issues.apache.org/jira/browse/ARROW-8506 > > > https://issues.apache.org/jira/browse/PARQUET-1845 > > > > > > Achieving big-endian support support accross the C++ Arrow and Parquet > > > codebases is likely to be a very significant effort, potentially > > > requiring cooperation between multiple developers. An additional > > > problem is that, without any Continuous Integration set up, it will be > > > impossible to ensure progress and be notified of regressions. > > > > > > If other people are seriously interested in the desired outcome, they > > > should probably team up with Kazuaki Ishizaki and discuss a practical > > > plan to avoid drowning in the difficulties. > > > > > > Regards > > > > > > Antoine. > >
Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release
hi Micah, I'm not saying that I think the work definitely will not be completed, but rather that we should put a date on the calendar as the target date for 1.0.0 and stick to it. If the work gets done, that's great. 10 to 12 weeks from now would mean releasing 1.0.0 either the week of June 29 or July 6. That is about 1 year since we discussed and adopted our SemVer policy [1] > I would propose that if there isn't an implementation in any language we > might drop it as part of the specification. The main feature that I think > meets this criteria is the Dictionary of Dictionary columns (Is this > supported in C++)? I don't have a strong view on this, but IIUC this is implemented in JavaScript and probably not far off in C++. - Wes [1]: https://lists.apache.org/thread.html/2a630234214e590eb184c24bbf9dac4a8d8f7677d85a75fa49d70ba8%40%3Cdev.arrow.apache.org%3E On Wed, Apr 22, 2020 at 12:26 AM Micah Kornfield wrote: > > Hi Wes, > I think we might be closer than we think on the Java side to having the > functionality listed (I've added comments inline at the end with the > features you listed in the original e-mail). > > My biggest concern is I don't think there is a clear path forward for > Sparse Unions. Getting compatibility for Sparse unions would be more > invasive/breaking changes to the java code base. [1] is the last thread on > the issue. I sadly have not had time to get back to this, nor will I > probably have time before the next release. > > I would propose that if there isn't an implementation in any language we > might drop it as part of the specification. The main feature that I think > meets this criteria is the Dictionary of Dictionary columns (Is this > supported in C++)? > > Thanks, > Micah > > > * custom_metadata fields > > Not sure about this one. > > > * Extension Types > > There is an implementation already in Java, probably. needs more work for > integration testing. > > * Large (64-bit offset) variable size types > > there is an open PR for string/binary types. LargeList is of more > questionable value until Java supports vectors/arrays with more than 2^32 > elements. > > * Delta and Replacement Dictionaries > > There is an implementation already in Java, probably needs more work for > specifically for integration testing. > > > * Unions > > There is an implementation for dense unions (likely needs more work for > integration testing). > > On Tue, Apr 21, 2020 at 11:26 AM Neal Richardson < > neal.p.richard...@gmail.com> wrote: > > > I'm all for making our next release be 1.0. Everything is about tradeoffs, > > and while I too would like to see a complete Java implementation, I think > > the costs of further delaying 1.0 outweigh the benefits of holding it > > indefinitely in hopes that there will be enough availability of Java > > developers to finish integration testing. > > > > Neal > > > > On Tue, Apr 21, 2020 at 10:55 AM Wes McKinney wrote: > > > > > hi Bryan -- with the way that things are going, if we were to block > > > the 1.0.0 release on completing the Java work, it could be a very long > > > time to wait (long time = more than 6 months from now). I don't think > > > that's acceptable. The Versioning document was formally adopted last > > > August and so a year will have soon elapsed since we previously said > > > we wanted to have everything integration tested. > > > > > > With what I'm proposing the primary things that would not be tested > > > (if no progress in Java): > > > > > > * custom_metadata fields > > > * Extension Types > > > * Large (64-bit offset) variable size types > > > * Delta and Replacement Dictionaries > > > * Unions > > > > > > These do not seem like huge sacrifices, or at least not ones that > > > compromise the stability of the columnar format. Of course, if some of > > > them are completed in the next 10-12 weeks, then that's great. > > > > > > - Wes > > > > > > On Tue, Apr 21, 2020 at 12:12 PM Bryan Cutler wrote: > > > > > > > > I really would like to see a 1.0.0 release with complete > > implementations > > > > for C++ and Java. From my experience, that interoperability has been a > > > > major selling point for the project. That being said, my time for > > > > contributions has been pretty limited lately and I know that Java has > > > been > > > > lagging, so if the rest of the community would like to push forward > > with > > > a > > > > reduced scope, that is okay with me. I'll still continue to do what I > > can > > > > on Java to fill in the gaps. > > > > > > > > Bryan > > > > > > > > On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney > > > wrote: > > > > > > > > > Hi all -- are there some opinions about this? > > > > > > > > > > Thanks > > > > > > > > > > On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney > > > wrote: > > > > > > > > > > > > hi folks, > > > > > > > > > > > > Previously we had discussed a plan for making a 1.0.0 release based > > > on > > > > > > completeness of columnar format integration tests and making > > > > > >
[jira] [Created] (ARROW-8553) [C++] Reimplement BitmapAnd using Bitmap::VisitWords
Antoine Pitrou created ARROW-8553: - Summary: [C++] Reimplement BitmapAnd using Bitmap::VisitWords Key: ARROW-8553 URL: https://issues.apache.org/jira/browse/ARROW-8553 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.17.0 Reporter: Antoine Pitrou Currently, {{BitmapAnd}} uses a bit-by-bit loop for unaligned inputs. Using {{Bitmap::VisitWords}} instead would probably yield a manyfold performance increase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NIGHTLY] Arrow Build Report for Job nightly-2020-04-22-0
Arrow Build Report for Job nightly-2020-04-22-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0 Failed Tasks: - test-conda-python-3.7-turbodbc-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.7-turbodbc-latest Succeeded Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-github-centos-6-amd64 - centos-7-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-github-centos-7-amd64 - centos-8-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-github-centos-8-amd64 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-conda-win-vs2015-py38 - debian-buster-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-github-debian-buster-amd64 - debian-stretch-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-github-debian-stretch-amd64 - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-travis-gandiva-jar-osx - gandiva-jar-xenial: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-travis-gandiva-jar-xenial - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-travis-homebrew-cpp - homebrew-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-travis-homebrew-r-autobrew - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-github-test-conda-cpp-valgrind - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-github-test-conda-cpp - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-test-conda-python-3.6 - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.7-dask-latest - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-kartothek-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.7-kartothek-latest - test-conda-python-3.7-kartothek-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.7-kartothek-master - test-conda-python-3.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.7-pandas-latest - test-conda-python-3.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.7-pandas-master - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.7-spark-master - test-conda-python-3.7-turbodbc-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.7-turbodbc-master - test-conda-python-3.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-azure-test-conda-python-3.7 - test-conda-python-3.8-dask-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.8-dask-master - test-conda-python-3.8-jpype: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-22-0-circle-test-conda-python-3.8-jpype -