[RESULT][VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 35.0.0 RC1
On Thu, Jan 25, 2024 at 8:33 AM Andy Grove wrote: > The vote passes with three binding +1 votes. Thanks, everyone. > > The release is available at > https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-35.0.0/ > > On Sun, Jan 21, 2024 at 12:38 PM L. C. Hsieh wrote: > >> +1 (binding) >> >> Agreed with Andrew. This looks like a test only issue. >> I think we should address the Expr PartialOrd further >> (https://github.com/apache/arrow-datafusion/issues/8932), but it >> should not block the release. >> >> Thanks Andy. >> >> On Sun, Jan 21, 2024 at 3:13 AM Andrew Lamb wrote: >> > >> > +1 (binding) >> > >> > I verified it on Mac (M3). >> > >> > I got the same error in test_partial_ord and I agree it looks very much >> the >> > the same as https://github.com/apache/arrow-datafusion/pull/8908 -- a >> test >> > only issue that should not block the release >> > >> > Thanks Andy >> > >> > >> > On Sat, Jan 20, 2024 at 10:43 AM Andy Grove >> wrote: >> > >> > > Hi, >> > > >> > > I would like to propose a release of Apache Arrow DataFusion >> > > Implementation, >> > > version 35.0.0. >> > > >> > > This release candidate is based on commit: >> > > e58446bbe9ebe3f5a2aae1abd3c17a694070b0d1 [1] >> > > The proposed release tarball and signatures are hosted at [2]. >> > > The changelog is located at [3]. >> > > >> > > Please download, verify checksums and signatures, run the unit tests, >> and >> > > vote >> > > on the release. The vote will be open for at least 72 hours. >> > > >> > > Only votes from PMC members are binding, but all members of the >> community >> > > are >> > > encouraged to test the release and vote with "(non-binding)". >> > > >> > > The standard verification procedure is documented at >> > > >> > > >> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates >> > > . >> > > >> > > [ ] +1 Release this as Apache Arrow DataFusion 35.0.0 >> > > [ ] +0 >> > > [ ] -1 Do not release this as Apache Arrow DataFusion 35.0.0 >> because... >> > > >> > > Here is my vote: >> > > >> > > +1 >> > > >> > > [1]: >> > > >> > > >> https://github.com/apache/arrow-datafusion/tree/e58446bbe9ebe3f5a2aae1abd3c17a694070b0d1 >> > > [2]: >> > > >> > > >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-35.0.0-rc1 >> > > [3]: >> > > >> > > >> https://github.com/apache/arrow-datafusion/blob/e58446bbe9ebe3f5a2aae1abd3c17a694070b0d1/CHANGELOG.md >> > > >> >
Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 35.0.0 RC1
The vote passes with three binding +1 votes. Thanks, everyone. The release is available at https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-35.0.0/ On Sun, Jan 21, 2024 at 12:38 PM L. C. Hsieh wrote: > +1 (binding) > > Agreed with Andrew. This looks like a test only issue. > I think we should address the Expr PartialOrd further > (https://github.com/apache/arrow-datafusion/issues/8932), but it > should not block the release. > > Thanks Andy. > > On Sun, Jan 21, 2024 at 3:13 AM Andrew Lamb wrote: > > > > +1 (binding) > > > > I verified it on Mac (M3). > > > > I got the same error in test_partial_ord and I agree it looks very much > the > > the same as https://github.com/apache/arrow-datafusion/pull/8908 -- a > test > > only issue that should not block the release > > > > Thanks Andy > > > > > > On Sat, Jan 20, 2024 at 10:43 AM Andy Grove > wrote: > > > > > Hi, > > > > > > I would like to propose a release of Apache Arrow DataFusion > > > Implementation, > > > version 35.0.0. > > > > > > This release candidate is based on commit: > > > e58446bbe9ebe3f5a2aae1abd3c17a694070b0d1 [1] > > > The proposed release tarball and signatures are hosted at [2]. > > > The changelog is located at [3]. > > > > > > Please download, verify checksums and signatures, run the unit tests, > and > > > vote > > > on the release. The vote will be open for at least 72 hours. > > > > > > Only votes from PMC members are binding, but all members of the > community > > > are > > > encouraged to test the release and vote with "(non-binding)". > > > > > > The standard verification procedure is documented at > > > > > > > https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates > > > . > > > > > > [ ] +1 Release this as Apache Arrow DataFusion 35.0.0 > > > [ ] +0 > > > [ ] -1 Do not release this as Apache Arrow DataFusion 35.0.0 because... > > > > > > Here is my vote: > > > > > > +1 > > > > > > [1]: > > > > > > > https://github.com/apache/arrow-datafusion/tree/e58446bbe9ebe3f5a2aae1abd3c17a694070b0d1 > > > [2]: > > > > > > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-35.0.0-rc1 > > > [3]: > > > > > > > https://github.com/apache/arrow-datafusion/blob/e58446bbe9ebe3f5a2aae1abd3c17a694070b0d1/CHANGELOG.md > > > >
Re: [Python][Rust] Is Arrow Rust supposed to support S3-compatible storage with non-public certificates?
On a related note, version 0.9.0 switched to using the system CAs by default [1], and so if you've added your private CA chain into there it should work. [1]: https://github.com/apache/arrow-rs/pull/5056 On 25 January 2024 09:17:55 GMT, Raphael Taylor-Davies wrote: >The ticket for supporting self-signed certificates can be found here [1]. > >If you switch the TLS backend to OpenSSL it may respect the SSL_CERT_FILE >environment variable, but I'm not very familiar with the particulars of that >library. This would require customising the Rust build, however, which may not >be possible if calling from python. > >Kind Regards, > >Raphael > > >[1]: https://github.com/apache/arrow-rs/issues/5034 > >On 25 January 2024 08:44:45 GMT, elveshoern32 > wrote: >>Since my question remained unanswered on the user list, I dare to ask again >>on the dev list: >> >> >>While experimenting with polars [1] (which is based on arrow-rs) I found that >>it's not possible to read a single file from our on-prem S3-compatible >>storage. >> >>Any attempts result in SSL error messages: >> >> >> >>error trying to connect: invalid peer certificate: UnknownIssuer >> >> >> >>Such SSL errors are well-known to us and usually get fixed by setting the >>environment variable SSL_CERT_FILE (or something similar) pointing to our >>company's certstore. >> >>polars seems to ignore that env var. >> >>Now it's unclear to me whether this is an issue of polars or arrow-rs (or >>anything else). >> >> >> >>For more details see [2]. >> >> >> >>[1] https://pola.rs/ >> >>[2] https://github.com/pola-rs/polars/issues/13741
Re: [Python][Rust] Is Arrow Rust supposed to support S3-compatible storage with non-public certificates?
The ticket for supporting self-signed certificates can be found here [1]. If you switch the TLS backend to OpenSSL it may respect the SSL_CERT_FILE environment variable, but I'm not very familiar with the particulars of that library. This would require customising the Rust build, however, which may not be possible if calling from python. Kind Regards, Raphael [1]: https://github.com/apache/arrow-rs/issues/5034 On 25 January 2024 08:44:45 GMT, elveshoern32 wrote: >Since my question remained unanswered on the user list, I dare to ask again on >the dev list: > > >While experimenting with polars [1] (which is based on arrow-rs) I found that >it's not possible to read a single file from our on-prem S3-compatible storage. > >Any attempts result in SSL error messages: > > > >error trying to connect: invalid peer certificate: UnknownIssuer > > > >Such SSL errors are well-known to us and usually get fixed by setting the >environment variable SSL_CERT_FILE (or something similar) pointing to our >company's certstore. > >polars seems to ignore that env var. > >Now it's unclear to me whether this is an issue of polars or arrow-rs (or >anything else). > > > >For more details see [2]. > > > >[1] https://pola.rs/ > >[2] https://github.com/pola-rs/polars/issues/13741
Re: [IPC] Delta Dictionary Flag Clarification for Multi-Batch IPC
Hello, My own answers: 1) isDelta should be true only when a delta is being transmitted (to be appended to the existing dictionary with the same id); it should be false when a full dictionary is being transmitted (to replace the existing dictionary with the same id, if any) 2) yes, it could 3) yes 4) there's no reason it can't be valid Regards Antoine. Le 25/01/2024 à 07:25, Micah Kornfield a écrit : Hi Chris, My interpretations: 1) I'm not sure it is clearly defined, but my impression is the first dictionary is never a delta dictionary (option 1) 2) I don't think they are prevented from switching state (which I supposed is more complicated?) but hopefully not by much? 3) Dictionaries are reused across batches unless replaced. 4) I'm not sure I understand this question. Dictionary should be passed independently of indexes? Thanks, Micah On Fri, Jan 19, 2024 at 1:55 PM Chris Larsen wrote: Hi folks, I'm working on multi-batch dictionary with delta support in Java [1] and would like some clarifications. Given the "isDelta" flag in the dictionary message [2], when should this be set to "true"? 1) If we have dictionary with an ID of 1 that we want to delta encode and it is used across multiple batches, should the initial batch have `isDelta=false` then subsequent batches have `isDelta=true`? E.g. batch 1, dict 1, isDelta=false, dictVector=[a, b, c], indexVector=[0, 1, 1, 2] batch 2, dict 1, isDelta=true, dictVector=[d], indexVector=[2, 3, 0, 1] batch 3, dict 1, isDelta=true, dictVector=[e], indexVector=[0, 4] Or should the flag be true for the entire IPC flow? E.g. batch 1, dict 1, isDelta=true, dictVector=[a, b, c], indexVector=[0, 1, 1, 2] batch 2, dict 1, isDelta=true, dictVector=[d], indexVector=[2, 3, 0, 1] batch 3, dict 1, isDelta=true, dictVector=[e], indexVector=[0, 4, 3] Either works for me. 2) Could (in stream, not file IPCs) a single dictionary ever switch state across batches from delta to replacement mode or vice-versa? E.g. batch 1, dict 1, isDelta = true, dictVector=[a, b, c], indexVector=[0, 1, 1, 2] batch 2, dict 1, isDelta = true, dictVector=[d], indexVector=[2, 3, 0, 1] batch 3, dict 1, isDelta = false, dictVector=[c, a, d], indexVector=[0, 1, 2] I'd like to keep the protocol and API simple and assume switching is not allowed. This would mean the 2nd example above would be canonical. 3) Are replacement dictionaries required to be serialized for every batch or is a dictionary re-used across batches until a replacement is received? The CPP IPC API has 'unify_dictionaries' [3] that mentions "a column with a dictionary type must have the same dictionary in each record batch". I assume (and prefer) the latter, that replacements are serialized once and re-used. E.g. batch 1, dict 1, isDelta = false, dictVector=[a, b, c], indexVector=[0, 1, 1, 2] batch 2, dict 1, isDelta = false, dictVector=[], indexVector=[2, 1, 0, 1] // use previous dictionary batch 3, dict 1, isDelta = false, dictVector=[c, a, d], indexVector=[0, 1, 2] // replacement And I assume that 'unify_dictionaries' simply concatenates all dictionaries into a single vector serialized in the first batch (haven't looked at the code yet). 4) Is it valid for a delta dictionary to have an update in a subsequent batch even though the update is not used in that batch? A silly example would be: batch 1, dict 1, isDelta = true, dictVector=[a, b, c], indexVector=[0, 1, 1, 2] batch 2, dict 1, isDelta = true, dictVector=[d], indexVector=[null, null, null, null] batch 3, dict 1, isDelta = true, dictVector=[], indexVector=[0, 3, 2] Thanks for your help! [1] https://github.com/apache/arrow/pull/38423 [2] https://github.com/apache/arrow/blob/main/format/Message.fbs#L134 [3] https://arrow.apache.org/docs/cpp/api/ipc.html#_CPPv4N5arrow3ipc15IpcWriteOptions18unify_dictionariesE -- Chris Larsen
[Python][Rust] Is Arrow Rust supposed to support S3-compatible storage with non-public certificates?
Since my question remained unanswered on the user list, I dare to ask again on the dev list: While experimenting with polars [1] (which is based on arrow-rs) I found that it's not possible to read a single file from our on-prem S3-compatible storage. Any attempts result in SSL error messages: error trying to connect: invalid peer certificate: UnknownIssuer Such SSL errors are well-known to us and usually get fixed by setting the environment variable SSL_CERT_FILE (or something similar) pointing to our company's certstore. polars seems to ignore that env var. Now it's unclear to me whether this is an issue of polars or arrow-rs (or anything else). For more details see [2]. [1] https://pola.rs/ [2] https://github.com/pola-rs/polars/issues/13741