Re: [ANNOUNCE] New Arrow PMC chair: Wes McKinney

2020-10-26 Thread Bryan Cutler
Congrats Wes, well deserved!

On Sun, Oct 25, 2020, 10:17 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Thanks a lot Jacques for taking the flag until now, and congratulations,
> Wes!
>
> On Sun, Oct 25, 2020 at 2:58 PM Wes McKinney  wrote:
>
> > Thanks all!
> >
> > On Sun, Oct 25, 2020 at 6:29 AM Krisztián Szűcs
> >  wrote:
> > >
> > > Congrats Wes!
> > >
> > > On Sun, Oct 25, 2020 at 2:40 AM David Li 
> wrote:
> > > >
> > > > Congratulations Wes!
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On 10/24/20, Li Jin  wrote:
> > > > > Congrats Wes!
> > > > >
> > > > > On Sat, Oct 24, 2020 at 10:05 AM Ying Zhou 
> > wrote:
> > > > >
> > > > >> Congratulations Wes! :)
> > > > >>
> > > > >> Ying
> > > > >>
> > > > >> > On Oct 23, 2020, at 7:35 PM, Jacques Nadeau  >
> > wrote:
> > > > >> >
> > > > >> > I am pleased to announce that we have a new PMC chair and VP as
> > per our
> > > > >> > newly started tradition of rotating the chair once a year. I
> have
> > > > >> resigned
> > > > >> > and Wes was duly elected by the PMC and approved unanimously by
> > the
> > > > >> board.
> > > > >> >
> > > > >> > Please join me in congratulating Wes!
> > > > >> >
> > > > >> > Jacques
> > > > >>
> > > > >>
> > > > >
> >
>


Re: mutual TLS peer_identity in arrow flight

2020-10-26 Thread Radu Teodorescu
Thank you folks,
A PR says more than a thousand words :) 
https://github.com/apache/arrow/pull/8537 


Certainly a much less ambitious change than propose in the auth redesign (which 
I am looking forward to btw).
To James’ concern, the auth information is only passed on from the ssl layer if 
there is client authentication (i.e. mTLS) and there is no authentication 
handler defined so it will have no impact on any use cases where one uses an 
authentication handler.

Let me know if you want me to make any further changes.
Radu

> On Oct 26, 2020, at 8:22 PM, James Duong  wrote:
> 
> The authentication redesign goes further down the path of getting the peer
> identity from the authentication information.
> I would say that getting the peer context through mTLS is valid, but we
> shouldn't change the behavior of
> existing implementations of FlightProducers that get this from the auth
> handler. In other words, it should be
> an option to get peer identity this way that you can opt into.
> 
> On Mon, Oct 26, 2020 at 4:41 PM David Li  wrote:
> 
>> Hey Radu,
>> 
>> That sounds fine to me, presumably if someone layers an authentication
>> handler on top of mTLS, they don't want the mTLS identity anymore.
>> 
>> Also note there's another auth redesign ongoing, though I don't think
>> that conflicts with this, but maybe the authors there might think
>> about how/if mTLS fits their design.
>> 
>> https://lists.apache.org/thread.html/r485888f4f818e8e4722dc6c53491fb4c68ee7ac16d1c769612e61d21%40%3Cdev.arrow.apache.org%3E
>> 
>> Best,
>> David
>> 
>> On 10/26/20, Radu Teodorescu  wrote:
>>> Hi,
>>> I have a follow up question/feature proposal in the context of mutual TLS
>>> (introduced by https://issues.apache.org/jira/browse/ARROW-8742
>>> ):
>>> In the context of mutual TLS the client is authenticated at TLS level and
>>> the client identity is available in the grpc context’s authentication
>>> context but that information is not propagated to the peer_identity in
>> the
>>> arrow flight context.
>>> This is because Flight has its own authentication mechanism and the TLS
>>> client authentication was added afterwards without connecting the two.
>>> 
>>> I suggest the following change to mediate the above (and happy to
>> deliver it
>>> myself):
>>> 
>>> In the case where the client is authenticated by the GRPC/TSL layer, I
>> can
>>> have the flight_context.peer_identity default to the PeerIdentity as
>> stored
>>> in the grpc auth_context.
>>> Pros: it’s a 4 line change and it would work out of the box for both
>> python
>>> and C++ with no public interface changes and no relevant observed
>> behavior
>>> for existing code (except for peer_identity context field being properly
>>> populated instead of empty).
>>> Cons: If there is a flight Authentication Handler, the lower level
>> identity
>>> would be ignored (but that is the case in the current implementation
>>> already).
>>> 
>>> I can send out a PR unless there is another solution in the works
>> 
> 
> 
> -- 
> 
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jam...@bitquilltech.com
> https://www.bitquilltech.com
> 
> This email message is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information.  Any unauthorized review,
> use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.



Re: mutual TLS peer_identity in arrow flight

2020-10-26 Thread James Duong
The authentication redesign goes further down the path of getting the peer
identity from the authentication information.
I would say that getting the peer context through mTLS is valid, but we
shouldn't change the behavior of
existing implementations of FlightProducers that get this from the auth
handler. In other words, it should be
an option to get peer identity this way that you can opt into.

On Mon, Oct 26, 2020 at 4:41 PM David Li  wrote:

> Hey Radu,
>
> That sounds fine to me, presumably if someone layers an authentication
> handler on top of mTLS, they don't want the mTLS identity anymore.
>
> Also note there's another auth redesign ongoing, though I don't think
> that conflicts with this, but maybe the authors there might think
> about how/if mTLS fits their design.
>
> https://lists.apache.org/thread.html/r485888f4f818e8e4722dc6c53491fb4c68ee7ac16d1c769612e61d21%40%3Cdev.arrow.apache.org%3E
>
> Best,
> David
>
> On 10/26/20, Radu Teodorescu  wrote:
> > Hi,
> > I have a follow up question/feature proposal in the context of mutual TLS
> > (introduced by https://issues.apache.org/jira/browse/ARROW-8742
> > ):
> > In the context of mutual TLS the client is authenticated at TLS level and
> > the client identity is available in the grpc context’s authentication
> > context but that information is not propagated to the peer_identity in
> the
> > arrow flight context.
> > This is because Flight has its own authentication mechanism and the TLS
> > client authentication was added afterwards without connecting the two.
> >
> > I suggest the following change to mediate the above (and happy to
> deliver it
> > myself):
> >
> > In the case where the client is authenticated by the GRPC/TSL layer, I
> can
> > have the flight_context.peer_identity default to the PeerIdentity as
> stored
> > in the grpc auth_context.
> > Pros: it’s a 4 line change and it would work out of the box for both
> python
> > and C++ with no public interface changes and no relevant observed
> behavior
> > for existing code (except for peer_identity context field being properly
> > populated instead of empty).
> > Cons: If there is a flight Authentication Handler, the lower level
> identity
> > would be ignored (but that is the case in the current implementation
> > already).
> >
> > I can send out a PR unless there is another solution in the works
>


-- 

*James Duong*
Lead Software Developer
Bit Quill Technologies Inc.
Direct: +1.604.562.6082 | jam...@bitquilltech.com
https://www.bitquilltech.com

This email message is for the sole use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized review,
use, disclosure, or distribution is prohibited.  If you are not the
intended recipient, please contact the sender by reply email and destroy
all copies of the original message.  Thank you.


Re: mutual TLS peer_identity in arrow flight

2020-10-26 Thread David Li
Hey Radu,

That sounds fine to me, presumably if someone layers an authentication
handler on top of mTLS, they don't want the mTLS identity anymore.

Also note there's another auth redesign ongoing, though I don't think
that conflicts with this, but maybe the authors there might think
about how/if mTLS fits their design.
https://lists.apache.org/thread.html/r485888f4f818e8e4722dc6c53491fb4c68ee7ac16d1c769612e61d21%40%3Cdev.arrow.apache.org%3E

Best,
David

On 10/26/20, Radu Teodorescu  wrote:
> Hi,
> I have a follow up question/feature proposal in the context of mutual TLS
> (introduced by https://issues.apache.org/jira/browse/ARROW-8742
> ):
> In the context of mutual TLS the client is authenticated at TLS level and
> the client identity is available in the grpc context’s authentication
> context but that information is not propagated to the peer_identity in the
> arrow flight context.
> This is because Flight has its own authentication mechanism and the TLS
> client authentication was added afterwards without connecting the two.
>
> I suggest the following change to mediate the above (and happy to deliver it
> myself):
>
> In the case where the client is authenticated by the GRPC/TSL layer, I can
> have the flight_context.peer_identity default to the PeerIdentity as stored
> in the grpc auth_context.
> Pros: it’s a 4 line change and it would work out of the box for both python
> and C++ with no public interface changes and no relevant observed behavior
> for existing code (except for peer_identity context field being properly
> populated instead of empty).
> Cons: If there is a flight Authentication Handler, the lower level identity
> would be ignored (but that is the case in the current implementation
> already).
>
> I can send out a PR unless there is another solution in the works


mutual TLS peer_identity in arrow flight

2020-10-26 Thread Radu Teodorescu
Hi,
I have a follow up question/feature proposal in the context of mutual TLS 
(introduced by https://issues.apache.org/jira/browse/ARROW-8742 
):
In the context of mutual TLS the client is authenticated at TLS level and the 
client identity is available in the grpc context’s authentication context but 
that information is not propagated to the peer_identity in the arrow flight 
context.
This is because Flight has its own authentication mechanism and the TLS client 
authentication was added afterwards without connecting the two.

I suggest the following change to mediate the above (and happy to deliver it 
myself):

In the case where the client is authenticated by the GRPC/TSL layer, I can have 
the flight_context.peer_identity default to the PeerIdentity as stored in the 
grpc auth_context. 
Pros: it’s a 4 line change and it would work out of the box for both python and 
C++ with no public interface changes and no relevant observed behavior for 
existing code (except for peer_identity context field being properly populated 
instead of empty).
Cons: If there is a flight Authentication Handler, the lower level identity 
would be ignored (but that is the case in the current implementation already).

I can send out a PR unless there is another solution in the works

[c++] Futures API review & help understanding benchmark result

2020-10-26 Thread Weston Pace
Hi all,

I've completed the initial composable futures API and iterator work.
The CSV reader portion is still WIP.

First, I'm interested in getting any feedback on the futures API.  In
particular Future.Then in future.h (and the type erased
Composable.Compose).  The actual implementation can probably be
cleaned up with regards to DRY (the 10 specializations of the Continue
function) which I plan to do at the end.

This approach is a little different than my earlier prototype.  In the
prototype it would always submit continuations on the thread pool as
new tasks.  Instead I've changed it so continuations will run
synchronously when the future is marked complete.  If there is a
desire to move the continuation into a thread pool task it can be done
with Executor.Transfer.  As an example usage this is done in
AsyncForEachHelper so that the applied for-each function is not run on
the reader thread.

Second, and perhaps what I'm more interested in, I've switched
ThreadedTaskGroup to using futures (e.g. using Compose to add a
callback that calls OneTaskDone instead of making a wrapper lambda
function).  In theory this should be more or less the exact same work
as the previous task group implementation.  However, I am seeing
noticeable overhead in arrow-thread-pool-benchmark for small tasks.
The benchmark runs on my system at ~950k items/s for no task group,
~890k items/s with the old task group implementation, and ~450k
items/s with the futures based implementation.  The change is isolated
to one method in task_group.cc so if you replace the method at line
102 with the commented out version at line 127 the original
performance returns.  I've verified that the task is not getting
copied.  There are a few extra moves and function calls and futures
have to be created and copied around so it is possible that is the
cause of it but I'm curious if a second eye could see some other cause
for the degradation that I am missing.  I'll also be seeing if I can
get gprof running later in hopes that can provide some insight.
However, I probably won't spend too much more time on it before
finishing up the CSV reader work and checking the performance of the
CSV reader.

If I can't figure out the cause of the performance I can always allow
task group to keep the implementation it has for Append(task) while
using future for Append(future).  I suspect that the CSV reader tasks
are long enough tasks that the overhead won't be an issue.

Code: 
https://github.com/apache/arrow/compare/master...westonpace:feature/arrow-10183?expand=1

-Weston


[Rust] Merging the Parquet Arrow Branch

2020-10-26 Thread Neville Dipale
Hi Arrow devs,
We've been working on a Parquet writer on a separate branch, mainly to
expedite
merging PRs in case there weren't enough reviewers.I wasn't comfortable
merging
the work by 2.0, so I opted not to get it merged for the release.

There are people who have expressed an interest in using the WIP writer,
and at
the same time, we're starting to diverge to the point where rebases are
becoming
a lot of work for me.

What process should I follow to get the changes merged in?
The branch has 9 commits + 1 which should be merged in the next 2 days.
I'd like to preserve the commit history if possible, instead of squashing
them.

Thanks
Neville


[NIGHTLY] Arrow Build Report for Job nightly-2020-10-26-0

2020-10-26 Thread Crossbow


Arrow Build Report for Job nightly-2020-10-26-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0

Failed Tasks:
- conda-win-vs2017-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-win-vs2017-py37
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-travis-homebrew-cpp
- nuget:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-github-nuget
- test-conda-python-3.7-spark-branch-3.0:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-github-test-conda-python-3.7-spark-branch-3.0
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-github-test-conda-python-3.8-jpype
- test-conda-python-3.8-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-github-test-conda-python-3.8-spark-master
- test-conda-r-4.0:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-github-test-conda-r-4.0
- wheel-osx-high-sierra-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-travis-wheel-osx-high-sierra-cp35m

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-github-centos-6-amd64
- centos-7-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-travis-centos-7-aarch64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-github-centos-7-amd64
- centos-8-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-travis-centos-8-aarch64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-github-centos-8-amd64
- conda-clean:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-clean
- conda-linux-gcc-py36-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-drone-conda-linux-gcc-py36-aarch64
- conda-linux-gcc-py36-cpu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-linux-gcc-py36-cpu
- conda-linux-gcc-py36-cuda:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-linux-gcc-py36-cuda
- conda-linux-gcc-py37-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-drone-conda-linux-gcc-py37-aarch64
- conda-linux-gcc-py37-cpu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-linux-gcc-py37-cpu
- conda-linux-gcc-py37-cuda:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-linux-gcc-py37-cuda
- conda-linux-gcc-py38-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-drone-conda-linux-gcc-py38-aarch64
- conda-linux-gcc-py38-cpu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-linux-gcc-py38-cpu
- conda-linux-gcc-py38-cuda:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-linux-gcc-py38-cuda
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-osx-clang-py38
- conda-win-vs2017-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-win-vs2017-py36
- conda-win-vs2017-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-azure-conda-win-vs2017-py38
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-github-debian-buster-amd64
- debian-buster-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-travis-debian-buster-arm64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-github-debian-stretch-amd64
- debian-stretch-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-26-0-travis-debian-stretch-arm64
- example-cpp-minimal-build-static-system-dependency:
  URL: 
https://github.com/ursa-labs/