date:20210215

Pcap2Arrow - Packet capture and data conversion tool to Apache Arrow on the fly

2021-02-15 Thread Kohei KaiGai

Hello,

Let me share my recent works below:
https://github.com/heterodb/pg-strom/wiki/804:-Pcap2Arrow

This standalone command-line tool allows to capture network packets
from network interface devices,
and convert them into Apache Arrow data format according to the
pre-defined data schema for each
supported protocol (TCP, UDP, ICMP x IPv4, IPv6), then write out the
destination files.

It internally uses PF_RING [*1] to support fast network interface card
(> 10Gb), and to minimize
packet losses by utilization of multi-core CPUs.
Even though I confirmed that Pcap2Arrow write out the captured network
packets more than
50Gb/s ratio, my test cases are artificial and biased traffic patterns.
If you can test the software on your environment, it makes sense to
improve the software.
[*1] https://www.ntop.org/products/packet-capture/pf_ring/

As you may know, network traffic data tends to grow so large, thus, it
is not easy to import
them into database systems for analytics. Once we can convert them
into Apache Arrow,
we don't need to import the captured data again. Just map the files
prior to analytics.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: [C++] Conventions around C++ shared_ptr in the code base?

2021-02-15 Thread Micah Kornfield

(Apologies if this is a double send)

I'll open a PR on this soon. To update the dev guide.

Given this standard there are few accessor methods that I think we should
either convert or create a new accessor that does the correct thing with
respect to return type.  Given how core these methods are I think the
latter might be a better approach (but I don't feel too strongly if others
have a good rationale one way or another):
1. Array::Data() [1] - In looking at some CPU profiles it seems like most
of the time spent in Validate is due to shared_ptr
construction/destruction.  In auditing the code this method appears to be
the only one returning copies.

2. RecordBatch::Column* [2] - These are more questionable since they are
virtual methods, it is not clear if dynamic Record batches where the
intention behind this design. So it might not be worth it.  Anecdotally,
I've known people who have written some naive iteration code use these
methods where shared_ptr construction/destruction contributed 10% overhead.

Thanks,
Micah



[1]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L163
[2]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.h#L98

On Mon, Feb 8, 2021 at 10:09 AM Wes McKinney  wrote:

> Agreed. We should probably document this in the C++ developer docs.
>
> On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou  wrote:
> >
> >
> > Hi Micah,
> >
> > That's roughly my mental model as well.
> >
> > However, for 4) I would say that return a const ref to shared_ptr if
> > preferable because the caller will often need the ownership (especially
> > with Array, ArrayData, DataType, etc.).
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 08/02/2021 à 18:02, Micah Kornfield a écrit :
> > > I'm not sure how consistent we are with how shared_ptr is used as a
> > > parameter to methods and as a return type.  In reviewing and writing
> code
> > > I've been using these guidelines for myself and I was wondering if they
> > > align with others:
> > >
> > > 1.  If a copy of a shared_ptr is not intended to be made by the method
> then
> > > use a const ref to underlying type.  i.e. void Foo(const Array& array)
> is
> > > preferable to void Foo(const shared_ptr& array) [1].
> > >
> > > 2.  If a copy is always going to be made pass by value.  i.e. void
> > > Foo(std::shared_ptr) and to std::move within the method.  The
> last
> > > time I did research on this allowed for eliminating shared_ptr
> overhead if
> > > the caller also can std::move() the parameter.
> > >
> > > 3.  If a copy might be made pass the shared_ptr by const reference.
> i.e. void
> > > Foo(const shared_ptr& array) The exception to this if the contents
> of
> > > the shared_ptr a reference can effectively be copied cheaply without
> as is
> > > the case with Array via ArrayData in which case #1 applies.
> > >
> > > 4. For accessor methods prefer returning by const ref or underlying
> ref to
> > > underlying when appropriate. i.e. const std::shared_ptr&
> foo()  or
> > > const Array& Foo().
> > >
> > > 5. For factory like methods return a copy i.e. std::shared_ptr
> > > MakeFoo();
> > >
> > > Is this other people's mental model?  I'd like to update our style
> guide so
> > > we can hopefully drive consistency over time.
> > >
> > > Thanks,
> > > Micah
> > >
> > > [1] Array is somewhat of a special case because one can have
> essentially
> > > the same shared_ptr copy semantics by copying the underlying ArrayData
> > > object.
> > >
>

Re: Documenting the dataset/compute/expression APIs

2021-02-15 Thread Micah Kornfield

Hi Aldrin,
I would guess there aren't too many JIRA items beyond what you linked for
documentation of the compute API.

-Micah

On Fri, Feb 12, 2021 at 2:45 PM Aldrin  wrote:

> Hello!
>
> I am interested in exploring the compute and expression APIs for pushdown
> filters, and I expect some of the use cases to overlap with Gandiva, and
> efforts towards flightSQL. I feel like the design and API documentations
> for these are sparse or I am simply bad at finding them, and wanted to help
> consolidate related documentation as I work on (or with) them.
>
> I have searched JIRA and found the following related issues:
>
>- [ARROW-9392] Document more of the compute layer
>
>- [ARROW-8894] C++ array kernels... buildout (umbrella)
>
>
>
> And here is a related design doc:
>
>
> https://docs.google.com/document/d/1LFk3WRfWGQbJ9uitWwucjiJsZMqLh8lC1vAUOscLtj8
>
>
> There are a few others under the above umbrella, but nothing that seemed
> totally germane. For reference, here's the JIRA search I used, so if you
> can improve it to make finding this sort of thing easier, it would be
> appreciated:
>
> project = ARROW AND component = "C++" AND component = Documentation AND
> (text ~ "dataset" OR text ~ "expression" OR description ~ "dataset" OR
> description ~ "expression")
>
> Specifically, I'm interested in C++ rather than python (though, I suppose
> pyarrow documentation can help with the C++ documentation?).
>
> I wanted to ping here in case anyone has materials to gather, and also in
> case anyone knows of materials I've missed.
>
> Thanks!
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>

Re: Threading Improvements Proposal

2021-02-15 Thread Micah Kornfield

I took a pass through this, thank you for a good discussion of the
alternative.  One thing that I don't quite understand with this proposal is
the scope?  Is the intention that most APIs will eventually work with
Futures instead of raw return values (i.e. returning a Table or Record
batch will never be a thing, but instead you get references to
Future)?

Thanks,
Micah

On Mon, Feb 15, 2021 at 2:15 PM Wes McKinney  wrote:

> hi Weston,
>
> Thanks for putting this comprehensive and informative document together.
>
> There are several layers of problems to consider, just thinking out loud:
>
> * I hypothesize that the bottom of the stack is a thread pool with a
> queue-per-thread that implements work stealing. Some code paths might
> use this low-level task API directly, for example a workload putting
> all of its tasks into one particular queue and letting the other
> threads take work if they are idle.
>
> * I've brought this up in the past, but if we are comfortable with
> more threads than CPU cores, we may allow for the base level thread
> pool to be expanded dynamically. The tradeoff here is coarse
> granularity context switching between tasks only at time of task
> completion vs. the OS context-switching mid-task between threads. For
> example, if there is a code path which wishes to guarantee that a
> thread is being put to work right away to execute its tasks, even if
> all of the other queues are full of other tasks, then this could
> partially address the task prioritization problem discussed in the
> document. If there is a notion of a "task producer" or a "workload"
> and then the number of task producers exceeds the size of the thread
> pool, then additional an thread+dedicated task queue for that thread
> could be created to handle tasks submitted by the producer. Maybe this
> is a bad idea (I'm not an expert in this domain after all), let me
> know if it doesn't make sense.
>
> * I agree that we should encourage as much code as possible to use the
> asynchronous model — per above, if there is a mechanism for async task
> producers to coexist alongside with code that manually manages the
> execution order of tasks generated by its task graph (thinking of
> query engine code here a la Quickstep), then that might be good.
>
> Lots to do here but excited to see things evolve here and see the
> project grow faster and more scalable on systems with a lot of cores
> that do a lot of mixed IO/CPU work!
>
> - Wes
>
> On Tue, Feb 2, 2021 at 9:02 PM Weston Pace  wrote:
> >
> > This is a follow up to a discussion from last September [3].  I've
> > been investigating Arrow's use of threading and I/O and I believe
> > there are some improvements that could be made.  Arrow is currently
> > supporting two threading options (single thread and "per-core" thread
> > pool).  Both of these approaches are hindered if blocking I/O is
> > performed on a CPU worker thread.
> >
> > It is somewhat alleviated by using background threads for I/O (in the
> > readahead iterator) but this implementation is not complete and does
> > not allow for nested parallelism.  I would like to convert Arrow's I/O
> > operations to an asynchronous model (expanding on the existing futures
> > API).  I have already converted the CSV reader in this fashion [2] as
> > a proof of concept.
> >
> > I have written a more detailed proposal here [1].  Please feel free to
> > suggest improvements or alternate approaches.  Also, please let me
> > know if I missed any goals or considerations I should keep in mind.
> >
> > Also, hello, this email is a bit of an introduction.  I have
> > previously made one or two small comments/changes but I am hoping to
> > be more involved going forwards.  I've mostly worked on proprietary
> > test and measurement software but have recently joined Ursa Computing
> > which will allow me more time to work on Arrow.
> >
> > Thanks,
> >
> > Weston Pace
> >
> > [1]
> https://docs.google.com/document/d/1tO2WwYL-G2cB_MCPqYguKjKkRT7mZ8C2Gc9ONvspfgo/edit?usp=sharing
> > [2] https://github.com/apache/arrow/pull/9095
> > [3]
> https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3CCAJPUwMDmU3rFt6Upyis%3DyXB%3DECkmrjdncgR9xj%3DDFapJt9FfUg%40mail.gmail.com%3E
>

RE: [C++] adopting an SIMD library - xsimd / GPU optimization

2021-02-15 Thread Joe Duarte

Hi all -- I looked at some of the SIMD libraries listed by Yibo Cai earlier in 
the thread, and you might want to take a closer look at nsimd. It looks very 
polished and has CUDA support, the only one I noticed that took account of GPUs.

To wit, in what ways is Arrow optimized for GPU compute? I'm new to Arrow and I 
noticed this bit on the homepage: "...organized for efficient analytic 
operations on modern hardware like CPUs and GPUs." Does that mean there's 
actual code targeting GPUs, e.g. CUDA, OpenCL, or C++ AMP (Microsoft)? Or is it 
more of a thoughtful pre-emptive GPU-readiness, so to speak, in the format's 
design?

Getting back to the SIMD library decision, my humble feedback is that you might 
want to approach it with a bit more evaluative attention. The number of GitHub 
stars and contributors seemed to be the major or driving considerations in the 
parts of the thread that I saw. GitHub stars wouldn't make my top-3 criteria, 
and might not make my list at all. I'm not even sure what that metric signifies 
-- general interest or something? (For the unfamiliar, it's not a star rating 
like for movies, but just a count.) It seems there's a lot more to look at than 
star count or contributor count, for example *performance*. SIMD libraries are 
definitely not equal on performance. Bugginess too -- I wish there were easier 
ways --maybe automated -- to evaluate projects and libraries on code quality. 
And I assume there are Arrow project-specific criteria that would matter too, 
which would be completely orthogonal to number of stars on GitHub.

Nsimd looks polished, and that might be because it's from a company 
specializing in high-performance computing: https://agenium-scale.com I hadn't 
heard of them, but it looks good. One thing that confuses me is that most of 
the nsimd code is under "include/nsimd/modules/fixed_point". There's no mention 
of floating point, and there's hardly any code outside of that tree, and I'm 
not sure why fixed point would be the focus. They don't seem to talk about it, 
or I missed it. Not sure if this will matter for Arrow. Their CUDA support 
stands out, but I couldn't find much code. Their Arm SVE support also stands 
out, but it's not clear that SVE actually exists in the wild. It's Arm's 
Scalable Vector Extension, which allows SIMD code to be written once and 
automatically adapted to different vector lengths as needed depending on the 
CPU. Arm's SIMD is typically 128 bits wide, and with SVE 256 and 512 bit widths 
become trivial, but I don't know of any implementations. Do Amazon's new 
Graviton2 chips support it? I hadn't heard that, or any support from Cavium or 
Marvel or whomever in the Arm server space. SVE is very new.

For code quality checking, you could throw a library up onto Coverity Scan. 
It's free for open-source projects. It would be useful for Arrow too if you're 
not using it already. Automated static analysis, and they support C and C++ 
code, among others.

Anyway, those are my thoughts for now.

Cheers,

Joe Duarte

-Original Message-
From: Antoine Pitrou  
Sent: Saturday, February 13, 2021 2:49 AM
To: dev@arrow.apache.org
Subject: Re: [C++] adopting an SIMD library - xsimd

On Fri, 12 Feb 2021 20:47:21 -0800
Micah Kornfield  wrote:
> That is unfortunate, like I said if the consensus is xsimd, let's move 
> forward with that.

I would say it's a soft consensus for now, and I would welcome more viewpoints 
on the matter.

Regards

Antoine.

Re: Threading Improvements Proposal

2021-02-15 Thread Wes McKinney

hi Weston,

Thanks for putting this comprehensive and informative document together.

There are several layers of problems to consider, just thinking out loud:

* I hypothesize that the bottom of the stack is a thread pool with a
queue-per-thread that implements work stealing. Some code paths might
use this low-level task API directly, for example a workload putting
all of its tasks into one particular queue and letting the other
threads take work if they are idle.

* I've brought this up in the past, but if we are comfortable with
more threads than CPU cores, we may allow for the base level thread
pool to be expanded dynamically. The tradeoff here is coarse
granularity context switching between tasks only at time of task
completion vs. the OS context-switching mid-task between threads. For
example, if there is a code path which wishes to guarantee that a
thread is being put to work right away to execute its tasks, even if
all of the other queues are full of other tasks, then this could
partially address the task prioritization problem discussed in the
document. If there is a notion of a "task producer" or a "workload"
and then the number of task producers exceeds the size of the thread
pool, then additional an thread+dedicated task queue for that thread
could be created to handle tasks submitted by the producer. Maybe this
is a bad idea (I'm not an expert in this domain after all), let me
know if it doesn't make sense.

* I agree that we should encourage as much code as possible to use the
asynchronous model — per above, if there is a mechanism for async task
producers to coexist alongside with code that manually manages the
execution order of tasks generated by its task graph (thinking of
query engine code here a la Quickstep), then that might be good.

Lots to do here but excited to see things evolve here and see the
project grow faster and more scalable on systems with a lot of cores
that do a lot of mixed IO/CPU work!

- Wes

On Tue, Feb 2, 2021 at 9:02 PM Weston Pace  wrote:
>
> This is a follow up to a discussion from last September [3].  I've
> been investigating Arrow's use of threading and I/O and I believe
> there are some improvements that could be made.  Arrow is currently
> supporting two threading options (single thread and "per-core" thread
> pool).  Both of these approaches are hindered if blocking I/O is
> performed on a CPU worker thread.
>
> It is somewhat alleviated by using background threads for I/O (in the
> readahead iterator) but this implementation is not complete and does
> not allow for nested parallelism.  I would like to convert Arrow's I/O
> operations to an asynchronous model (expanding on the existing futures
> API).  I have already converted the CSV reader in this fashion [2] as
> a proof of concept.
>
> I have written a more detailed proposal here [1].  Please feel free to
> suggest improvements or alternate approaches.  Also, please let me
> know if I missed any goals or considerations I should keep in mind.
>
> Also, hello, this email is a bit of an introduction.  I have
> previously made one or two small comments/changes but I am hoping to
> be more involved going forwards.  I've mostly worked on proprietary
> test and measurement software but have recently joined Ursa Computing
> which will allow me more time to work on Arrow.
>
> Thanks,
>
> Weston Pace
>
> [1] 
> https://docs.google.com/document/d/1tO2WwYL-G2cB_MCPqYguKjKkRT7mZ8C2Gc9ONvspfgo/edit?usp=sharing
> [2] https://github.com/apache/arrow/pull/9095
> [3] 
> https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3CCAJPUwMDmU3rFt6Upyis%3DyXB%3DECkmrjdncgR9xj%3DDFapJt9FfUg%40mail.gmail.com%3E

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-15 Thread Micah Kornfield

Sorry I realized I had a typo in my email.  We should definitely namespace
dangerous apis appropriately.

On Monday, February 15, 2021, Itamar Turner-Trauring 
wrote:

>
>
> On Fri, Feb 12, 2021, at 11:52 PM, Micah Kornfield wrote:
> > 2.  I'm open to exposing the lower level encryption libraries in python
> > (without appropriate namespacing/communication).  It seems at least for
> > reading, there is potentially less harm (I'll caveat that with I'm not a
> > security expert).  Are both the low level read and write implementations
> > necessary?  (it probably makes sense to have a few smaller PRs for
> exposing
> > this functionality anyways).
>
> Starting with decryption sounds like a good plan! I asked my potential
> users how they feel about starting with that.

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-15 Thread Itamar Turner-Trauring



On Fri, Feb 12, 2021, at 11:52 PM, Micah Kornfield wrote:
> 2.  I'm open to exposing the lower level encryption libraries in python
> (without appropriate namespacing/communication).  It seems at least for
> reading, there is potentially less harm (I'll caveat that with I'm not a
> security expert).  Are both the low level read and write implementations
> necessary?  (it probably makes sense to have a few smaller PRs for exposing
> this functionality anyways).

Starting with decryption sounds like a good plan! I asked my potential users 
how they feel about starting with that.

Re: [Rust] [DataFusion] Topic for next Rust Sync Call

2021-02-15 Thread Andrew Lamb

> Also, unrelated, is there a schedule for the sync calls? Will try and
carve out some free time for the next one :)

It is every other Wednesday at noon EST. Here is the original announcement
with more details:
https://lists.apache.org/thread.html/raa72e1a8a3ad5dbb8366e9609a041eccca87f85545c3bc3d85170cfc%40%3Cdev.arrow.apache.org%3E


On Sun, Feb 14, 2021 at 8:29 AM Ruan Pearce-Authers 
wrote:

> I'd be interested in helping spec this out, it's especially tricky atm to
> track down issues when integrating DataFusion into the same binary as other
> medium/large dependencies.
>
> Recently hit a really specific issue where DataFusion depends on Parquet,
> which supports various compression algs, including Brotli, and actix-web
> also depends on a slightly different Rust implementation of Brotli. Both of
> these Brotli libs package the same underlying C lib separately, resulting
> in multiply-defined symbols compiling using msvc (and maybe on other
> platforms? didn't test in CI in the end).
>
> Got a quick interim hack [1] in place for my use case which doesn't really
> use Parquet, so it's not pressing, but would be awesome to sort this
> properly upstream.
>
> I guess the only major tradeoff of having a comprehensive feature setup is
> that it could make testing slightly harder, in terms of making sure no-one
> breaks the build for specific feature combinations; this can always be
> mitigated with more CI though (yay, unlimited Actions minutes for public
> repos).
>
> Also, unrelated, is there a schedule for the sync calls? Will try and
> carve out some free time for the next one :)
>
> [1]
> https://github.com/reservoirdb/arrow/commit/e63e157927a552ecf1a6f63ec401f0b6157b5468
>
> -Original Message-
> From: Andrew Lamb 
> Sent: 14 February 2021 11:14
> To: dev 
> Subject: [Rust] [DataFusion] Topic for next Rust Sync Call
>
> I would like to add the following item to the agenda call for the next
> Rust sync call:
>
> Dependencies
>
> Background: As the dependency stack gets larger, it will be harder to use
> DataFusion as an embedded query engine and the compile / dev times will get
> higher.
>
> As we expand the supported functions of DataFusion this problem is likely
> to get worse. For example
> https://github.com/apache/arrow/pull/9243#discussion_r575716759 and
> https://github.com/apache/arrow/pull/9139
>
> Proposal: Add Rust "features" to the datafusion crate and make many of the
> new dependencies optional (so that we had features like regex and unicode
> and hash which would only pull in the dependencies / have those functions
> if the features were enabled.) This approach has worked well for Arrow
> (which has only chrono and num as required dependencies)
>

[NIGHTLY] Arrow Build Report for Job nightly-2021-02-15-0

2021-02-15 Thread Crossbow



Arrow Build Report for Job nightly-2021-02-15-0

All tasks: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0

Failed Tasks:
- conda-linux-gcc-py36-cpu-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-linux-gcc-py36-cpu-r36
- conda-linux-gcc-py37-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-drone-conda-linux-gcc-py37-aarch64
- conda-linux-gcc-py37-cpu-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-linux-gcc-py37-cpu-r40
- conda-linux-gcc-py37-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-linux-gcc-py37-cuda
- conda-linux-gcc-py38-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-drone-conda-linux-gcc-py38-aarch64
- conda-linux-gcc-py38-cpu:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-linux-gcc-py38-cpu
- conda-linux-gcc-py38-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-linux-gcc-py38-cuda
- conda-linux-gcc-py39-cpu:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-linux-gcc-py39-cpu
- conda-osx-clang-py36-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-osx-clang-py36-r36
- conda-osx-clang-py37-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-osx-clang-py37-r40
- conda-osx-clang-py38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-osx-clang-py38
- conda-osx-clang-py39:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-osx-clang-py39
- conda-win-vs2017-py36-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-win-vs2017-py36-r36
- conda-win-vs2017-py37-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-win-vs2017-py37-r40
- conda-win-vs2017-py38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-win-vs2017-py38
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-github-test-conda-cpp-valgrind
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-github-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-3.2:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-github-test-conda-python-3.7-hdfs-3.2
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-github-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-github-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-github-test-conda-python-3.8-jpype
- test-ubuntu-18.04-docs:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-test-ubuntu-18.04-docs
- test-ubuntu-18.04-r-sanitizer:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-test-ubuntu-18.04-r-sanitizer
- test-ubuntu-ruby:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-test-ubuntu-ruby

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-github-centos-8-amd64
- conda-clean:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-clean
- conda-linux-gcc-py36-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-drone-conda-linux-gcc-py36-aarch64
- conda-linux-gcc-py36-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-linux-gcc-py36-cuda
- conda-linux-gcc-py39-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-drone-conda-linux-gcc-py39-aarch64
- conda-linux-gcc-py39-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-azure-conda-linux-gcc-py39-cuda
- debian-buster-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-15-0-github-debian-buster-amd64
-

Pcap2Arrow - Packet capture and data conversion tool to Apache Arrow on the fly

Re: [C++] Conventions around C++ shared_ptr in the code base?

Re: Documenting the dataset/compute/expression APIs

Re: Threading Improvements Proposal

RE: [C++] adopting an SIMD library - xsimd / GPU optimization

Re: Threading Improvements Proposal

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

Re: [Rust] [DataFusion] Topic for next Rust Sync Call

[NIGHTLY] Arrow Build Report for Job nightly-2021-02-15-0

10 matches

Site Navigation

Mail list logo

Footer information