Dear all,
Currently, master build is failing occasionally.
After investigation, we find it was caused by a cyclic dependency when
class loading.
We have provided a patch for it [1]. Please take a look.
Best,
Liya Fan
[1] https://github.com/apache/arrow/pull/7628
I added JIRAs for incorporating this into implementations.
On Thu, Jul 2, 2020 at 6:25 AM Wes McKinney wrote:
> Forwarding with [RESULT] subject line
>
> On Wed, Jul 1, 2020 at 1:24 AM Micah Kornfield
> wrote:
> >
> > The vote carries with 4 binding +1 votes and 0 non-binding +1. I will
> > mer
I have seen this failure multiple times. However, it is not addressed yet.
https://travis-ci.community/t/s390x-no-space-left-on-device/8953
It is fine with me until we see more stable results.
Regards,
Kazuaki Ishizaki
From: Wes McKinney
To: dev
Date: 2020/07/03 05:32
Subject:
The vote carries with 6 binding +1 votes and 2 non-binding +1
On Tue, Jun 30, 2020 at 4:03 PM Sutou Kouhei wrote:
>
> +1 (binding)
>
> In
> "[VOTE] Increment MetadataVersion in Schema.fbs from V4 to V5 for 1.0.0
> release" on Mon, 29 Jun 2020 16:42:45 -0500,
> Wes McKinney wrote:
>
> > Hi,
The vote carries with 6 binding +1 and 1 non-binding +1. Thanks all
On Tue, Jun 30, 2020 at 10:07 AM Francois Saint-Jacques
wrote:
>
> +1 (binding)
>
> On Tue, Jun 30, 2020 at 10:55 AM Neal Richardson
> wrote:
> >
> > +1 (binding)
> >
> > On Tue, Jun 30, 2020 at 2:52 AM Antoine Pitrou wrote:
>
Thanks!
> You should be able to store different length vectors in Parquet. Think of
> strings simply as an array of bytes, and those are variable length. You
> would want to make sure you don’t use DICTIONARY_ENCODING in that case.
>
Interesting. We'll look at that.
> No, I'm not aware of any
The vote carries with 3 binding +1 votes, 2 non-binding +1, and 1 +0
Thanks all for voting. I will update the Format PR and plan to merge
the C++ PR soon thereafter
On Tue, Jun 30, 2020 at 4:00 PM Sutou Kouhei wrote:
>
> +1 (binding)
>
> In
> "[VOTE] Removing validity bitmap from Arrow union
Joaquin,
> Do you know whether there any activity on supporting partial read/writes
in
arrow or fastparquet?
I’m not entirely sure about the status of partial read/writes in Arrow’s
Parquet implementation but
https://github.com/xitongsys/parquet-go for example has this capability.
> Even then, t
Just looking at https://travis-ci.org/github/apache/arrow/builds the
failure rate on master (which should be green > 95% of the time) is
really high. I'm going to open a patch adding to allow_failures until
we see this become less flaky
On Thu, Jul 2, 2020 at 8:39 AM Antoine Pitrou wrote:
>
>
> I
hi folks,
I hope you and your families are all well.
We're heading into a holiday weekend here in the US -- I would guess
given the state of the backlog and nightly builds that the earliest we
could contemplate making the release will be the week of July 13. That
should give enough time next week
I can confirm what Uwe said, manylinux doesn't cause issues.
Here I've build inside a manylinux2010 docker a C++ Python extension (using
the C++ of Arrow):
https://github.com/vaexio/vaex-arrow-ext/runs/831763024?check_suite_focus=true
It's built with the manylinux1 and manylinux2010 pyarrow wheel
Well, it depends how important speed is, but LZ4 has extremely fast
decompression, even compared to Snappy:
https://github.com/lz4/lz4#benchmarks
Regards
Antoine.
Le 02/07/2020 à 19:47, Christian Hudon a écrit :
> At least for us, the advantages of Parquet are speed and interoperability
> in
At least for us, the advantages of Parquet are speed and interoperability
in the context of longer-term data storage, so I would tend to say
"reasonably conservative".
Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou a
écrit :
>
> I don't have a sense of how conservative Parquet users generally
Very interesting. This is something that I would potentially also be
interested in, so if there were some code available out there, I could
potentially contribute or at least use. At least, I'd love for something
that allows Arrow to work with both larger and very small record batches (a
few rows)
Since publishing artifacts to NPM is somewhat independent from the
Apache source release, if you aren't ready to push to NPM then the
release manager can just not push the artifacts
Note that the plan hasn't been to go from 1.0.0 to 1.1.0, rather that
almost every Apache release (aside from patch
We build pyarrow in the docker image because auditwheel complains about pyarrow
otherwise which causes our wheels to fail auditwheel and not allow the
manylinux tag. But assuming we build pyarrow in the docker image, our manylinux
wheels that result are then compatible with the pyarrow manylinux
I did try the approach to not link against pyarrow but leave out the symbols,
just ensure pyarrow is imported before the vaex extension. This works
out-of-the-box on macOS but fails on Linux as symbols have a scope there.
Adding the following lines to load Arrow into the global scope made it wor
Hello Tim,
thanks for the hint. I see that you build arrow by yourselves in the
Dockerfile. Could it be that in the end you statically link the arrow libraries?
As there are no wheel on PyPI, I couldn't verify whether that assumption is
true.
Best
Uwe
On Thu, Jul 2, 2020, at 4:53 PM, Tim Pain
The virtual table a sounds a lot like regular-table:
https://github.com/jpmorganchase/regular-table
Used in perspective:
https://perspective.finos.org/
We use arrow c++ compiled with webassembly and some front end grid and chart
plugins, perspective can run in a client server fashion and only se
On Wed, Jul 1, 2020 at 9:52 AM Joris Van den Bossche
wrote:
>
> I am personally fine with removing the compute dunder methods again (i.e.
> Array.__richcmp__), if that resolves the ambiguity. Although they *are*
> convenient IMO, even for developers (question might also come up if we want
> to add
We spent a ton of time on this for perspective, the end result is a mostly
compatible set of wheels for most platforms, I believe we skipped py2 but
nobody cares about those anyway. We link against libarrow and libarrow_python
on Linux, on windows we vendor them all into our library. Feel free t
I think the intention so far has been to support precision between 0
and 38 and scale <= precision. 128-bit integers max out at 38 digits,
I think that's the rationale for the limit. See e.g. the Impala docs
(also uses 128-bit decimals) [1]
[1]: https://impala.apache.org/docs/build/html/topics/imp
I had so much fun with the wheels in the past, I'm now a happy member of
conda-forge core instead :D
The good thing first:
* The C++ ABI didn't change between the manylinux versions, it is the old one
in all cases. So you mix & match manylinux versions.
The sad things:
* The manylinuxX standa
Hi folks,
We are reaching out to better understand the performance of ArrowJS when it
comes to viewing large amounts of data (> 1M records) in the browser’s DOM.
Our backend (https://github.com/tenzir/vast) spits out record batches,
which we are accumulating in the frontend with a RecordBatchReade
Hi Nick, all,
Thanks! I updated the blog post to specify the requirements better.
First, we plan to store the datasets in S3 (on min.io). I agree this works
nicely with Parquet.
Do you know whether there any activity on supporting partial read/writes in
arrow or fastparquet? That would change th
In my experience, both the s390x and ARM builds are flaky on Travis-Ci,
for reasons which seem unrelated to Arrow. The infrastructure seems a
bit unreliable.
Regards
Antoine.
Le 02/07/2020 à 15:15, Wes McKinney a écrit :
> I would be interested to know the empirical reliability of the s390x
Forwarding with [RESULT] subject line
On Wed, Jul 1, 2020 at 1:24 AM Micah Kornfield wrote:
>
> The vote carries with 4 binding +1 votes and 0 non-binding +1. I will
> merge the change and open some JIRAs about reading/writing the new field
> from reference implementations (hopefully tomorrow).
>
I would be interested to know the empirical reliability of the s390x
Travis CI build, but my guess is that it is flaking at least 20% of
the time, maybe more than that. If that's the case, then I think it
should be added back to allow_failures and at best we can look at it
perioidically to make sur
On Thu, Jul 2, 2020 at 3:32 AM Maarten Breddels
wrote:
>
> Hi,
>
> in the process of adding Arrow support in Vaex (natively, not converting to
> Numpy as we did before), one of our biggest pain points is (surprisingly)
> the name mismatch between NumPy's .tolist() and Arrow's .to_pylist().
> Espec
Ok, thanks!
I'm setting up a repo with an example here, using pybind11:
https://github.com/vaexio/vaex-arrow-ext
and I'll just try all possible combinations and report back.
cheers,
Maarten Breddels
Software engineer / consultant / data scientist
Python / C++ / Javascript / Jupyter
www.maartenb
Also no concrete answer, but one such example is turbodbc, I think.
But it seems they only have conda binary packages, and don't
distribute wheels ..
(https://turbodbc.readthedocs.io/en/latest/pages/getting_started.html),
so not that relevant as comparison (they also need to build against an
odbc d
Arrow Build Report for Job nightly-2020-07-02-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-02-0
Failed Tasks:
- test-conda-cpp-valgrind:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-02-0-github-test-conda-cpp-valgrind
Hi Maarten,
Le 02/07/2020 à 10:53, Maarten Breddels a écrit :
>
> Also, I see pyarrow distributes manylinux1/2010/2014 wheels. Would a vaex
> extension distributed as a 2010 wheel, and build with the pyarrow 2010
> wheel, work in an environment where someone installed a pyarrow 2014
> wheel, or
Hi,
again, in the process of adopting Arrow in Vaex, we need to have some
legacy c++ code in Vaex itself, and we might want to add some new functions
in c++ that might not be suitable for core Apache Arrow, or we need to ship
ourselves due to time constraints.
I am a bit worried about the C++ ABI
Hi,
in the process of adding Arrow support in Vaex (natively, not converting to
Numpy as we did before), one of our biggest pain points is (surprisingly)
the name mismatch between NumPy's .tolist() and Arrow's .to_pylist().
Especially in code that deals with both types of arrays, this is a bit of
35 matches
Mail list logo