On Mon, Jul 6, 2020 at 11:08 AM Antoine Pitrou wrote:
>
>
> Le 06/07/2020 à 17:57, Steve Kim a écrit :
> > The Parquet format specification is ambiguous about the exact details of
> > LZ4 compression. However, the *de facto* reference implementation in Java
> > (parquet-mr) uses the Hadoop LZ4
I would also be interested in having a reusable serialized format for
filter- and projection-like expressions. I think trying to go so far
as full logical query plans suitable for building a SQL engine is
perhaps a bit too far but we could start small with the use case from
the JNI Datasets PR as
This is something that I am also interested in.
My current approach in my personal project that uses Arrow is to use
protobuf to represent expressions (as well as logical and physical query
plans). I used the Gandiva protobuf definition as a starting point.
Protobuf works for going between
I have been following the discussion on a pull request (
https://github.com/apache/arrow/pull/7030) by Hongze Zhang to use the
high-level dataset API via JNI.
An obstacle that was encountered in this PR is that there is not a good way
to pass a filter expression via JNI. Expressions have a
> Would that keep compatibility with existing files produces by Parquet C++?
Changing the lz4 implementation to be compatible with parquet-mr/hadoop
would break compatibility with any existing files that were written by
Parquet C++ using lz4 compression. I believe that it is not possible to
Le 06/07/2020 à 17:57, Steve Kim a écrit :
> The Parquet format specification is ambiguous about the exact details of
> LZ4 compression. However, the *de facto* reference implementation in Java
> (parquet-mr) uses the Hadoop LZ4 codec.
>
> I think that it is important for Parquet c++ to have
Could you clarify what you mean by "without external libraries"? Do you
mean without using pyarrow and the arrow R package?
Neal
On Mon, Jul 6, 2020 at 1:40 AM Fan Liya wrote:
> Hi Teng,
>
> Arrow provides two formats for IPC between different languages: streaming
> and file.
> This article
The Parquet format specification is ambiguous about the exact details of
LZ4 compression. However, the *de facto* reference implementation in Java
(parquet-mr) uses the Hadoop LZ4 codec.
I think that it is important for Parquet c++ to have compatibility and
feature parity with parquet-mr when
Thanks Rok and Antoine,
I couldn't see what the issue could have been, so the SO link was
very helpful and informative.
I'll try it out, and submit a PR if I get it right.
On Mon, 6 Jul 2020 at 14:30, Antoine Pitrou wrote:
>
> Yes, that's certainly the case.
> Changing:
> values =
Yes, that's certainly the case.
Changing:
values = np.random.randint(lower, upper, size=size)
to:
values = np.random.randint(lower, upper, size=size, dtype=np.int64)
would hopefully fix the issue. Neville, could you try it out?
Thank you
Antoine.
Le 06/07/2020 à 14:16, Rok Mihevc a
Numpy on windows has different default bitwidth than on linux. Perhaps this
is causing the issue? (see:
https://stackoverflow.com/questions/36278590/numpy-array-dtype-is-coming-as-int32-by-default-in-a-windows-10-64-bit-machine
)
Rok
On Mon, Jul 6, 2020 at 12:57 PM Neville Dipale
wrote:
> Hi
Hi Arrow devs,
I'm trying to run archery integration tests in Windows 10 (Python 3.7.7;
conda 4.8.3), but I'm getting an error *ValueError: low is out of bounds
for int32* (https://gist.github.com/nevi-me/4946eabb2dc111e10b98c074b45b73b1
).
Has someone else encountered this problem before?
Arrow Build Report for Job nightly-2020-07-06-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-06-0
Failed Tasks:
- homebrew-cpp:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-06-0-travis-homebrew-cpp
-
Hi Teng,
Arrow provides two formats for IPC between different languages: streaming
and file.
This article gives a tutorial for Java:
https://arrow.apache.org/docs/java/ipc.html
For other languages, it may be helpful to read the test cases.
Best,
Liya Fan
On Sun, Jul 5, 2020 at 4:24 PM Teng
14 matches
Mail list logo