Thanks for referencing this, Antoine. The concepts and principles seem to be
pretty concrete so I
may take some time to read it in detail.
BTW I noticed that by the current discussion in ticket ARROW-7272[1] it's
unlikely clear whether
this one or ipc flatbuffers could be a better approach for J
Hi Aaron,
The schema is immutable, add_metadata returns a new schema object which
includes the metadata.
So I think this does what you want:
schema = schema.add_metadata(meta)
If not, experts will chime in hopefully.
Cheers,
Maarten.
> On Nov 28, 2019, at 12:41 AM, Aaron Chu wrote:
>
> De
Dear all,
I need your help regarding the pyarrow.table.schema.
I tried to create a schema and use with_metadata/add_metadata functions to
add the metadata (a python dict) to the schema. However, nothing showed up
when I run 'schema.metadata'. I can't get the metadata added to the schema.
This is
Hi Francois,
Thanks for the proposal and your effort.
I made a simple JNI poc before for RecordBatch/VectorSchemaRoot interaction
between Java and C++[1][2].
This may help a little.
Thanks,
Ji Liu
[1] https://github.com/tianchen92/jni-poc-java
[2] https://github.com/tianchen92/jni-poc-cpp
hi,
There have been a number of discussions over the years about on-disk
pre-allocation strategies. No volunteers have implemented anything,
though. Developing an HDF5 integration library with pre-allocation and
buffer management utilities seems like a reasonable growth area for
the project. The f
Thanks for the feedback.
I do think if we had explicitly embraced gRPC from the beginning,
there are a lot of places where things could be made more ergonomic,
including with the metadata fields. But it would also have locked out
us of potential future transports.
On another note: I hesitate to p
Hello Hongze,
The C++ implementation of dataset, notably Dataset, DataSource,
DataSourceDiscovery, and Scanner classes are not ready/designed for
distributed computing. They don't serialize and they reference by
pointer all around, thus I highly doubt that you can implement parts
in Java, and some
Attendees:
- Micah Kornfield, Google
- Praveen Kumar, Dremio
- Todd Hendricks
- François Saint-Jacques RStudio/Ursa Labs
Subject
- Bazel. Micah wants feedback on the PR. This first is aimed a
developer productivity, notably shorter link time and sandboxed build.
As a first PoC, parts of the python
Francois Saint-Jacques created ARROW-7272:
-
Summary: [C++][Java] JNI bridge between RecordBatch and
VectorSchemaRoot
Key: ARROW-7272
URL: https://issues.apache.org/jira/browse/ARROW-7272
Proje
On Tue, Nov 26, 2019 at 9:40 AM Maarten Breddels
wrote:
>
> Op di 26 nov. 2019 om 15:02 schreef Wes McKinney :
>
> > hi Maarten
> >
> > I opened https://issues.apache.org/jira/browse/ARROW-7245 in part based
> > on this.
> >
> > I think that normalizing to a common type (which would require castin
>
> I don't get how this is a cycle. It only means Bazel is too limited to
> distinguish between a header dependency and a C++ module?
Agreed, this isn't a true cycle, but bazel is opinionated about this (i.e.
forces workarounds). In the example I highlighted it might have been
cleaner to take
Fair enough. I'm okay with the bytes approach and the proposal looks good
to me.
On Fri, Nov 8, 2019 at 11:37 AM David Li wrote:
> I've updated the proposal.
>
> On the subject of Protobuf Any vs bytes, and how to handle
> errors/metadata, I still think using bytes is preferable:
> - It doesn't
https://meet.google.com/vtm-teks-phx
I'm unable to join on account of the Thanksgiving holiday, but others
are welcome to discuss and share call notes after
Le 27/11/2019 à 06:16, Micah Kornfield a écrit :
>
>> Can you give an example of circular dependency? Can this be solved by
>> having more "type_fwd.h" headers for forward declarations of opaque types?
>
> I think the type_fwd.h might contribute to the problem. The solution would
> be more gr
The flight compilation error occurring in the Conda builds
are caused by a recent protobuf conda-forge update and
should be fixed by https://github.com/apache/arrow/pull/5917
On Wed, Nov 27, 2019 at 2:01 PM Crossbow wrote:
>
> Arrow Build Report for Job nightly-2019-11-27-0
>
> All tasks:
> http
Krisztian Szucs created ARROW-7271:
--
Summary: [C++][Flight] Use the single parameter version of
SetTotalBytesLimit
Key: ARROW-7271
URL: https://issues.apache.org/jira/browse/ARROW-7271
Project: Apach
Arrow Build Report for Job nightly-2019-11-27-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-27-0
Failed Tasks:
- homebrew-cpp:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-27-0-travis-homebrew-cpp
- test-conda-cpp:
U
To set up bridges between Java and C++, the C data interface
specification may help:
https://github.com/apache/arrow/pull/5442
There's an implementation for C++ here, and it also includes a Python-R
bridge able to share Arrow data between two different runtimes (i.e.
PyArrow and R-Arrow were com
Hi Micah,
Regarding our use cases, we'd use the API on Parquet files with some pushed
filters and projectors, and we'd extend the C++ Datasets code to provide
necessary support for our own data formats.
> If JNI is seen as too cumbersome, another possible avenue to pursue is
> writing a gRPC
Jiajia Li created ARROW-7269:
Summary: [C++] Fix arrow::parquet compiler warning
Key: ARROW-7269
URL: https://issues.apache.org/jira/browse/ARROW-7269
Project: Apache Arrow
Issue Type: Improvemen
Sebastien Binet created ARROW-7270:
--
Summary: [Go] preserve CSV reading behaviour, improve memory usage
Key: ARROW-7270
URL: https://issues.apache.org/jira/browse/ARROW-7270
Project: Apache Arrow
Hi Hongze,
I have a strong preference for not porting non-trivial logic from one
language to another, especially if the main goal is performance. I think
this will replicate bugs and cause confusion if inconsistencies occur. It
is also a non-trivial amount of work to develop, review, setup CI, et
22 matches
Mail list logo