p with porting (some of)
the C++ Datasets API
to Java, although I don't have to touch higher level DataSource-related stuffs
at my first
development iteration. So Francois, would you suggest to file a JIRA for
"Implement file-based
Datasets scan in Java" or something? I believe the fu
Le 28/11/2019 à 07:26, Hongze Zhang a écrit :
> Thanks for referencing this, Antoine. The concepts and principles seem to be
> pretty concrete so I
> may take some time to read it in detail.
>
> BTW I noticed that by the current discussion in ticket ARROW-7272[1] it's
> unlikely clear whether
. One
> > > could
> > > then create a facade on top of that for Java. For data reads, I can see
> > > either building a Flight server or directly use the JNI readers.
> >
> > Thanks for your suggestion but I'm not entirely getting it. Does this mean
> &
--
From:Francois Saint-Jacques
Send Time:2019年11月28日(星期四) 05:08
To:dev
Subject:Re: Datasets and Java
Hello Hongze,
The C++ implementation of dataset, notably Dataset, DataSource,
DataSourceDiscovery, and Scanner classes
l gRPC/Flight server process to deal with the
> metadata/data exchange problem between Java and C++ Datasets? If yes, then in
> some cases, doesn't it easily introduce bigger problems about life cycle and
> resource management of the processes? Please correct me if I misunderstood
s to deal with the
> metadata/data exchange problem between Java and C++ Datasets? If yes, then in
> some cases, doesn't it easily introduce bigger problems about life cycle and
> resource management of the processes? Please correct me if I misunderstood
> your point.
>
>
start some individual gRPC/Flight server process to deal with the metadata/data
exchange problem between Java and C++ Datasets? If yes, then in some cases,
doesn't it easily introduce bigger problems about life cycle and resource
management of the processes? Please correct me if I misunderstoo
e DataSource
> discovery system? Or just bridge the C++ arrow Parquet, Orc readers (as
> Micah said, orc-jni is
> already there) and reimplement everything needed by datasets in Java? This
> might be not that easy to
> decide but currently based on my limited perspective I would
readers (as Micah
said, orc-jni is
already there) and reimplement everything needed by datasets in Java? This
might be not that easy to
decide but currently based on my limited perspective I would prefer to get
started from the ScanTask
layer as a result we could leverage some valuable work
s the fix of ARROW-6952[1]. And as I currently work on
> Java/Scala projects like Spark, I am now investigating a way to call some
> of the datasets APIs in Java so that I could gain performance improvement
> from native dataset filters/projectors. Meantime I am also interested in
> the
d as I currently work on Java/Scala projects like
> Spark, I am now investigating a way to call some of the datasets APIs in Java
> so that I could gain performance improvement from native dataset
> filters/projectors. Meantime I am also interested in the ability of scanning
>
of the datasets APIs in Java
so that I could gain performance improvement from native dataset
filters/projectors. Meantime I am also interested in the ability of scanning
different data sources provided by dataset API.
Regarding using datasets in Java, my initial idea is to port (by writing
Java
12 matches
Mail list logo