Re: Datasets and Java

2019-11-28 Thread Hongze Zhang
p with porting (some of) the C++ Datasets API to Java, although I don't have to touch higher level DataSource-related stuffs at my first development iteration. So Francois, would you suggest to file a JIRA for "Implement file-based Datasets scan in Java" or something? I believe the fu

Re: Datasets and Java

2019-11-28 Thread Antoine Pitrou
Le 28/11/2019 à 07:26, Hongze Zhang a écrit : > Thanks for referencing this, Antoine. The concepts and principles seem to be > pretty concrete so I > may take some time to read it in detail. > > BTW I noticed that by the current discussion in ticket ARROW-7272[1] it's > unlikely clear whether

Re: Datasets and Java

2019-11-27 Thread Hongze Zhang
. One > > > could > > > then create a facade on top of that for Java. For data reads, I can see > > > either building a Flight server or directly use the JNI readers. > > > > Thanks for your suggestion but I'm not entirely getting it. Does this mean > &

Re: Datasets and Java

2019-11-27 Thread Ji Liu
-- From:Francois Saint-Jacques Send Time:2019年11月28日(星期四) 05:08 To:dev Subject:Re: Datasets and Java Hello Hongze, The C++ implementation of dataset, notably Dataset, DataSource, DataSourceDiscovery, and Scanner classes

Re: Datasets and Java

2019-11-27 Thread Francois Saint-Jacques
l gRPC/Flight server process to deal with the > metadata/data exchange problem between Java and C++ Datasets? If yes, then in > some cases, doesn't it easily introduce bigger problems about life cycle and > resource management of the processes? Please correct me if I misunderstood

Re: Datasets and Java

2019-11-27 Thread Antoine Pitrou
s to deal with the > metadata/data exchange problem between Java and C++ Datasets? If yes, then in > some cases, doesn't it easily introduce bigger problems about life cycle and > resource management of the processes? Please correct me if I misunderstood > your point. > >

Re: Datasets and Java

2019-11-27 Thread Hongze Zhang
start some individual gRPC/Flight server process to deal with the metadata/data exchange problem between Java and C++ Datasets? If yes, then in some cases, doesn't it easily introduce bigger problems about life cycle and resource management of the processes? Please correct me if I misunderstoo

Re: Datasets and Java

2019-11-27 Thread Micah Kornfield
e DataSource > discovery system? Or just bridge the C++ arrow Parquet, Orc readers (as > Micah said, orc-jni is > already there) and reimplement everything needed by datasets in Java? This > might be not that easy to > decide but currently based on my limited perspective I would

Re: Datasets and Java

2019-11-26 Thread Hongze Zhang
readers (as Micah said, orc-jni is already there) and reimplement everything needed by datasets in Java? This might be not that easy to decide but currently based on my limited perspective I would prefer to get started from the ScanTask layer as a result we could leverage some valuable work

Re: Datasets and Java

2019-11-26 Thread Micah Kornfield
s the fix of ARROW-6952[1]. And as I currently work on > Java/Scala projects like Spark, I am now investigating a way to call some > of the datasets APIs in Java so that I could gain performance improvement > from native dataset filters/projectors. Meantime I am also interested in > the

Re: Datasets and Java

2019-11-26 Thread Wes McKinney
d as I currently work on Java/Scala projects like > Spark, I am now investigating a way to call some of the datasets APIs in Java > so that I could gain performance improvement from native dataset > filters/projectors. Meantime I am also interested in the ability of scanning >

Datasets and Java

2019-11-26 Thread Hongze Zhang
of the datasets APIs in Java so that I could gain performance improvement from native dataset filters/projectors. Meantime I am also interested in the ability of scanning different data sources provided by dataset API. Regarding using datasets in Java, my initial idea is to port (by writing Java