I don't think HBase directly integrates with Parquet either. If you look
at the HBase documentation, the only reference to Parquet is related to
spark dataframe compatibility:
> HBase Dataframe is a standard Spark Dataframe, and is able to
interact with any other data sources such as Hive, Orc, Parquet, JSON, etc.
There are a lot of projects that integrate between the two (indeed,
GeoMesa will ingest and export both Parquet and Arrow through HBase and
Accumulo), but it's not a native component.
Thanks,
Emilio
On 2/18/21 8:15 AM, Roberts, Geoffry [USA] wrote:
They are saying that HBase uses Apache Parquet, which as I gather is
compatible with Arrow. I am just now spinning up on all this so bear
with me. As I understand it, Arrow is memory and Parquet is files.
I have a code base that is built around Accumulo. My code does a lot
in memory already. I like what Arrow has to offer from a polyglot
standpoint, but my data sets are, well, they're what people call
"big data" hence Accumulo. If HBase can handle the Arrow/Parquet
structure, why not Accumulo?
Good to be talking
------------------------------------------------------------------------
*From:* Emilio Lahr-Vivaz <elahrvi...@ccri.com>
*Sent:* Wednesday, February 17, 2021 4:09 PM
*To:* user@accumulo.apache.org <user@accumulo.apache.org>
*Subject:* Re: [External] Re: Accumulo and Arrow
I believe that was a theoretical - I don't think there has been any
actual integration at this point. But I'd be happy to be proven wrong :)
Thanks,
Emilio
On 2/17/21 12:17 PM, Roberts, Geoffry [USA] wrote:
This is where I saw a reference to Hbase
<https://urldefense.com/v3/__https://blog.cloudera.com/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-standard/__;!!May37g!ZB8PMax5pRwIM7nFl1H-Mp08wuwY5wrZFRlBWLpFpE_9dISxxitDG-watKobtJyhfuEg$>.
------------------------------------------------------------------------
*From:* Emilio Lahr-Vivaz <elahrvi...@ccri.com>
<mailto:elahrvi...@ccri.com>
*Sent:* Wednesday, February 17, 2021 11:04 AM
*To:* user@accumulo.apache.org <mailto:user@accumulo.apache.org>
<user@accumulo.apache.org> <mailto:user@accumulo.apache.org>
*Subject:* [External] Re: Accumulo and Arrow
Hello,
Do you have a link to describe the integration between HBase and
Arrow? I didn't find anything except some theoretical discussions. My
understanding is that Arrow is meant for in-memory representations,
and there is no plan to i.e. replace HFiles or RFiles with Arrow
files in HBase/Accumulo.
I'm interested in the intersection of the two, though. I'm a
committer on GeoMesa, and we provide a way to export Arrow files out
of both Accumulo and HBase using custom iterators/coprocessors.
GeoMesa is focused on spatial data though, so it may not fit with
your use case.
Thanks,
Emilio
On 2/17/21 8:13 AM, Roberts, Geoffry [USA] wrote:
All,
I have been looking into Apache Arrow. I see that it supports a
connect to HBase. I
Googled but found nothing wrt Accumulo. Is there, or is there
planned, support for Arrow/Accumulo?
Thanks