---
*From:* Emilio Lahr-Vivaz
*Sent:* Wednesday, February 17, 2021 4:09 PM
*To:* user@accumulo.apache.org
*Subject:* Re: [External] Re: Accumulo and Arrow
I believe that was a theoretical - I don't think there has been any
actual integration at this point. But I'd be happy to be proven wrong :)
Th
ing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-standard/>.
*From:* Emilio Lahr-Vivaz
*Sent:* Wednesday, February 17, 2021 11:04 AM
*To:* user@accumulo.apache.org
*Subject:* [External] Re: Ac
Hello,
Do you have a link to describe the integration between HBase and Arrow?
I didn't find anything except some theoretical discussions. My
understanding is that Arrow is meant for in-memory representations, and
there is no plan to i.e. replace HFiles or RFiles with Arrow files in
You should be able to use a conditional writer to support 'put if
absent':
https://accumulo.apache.org/docs/2.x/getting-started/clients#conditionalwriter
Generally you would not want to repeatedly write the same key/value, as
you will have to scan every single versioned entry when you want to
Hello,
GeoMesa is a library (providing a GeoTools data store backed by
Accumulo, among other things), so there isn't a single entry point. We
could try to wrap every method call, but that would likely be complex
(the GeoTools API has a lot of surface area).
Would it make sense to create a
Hi Suresh, David,
I think I speculated that it *might* be a bug in GeoMesa, but I haven't
seen any evidence of that yet. David, do you still see those warnings
after cleaning up your close methods? In normal operation, I don't see
those, but possibly the access patterns you're using are
Another thing to consider is how many tablet servers the mutations are
being sent to - if they're all going to a single split, that's going to
reduce your throughput a lot.
On 07/15/2016 02:33 PM, dlmar...@comcast.net wrote:
The batch writer has several knobs (latency time, memory buffer, etc)
It sounds like each of your ranges is an ID, e.g. a single row. I've
found that scanning lots of non-sequential single-row ranges is pretty
slow in accumulo. Your best approach is probably to create an index
table on whatever you are originally trying to query (assuming those
1 ids came