Re: Multi-dimensional Range Queries Help

2021-05-04 Thread Nick Dimiduk
Hi Kevin,

Did you get an answer to your question, maybe over on hbase-user?

As it seems you're aware, HBase is built on a single index -- the rowkey.
You may be able to implement something like MySQL's composite indexing on
HBase if the algorithm can be mapped to a 1-dimensional linear index. You
would have to implement this yourself as HBase doesn't offer this out of
the box. Such an encoding would be an interesting contribution to HBase, it
might sit over next to our other data encoding "types" in
`org.apache.hadoop.hbase.types`.

As for why your filtered queries are slow, you're the best person to start
answering that question. Is your data local to the region server that's
hosting it, or do you have multiple network hops and service
serialize/deserialize steps in your hot path? Is your index optimized for
your query (sounds like maybe not, based on the first question)? Have you
seen the Profiling Servlet [0]? You can start by setting that up, isolating
the workload, and collecting some FlameGraphs to analyze.

Thanks,
Nick

[0]: https://hbase.apache.org/book.html#profiler

On Mon, Apr 12, 2021 at 10:26 AM Kevin Wright 
wrote:

> Hi!
>
> Our application requires fast read queries that specify two ranges. One
> range on timestamps, and another on ids. We are currently using Apache
> HBase as our db, but we’re unsure how to optimally design the row keys /
> schemas. Currently, scanning over row key (the ids) with filter on
> timeranges is taking more time than what we expect. A normal query would
> probably have say 200 rows that match the id range, and about 10 rows that
> match both ranges, and we have currently on the order of 10s of millions of
> rows.
>
> We’re wondering if there’s something we can do to increase throughput with
> HBase (e.g., is there something like composite indexing like in MySQL?).
> Not sure if this is the best place to ask this, but if anyone could point
> us to the right direction, that would be great!
>
> Thank you!
>


Multi-dimensional Range Queries Help

2021-04-12 Thread Kevin Wright
Hi!

Our application requires fast read queries that specify two ranges. One
range on timestamps, and another on ids. We are currently using Apache
HBase as our db, but we’re unsure how to optimally design the row keys /
schemas. Currently, scanning over row key (the ids) with filter on
timeranges is taking more time than what we expect. A normal query would
probably have say 200 rows that match the id range, and about 10 rows that
match both ranges, and we have currently on the order of 10s of millions of
rows.

We’re wondering if there’s something we can do to increase throughput with
HBase (e.g., is there something like composite indexing like in MySQL?).
Not sure if this is the best place to ask this, but if anyone could point
us to the right direction, that would be great!

Thank you!


Multi-dimensional Range Queries Help

2021-04-12 Thread Kevin Wright
Hi!

Our application requires fast read queries that specify two ranges. One
range on timestamps, and another on ids. We are currently using Apache
HBase as our db, but we’re unsure how to optimally design the row keys /
schemas. Currently, scanning over row key (the ids) with filter on
timeranges is taking more time than what we expect. A normal query would
probably have say 200 rows that match the id range, and about 10 rows that
match both ranges, and we have currently on the order of 10s of millions of
rows.

We’re wondering if there’s something we can do to increase throughput with
HBase (e.g., is there something like composite indexing like in MySQL?).
Not sure if this is the best place to ask this, but if anyone could point
us to the right direction, that would be great!

Thank you!