I'll check when I'm on site tomorrow, but our (much smaller) local cluster
is using the default hbase.hregion.max.filesize of 10 GB for HDP.
hbase.hregion.majorcompaction is set to 7 days, so I'm sure it would have
ran by now.
What would be the best filesize limit? Cloudera suggests having 20-200
It's a bit peculiar that you've got it pre-split to 10 salt buckets, but
seeing 400+ partitions. It sounds like HBase is splitting the regions on
you, possibly due to the 'hbase.hregion.max.filesize' setting. You should
be able to check the HBase Master UI and see the table details to see how
many
Jonathan,
I do check the queries using EXPLAIN, but it doesn't work the same in
Spark. In Spark, I can only see a very generic plan and it only tells me if
certain filters are pushed down to Phoenix or not. Query hints are ignored,
since they're first translated by the Spark or Hive query
Do an explain on your query to confirm that it's doing a full scan and not a
skip scan.
I typically use an in () clause instead of or, especially with compound keys. I
have also had to hint queries to use a skip scan, e.g /*+ SKIP_SCAN */.
Phoenix seems to do a very good job not reading data
Thanks for the quick reply, Josh!
For our demo cluster, we have 5 nodes, so the table was already set to 10
salt buckets. I know you can increase the salt buckets after the table is
created, but how do you change the split points? The repartition in Spark
seemed to be extremely inefficient, so we
Hi Mark,
At present, the Spark partitions are basically equivalent to the number of
regions in the underlying HBase table. This is typically something you can
control yourself, either using pre-splitting or salting (
https://phoenix.apache.org/faq.html#Are_there_any_tips_for_optimizing_Phoenix).
Our use case is to analyze images using Spark. The images are typically
~1MB each, so in order to prevent the small files problem in HDFS, we went
with HBase and Phoenix. For 20+ million images and metadata, this has been
working pretty well so far. Since this is pretty new to us, we didn't
create