[
https://issues.apache.org/jira/browse/YARN-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630472#comment-15630472
]
Vrushali C commented on YARN-5336:
----------------------------------
Some other interesting points to keep in mind:
As per https://hbase.apache.org/book.html#table_schema_rules_of_thumb , we
should aim to have cells no larger than 10 MB, or 50 MB if we use mob.
Otherwise, consider storing your cell data in HDFS and store a pointer to the
data in HBase.
Aim to have regions sized between 10 and 50 GB.
Aim to have cells no larger than 10 MB, or 50 MB if you use mob. Otherwise,
consider storing your cell data in HDFS and store a pointer to the data in
HBase.
A typical schema has between 1 and 3 column families per table. HBase tables
should not be designed to mimic RDBMS tables.
Around 50-100 regions is a good number for a table with 1 or 2 column families.
Remember that a region is a contiguous segment of a column family.
Keep your column family names as short as possible. The column family names are
stored for every value (ignoring prefix encoding). They should not be
self-documenting and descriptive like in a typical RDBMS.
About Medium sized objects (https://hbase.apache.org/book.html#hbase_mob)
While HBase can technically handle binary objects with cells that are larger
than 100 KB in size, HBase’s normal read and write paths are optimized for
values smaller than 100KB in size. When HBase deals with large numbers of
objects over this threshold, referred to here as medium objects, or MOBs,
performance is degraded due to write amplification caused by splits and
compactions. When using MOBs, ideally your objects will be between 100KB and
10MB. HBase FIX_VERSION_NUMBER adds support for better managing large numbers
of MOBs while maintaining performance, consistency, and low operational
overhead. MOB support is provided by the work done in HBASE-11339. To take
advantage of MOB, you need to use HFile version 3. Optionally, configure the
MOB file reader’s cache settings for each RegionServer (see Configuring the MOB
Cache), then configure specific columns to hold MOB data. Client code does not
need to change to take advantage of HBase MOB support. The feature is
transparent to the client.
> Put in some limit for accepting key-values in hbase writer
> ----------------------------------------------------------
>
> Key: YARN-5336
> URL: https://issues.apache.org/jira/browse/YARN-5336
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Vrushali C
> Assignee: Vrushali C
> Labels: YARN-5355
>
> As recommended by [~jrottinghuis] , need to add in some limit (default and
> configurable) for accepting key values to be written to the backend.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]