Hi Serkan,

This is my personal opinion and some might not share it ;)

I tried to go with the deep versions approach for one project and I found
issues on some of the calls (pagination over versions as an example). So if
for you both (The deep version and wide columns) are the same, I will say,
better go with the wide columns.

Also, why not good with tall table instead of wide?

JMS

Le sam. 30 mars 2019 à 01:14, Serkan Uzunbaz <uzun...@gmail.com> a écrit :

> Hi all,
> I have a question regarding the difference between storing a set of data
> as:
> *a) n columns with 1 version each*
> *b) 1 column with n versions*
>
> Since the storage unit in hbase is a cell (rowkey, column family, column
> qualifier, timestamp), is there a difference between the above two storage
> options in terms of read/write performance, compaction/GC time, etc?
>
> I know it is not recommended to use high number of versions if you do not
> really need them. However, if those n versions of data are really needed
> for reading, then will it cause any problem to store the data in a single
> column with n versions. Also, even if max versions is set to 1 for a column
> (option a), new values are still stored as a new cell and old cell is
> deleted at compaction time. So, I also feel like compaction-wise two
> options are identical.
> I wonder if there is anything that makes one option superior to the other.
>
> *Example*: To clarify more, say the data to be stored is set of urls
> visited in certain time ranges and we want to keep the last 100 hours of
> url sets:
>
> *a) store each hour as column name with one url set in it (column names
> will be used in cyclic manner (data for hour 101 will be written into
> column 1))*
> column_qualifier: value
> ---------------------------
> urls_hour1: <abc.com, xyz.com, ...>
> urls_hour2: <urls>
> urls_hour3: <urls>
> ...
> urls_hour100: <urls>
>
>
> *b) store in a single column with 100 versions (one for each hour) (max
> versions for column will be 100 and hbase will do the auto-compaction for
> old versions)*
> column_qualifier: value @ timestamp
> ---------------------------
> urls: <abc.com, xyz.com, ...> @ ts_hour1, <urls> @ ts_hour2, <urls> @
> ts_hour3, .... , <urls> @ ts_hour100
>
> Thanks,
> -Serkan
>

Reply via email to