On Wed, Jun 26, 2019 at 1:08 PM Vitaliy Semochkin <vitaliy...@gmail.com> wrote:
> Hi, > > I have an analytical report that would be very easy to build > if I could store thousands of cells in one row each cell storing about > 2kb of information. > I don't need those rows to be stored in any cache, because they will > be used only occasionally for analytical reports in Flink. > > The question is, what is the biggest size of a row hbase can handle? > Should I store 2kb rows as MOBs or regular format is ok? > > There are old articles that say that large rows, i.e. rows which total > size is large than 10mb, can affect hbase performance, > is this statement still valid for the modern hbase versions? > What is the largest row size hbase handle theses days without having > issues with performance? > Is it possible to read a row so that it's whole content is not read > into memory (e.g I would like to read row's content cell by cell)? > > See https://hbase.apache.org/2.0/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAllowPartialResults-boolean- It speaks to your question. See the 'See Also:' on this method too. Only works for Scan. Doesn't work if you Get a row (You could Scan one row only if you need the above partial result). HBase has no 'streaming' API that would allow you return a Cell-at-a-time so big rows are a problem if you don't do the above partial. The big row is materialized serverside in memory and then again client-side. 10MB is a conservative upper bound. 2kb Cells should work nicely -- even if a few thousand... especially if you can use partial. S > Best Regards > Vitaliy >