Re: Mapping Over Cells

Michael Segel Mon, 11 May 2015 15:53:14 -0700

How large is the max file size? How large are your regions? How much memory are 
you allocating to your region server? 
How many rows are too large that cause the OOM error?


The key is trying to figure out how to help you without doing a slight schema 
change. 
(Adding an (Max Long - timestamp) to the row key and then counting the number 
of column qualifiers in the row.  Once you hit N, you write to a new row with a 
new timestamp.  When you want to insert, you just fetch the first rowkey in a 
small range scan, and count the current number of column qualifiers.  The 
difficult part is that you will have to manually merge the result set on read 
and if you have two rows with the same column qualifier, the one in the latest 
row wins. 

That will solve your too fat of a row problem if you could change schemas. 


> On May 11, 2015, at 11:04 AM, Webb, Ryan L. <[email protected]> wrote:
> 
> We use the filtering for the Family, but the resulting Result is still too 
> large.
> 
> Basically we have a super vertex problem. 
> RowID        ColF       ColQ       Value
> VertexID   InEdge  EdgeID  VertexID
> 
> We are working with an existing codebase so a scheme re-write would be 
> painful and was hoping there was a simple solution we just haven't found.
> 
> A Cell input format would let us look at the table as an Edge List instead of 
> the Vertex list that the Result gives us. 
> 
> We are starting to look into a migration to a different scheme because of all 
> of the other issues a super vertex gives.
> 
> Ryan Webb
> 
> -----Original Message-----
> From: Shahab Yunus [mailto:[email protected]] 
> Sent: Monday, May 11, 2015 11:51 AM
> To: [email protected]
> Subject: Re: Mapping Over Cells
> 
> You can specify the column family or column to read when you create the Scan 
> object. Have you tried that? Does it make sense? Or I misunderstood your 
> problem?
> 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addColumn(byte[],%20byte[])
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addFamily(byte[])
> 
> Regards,
> Shahab
> 
> On Mon, May 11, 2015 at 11:45 AM, Webb, Ryan L. <[email protected]>
> wrote:
> 
>> Hello,
>> 
>> We have a table in HBase that has very large rows and it goes OOM when 
>> the table mapper attempts to read the entire row into a result.
>> 
>> We would like to be able to map over each Cell in the table as a 
>> solution and it is what we are doing in the map anyway.
>> Is this possible? Like the default behavior for Accumulo?
>> 
>> We looked at the settings on Scan and didn't really see anything and 
>> the source code of Result looks like it wraps an array of cells so the 
>> data is already loaded at that point.
>> We are using HBase .98.1 and Hadoop 2 APIs
>> 
>> Thanks
>> Ryan Webb
>> 
>> PS - Sorry if this is a duplicate, I sent the first one before 
>> subscribing so I don't know what the policy is with that.
>> 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Mapping Over Cells

Reply via email to