We use the filtering for the Family, but the resulting Result is still too 
large.

Basically we have a super vertex problem. 
RowID        ColF       ColQ       Value
VertexID   InEdge  EdgeID  VertexID

We are working with an existing codebase so a scheme re-write would be painful 
and was hoping there was a simple solution we just haven't found.

A Cell input format would let us look at the table as an Edge List instead of 
the Vertex list that the Result gives us. 

We are starting to look into a migration to a different scheme because of all 
of the other issues a super vertex gives.

Ryan Webb

-----Original Message-----
From: Shahab Yunus [mailto:[email protected]] 
Sent: Monday, May 11, 2015 11:51 AM
To: [email protected]
Subject: Re: Mapping Over Cells

You can specify the column family or column to read when you create the Scan 
object. Have you tried that? Does it make sense? Or I misunderstood your 
problem?

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addColumn(byte[],%20byte[])
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addFamily(byte[])

Regards,
Shahab

On Mon, May 11, 2015 at 11:45 AM, Webb, Ryan L. <[email protected]>
wrote:

> Hello,
>
> We have a table in HBase that has very large rows and it goes OOM when 
> the table mapper attempts to read the entire row into a result.
>
> We would like to be able to map over each Cell in the table as a 
> solution and it is what we are doing in the map anyway.
> Is this possible? Like the default behavior for Accumulo?
>
> We looked at the settings on Scan and didn't really see anything and 
> the source code of Result looks like it wraps an array of cells so the 
> data is already loaded at that point.
> We are using HBase .98.1 and Hadoop 2 APIs
>
> Thanks
> Ryan Webb
>
> PS - Sorry if this is a duplicate, I sent the first one before 
> subscribing so I don't know what the policy is with that.
>

Reply via email to