We use the filtering for the Family, but the resulting Result is still too large.
Basically we have a super vertex problem. RowID ColF ColQ Value VertexID InEdge EdgeID VertexID We are working with an existing codebase so a scheme re-write would be painful and was hoping there was a simple solution we just haven't found. A Cell input format would let us look at the table as an Edge List instead of the Vertex list that the Result gives us. We are starting to look into a migration to a different scheme because of all of the other issues a super vertex gives. Ryan Webb -----Original Message----- From: Shahab Yunus [mailto:[email protected]] Sent: Monday, May 11, 2015 11:51 AM To: [email protected] Subject: Re: Mapping Over Cells You can specify the column family or column to read when you create the Scan object. Have you tried that? Does it make sense? Or I misunderstood your problem? http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addColumn(byte[],%20byte[]) http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addFamily(byte[]) Regards, Shahab On Mon, May 11, 2015 at 11:45 AM, Webb, Ryan L. <[email protected]> wrote: > Hello, > > We have a table in HBase that has very large rows and it goes OOM when > the table mapper attempts to read the entire row into a result. > > We would like to be able to map over each Cell in the table as a > solution and it is what we are doing in the map anyway. > Is this possible? Like the default behavior for Accumulo? > > We looked at the settings on Scan and didn't really see anything and > the source code of Result looks like it wraps an array of cells so the > data is already loaded at that point. > We are using HBase .98.1 and Hadoop 2 APIs > > Thanks > Ryan Webb > > PS - Sorry if this is a duplicate, I sent the first one before > subscribing so I don't know what the policy is with that. >
