Thank you,
TextVertexInputFormat has getEdges() method but EdgeInputFormat does not
have (since it's not a vertex) and it does not support returning
multiple edges per record. Normally, a row should have only one edge but
in my case (Nutch2), we have multiple edges per row.
key: URL
value: ol:URL2, ol:URL3, ol:URL4, ...
Indicating multiple outlinks per row.
Is there a way to overcome this?
On 07/19/2013 01:03 AM, Avery Ching wrote:
I don't think it will be hard to implement. Just start with the
HbaseVertexInputFormat and have it extend EdgeInputFormat. You can
look at TableEdgeInputFormat for an example. It sounds like a good
contribution to Giraph.
On 7/18/13 1:57 PM, Puneet Jain wrote:
I also need this feature. Will be really helpful.
On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ
<[email protected] <mailto:[email protected]>> wrote:
Hi,
Question: Will there be HBaseEdgeInputFormat class or is there a
restriction of HBase thus we can't implement it?
HBaseVertexInputFormat is fine for vertex-centric reading, i.e.
each row in HBase corresponds to one Vertex. But it does not
allow me to create duplicate vertices with the same ID.
Now I have the case "many rows in HBase can correspond to one
Vertex, each representing sets of edges."
Example:
a1 - x y z
a2 - t p
a3 - k
will be
vertex "a" with edges to x y z t p k
It gives me the intuition that if there existed
HBaseEdgeInputFormat, I could solve this case. But it doesn't
exist yet.