Thank you,

TextVertexInputFormat has getEdges() method but EdgeInputFormat does not have (since it's not a vertex) and it does not support returning multiple edges per record. Normally, a row should have only one edge but in my case (Nutch2), we have multiple edges per row.

key: URL
value: ol:URL2, ol:URL3, ol:URL4, ...

Indicating multiple outlinks per row.

Is there a way to overcome this?



On 07/19/2013 01:03 AM, Avery Ching wrote:
I don't think it will be hard to implement. Just start with the HbaseVertexInputFormat and have it extend EdgeInputFormat. You can look at TableEdgeInputFormat for an example. It sounds like a good contribution to Giraph.

On 7/18/13 1:57 PM, Puneet Jain wrote:
I also need this feature. Will be really helpful.


On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    Question: Will there be HBaseEdgeInputFormat class or is there a
    restriction of HBase thus we can't implement it?

    HBaseVertexInputFormat is fine for vertex-centric reading, i.e.
    each row in HBase corresponds to one Vertex. But it does not
    allow me to create duplicate vertices with the same ID.
    Now I have the case "many rows in HBase can correspond to one
    Vertex, each representing sets of edges."

    Example:
    a1 - x y z
    a2 - t p
    a3 - k

    will be

    vertex "a" with edges to x y z t p k

    It gives me the intuition that if there existed
    HBaseEdgeInputFormat, I could solve this case. But it doesn't
    exist yet.

Reply via email to