Re: Differences with Edge and Vertex Input Format

Alessandro Presta Sun, 20 Jan 2013 08:43:17 -0800

Hi Peter,

Good questions.

1) If you only specify an EdgeInputFormat, vertex values will be initialized to 
their type's default value. You can also specify a VertexValueInputFormat, 
which is just a more convenient API around VertexInputFormat to read vertex 
values.

2) They will be created as they receive the first message, unless you override 
VertexResolver with some other behavior.

3) In general vertex input is more efficient because of what you said, and 
because it's a more compact representation. However, if your original dataset 
is in the form of a list of edges, the additional step of grouping them by 
source vertex might be more expensive than doing that in Giraph (depending on 
your infrastructure).

4) We don't have a way to enforce which worker will read what splits, so I 
think in general you can expect most of the data to be shuffled across workers.

Alessandro

Sent from my iPhone

On Jan 20, 2013, at 6:12 AM, "Peter Morgan" <[email protected]> wrote:

> I'm interested in hearing about the differences in loading using the edge and 
> vertex inputs. In particular, I have a few questions:
> 
> 1) How can vertex state be set using edge input format?
> 2) How are vertices with only in-edges initialised using edge input format?
> 3) Is either vertex or edge input more efficient for loading? I guess less 
> needs to be shuffled around the network using vertex input?
> 4) If your adjacency list for vertex input is pre-partitioned, does this 
> decrease loading time as again vertices don't need to be shuffled across the 
> network?
> 
> Thanks in advance for any help.
> Peter

Re: Differences with Edge and Vertex Input Format

Reply via email to