Hi Peter, Good questions.
1) If you only specify an EdgeInputFormat, vertex values will be initialized to their type's default value. You can also specify a VertexValueInputFormat, which is just a more convenient API around VertexInputFormat to read vertex values. 2) They will be created as they receive the first message, unless you override VertexResolver with some other behavior. 3) In general vertex input is more efficient because of what you said, and because it's a more compact representation. However, if your original dataset is in the form of a list of edges, the additional step of grouping them by source vertex might be more expensive than doing that in Giraph (depending on your infrastructure). 4) We don't have a way to enforce which worker will read what splits, so I think in general you can expect most of the data to be shuffled across workers. Alessandro Sent from my iPhone On Jan 20, 2013, at 6:12 AM, "Peter Morgan" <[email protected]> wrote: > I'm interested in hearing about the differences in loading using the edge and > vertex inputs. In particular, I have a few questions: > > 1) How can vertex state be set using edge input format? > 2) How are vertices with only in-edges initialised using edge input format? > 3) Is either vertex or edge input more efficient for loading? I guess less > needs to be shuffled around the network using vertex input? > 4) If your adjacency list for vertex input is pre-partitioned, does this > decrease loading time as again vertices don't need to be shuffled across the > network? > > Thanks in advance for any help. > Peter
