Github user rnowling commented on the pull request:
https://github.com/apache/spark/pull/2168#issuecomment-53727924
thanks @ankurdave . I'd also like to recommend @srowen as a reviewer since
he made a few contributions to the GraphGenerator in the past.
I've made the following changes based on your feedback:
* Added documentation for logNormalGraph parameters
* Add optional seed parameter to SynthBenchmark with a default of
generating random seeds (more below)
I also found and fixed the following bugs:
* Fixed a bug in generateRandomEdges where the number of edges produced was
not the out-degree but the total number of vertices -- now produces the correct
number of edges.
* Updated SynthBenchmark docs to correct nverts parameter name
* Fixed failure to call count() before printing result of connected
components in SynthBenchmark
and I started unit tests for GraphGenerators -- only covers the
logNormalGraph, generateRandomEdges, and sampleLogNormal functions for now but
it's a start.
I understand the concern about seeding and benchmarks. However, I also
think the use of optional parameters for seeds can be subtle and problematic.
Most random functions (RNGs, etc.) generate a random seed when one is not
given, so users expect that. I'm often tripped up when the default is a fixed
seed. Since users may use logNormalGraph for statistical testing and other
work, I would highly prefer that the default is a random seed.
As for SynthBenchmark, I followed the same principle of least surprise.
When no seed is given, one is randomly generated. If you feel the seed should
be hard-coded in the benchmark by default, however, I'd be happy to change that.
At this stage, I don't want to introduce backwards-incompatible API changes
unless we know the new API will be there for some time. When a new API is
introduced, the seed parameter should be required (rather than optional) to
remove the ambiguity and subtly of the issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]