Re: Implementing and optimizing a simple graph metric

2013-09-27 Thread Joseph Rushton Wakeling

On Thursday, 26 September 2013 at 22:03:12 UTC, bearophile wrote:

Joseph Rushton Wakeling:


T is qualified via isFloatingPoint :-)


I know, but that qualification could change in future 
evolutions of your code. Strong type safety means that if you 
change a type in your code, with a localized change (like 
removing isFloatingPoint at the top of your function) the whole 
code that depends on that type (like the body of this function 
of yours) will keep working as safely as before :-)


OK, I accept your argument :-)

As things stand I'm inclined to leave the code using "new" for 
now. I'll see if I can work out anything about why 
minimallyInitializedArray might be problematic -- at a guess, 
perhaps it accidentally gets in the way of some potential 
optimizations?


Re: Implementing and optimizing a simple graph metric

2013-09-26 Thread bearophile

Joseph Rushton Wakeling:


T is qualified via isFloatingPoint :-)


I know, but that qualification could change in future evolutions 
of your code. Strong type safety means that if you change a type 
in your code, with a localized change (like removing 
isFloatingPoint at the top of your function) the whole code that 
depends on that type (like the body of this function of yours) 
will keep working as safely as before :-)


Bye,
bearophile


Re: Implementing and optimizing a simple graph metric

2013-09-26 Thread Joseph Rushton Wakeling

On Thursday, 26 September 2013 at 21:29:42 UTC, bearophile wrote:
You also have arrays of T. Someday T could be something with 
indirections :-) So minimallyInitializedArray is safer 
regarding future changes in your code.


T is qualified via isFloatingPoint :-)


Re: Implementing and optimizing a simple graph metric

2013-09-26 Thread bearophile

Joseph Rushton Wakeling:


So it's bizarre that there is a performance difference here.


Bizarre findings deserve more experiments and studies :-)


Incidentally, this was why I felt safe using uninitializedArray 
for arrays of e.g. long, size_t, etc., because the only 
difference at the end of the function could be whether the 
values contained in the array have been set or not.  As I do 
ensure that any array entry used also has its value set, it 
should be OK.


You also have arrays of T. Someday T could be something with 
indirections :-) So minimallyInitializedArray is safer regarding 
future changes in your code.


Bye,
bearophile


Re: Implementing and optimizing a simple graph metric

2013-09-26 Thread Joseph Rushton Wakeling

On Thursday, 26 September 2013 at 20:56:39 UTC, bearophile wrote:

Joseph Rushton Wakeling:

I have not found this -- using minimallyInitializedArray for 
the arrays of built-in types is slower than if I use 
uninitializedArray.


Then minimallyInitializedArray should be improved :-)


It's odd, because the two actually use the same underlying 
function, with minimallyInitializedArray potentially triggering 
this section:


else static if(minimallyInitialized && hasIndirections!E)
{
ret[] = E.init;
}

... but if E is (say) an int, or a size_t, or any other built-in 
type, this should not be triggered -- right?


So it's bizarre that there is a performance difference here.

Incidentally, this was why I felt safe using uninitializedArray 
for arrays of e.g. long, size_t, etc., because the only 
difference at the end of the function could be whether the values 
contained in the array have been set or not.  As I do ensure that 
any array entry used also has its value set, it should be OK.


Re: Implementing and optimizing a simple graph metric

2013-09-26 Thread bearophile

Joseph Rushton Wakeling:

I have not found this -- using minimallyInitializedArray for 
the arrays of built-in types is slower than if I use 
uninitializedArray.


Then minimallyInitializedArray should be improved :-)


You seem to be suggesting that using uninitializedArray could 
cause general slowdown,


I am saying that in general uninitializedArray is less safe.

Bye,
bearophile


Re: Implementing and optimizing a simple graph metric

2013-09-26 Thread Joseph Rushton Wakeling

On Tuesday, 24 September 2013 at 22:14:30 UTC, bearophile wrote:
minimallyInitializedArray is not stupid, if the specified type 
has no indirections, it's equivalent to using 
uninitializedArray, but it's safer if you later change the 
type. So in general it's not a good idea to use 
uninitializedArray, unless you have special needs. The two 
functions are not equivalent, one of them is for normal 
performance tuning, and the other is for special usages.


I have not found this -- using minimallyInitializedArray for the 
arrays of built-in types is slower than if I use 
uninitializedArray.


These arrays have their values initialized immediately -- this is 
the actual code from inside the betweenness centrality function:


size_t[] stack = uninitializedArray!(size_t[])(g.vertexCount);
T[] sigma = uninitializedArray!(T[])(g.vertexCount);
T[] delta = uninitializedArray!(T[])(g.vertexCount);
long[] d = uninitializedArray!(long[])(g.vertexCount);
auto q = VertexQueue(g.vertexCount);
Appender!(size_t[])[] p = 
minimallyInitializedArray!(Appender!(size_t[])[])(g.vertexCount);


sigma[] = to!T(0);
delta[] = to!T(0);
d[] = -1L;

... so I don't see the safety problem here.  uninitializedArray 
is used only for arrays of built-in types, 
minimallyInitializedArray is used otherwise.


On the other hand, if inside the VertexQueue implementation, I 
replace the "new" declaration with an uninitializedArray call, 
the code gets slower by a noticeable amount.


Very odd -- any ideas why that might be?


See above, use uninitializedArray only in special situations 
and when you know what you are doing. Here you do not know what 
you are doing, so use minimallyInitializedArray.


uninitializedArray or minimallyInitializedArray, using either 
inside the VertexQueue creates a significant slowdown.


uninitializedArray creates random pointers, and aliases that 
could increase the work done by the GC and cause (temporary if 
you initialized all your data) memory leaks. Try to totally 
disable the GC and time the two versions of the code, with and 
without uninitializedArray. If the GC is the cause of speed 
differences and you disable it, you will see no performance 
difference any more.


You seem to be suggesting that using uninitializedArray could 
cause general slowdown, but in general it results in a speedup 
compared to minimallyInitializedArray.


In the one case where it causes a slowdown, 
minimallyInitializedArray does too, and by a similar amount.


Re: Implementing and optimizing a simple graph metric

2013-09-24 Thread bearophile

Joseph Rushton Wakeling:

As an experiment I tried something along these lines -- using 
uninitializedArray for most of the arrays here and 
minimallyInitializedArray for p.


minimallyInitializedArray is not stupid, if the specified type 
has no indirections, it's equivalent to using uninitializedArray, 
but it's safer if you later change the type. So in general it's 
not a good idea to use uninitializedArray, unless you have 
special needs. The two functions are not equivalent, one of them 
is for normal performance tuning, and the other is for special 
usages.



On the other hand, if inside the VertexQueue implementation, I 
replace the "new" declaration with an uninitializedArray call, 
the code gets slower by a noticeable amount.


Very odd -- any ideas why that might be?


See above, use uninitializedArray only in special situations and 
when you know what you are doing. Here you do not know what you 
are doing, so use minimallyInitializedArray.


uninitializedArray creates random pointers, and aliases that 
could increase the work done by the GC and cause (temporary if 
you initialized all your data) memory leaks. Try to totally 
disable the GC and time the two versions of the code, with and 
without uninitializedArray. If the GC is the cause of speed 
differences and you disable it, you will see no performance 
difference any more.


Bye,
bearophile


Re: Implementing and optimizing a simple graph metric

2013-09-24 Thread Joseph Rushton Wakeling

On Wednesday, 18 September 2013 at 13:39:29 UTC, bearophile wrote:
auto centrality = 
minimallyInitializedArray!(typeof(return))(g.vertexCount);

centrality[] = T0;
auto stack = new size_t[g.vertexCount];
auto sigma = minimallyInitializedArray!T(g.vertexCount);
sigma[] = T0;
auto delta = minimallyInitializedArray!T(g.vertexCount);
delta[] = T0;
auto d = minimallyInitializedArray!long(g.vertexCount);
d[] = -1;
auto q = VertexQueue(g.vertexCount);
auto p = new Appender!(size_t[])[g.vertexCount];


As an experiment I tried something along these lines -- using 
uninitializedArray for most of the arrays here and 
minimallyInitializedArray for p.


It seems to result in a small speed improvement, although I'm not 
certain about that -- certainly not a speed loss.


On the other hand, if inside the VertexQueue implementation, I 
replace the "new" declaration with an uninitializedArray call, 
the code gets slower by a noticeable amount.


Very odd -- any ideas why that might be?


Re: Implementing and optimizing a simple graph metric

2013-09-19 Thread Joseph Rushton Wakeling

On Wednesday, 18 September 2013 at 17:13:28 UTC, bearophile wrote:
How many times or how often do you need to call betweenness()? 
If it's called only few times or once in a while then using the 
GC is good enough. But if you have to call it many millions of 
times, all those GC array allocations could cost some time, 
even if they are small arrays. In this case allocating them on 
the stack (or not allocating them, using externally allocated 
buffers) helps.


I think millions of times is excessive. In my test code I have an 
example with a 50-node network calculating betweenness centrality 
1 times, which takes under 1.5 seconds. So obviously if you 
scale up to millions, small improvements could make a difference, 
but it's not going to be on such a scale that it's worth 
prioritizing, at least for now. I think even 1 calculations 
is excessive for any practical case I can think of.


You're right to call for the opportunity to pass a buffer, 
though, and I've enabled that. In the current test code it 
doesn't seem to improve anything, but that may be simply a matter 
of scale.


Re: Implementing and optimizing a simple graph metric

2013-09-18 Thread Joseph Rushton Wakeling

On Wednesday, 18 September 2013 at 13:39:29 UTC, bearophile wrote:

- Try to optionally accept the buffers from outside.


Does this look good to you?

/

auto ref betweenness(T = double, Graph)(ref Graph g, bool[] 
ignore = null)

if (isFloatingPoint!T && isGraph!Graph)
{
T[] centrality = new T[g.vertexCount];
return betweenness!(T, Graph)(g, centrality, ignore);
}

auto ref betweenness(T = double, Graph)(ref Graph g, ref T[] 
centrality, bool[] ignore = null)

{
centrality.length = g.vertexCount;
centrality[] = to!T(0);
// ... the rest as before
}

/


Re: Implementing and optimizing a simple graph metric

2013-09-18 Thread bearophile

Joseph Rushton Wakeling:

in the case of this code I think that stack vs. heap probably 
isn't that important.  If the data is small enough for stack 
allocation, the calculation will be quick anyway.


How many times or how often do you need to call betweenness()? If 
it's called only few times or once in a while then using the GC 
is good enough. But if you have to call it many millions of 
times, all those GC array allocations could cost some time, even 
if they are small arrays. In this case allocating them on the 
stack (or not allocating them, using externally allocated 
buffers) helps.


Bye,
bearophile


Re: Implementing and optimizing a simple graph metric

2013-09-18 Thread Joseph Rushton Wakeling
On Wednesday, 18 September 2013 at 15:17:25 UTC, Joseph Rushton 
Wakeling wrote:
I think I did give std.array.Array a trial when trying to speed 
up its performance, and I don't remember it making any 
difference (if anything it may have slowed things down).  But 
I'll give it a second look and report back.


I tried rewriting the variable declarations:

Array!T centrality;
centrality.length = g.vertexCount;
centrality[] = to!T(0);
Array!size_t stack;
stack.length = g.vertexCount;
Array!T sigma;
sigma.length = g.vertexCount;
Array!T delta;
delta.length = g.vertexCount;
Array!long d;
d.length = g.vertexCount;
auto q = VertexQueue(g.vertexCount);
Appender!(size_t[])[] p = new 
Appender!(size_t[])[g.vertexCount];


It results in a moderate slowdown of the code, at a guess because 
it increases the total number of mallocs/deallocs.


I note that there seems to be no way to initialize 
std.container.Array as having a certain length ... ?


I tried instead using std.array.uninitializedArray to initialize 
the arrays in the betweenness() function -- it was a bit 
difficult to judge here: with a relatively small number of calls 
to betweenness() it seems to have resulted in a speedup, but with 
very many calls, it seems to have been slower.


Re: Implementing and optimizing a simple graph metric

2013-09-18 Thread Joseph Rushton Wakeling

On Wednesday, 18 September 2013 at 15:22:51 UTC, bearophile wrote:

Joseph Rushton Wakeling:

I haven't yet tried alloca or other manual memory management 
-- I felt a bit resistant to this as I'd prefer to keep the 
code simple and readable -- but I'll give that a go too just 
to see how it goes.


I'd like some stack-allocated variable-length array in 
D+Phobos, as in C++14. It's also a basis to build several other 
stack-allocated data structures.


Yes, that'd be useful, although in the case of this code I think 
that stack vs. heap probably isn't that important.  If the data 
is small enough for stack allocation, the calculation will be 
quick anyway.


However, my impression now is that having or not having it, or 
any other method (e.g. enum T0, T1, etc.) makes absolutely no 
difference, and that I might as well be simple and write 0 for 
integral-types, 0.0 for floating-point.


Right. And this could be right even if you want T=Rational.


Actually, on re-testing this just now, I'm returning to my 
original view.  I find that if you put in raw floating-point 
numbers like 0.0, 1.0 etc. the resulting code can be slower in 
the case where T = float.  Putting to!T or using enums as you 
suggest appears to make no difference.


This is a guess rather than confirmed by looking at assembly, but 
I'm presuming that to!T(0) and other conversions of compile-time 
constants are evaluated at compile time, so you don't in practice 
carry the cost you'd normally get from using to().


Re: Implementing and optimizing a simple graph metric

2013-09-18 Thread bearophile

Joseph Rushton Wakeling:

I haven't yet tried alloca or other manual memory management -- 
I felt a bit resistant to this as I'd prefer to keep the code 
simple and readable -- but I'll give that a go too just to see 
how it goes.


I'd like some stack-allocated variable-length array in D+Phobos, 
as in C++14. It's also a basis to build several other 
stack-allocated data structures.



However, my impression now is that having or not having it, or 
any other method (e.g. enum T0, T1, etc.) makes absolutely no 
difference, and that I might as well be simple and write 0 for 
integral-types, 0.0 for floating-point.


Right. And this could be right even if you want T=Rational.

Bye,
bearophile


Re: Implementing and optimizing a simple graph metric

2013-09-18 Thread Joseph Rushton Wakeling

On Wednesday, 18 September 2013 at 13:39:29 UTC, bearophile wrote:
Just for a test, try to allocate all those arrays in a 
different way:

- First try a std.array.Array using the LDC2 compiler;
- Another thing to try is to allocate them on the stack using 
core.stdc.stdlib.alloca:

auto p = cast(T*)alloca(T.sizeof * g.vertexCount);
if (!p) throw new MemoryError...
auto centrality = p[0 .. g.vertexCount];
This probably can't be used for very large graphs, so you have 
to add even more logic to allocate the arrays on the C heap if 
they are too much large. This could be useful if you call 
betweenness() many many times.

- Try to optionally accept the buffers from outside.


Nice suggestions, I'll try those out! :-)

I think I did give std.array.Array a trial when trying to speed 
up its performance, and I don't remember it making any difference 
(if anything it may have slowed things down).  But I'll give it a 
second look and report back.


I haven't yet tried alloca or other manual memory management -- I 
felt a bit resistant to this as I'd prefer to keep the code 
simple and readable -- but I'll give that a go too just to see 
how it goes.


It's interesting that you pick up on my use of to!T(0) etc.  I 
had some concern about whether that would affect speed at all.  
At the time of originally writing the code, my impression was 
that not having it actually made things worse, and that the 
compiler was smart enough to carry out the conversion at 
compile-time.


However, my impression now is that having or not having it, or 
any other method (e.g. enum T0, T1, etc.) makes absolutely no 
difference, and that I might as well be simple and write 0 for 
integral-types, 0.0 for floating-point.


Re: Implementing and optimizing a simple graph metric

2013-09-18 Thread bearophile

Joseph Rushton Wakeling:

http://braingam.es/2013/09/betweenness-centrality-in-dgraph/


Some small improvements:


T[] betweenness(Graph, T = double)(ref Graph g, bool[] ignore = 
null)

if (isGraph!Graph && isFloatingPoint!T)
{
enum T T0 = 0;
enum T T1 = 1;
auto centrality = 
minimallyInitializedArray!(typeof(return))(g.vertexCount);

centrality[] = T0;
auto stack = new size_t[g.vertexCount];
auto sigma = minimallyInitializedArray!T(g.vertexCount);
sigma[] = T0;
auto delta = minimallyInitializedArray!T(g.vertexCount);
delta[] = T0;
auto d = minimallyInitializedArray!long(g.vertexCount);
d[] = -1;
auto q = VertexQueue(g.vertexCount);
auto p = new Appender!(size_t[])[g.vertexCount];

foreach (immutable s; 0 .. g.vertexCount)
{
if (ignore && ignore[s])
{
continue;
}

size_t stackLength = 0;
assert(q.empty);
sigma[s] = T1;
d[s] = 0;
q.push(s);

while (!q.empty)
{
immutable size_t v = q.front;
q.pop;
stack[stackLength] = v;
++stackLength;
foreach (immutable w; g.neighboursOut(v))
{
if (ignore && ignore[w])
{
continue;
}

if (d[w] < 0)
{
q.push(w);
d[w] = d[v] + 1;
assert(sigma[w] == T0);
sigma[w] = sigma[v];
p[w].clear;
p[w].put(v);
}
else if (d[w] > (d[v] + 1))
{
/* w has already been encountered, but we've
   found a shorter path to the source.  This
   is probably only relevant to the weighted
   case, but let's keep it here in order to
   be ready for that update. */
d[w] = d[v] + 1L;
sigma[w] = sigma[v];
p[w].clear;
p[w].put(v);
}
else if (d[w] == (d[v] + 1))
{
sigma[w] += sigma[v];
p[w].put(v);
}
}
}

while (stackLength > 0)
{
--stackLength;
immutable w = stack[stackLength];
foreach (immutable v; p[w].data)
{
delta[v] += ((sigma[v] / sigma[w]) * (T1 + 
delta[w]));

}
if (w != s)
{
centrality[w] += delta[w];
}
sigma[w] = T0;
delta[w] = T0;
d[w] = -1;
}
}

static if (!g.directed)
{
centrality[] /= 2;
}

return centrality;
}


Just for a test, try to allocate all those arrays in a different 
way:

- First try a std.array.Array using the LDC2 compiler;
- Another thing to try is to allocate them on the stack using 
core.stdc.stdlib.alloca:

auto p = cast(T*)alloca(T.sizeof * g.vertexCount);
if (!p) throw new MemoryError...
auto centrality = p[0 .. g.vertexCount];
This probably can't be used for very large graphs, so you have to 
add even more logic to allocate the arrays on the C heap if they 
are too much large. This could be useful if you call 
betweenness() many many times.

- Try to optionally accept the buffers from outside.

Bye,
bearophile