Cost of assoc array?

2014-05-14 Thread Chris via Digitalmars-d-learn

I have code that uses the following:

string[][size_t] myArray;

1. myArray = [0:[t, o, m], 1:[s, m, i, th]];

However, I've found out that I never need an assoc array and a 
linear array would be just fine, as in


2. myArray = [[t, o, m], [s, m, i, th]];

Is there any huge difference as regards performance and memory 
footprint between the two? Or is 2. basically 1. under the hood?


Re: Cost of assoc array?

2014-05-14 Thread bearophile via Digitalmars-d-learn

Chris:

Is there any huge difference as regards performance and memory 
footprint between the two? Or is 2. basically 1. under the hood?


An associative array is a rather more complex data structure, so 
if you don't need it, use something simpler. There is difference 
in both the amount of memory used and performance (in many cases 
such difference doesn't matter). In D there are also differences 
in the way they are initialized from a null or fat null.


Bye,
bearophile


Re: Cost of assoc array?

2014-05-14 Thread Chris via Digitalmars-d-learn

On Wednesday, 14 May 2014 at 10:20:51 UTC, bearophile wrote:

Chris:

Is there any huge difference as regards performance and memory 
footprint between the two? Or is 2. basically 1. under the 
hood?


An associative array is a rather more complex data structure, 
so if you don't need it, use something simpler. There is 
difference in both the amount of memory used and performance 
(in many cases such difference doesn't matter). In D there are 
also differences in the way they are initialized from a null or 
fat null.


Bye,
bearophile


Thanks. Do you mean the difference is negligible in many cases? 
I'm not sure, because changing it would be a breaking change in 
the old code, meaning I would have to change various methods.


Re: Cost of assoc array?

2014-05-14 Thread dennis luehring via Digitalmars-d-learn

Am 14.05.2014 12:33, schrieb Chris:

On Wednesday, 14 May 2014 at 10:20:51 UTC, bearophile wrote:

Chris:


Is there any huge difference as regards performance and memory
footprint between the two? Or is 2. basically 1. under the
hood?


An associative array is a rather more complex data structure,
so if you don't need it, use something simpler. There is
difference in both the amount of memory used and performance
(in many cases such difference doesn't matter). In D there are
also differences in the way they are initialized from a null or
fat null.

Bye,
bearophile


Thanks. Do you mean the difference is negligible in many cases?
I'm not sure, because changing it would be a breaking change in
the old code, meaning I would have to change various methods.


a simple array would be faster because no access is generate for your 
key - just plain access, just don't use assoc arrays if you don't need 
key based access


read more manuals about hashmaps and stuff and how to do benchmarking - 
helps alot in the future





Re: Cost of assoc array?

2014-05-14 Thread bearophile via Digitalmars-d-learn

Chris:


Do you mean the difference is negligible in many cases?


Yes, but you have to profile the code (or reason about it well, 
with knowledge of the data structures and their usage patterns) 
if you want to know what your case is.


Bye,
bearophile


Re: Cost of assoc array?

2014-05-14 Thread John Colvin via Digitalmars-d-learn

On Wednesday, 14 May 2014 at 09:38:10 UTC, Chris wrote:

I have code that uses the following:

string[][size_t] myArray;

1. myArray = [0:[t, o, m], 1:[s, m, i, th]];

However, I've found out that I never need an assoc array and a 
linear array would be just fine, as in


2. myArray = [[t, o, m], [s, m, i, th]];

Is there any huge difference as regards performance and memory 
footprint between the two? Or is 2. basically 1. under the hood?


If you don't need the features of associative arrays, don't use 
them.


Normal arrays are much simpler, faster and (due to some 
outstanding problems with associative arrays in D) less 
bug-prone. Associative arrays, by definition, require a lot more 
work behind the scenes for both reading and writing.


Re: Cost of assoc array?

2014-05-14 Thread Chris via Digitalmars-d-learn

On Wednesday, 14 May 2014 at 11:13:10 UTC, John Colvin wrote:

On Wednesday, 14 May 2014 at 09:38:10 UTC, Chris wrote:

I have code that uses the following:

string[][size_t] myArray;

1. myArray = [0:[t, o, m], 1:[s, m, i, th]];

However, I've found out that I never need an assoc array and a 
linear array would be just fine, as in


2. myArray = [[t, o, m], [s, m, i, th]];

Is there any huge difference as regards performance and memory 
footprint between the two? Or is 2. basically 1. under the 
hood?


If you don't need the features of associative arrays, don't use 
them.


Normal arrays are much simpler, faster and (due to some 
outstanding problems with associative arrays in D) less 
bug-prone. Associative arrays, by definition, require a lot 
more work behind the scenes for both reading and writing.


The question is, if they are _much_ faster. With this type 
(string[][size_t]) I haven't encountered any bugs yet. On the 
other hand, it introduces some rather stilted logic sometimes, as 
in


foreach (size_t i; 0..myArray.length) {
  // do something with myArray[i];
}

because it's not sorted. This, or I sort it first. Anyway, 
there's always an overhead associated with associative arrays. 
I'll have to see how big this breaking change would be, and 
decide, if it's worth it.


Profiling is not really feasible, because for this to work 
properly, I would have to introduce the change first to be able 
to compare both. Nothing worse than carefully changing things 
only to find out, it doesn't really speed up things.


Re: Cost of assoc array?

2014-05-14 Thread dennis luehring via Digitalmars-d-learn

Am 14.05.2014 15:20, schrieb Chris:

Profiling is not really feasible, because for this to work
properly, I would have to introduce the change first to be able
to compare both. Nothing worse than carefully changing things
only to find out, it doesn't really speed up things.


why not using an alias for easier switch between the versions?

alias string[][size_t] my_array_type

or

alias string[][] my_array_type

an do an searchreplace of string[][size_t] with my_array_type thats 
it - still too hard :)








Re: Cost of assoc array?

2014-05-14 Thread John Colvin via Digitalmars-d-learn

On Wednesday, 14 May 2014 at 13:20:40 UTC, Chris wrote:

On Wednesday, 14 May 2014 at 11:13:10 UTC, John Colvin wrote:

On Wednesday, 14 May 2014 at 09:38:10 UTC, Chris wrote:

I have code that uses the following:

string[][size_t] myArray;

1. myArray = [0:[t, o, m], 1:[s, m, i, th]];

However, I've found out that I never need an assoc array and 
a linear array would be just fine, as in


2. myArray = [[t, o, m], [s, m, i, th]];

Is there any huge difference as regards performance and 
memory footprint between the two? Or is 2. basically 1. under 
the hood?


If you don't need the features of associative arrays, don't 
use them.


Normal arrays are much simpler, faster and (due to some 
outstanding problems with associative arrays in D) less 
bug-prone. Associative arrays, by definition, require a lot 
more work behind the scenes for both reading and writing.


The question is, if they are _much_ faster. With this type 
(string[][size_t]) I haven't encountered any bugs yet. On the 
other hand, it introduces some rather stilted logic sometimes, 
as in


foreach (size_t i; 0..myArray.length) {
  // do something with myArray[i];
}

because it's not sorted. This, or I sort it first. Anyway, 
there's always an overhead associated with associative arrays. 
I'll have to see how big this breaking change would be, and 
decide, if it's worth it.


Profiling is not really feasible, because for this to work 
properly, I would have to introduce the change first to be able 
to compare both. Nothing worse than carefully changing things 
only to find out, it doesn't really speed up things.


Yes, they are much faster. Normal array indexing is equivalent to 
*(myArray.ptr + index) plus an optional bounds check, whereas 
associative array indexing is a much, much larger job.


Why were you using associative arrays in the first place? Unless 
your keys are somehow sparse* or of a non-integer type there 
isn't any reason to.



* How I see that constraint in that context:

(maxKey - minKey) / nElements  1 + epsilon
where epsilon is the maximum proportional wasted space you could 
afford in a normal array (emptyElements / usedElements). Bear in 
mind the memory overhead of associative arrays is itself non-zero.


Also, while normal arrays tend to be more cache friendly than 
associative arrays, this isn't true for very sparse arrays.


Re: Cost of assoc array?

2014-05-14 Thread Chris via Digitalmars-d-learn

On Wednesday, 14 May 2014 at 13:31:53 UTC, dennis luehring wrote:

Am 14.05.2014 15:20, schrieb Chris:

Profiling is not really feasible, because for this to work
properly, I would have to introduce the change first to be able
to compare both. Nothing worse than carefully changing things
only to find out, it doesn't really speed up things.


why not using an alias for easier switch between the versions?

alias string[][size_t] my_array_type

or

alias string[][] my_array_type

an do an searchreplace of string[][size_t] with 
my_array_type thats it - still too hard :)


Talking about not seeing the forest for the trees ... I'll try 
that. :)


Re: Cost of assoc array?

2014-05-14 Thread bearophile via Digitalmars-d-learn

Chris:


foreach (size_t i; 0..myArray.length) {
  // do something with myArray[i];
}


There are various better ways to use a foreach on an array:

foreach (immutable x; myArray) {

foreach (ref const x; myArray) {

foreach (ref x; myArray) {

Bye,
bearophile


Re: Cost of assoc array?

2014-05-14 Thread John Colvin via Digitalmars-d-learn

On Wednesday, 14 May 2014 at 13:49:22 UTC, bearophile wrote:

Chris:


foreach (size_t i; 0..myArray.length) {
 // do something with myArray[i];
}


There are various better ways to use a foreach on an array:

foreach (immutable x; myArray) {

foreach (ref const x; myArray) {

foreach (ref x; myArray) {

Bye,
bearophile


His code was for an associative array.


Re: Cost of assoc array?

2014-05-14 Thread Chris via Digitalmars-d-learn

On Wednesday, 14 May 2014 at 13:44:40 UTC, John Colvin wrote:


Yes, they are much faster. Normal array indexing is equivalent 
to *(myArray.ptr + index) plus an optional bounds check, 
whereas associative array indexing is a much, much larger job.


Why were you using associative arrays in the first place? 
Unless your keys are somehow sparse* or of a non-integer type 
there isn't any reason to.


It is very old code (dmd 2.052 times). When I set out to write 
it, I was a) new to the language and b) I could not yet tell 
whether or not I would need it (I think it also had to do with 
the way D was back then, but I don't remember my exact 
reasoning). It has always been in the back of my mind to change 
it into a normal array. It's basically a code corpse.




* How I see that constraint in that context:

(maxKey - minKey) / nElements  1 + epsilon
where epsilon is the maximum proportional wasted space you 
could afford in a normal array (emptyElements / usedElements). 
Bear in mind the memory overhead of associative arrays is 
itself non-zero.


Also, while normal arrays tend to be more cache friendly than 
associative arrays, this isn't true for very sparse arrays.