Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Ivan Kazmenko

Namespace wrote:
A small but nice intentioned advice of me is this article: 
http://dlang.org/dstyle.html
"The names of user-defined types should be PascalCased, which 
is the same as camelCased except that the first letter is 
uppercase."


Thanks for pointing that out.  Indeed, PascalCase seems
convenient to use with D structs.  On the other side, I still
have aesthetic considerations against camelCase and prefer
underscores_between_words for variables in the programs I write
for personal use.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Ivan Kazmenko

Steven Schveighoffer wrote:

Again, the issue here is D has restrictions on ref parameters.  
There are no such restrictions on pass-by-value parameters.  So 
by default, the predicate tries by-value.


I am starting to understand the difficulty with ref being too 
restrictive.  Some kind of "const ref" in the language would have 
helped indeed.  But it would probably cause some allocation 
difficulties for actual constants... or worse?


In this case, RedBlackTree is always assured of having lvalues 
for the predicate, so I can probably make the default use 
pass-by-ref.  If you could file a bug, that would be most 
helpful.


Done! http://d.puremagic.com/issues/show_bug.cgi?id=9513


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Namespace
As I said elsewhere, the issue is D not having a good solution 
for accepting both rvalues and lvalues by ref.  Jonathan 
mentioned auto ref, but it's not exactly implemented correctly.


In C++, it just takes all parameters as const T&, and that 
works for both lvalues and rvalues.


The extra 200k copies are from the implementation taking all 
parameters by value.  If we didn't do that, sans a working auto 
ref, things like tree.insert(element(5)) would fail to compile 
(cannot pass an rvalue by reference)


-Steve


And Jonathan said also that 'auto ref' is not the solution for 
non-template functions.
Probably 'auto ref' will never work for non-template functions. 
So a general solution must be found first.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Steven Schveighoffer

On Thu, 14 Feb 2013 15:33:19 -0500, Ivan Kazmenko  wrote:

... I get only exactly 3 * LIMIT postblit constructor calls, which is  
300,000 instead of 11,389,556.  While I still want to find where the  
excess 200,000 calls originate from, this is definitely asymptotically  
better than before


As I said elsewhere, the issue is D not having a good solution for  
accepting both rvalues and lvalues by ref.  Jonathan mentioned auto ref,  
but it's not exactly implemented correctly.


In C++, it just takes all parameters as const T&, and that works for both  
lvalues and rvalues.


The extra 200k copies are from the implementation taking all parameters by  
value.  If we didn't do that, sans a working auto ref, things like  
tree.insert(element(5)) would fail to compile (cannot pass an rvalue by  
reference)


-Steve


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Steven Schveighoffer

On Thu, 14 Feb 2013 15:33:19 -0500, Ivan Kazmenko  wrote:


Rob T wrote:
When I look at the std.container source code, it seems that the payload  
element is passed by value multiple times unnecessarily, so to minimize  
copy construction you'll have to implement element as a class and  
implement a dup function for it.


Thank you all for assistance, I've finally pinned it down!  It's the  
default magic "a < b" that devours the most copies of my precious  
structs!  Once I change the declaration:


-
alias RedBlackTree !(element) container;
-

to

-
bool lessfun (ref element a, ref element b)
{
return a < b;
}

alias RedBlackTree !(element, lessfun) container;
-

... I get only exactly 3 * LIMIT postblit constructor calls, which is  
300,000 instead of 11,389,556.  While I still want to find where the  
excess 200,000 calls originate from, this is definitely asymptotically  
better than before: O (n) versus O (n log n) calls.  And it is now less  
of a bottleneck in my original program.


This raises the question to the default behavior of  
std.functional.binaryFun.  Sure, for integers, class references and  
small types, the current way is the best.  On the other hand, for large  
structs and maybe some other complex objects that obey value semantics,  
it seems it would be beneficial to, given a string like "a < b", produce  
a function taking arguments by reference and not by value.  Maybe there  
is some simple criterion (size of type greater than size_t - seems bad?  
type is a struct - better?) which can be used with static if to generate  
"(ref a, ref b)" instead of "(a, b)" version by default.  Of course, all  
that could only work if there is a more detailed hierarchy of binary  
functions, at least a binaryFunConst taking const arguments.


It is great that you found this!

Again, the issue here is D has restrictions on ref parameters.  There are  
no such restrictions on pass-by-value parameters.  So by default, the  
predicate tries by-value.


In this case, RedBlackTree is always assured of having lvalues for the  
predicate, so I can probably make the default use pass-by-ref.  If you  
could file a bug, that would be most helpful.


http://d.puremagic.com/issues/enter_bug.cgi?product=D

-Steve


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread monarch_dodra

On Thursday, 14 February 2013 at 20:44:13 UTC, bearophile wrote:

Ivan Kazmenko:


Thank you all for assistance, I've finally pinned it down!


This is great. The first step to solve a problem is to find it. 
The second step is to formalize it. So I suggest you to create 
a small benchmark that shows what you mean (it should not 
include RedBlackTree) and put it in Bugzilla, with some timings.


Bye,
bearophile



I've already begun investigating, and am now benching my solution.

But filing issues never hurt anyone.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Jonathan M Davis
On Thursday, February 14, 2013 14:31:36 Steven Schveighoffer wrote:
> On Thu, 14 Feb 2013 13:45:30 -0500, Rob T  wrote:
> > When I look at the std.container source code, it seems that the payload
> > element is passed by value multiple times unnecessarily, so to minimize
> > copy construction you'll have to implement element as a class and
> > implement a dup function for it.
> > 
> > I expect performance will increase substantially even with the extra
> > heap allocations.
> > 
> > Alternatively, you can implement your own version of RedBlackTree with
> > pass by ref optimizations for insertion of struct elements.
> 
> If it was pass by ref, then rbt.insert(5) would not work. And in fact, I
> wouldn't want things passed by ref if the element is int. I have to
> admit, I did not consider expensive postblits when I designed it. Almost
> all my testing is with integer types.

And Walter and Andrei both seem to think that having expensive postlbits is a 
design mistake. The range stuff sort of tries to support it with the move* 
primitives, but Andrei's been wanting to argue for just assuming that 
postblits are cheap. And Walter was actually arguing at one point for making 
it illegal (probably by getting rid of postblit all together - I don't 
remember the details). Plenty of the rest of us don't agree, but to some 
extent, it is true that having an expensive postblit is asking for it. On a 
related note, we still need a solution for dealing with const postblits 
(probably be introducing copy constructors - IIRC Andrei was considering 
phasing out postblits in favor of copy constructors, which would be 
unnecessary, but we do need a way to deal with deep copying const structs 
which hold reference types which need to be deep copied.

> Unfortunately, D is in this state where taking a ref parameter means
> strictly lvalue. So passing rvalues will not work. D does not have the
> notion that C++ has where const type& can accept both rvalue and lvalue.
> We desperately need something similar.

auto ref is the current solution for templated functions, but we do need a 
more general solution. DIP25 got created in part due to discussions on that, 
but it probably needs to be fully sorted out before we can fully sort out the 
const& solution.

- Jonathan M Davis


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Namespace

On Thursday, 14 February 2013 at 20:20:59 UTC, Rob T wrote:

On Thursday, 14 February 2013 at 19:56:33 UTC, Namespace wrote:


There are discussions for such thing for almost a year - but 
nothing has changed so far.


I think its finally on the priority radar, but there's still 
other things in more dire need to be resolved first, like 
@property and lack of shared library support, and probably 
something else too ...


--rt


Yes, not to mention
  - A package system for Phobos
  - Half Float Datatypes
  - JSON
  - bug fixes and merging of new features in D1 (although no 
further development of D1 should be done since December 31)

and probably something else too.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread bearophile

Ivan Kazmenko:


Thank you all for assistance, I've finally pinned it down!


This is great. The first step to solve a problem is to find it. 
The second step is to formalize it. So I suggest you to create a 
small benchmark that shows what you mean (it should not include 
RedBlackTree) and put it in Bugzilla, with some timings.


Bye,
bearophile


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Ivan Kazmenko

Rob T wrote:
When I look at the std.container source code, it seems that the 
payload element is passed by value multiple times 
unnecessarily, so to minimize copy construction you'll have to 
implement element as a class and implement a dup function for 
it.


Thank you all for assistance, I've finally pinned it down!  It's 
the default magic "a < b" that devours the most copies of my 
precious structs!  Once I change the declaration:


-
alias RedBlackTree !(element) container;
-

to

-
bool lessfun (ref element a, ref element b)
{
return a < b;
}

alias RedBlackTree !(element, lessfun) container;
-

... I get only exactly 3 * LIMIT postblit constructor calls, 
which is 300,000 instead of 11,389,556.  While I still want to 
find where the excess 200,000 calls originate from, this is 
definitely asymptotically better than before: O (n) versus O (n 
log n) calls.  And it is now less of a bottleneck in my original 
program.


This raises the question to the default behavior of 
std.functional.binaryFun.  Sure, for integers, class references 
and small types, the current way is the best.  On the other hand, 
for large structs and maybe some other complex objects that obey 
value semantics, it seems it would be beneficial to, given a 
string like "a < b", produce a function taking arguments by 
reference and not by value.  Maybe there is some simple criterion 
(size of type greater than size_t - seems bad? type is a struct - 
better?) which can be used with static if to generate "(ref a, 
ref b)" instead of "(a, b)" version by default.  Of course, all 
that could only work if there is a more detailed hierarchy of 
binary functions, at least a binaryFunConst taking const 
arguments.


-
Ivan Kazmenko.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Rob T

On Thursday, 14 February 2013 at 19:56:33 UTC, Namespace wrote:


There are discussions for such thing for almost a year - but 
nothing has changed so far.


I think its finally on the priority radar, but there's still 
other things in more dire need to be resolved first, like 
@property and lack of shared library support, and probably 
something else too ...


--rt


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Namespace
On Thursday, 14 February 2013 at 19:31:36 UTC, Steven 
Schveighoffer wrote:
On Thu, 14 Feb 2013 13:45:30 -0500, Rob T  
wrote:


When I look at the std.container source code, it seems that 
the payload element is passed by value multiple times 
unnecessarily, so to minimize copy construction you'll have to 
implement element as a class and implement a dup function for 
it.


I expect performance will increase substantially even with the 
extra heap allocations.


Alternatively, you can implement your own version of 
RedBlackTree with pass by ref optimizations for insertion of 
struct elements.


If it was pass by ref, then rbt.insert(5) would not work.  And 
in fact, I wouldn't want things passed by ref if the element is 
int.  I have to admit, I did not consider expensive postblits 
when I designed it.  Almost all my testing is with integer 
types.


Unfortunately, D is in this state where taking a ref parameter 
means strictly lvalue.  So passing rvalues will not work.  D 
does not have the notion that C++ has where const type& can 
accept both rvalue and lvalue.  We desperately need something 
similar.


-Steve


There are discussions for such thing for almost a year - but 
nothing has changed so far.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Rob T
On Thursday, 14 February 2013 at 19:31:36 UTC, Steven 
Schveighoffer wrote:
On Thu, 14 Feb 2013 13:45:30 -0500, Rob T  
wrote:


When I look at the std.container source code, it seems that 
the payload element is passed by value multiple times 
unnecessarily, so to minimize copy construction you'll have to 
implement element as a class and implement a dup function for 
it.


I expect performance will increase substantially even with the 
extra heap allocations.


Alternatively, you can implement your own version of 
RedBlackTree with pass by ref optimizations for insertion of 
struct elements.


If it was pass by ref, then rbt.insert(5) would not work.  And 
in fact, I wouldn't want things passed by ref if the element is 
int.  I have to admit, I did not consider expensive postblits 
when I designed it.  Almost all my testing is with integer 
types.


You can make it work by implementing two insert functions, one 
with pass by ref and one with pass by value. The pass by value 
implementation simply calls the pass by ref version, and 
everything from there is pass by ref where it makes sense as an 
optimization.


I understand the implementation difficulties, because we have 
multiple situations to consider and optimize for, such as simple 
POD which can often pass by value efficiently, but then we have 
complex structs that do not pass by value efficiently and should 
be passed by ref instead, but there's no way to detect this 
easily (I suppose you could try looking at the size and adjust 
the pass-by method somehow, but ...), and finally if you accept a 
class then you have different needs because a class is a pointer 
not a POD value, again I suppose you can detect if a class is 
being passed and adjust the implementation but that introduces 
more complexity ...


In some ways this is where C++ got it right from a consistency 
POV, where a class is no different from a struct, so whatever 
works for a class also works for a struct. In D, structs and 
classes operate in fundamentally different ways making 
implementing generics more difficult to do.


Unfortunately, D is in this state where taking a ref parameter 
means strictly lvalue.  So passing rvalues will not work.  D 
does not have the notion that C++ has where const type& can 
accept both rvalue and lvalue.  We desperately need something 
similar.


-Steve


Yes, I agree.

--rt


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Steven Schveighoffer

On Thu, 14 Feb 2013 13:45:30 -0500, Rob T  wrote:

When I look at the std.container source code, it seems that the payload  
element is passed by value multiple times unnecessarily, so to minimize  
copy construction you'll have to implement element as a class and  
implement a dup function for it.


I expect performance will increase substantially even with the extra  
heap allocations.


Alternatively, you can implement your own version of RedBlackTree with  
pass by ref optimizations for insertion of struct elements.


If it was pass by ref, then rbt.insert(5) would not work.  And in fact, I  
wouldn't want things passed by ref if the element is int.  I have to  
admit, I did not consider expensive postblits when I designed it.  Almost  
all my testing is with integer types.


Unfortunately, D is in this state where taking a ref parameter means  
strictly lvalue.  So passing rvalues will not work.  D does not have the  
notion that C++ has where const type& can accept both rvalue and lvalue.   
We desperately need something similar.


-Steve


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Rob T
When I look at the std.container source code, it seems that the 
payload element is passed by value multiple times unnecessarily, 
so to minimize copy construction you'll have to implement element 
as a class and implement a dup function for it.


I expect performance will increase substantially even with the 
extra heap allocations.


Alternatively, you can implement your own version of RedBlackTree 
with pass by ref optimizations for insertion of struct elements.


--rt


Re: std.regex: bug or something special?

2013-02-14 Thread Dmitry Olshansky

14-Feb-2013 22:24, Lubos Pintes пишет:

Hi,
I am reading std.regex. I found the following code on line 671:
 else if('A' <= current && current <= 'Z')
 val = val * 16 + current - 'A' + 10;

Hex digits are parsed there. So unles this is something special, this is
a bug. Because for example what 'J' would mean in this context?


A bug. Especially considering the full source:

if('0' <= current && current <= '9')
val = val * 16 + current - '0';
else if('a' <= current && current <= 'f')
val = val * 16 + current -'a' + 10;
else if('A' <= current && current <= 'Z')
val = val * 16 + current - 'A' + 10;

I'd say file it or better yet submit a pull with a unittest that 
presently  does something fishy. For instance "\u00JJ" would parse and 
match m-hm 'J'-'A' * 16 + 'J' - 'A' where it shouldn't parse in the 
first place.


--
Dmitry Olshansky


Re: How to read fastly files ( I/O operation)

2013-02-14 Thread bioinfornatics


Mr.Bio, what usage cases you'll be interested in, other than 
those counters?


some idea such as letter counting:
rename identifier
trimming sequence from quality value to cutoff
convert to a binary format
convert to fasta + sff
merge close sequence to one concenus
create a brujin graph
more idea later


Re: std.regex: bug or something special?

2013-02-14 Thread H. S. Teoh
On Thu, Feb 14, 2013 at 07:24:32PM +0100, Lubos Pintes wrote:
> Hi,
> I am reading std.regex. I found the following code on line 671:
> else if('A' <= current && current <= 'Z')
> val = val * 16 + current - 'A' + 10;
> 
> Hex digits are parsed there. So unles this is something special,
> this is a bug. Because for example what 'J' would mean in this
> context?

Looks like a bug. The last clause of the if-condition should check <=
'F', not <= 'Z'.

Please file a bug at http://d.puremagic.com/issues .


T

-- 
GEEK = Gatherer of Extremely Enlightening Knowledge


std.regex: bug or something special?

2013-02-14 Thread Lubos Pintes

Hi,
I am reading std.regex. I found the following code on line 671:
else if('A' <= current && current <= 'Z')
val = val * 16 + current - 'A' + 10;

Hex digits are parsed there. So unles this is something special, this is 
a bug. Because for example what 'J' would mean in this context?


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Ivan Kazmenko

Dmitry Olshansky wrote:

D2 (DMD 2.059, -O): 11,389,556


I'd add :

-release -inline

or it may not inline away temporary copies.


Thank you for the suggestion.  However, the result after "-O 
-release -inline" is still 11,389,556.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Jonathan M Davis
On Thursday, February 14, 2013 13:27:30 monarch_dodra wrote:
> Actually, (and IMO, this is a very big problem), these structures
> are *always* null by default. There is no easy way to "default
> initialize" structs in D :(

You mean there's no way to default-construct them. They're always default-
initialized to their init value if you don't explicitly initialize them.

- Jonathan M Davis


Re: Strange that D is printing large values as zero. Any mistake in my code?

2013-02-14 Thread bearophile

Sparsh Mittal:


const long DIM = 1024*1024*1024*1024*4;


Vote, if you want:
http://d.puremagic.com/issues/show_bug.cgi?id=4835

Go language doesn't have such bugs. Likewise D should not have 
such bugs.


Bye,
bearophile


Re: Strange that D is printing large values as zero. Any mistake in my code?

2013-02-14 Thread Joseph Rushton Wakeling

On 02/14/2013 04:49 PM, Adam D. Ruppe wrote:

const long DIM = 1024*1024*1024*1024*4;


Those are all int (32 bit) literals so it is prolly wrapping around the
intermediates before it actually assigns to DIM.

If you used 1024L it should be better, by making the right hand side 64 bit too.


Oh, snap. :-P

First time I ran into this kind of issue was when I was calculating a power of 2 
in C++ by using:


size_t p = 1 << m;

m was calculated to ensure that 2 ^^ m was the largest power of 2 that could (i) 
be multiplied by 2 without integer wraparound and (ii) fell within the range of 
the uniform random number generator.  And all was fine until I upgraded my 
system to 64-bit, and suddenly I was getting a nonsense value of p.


It shouldn't have changed, because the RNG used a 32-bit unsigned integer as its 
internal data type.  After much head-scratching I cottoned on to the fact that 
as it was, 1 << m was a regular integer that was wrapping around to a negative 
value before being converted to size_t (which in practice meant: it came out as 
the maximum of size_t less the negative value).  With 32-bit size_t this was the 
correct value.  With 64-bit size_t it meant a value of p far larger than it 
should have been.


Replace the power-of-2 statement with

size_t p = 1UL << m;

... and everything was cured.


Re: Strange that D is printing large values as zero. Any mistake in my code?

2013-02-14 Thread monarch_dodra
On Thursday, 14 February 2013 at 15:44:25 UTC, Sparsh Mittal 
wrote:

Here is the program:


import std.stdio;

const long DIM = 1024*1024*1024*1024*4;
void main()
{
writeln(" DIM  is ", DIM);
writeln(" Value ", 1024*1024*1024*1024*4);
writeln(" Max ", long.max);

}

I compiled it:
gdc -frelease -O3 temp.d -o t1 ; ./t1
 DIM  is 0
 Value 0
 Max 9223372036854775807

Can you please tell, why it is taking DIM as zero? If I reduce 
DIM, it works fine. It is strange.


It's not because "DIM" is a long, that "1024*1024*1024*1024*4" is 
a long. Case in point, it is calculated as an int, and becomes 0.


Try it with:
"1UL * 1024*1024*1024*1024*4"
or
"1024UL*1024*1024*1024*4"
or
"cast(ulong)1024*1024*1024*1024*4"
To force the actual calculation into long mode, and you'll get 
the desired result.


Or, just "2UL^^42", depending on just what you want "DIM" to 
represent.


Re: Strange that D is printing large values as zero. Any mistake in my code?

2013-02-14 Thread Sparsh Mittal
On Thursday, 14 February 2013 at 15:51:45 UTC, Joseph Rushton 
Wakeling wrote:

On 02/14/2013 04:44 PM, Sparsh Mittal wrote:
Can you please tell, why it is taking DIM as zero? If I reduce 
DIM, it works

fine. It is strange.


1024 is an int value.  Write 1024L instead to ensure that the 
calculation is performed using long.


Thanks a lot for your help.



Re: Strange that D is printing large values as zero. Any mistake in my code?

2013-02-14 Thread Joseph Rushton Wakeling

On 02/14/2013 04:44 PM, Sparsh Mittal wrote:

Can you please tell, why it is taking DIM as zero? If I reduce DIM, it works
fine. It is strange.


1024 is an int value.  Write 1024L instead to ensure that the calculation is 
performed using long.




Re: Strange that D is printing large values as zero. Any mistake in my code?

2013-02-14 Thread Adam D. Ruppe

const long DIM = 1024*1024*1024*1024*4;


Those are all int (32 bit) literals so it is prolly wrapping 
around the intermediates before it actually assigns to DIM.


If you used 1024L it should be better, by making the right hand 
side 64 bit too.


It is similar to

float x = 1 / 2; // x == 0 because the right hand side is done as 
ints


float x = 1.0 / 2; // now x == 0.5


Strange that D is printing large values as zero. Any mistake in my code?

2013-02-14 Thread Sparsh Mittal

Here is the program:


import std.stdio;

const long DIM = 1024*1024*1024*1024*4;
void main()
{
writeln(" DIM  is ", DIM);
writeln(" Value ", 1024*1024*1024*1024*4);
writeln(" Max ", long.max);

}

I compiled it:
gdc -frelease -O3 temp.d -o t1 ; ./t1
 DIM  is 0
 Value 0
 Max 9223372036854775807

Can you please tell, why it is taking DIM as zero? If I reduce 
DIM, it works fine. It is strange.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Ivan Kazmenko



If I use (as you do)

this(ref this) {

I get 0 as output. (dmd 2.062 beta1)
If I remove the 'ref' I get 11389556.


Ouch, sorry!  That's a copy & paste bug I introduced when posting 
here.


Actually, I use this (this) of course.  I tried this (ref this) 
at some point, it does indeed compile surprisingly, but it's not 
what gets called on copying (and thus is of no use).


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Steven Schveighoffer
On Thu, 14 Feb 2013 09:47:31 -0500, Namespace   
wrote:



If I use (as you do)

this(ref this) {

I get 0 as output. (dmd 2.062 beta1)
If I remove the 'ref' I get 11389556.


this(ref this) is not a postblit.  That it would even compile is a bug.

-Steve


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Namespace

If I use (as you do)

this(ref this) {

I get 0 as output. (dmd 2.062 beta1)
If I remove the 'ref' I get 11389556.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Dmitry Olshansky

14-Feb-2013 03:22, Ivan Kazmenko пишет:

Hi!

I'm learning to use D collections properly, and I'm looking for a sorted
data structure with logarithmic access time (i.e., a binary search tree
will do, but a hash table would not help). As far as I can see,
std.container.RedBlackTree is exactly what I need. However, I am not
sure if I use it as intended as its performance seems inferior to a C++
STL solution (e.g., std::set).

To be more specific, right now I wonder what is the best (or intended)
way to store an object in the RedBlackTree: should it be a class
reference, or a struct (passed by value), or something quirkier like an
integer pointing into an array or a simple pointer. The rest of my
program suggested to use structs, but the whole thing turned out to be
rather slow, and the profiler told me that these structs are being
copied around much more than I anticipated.

And so I wrote a minimalistic test program to check the number of copy
(postblit) constructor calls. Here is the D version:


[snip]



And the results are:
D2 (DMD 2.059, -O): 11,389,556


I'd add :

-release -inline

or it may not inline away temporary copies.



--
Dmitry Olshansky


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-14 Thread Sparsh Mittal

Thanks a lot for your reply.


Re: Static class question

2013-02-14 Thread anonymous

On Thursday, 14 February 2013 at 12:49:01 UTC, Lubos Pintes wrote:

Hi,
I saw one ListBox class implementation.
As we all know, ListBox class collects and displays strings. 
Author decided that he encapsulate the Object as item, and uses 
its toString() method when string is needed for display.
For cases when string is added, he defined this small class, 
internal to ListBox:

[code]
private static class StringItem
{
private string _str;

public this(string s)
{
this._str = s;
}

public override string toString()
{
return this._str;
}
}
[/code]

After I read about attributes on dlang, I think the static 
could be removed from above definition. Is this true?

Thank


Without 'static', StringItem objects would be tied to ListBox
objects. 'static' saves StringItem objects some bytes, and
ensures that StringItem is independent of ListBox.
See http://dlang.org/class.html#nested


Static class question

2013-02-14 Thread Lubos Pintes

Hi,
I saw one ListBox class implementation.
As we all know, ListBox class collects and displays strings. Author 
decided that he encapsulate the Object as item, and uses its toString() 
method when string is needed for display.
For cases when string is added, he defined this small class, internal to 
ListBox:

[code]
private static class StringItem
{
private string _str;

public this(string s)
{
this._str = s;
}

public override string toString()
{
return this._str;
}
}
[/code]

After I read about attributes on dlang, I think the static could be 
removed from above definition. Is this true?

Thank


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread monarch_dodra

On Thursday, 14 February 2013 at 10:58:19 UTC, Namespace wrote:

struct S
{
   static struct Payload
   {
   //Tons of data here
   }
   Payload* _p;

   //fonctions
}

Ref counted is implemented that way. most of the containers 
are also implemented that way. associative arrays are also 
implemented that way under the hood.


But you have to allocate '_p' again on the heap.


Well, yeah, that's the point. I'm not saying it's a best fit for 
everything. You are basically trading construction costs for copy 
costs.


I see no advantage over classes, except that these structures 
are just not null by default.


Actually, (and IMO, this is a very big problem), these structures 
are *always* null by default. There is no easy way to "default 
initialize" structs in D :(



Is that the advantage?


The advantage is deterministic RAII. With classes you only get 
non-deterministic RAII.


For example: File. File will close the underlying FILE* when the 
last File is destroyed. But no later.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Ivan Kazmenko

First, thank you all for the replies!

Jonathan M Davis wrote:
Also, it could make a big difference if you use gdc or ldc 
rather than dmd.


Thank you for the suggestion, I'll check that.  However, I don't 
expect the GCC optimizer to reduce the number of calls in this 
specific test, since it did not kick in in the more obvious case 
as I mentioned.


Right now, I am more concerned with the performance of the 
front-end (RedBlackTree): if it currently can't be tuned to 
perform fast enough, I'll just know that I should implement my 
own version of a binary search tree container for my needs.


Rob T wrote:

You can check if disabling the GC just before the insert process
improves the performance. You may see 3x performance 
improvement.

Disabling is safe provided you re-enable, this can be done
reliably with scope(exit) or something similar.


Hmm, it indeed performed faster, though not 3x in my setup.  
However, the number of copy constructor calls stayed the same.


Steven Schveighoffer wrote:

I find the number of postblit calls excessive too.
I will have to look into why that happens.
I can say that once an element is allocated, it is not moved or 
copied.


Could it be easily enforced in the library?  Something like this: 
when accessing data only for internal use in the data structure, 
use an interface that disables assignment?  Can such cases be 
distinguished from the ones when the library user actively wants 
to copy data?


Steven Schveighoffer wrote:
I will note that std.container.RedBlackTree is a port of 
dcollections'

RBTree implementation.


Thank you for pointing that out.  Perhaps I'll have to try the 
test with that implementation, too.


monarch_dodra wrote:

Keep in mind that C++ and D have very different philosophies
regarding copy construction.
...
The conclusion is that the comparison is not fair: D's pass by
value is not *very* different from C++'s pass by ref (in amount
of data copied).


Well, perhaps I didn't make myself clear enough.  The comparison 
I posted is not intended to be a fair comparison of languages in 
the general case!  It is just my use case stripped to a minimal 
example that still shows the number of calls correctly.  In the 
actual use case, I have something like


-
struct element
{
long value;
int [] one;
int [] two;

this (this)
{
one = one.dup;
two = two.dup;
}
}
-

Sure I could think of some other way to represent the data I need 
as an object, this one just seems the most intuitive.


monarch_dodra wrote:

If you *do* want a fair-er comparison, then I'd suggest you
implement a ".dup" on your object, and see how many times THAT
gets called. Guess what: 0.


Right, but it's the copy constructor that gets called when I copy 
a struct, and in these cases, I actually want it to copy the 
arrays as well.  Otherwise, the element object may end up in a 
bad state.


Do you imply that I should implement dup for every struct I 
create and then leave the copy constructor empty?  As the library 
makes use of the copy constructor and not dup, I fear it would be 
an incorrect design, since the library will make broken copies of 
my struct and pass them around.


monarch_dodra wrote:

I'm not saying D's approach is perfect, just that D's library is
optimized for D-like types.


and Rob T wrote:

I agree. There are cases where structs make a lot of sense,
usually when they are very simple simple and contain no pointers
or references, otherwise structs should be avoided in favor of
classes to avoid doing copy/move constructors and to avoid
concerns over performance optimizations. With classes, only
certain points in your code require that a duplicate copy be 
made

of the class instance, the majority of code need only to pass
around a reference which is the default behavior - easy and 
fast!


So, you suggest passing things by reference once they grow beyond 
several bytes in size?  Wouldn't that mean having to constantly 
pay attention to some counter-intuitive code when I actually want 
my objects to behave like values, not references (e.g., matrices 
or other complex math objects)?  In my example case with arrays 
one and two, I do want my whole object to obey value semantics.


Once again, thank you all for attention.

-
Ivan Kazmenko.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Namespace

struct S
{
static struct Payload
{
//Tons of data here
}
Payload* _p;

//fonctions
}

Ref counted is implemented that way. most of the containers are 
also implemented that way. associative arrays are also 
implemented that way under the hood.


But you have to allocate '_p' again on the heap.
I see no advantage over classes, except that these structures are 
just not null by default.

Is that the advantage?


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread monarch_dodra

On Thursday, 14 February 2013 at 10:44:22 UTC, Namespace wrote:
Another good balance are stack based struct pointer wrappers 
to implementation : You can pass them by value, but they carry 
a complex payload.


I'm not sure what that is.
Can you give a small example?


struct S
{
static struct Payload
{
//Tons of data here
}
Payload* _p;

//fonctions
}

Ref counted is implemented that way. most of the containers are 
also implemented that way. associative arrays are also 
implemented that way under the hood.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Namespace
Another good balance are stack based struct pointer wrappers to 
implementation : You can pass them by value, but they carry a 
complex payload.


I'm not sure what that is.
Can you give a small example?


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread monarch_dodra

On Thursday, 14 February 2013 at 10:23:18 UTC, Namespace wrote:
I agree. There are cases where structs make a lot of sense, 
usually when they are very simple simple and contain no 
pointers or references, otherwise structs should be avoided in 
favor of classes to avoid doing copy/move constructors and to 
avoid concerns over performance optimizations. With classes, 
only certain points in your code require that a duplicate copy 
be made of the class instance, the majority of code need only 
to pass around a reference which is the default behavior - 
easy and fast!


--rt


It sounds like Java philosophy: Objects are always better.
Or have I misunderstood?
In any case, a intensive use of classes / objects, instead of 
structs would be also an enormous heap effort.

I usually try to use as few classes as possible.


It's a matter of balance. If you start having really complex 
objects (and big, eg > 100 bytes), then classes tend to scale 
better.


If having a class is *really* too much overhead, but your objects 
start getting too big to pass around by value, you can just new 
them on the heap, and you'll get the "best" (or worst?) of both 
worlds.


Another good balance are stack based struct pointer wrappers to 
implementation : You can pass them by value, but they carry a 
complex payload. The advantage to doing this over a naked array 
is the static type. The *dis*-advantage is that D has no standard 
default initialization scheme (classes do though). Most things in 
phobos use this scheme.


The point I (we?) are trying to get across is that *usually* (not 
a hard rule) copying things in D is *expected* to be trivial and 
cheap. If this is not the case, then the tools you'll interface 
with will not work optimally.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Namespace
I agree. There are cases where structs make a lot of sense, 
usually when they are very simple simple and contain no 
pointers or references, otherwise structs should be avoided in 
favor of classes to avoid doing copy/move constructors and to 
avoid concerns over performance optimizations. With classes, 
only certain points in your code require that a duplicate copy 
be made of the class instance, the majority of code need only 
to pass around a reference which is the default behavior - easy 
and fast!


--rt


It sounds like Java philosophy: Objects are always better.
Or have I misunderstood?
In any case, a intensive use of classes / objects, instead of 
structs would be also an enormous heap effort.

I usually try to use as few classes as possible.


Re: std.container.RedBlackTree versus C++ std::set

2013-02-14 Thread Rob T
On Thursday, 14 February 2013 at 06:56:38 UTC, monarch_dodra 
wrote:
On Wednesday, 13 February 2013 at 23:22:03 UTC, Ivan Kazmenko 
wrote:

Hi!
-
Ivan Kazmenko.


Keep in mind that C++ and D have very different philosophies 
regarding copy construction.


C++ has "strong ownership", so for example, whenever you copy a 
string/vector, or pass it by value, the whole damn thing has to 
be completely duplicated. It's so expansive that C++ has gone 
out of its way to use pass-by-reference semantics, as well 
(now) RValue references.


D, on the other hand (being GC), has a "shallow ownership" 
philosopy: Basically, when you make a copy, nothing happens, 
appart from the binary copy of the object. You want to pass a 
string by value? that's 8 bytes of data. Period. The *most* 
complex CC I have *ever* seen in D merely increments a 
reference counter. That's it. 90% (99%?) of the time, there 
isn't even a postblit implemented. Because of this, D has 
embraced pass by value.


Not to mention, if you are using classes, those are already 
pointer types. Why pass them by ref?




The conclusion is that the comparison is not fair: D's pass by 
value is not *very* different from C++'s pass by ref (in amount 
of data copied).


If you *do* want a fair-er comparison, then I'd suggest you 
implement a ".dup" on your object, and see how many times THAT 
gets called. Guess what: 0.


I'm not saying D's approach is perfect, just that D's library 
is optimized for D-like types.


Just the same, a lot the stl is not properlly optimized for 
little POD types, as they get passed around by ref, when their 
actual size is the same as the pointer referencing them...


I agree. There are cases where structs make a lot of sense, 
usually when they are very simple simple and contain no pointers 
or references, otherwise structs should be avoided in favor of 
classes to avoid doing copy/move constructors and to avoid 
concerns over performance optimizations. With classes, only 
certain points in your code require that a duplicate copy be made 
of the class instance, the majority of code need only to pass 
around a reference which is the default behavior - easy and fast!


--rt


Re: Mixin template function

2013-02-14 Thread cal
On Thursday, 14 February 2013 at 07:40:58 UTC, Jacob Carlborg 
wrote:

This is by design. Foo and A have different overload sets. Try:

alias Foo.foo foo;

http://dlang.org/template-mixin.html

Search for: "Mixin Scope" and pay attention to:

"Alias declarations can be used to overload together functions 
declared in different mixins".


Ahh this is what I was missing, many thanks.