Re: toUTFz and WinAPI GetTextExtentPoint32W

2011-09-21 Thread Dmitry Olshansky

On 21.09.2011 4:04, Timon Gehr wrote:

On 09/21/2011 01:57 AM, Christophe wrote:

Jonathan M Davis , dans le message (digitalmars.D.learn:29637), a
écrit :

On Tuesday, September 20, 2011 14:43 Andrej Mitrovic wrote:

On 9/20/11, Jonathan M Davisjmdavisp...@gmx.com wrote:

Or std.range.walkLength. I don't know why we really have
std.utf.count. I
just
calls walkLength anyway. I suspect that it's a function that predates
walkLength and was made to use walkLength after walkLength was
introduced. But
it's kind of pointless now.

- Jonathan M Davis


I don't think having better-named aliases is a bad thing. Although now
I'm seeing it's not just an alias but a function.




std.utf.count has on advantage: someone looking for the function will
find it. The programmer might not look in std.range to find a function
about UFT strings, and even if he did, it is not indicated in walkLength
that it works with (narrow) strings the way it does. To know you can use
walklength, you must know that:
-popFront works differently in string.
-hasLength is not true for strings.
-what is walkLength.

So yes, you experienced programmer don't need std.utf.count, but newbies
do.

Last point: WalkLength is not optimized for strings.
std.utf.count should be.

This short implementation of count was 3 to 8 times faster than
walkLength is a simple benchmark:

size_t myCount(string text)
{
size_t n = text.length;
for (uint i=0; itext.length; ++i)
{
auto s = text[i]6;
n -= (s1) - ((s+1)2);
}
return n;
}

(compiled with gdc on 64 bits, the sample text was the introduction of
french wikipedia UTF-8 article down to the sommaire -
http://fr.wikipedia.org/wiki/UTF-8 ).

The reason is that the loop can be unrolled by the compiler.


Very good point, you might want to file an enhancement request. It would
make the functionality different enough to prevent count from being
removed: walkLength throws on an invalid UTF sequence.


Actually, I don't buy it. I guess the reason it's faster is that it 
doesn't check if the codepoint is valid. In fact you can easily get 
ridiculous overflowed negative lengths. Maybe we can put it here as 
unsafe and fast version though.
Also check std.utf.stride to see if you can get it better, it's the 
beast behind narrow string popFront.


--
Dmitry Olshansky


Re: Issuing compile-time warnings with line numbers?

2011-09-21 Thread Andrej Mitrovic
On 9/21/11, Jacob Carlborg d...@me.com wrote:
 Have a look at: http://d-programming-language.org/templates-revisited.html

Right, but as I've said conv.to works at compile-time so that's
unnecessary. Maybe adding a note there about this would be nice, so
people don't spend time reimplementing common compile-time conversions
that conv.to can already do.


Re: toUTFz and WinAPI GetTextExtentPoint32W

2011-09-21 Thread Timon Gehr

On 09/21/2011 02:15 AM, Christophe wrote:

Timon Gehr , dans le message (digitalmars.D.learn:29641), a écrit :

Last point: WalkLength is not optimized for strings.
std.utf.count should be.

This short implementation of count was 3 to 8 times faster than
walkLength is a simple benchmark:

size_t myCount(string text)
{
size_t n = text.length;
for (uint i=0; itext.length; ++i)
  {
auto s = text[i]6;
n -= (s1) - ((s+1)2);
  }
return n;
}

(compiled with gdc on 64 bits, the sample text was the introduction of
french wikipedia UTF-8 article down to the sommaire -
http://fr.wikipedia.org/wiki/UTF-8 ).

The reason is that the loop can be unrolled by the compiler.


Very good point, you might want to file an enhancement request. It would
make the functionality different enough to prevent count from being
removed: walkLength throws on an invalid UTF sequence.


I would be glad to do so, but I am quite new here, so I don't know how
to. A little pointer could help.



http://d.puremagic.com/issues/

You can tick 'Severity: enhancement request'. Probably it would be best 
if it throws if the final result is larger than text.length though.





Re: toUTFz and WinAPI GetTextExtentPoint32W

2011-09-21 Thread Timon Gehr

On 09/21/2011 12:37 PM, Dmitry Olshansky wrote:

On 21.09.2011 4:04, Timon Gehr wrote:

On 09/21/2011 01:57 AM, Christophe wrote:

Jonathan M Davis , dans le message (digitalmars.D.learn:29637), a
écrit :

On Tuesday, September 20, 2011 14:43 Andrej Mitrovic wrote:

On 9/20/11, Jonathan M Davisjmdavisp...@gmx.com wrote:

Or std.range.walkLength. I don't know why we really have
std.utf.count. I
just
calls walkLength anyway. I suspect that it's a function that predates
walkLength and was made to use walkLength after walkLength was
introduced. But
it's kind of pointless now.

- Jonathan M Davis


I don't think having better-named aliases is a bad thing. Although now
I'm seeing it's not just an alias but a function.




std.utf.count has on advantage: someone looking for the function will
find it. The programmer might not look in std.range to find a function
about UFT strings, and even if he did, it is not indicated in walkLength
that it works with (narrow) strings the way it does. To know you can use
walklength, you must know that:
-popFront works differently in string.
-hasLength is not true for strings.
-what is walkLength.

So yes, you experienced programmer don't need std.utf.count, but newbies
do.

Last point: WalkLength is not optimized for strings.
std.utf.count should be.

This short implementation of count was 3 to 8 times faster than
walkLength is a simple benchmark:

size_t myCount(string text)
{
size_t n = text.length;
for (uint i=0; itext.length; ++i)
{
auto s = text[i]6;
n -= (s1) - ((s+1)2);
}
return n;
}

(compiled with gdc on 64 bits, the sample text was the introduction of
french wikipedia UTF-8 article down to the sommaire -
http://fr.wikipedia.org/wiki/UTF-8 ).

The reason is that the loop can be unrolled by the compiler.


Very good point, you might want to file an enhancement request. It would
make the functionality different enough to prevent count from being
removed: walkLength throws on an invalid UTF sequence.


Actually, I don't buy it. I guess the reason it's faster is that it
doesn't check if the codepoint is valid. In fact you can easily get
ridiculous overflowed negative lengths.


Most of these could be caught by a final check. I think having the 
option of a version that is so much faster would be nice. Chances are 
pretty high that code actually manipulating the string will throw 
eventually if it is invalid.


 Maybe we can put it here as

unsafe and fast version though.
Also check std.utf.stride to see if you can get it better, it's the
beast behind narrow string popFront.





Re: Issuing compile-time warnings with line numbers?

2011-09-21 Thread Jacob Carlborg

On 2011-09-21 13:59, Andrej Mitrovic wrote:

On 9/21/11, Jacob Carlborgd...@me.com  wrote:

Have a look at: http://d-programming-language.org/templates-revisited.html


Right, but as I've said conv.to works at compile-time so that's
unnecessary. Maybe adding a note there about this would be nice, so
people don't spend time reimplementing common compile-time conversions
that conv.to can already do.


Oh, I missed one of your posts, sorry.

--
/Jacob Carlborg


Re: toUTFz and WinAPI GetTextExtentPoint32W

2011-09-21 Thread Christophe
 Actually, I don't buy it. I guess the reason it's faster is that it 
 doesn't check if the codepoint is valid.

Why should it ? The documentation of std.utf.count says the string must 
be validly encoded, not that it will enforce that it is.
Checking a string is valid everytime you use it would be very expensive.

Actually, std.range.walkLength does not check the sequence is valid. See 
this test:

void main()
{
  string text = aléluyah;
  char[] text2 = text.dup;
  text2[3] = 'a';
  writeln(walkLength(text2)); // outputs: 8
  writeln(text2); // outputs: al\303aluyah
}

There is probably a way to check an utf sequence is valid with an 
unrollable loop.

 In fact you can easily get ridiculous overflowed negative lengths. 
 Maybe we can put it here as unsafe and fast version though.

Unless I am mistaken, the minimum length myCount can return is 0 even 
if the string is invalid.

 Also check std.utf.stride to see if you can get it better, it's the 
 beast behind narrow string popFront.

stride does not make much checking. It can even return 5 or 6, which is 
not possible for a valid utf-8 string !

The equivalent of myCount to stride would be:

size_t myStride(char c)
{
// optional:
// if ( (((c7)+1)1) - (((c6)+1)2) + (((c3)+1)5))
// throw new UtfException(Not the start of the UTF-8 sequence);
return 1 + (((c6)+1)2) + (((c5)+1)3) + (((c4)+1)4);
}

That I compared to:

size_t utfLikeStride(char c)
{
  // optional:
  // immutable result = UTF8stride[c];
  // if (result == 0xFF)
  // throw new UtfException(Not the start of the UTF-8 sequence);
  // return result;
  return UTF8stride[c];
}

One table lookup is replaced by byte some arythmetic in myStride.

I also took only one char as input, since stride only looked at the i-th 
character. Actually, if stride signature is kept to uint stride(char[] 
s, int i), I did not find any change with -O3.

Average times for a lot of calls:
(compiled with gcc, tested with -O3 and a homogenous distribution of 
valid characters from '\x00'..'\x7F' and '\xC2'..'\xF4')

myStride no throws:  1112ms.
utfLikeStride no throws: 1433ms.
utfLikeStride throws:1868ms. (the current implementation).
myStride throws: 8269ms.

Removing throws from utfLikeStride makes it about 25% faster.
Removing throws from myStride makes it about 7 times faster.

With -O0, myStride gets less 10% slower than utfLikeStride (no throws).

In conclusion, the fastest implementation is myStride without throws, 
and it beats the current implementation by about 40%. Changing 
std.utf.stride may be desirable. As I said earlier, the throws do 
not enforce the validity of the string. Really checking the validity of 
the string would cost much more, which may not be desirable, so why 
bother checking at all? A more serious benchmark could justify to change 
std.utf.stride. The improvement could be even better in real situation, 
because the lookup table of utfLikeStride may not be always at hand - 
this actually really depends on what the compiler does.

In any case, this may not improve walkLength by more than a few 
percents.

-- 
Christophe

now I'll go back to my real work...


Heap fucntion calls

2011-09-21 Thread deadalnix
D has a wonderfull feature named delegate. Delegate can acess local 
data, thus would be dangerous if thoses data were on the stack. For what 
I understand, when a delegate can access the local data of a function, 
those data are set on the heap instead of the stack, resulting on a 
slower function call, but on a safe delegate behaviour.


I'm wondering what's going on behind the hood when such a function is 
called. are the parameter passed to the function on the stack and the 
copied on the heap ? In such a situation, data are copied two times. 
Will a postblit constructor be called two times ? Or is the function 
taggued as « heap function » and then only the pointer is passed in the 
function call ?


Secondly, how does thing like scope(exit) are handled in such a case ? 
When the constext is collected by the GC ? When the function ends it's 
execution ? The try {} finally {} analogy suggest the second one, but 
this is definitively not an exit of the scope, the scope being still 
accsible throw the delegate.


Those are exemple but more generaly, my question isn't about thoses 
exemples. It is about what really is going on. Let's say, what would be 
the C translation of such a function call or somethung similar.


Thank by adavnce,

deadalnix


Re: toUTFz and WinAPI GetTextExtentPoint32W

2011-09-21 Thread zeljkog

On 21.09.2011 01:57, Christophe wrote:


size_t myCount(string text)
{
   size_t n = text.length;
   for (uint i=0; itext.length; ++i)
 {
   auto s = text[i]6;
   n -= (s1) - ((s+1)2);
 }
   return n;
}



Here is a more readable and a bit faster version on dmd windows:

size_t utfCount(string text)
{
size_t n = 0;
for (uint i=0; itext.length; ++i)
 n += ((text[i]6)^0b10)? 1: 0;
return n;
}


Re: toUTFz and WinAPI GetTextExtentPoint32W

2011-09-21 Thread Dmitry Olshansky

On 21.09.2011 18:47, Christophe wrote:

Actually, I don't buy it. I guess the reason it's faster is that it
doesn't check if the codepoint is valid.


Why should it ? The documentation of std.utf.count says the string must
be validly encoded, not that it will enforce that it is.
Checking a string is valid everytime you use it would be very expensive.

Actually, std.range.walkLength does not check the sequence is valid. See
this test:

void main()
{
   string text = aléluyah;
   char[] text2 = text.dup;
   text2[3] = 'a';
   writeln(walkLength(text2)); // outputs: 8
   writeln(text2); // outputs: al\303aluyah
}


Ouch, the checking is apparently very loosy.



There is probably a way to check an utf sequence is valid with an
unrollable loop.


In fact you can easily get ridiculous overflowed negative lengths.
Maybe we can put it here as unsafe and fast version though.


Unless I am mistaken, the minimum length myCount can return is 0 even
if the string is invalid.


Yeah, a brain malfunction on my part.




Also check std.utf.stride to see if you can get it better, it's the
beast behind narrow string popFront.


stride does not make much checking. It can even return 5 or 6, which is
not possible for a valid utf-8 string !

The equivalent of myCount to stride would be:

size_t myStride(char c)
{
 // optional:
 // if ( (((c7)+1)1) - (((c6)+1)2) + (((c3)+1)5))
 // throw new UtfException(Not the start of the UTF-8 sequence);
 return 1 + (((c6)+1)2) + (((c5)+1)3) + (((c4)+1)4);
}

That I compared to:

size_t utfLikeStride(char c)
{
   // optional:
   // immutable result = UTF8stride[c];
   // if (result == 0xFF)
   // throw new UtfException(Not the start of the UTF-8 sequence);
   // return result;
   return UTF8stride[c];
}

One table lookup is replaced by byte some arythmetic in myStride.

I also took only one char as input, since stride only looked at the i-th
character. Actually, if stride signature is kept to uint stride(char[]
s, int i), I did not find any change with -O3.

Average times for a lot of calls:
(compiled with gcc, tested with -O3 and a homogenous distribution of
valid characters from '\x00'..'\x7F' and '\xC2'..'\xF4')

myStride no throws:  1112ms.
utfLikeStride no throws: 1433ms.
utfLikeStride throws:1868ms. (the current implementation).
myStride throws: 8269ms.

I wonder what impact may have if any changing 0xff to 0x00 in 
implementation of utfLikeStride. It should amount to cmp vs test, not 
sure if it matters much.



Removing throws from utfLikeStride makes it about 25% faster.
Removing throws from myStride makes it about 7 times faster.

With -O0, myStride gets less 10% slower than utfLikeStride (no throws).

In conclusion, the fastest implementation is myStride without throws,
and it beats the current implementation by about 40%. Changing
std.utf.stride may be desirable. As I said earlier, the throws do
not enforce the validity of the string. Really checking the validity of
the string would cost much more, which may not be desirable, so why
bother checking at all?


The truth is I'd checked this in the past (though I used some bsr black 
magic) and if I kept check in place the end result was always slower 
then current. But since the check is not very accurate anyway, maybe it 
can be replaced. It's problematic if some code happen to depend on it. 
(given the doc it should not)


 A more serious benchmark could justify to change
 std.utf.stride. The improvement could be even better in real situation,
 because the lookup table of utfLikeStride may not be always at hand -
 this actually really depends on what the compiler does.

Yes and no, I think it would be hard to find app that bottlenecks at 
traversing UTF, on decoding - maybe. Generally if you do a lot calls to 
stride it's in cache, if not it doesn't matter much(?). Though I'd 
prefer non-tabulated version



In any case, this may not improve walkLength by more than a few
percents.



Then specializing walkLength to do your unrollable version seems like 
good idea.


--
Dmitry Olshansky


Re: toUTFz and WinAPI GetTextExtentPoint32W

2011-09-21 Thread zeljkog

On 21.09.2011 19:12, Christophe Travert wrote:

Nice. It is better with gdc linux 64bits too. I wanted to avoid
conditional expressions like ?: but it's actually slightly faster that
way.


It is not compiled in as conditional jump.


Re: Heap fucntion calls

2011-09-21 Thread Simen Kjaeraas

On Wed, 21 Sep 2011 18:32:49 +0200, deadalnix deadal...@gmail.com wrote:

D has a wonderfull feature named delegate. Delegate can acess local  
data, thus would be dangerous if thoses data were on the stack. For what  
I understand, when a delegate can access the local data of a function,  
those data are set on the heap instead of the stack, resulting on a  
slower function call, but on a safe delegate behaviour.


I'm wondering what's going on behind the hood when such a function is  
called. are the parameter passed to the function on the stack and the  
copied on the heap ? In such a situation, data are copied two times.  
Will a postblit constructor be called two times ? Or is the function  
taggued as « heap function » and then only the pointer is passed in the  
function call ?


It's the latter. A delegate is simply a function pointer/context pointer
pair, and the exact same thing is used for pointers to member functions
as for lexical closures.


Secondly, how does thing like scope(exit) are handled in such a case ?  
When the constext is collected by the GC ? When the function ends it's  
execution ? The try {} finally {} analogy suggest the second one, but  
this is definitively not an exit of the scope, the scope being still  
accsible throw the delegate.


scope(exit) foo();
// stuff

is simply rewritten as

try {
// stuff
}
finally {
foo();
}

Hence, again the latter is the case. In this case:

string delegate() foo() {
string s = initialized;
scope( exit ) s = destroyed;
auto ret = (){return s;}
return ret;
}

void bar() {
assert(foo()() == destroyed);
}

The assert passes.


Those are exemple but more generaly, my question isn't about thoses  
exemples. It is about what really is going on. Let's say, what would be  
the C translation of such a function call or somethung similar.


void foo() {
int x = 5;
auto dg = () {x = 4;}
dg();
}

is roughly equivalent to:

typedef struct foo_dg_1_delegate {
void (*funcptr)(struct foo_dg_1_context*);
void* ptr;
};

typedef struct foo_dg_1_context {
int x;
};

void foo_dg_1(struct foo_dg_1_context* ctx) {
ctx-x = 4;
}

void foo(void) {
struct foo_dg_1_delegate dg;
struct foo_dg_1_context* ctx = (struct  
foo_dg_1_context*)malloc(sizeof(struct foo_dg_1_context));

dg.funcptr = foo_dg_1;
dg.ptr = ctx;
ctx-x = 5;
dg.funcptr(dg.ptr);
}

--
  Simen


Re: const-immutable array argument?

2011-09-21 Thread bearophile
Daniel Murphy:

 It's a bug, the compiler shouldn't be inserting a cast when the value 
 implicitly converts.  It's also a bug when the compiler tries to optimise 
 away the variable to a literal when passing by reference, I've got a patch 
 for this I haven't written up yet. 

http://d.puremagic.com/issues/show_bug.cgi?id=6708

Thank you for all the answers.

Bye,
bearophile


Re: Heap fucntion calls

2011-09-21 Thread deadalnix

Great answer ! Thank you very much, it answered almost everything !

But what about, in the exemple you gave me (which is great by the way) 
if foo as parameters ? Those parameters are passed on the stack by copy 
to the function, and then, copied to the heap (resulting in two copies) ?


Le 21/09/2011 19:56, Simen Kjaeraas a écrit :

void foo() {
int x = 5;
auto dg = () {x = 4;}
dg();
}

is roughly equivalent to:

typedef struct foo_dg_1_delegate {
void (*funcptr)(struct foo_dg_1_context*);
void* ptr;
};

typedef struct foo_dg_1_context {
int x;
};

void foo_dg_1(struct foo_dg_1_context* ctx) {
ctx-x = 4;
}

void foo(void) {
struct foo_dg_1_delegate dg;
struct foo_dg_1_context* ctx = (struct
foo_dg_1_context*)malloc(sizeof(struct foo_dg_1_context));
dg.funcptr = foo_dg_1;
dg.ptr = ctx;
ctx-x = 5;
dg.funcptr(dg.ptr);
}





Conditional Compilation with Version

2011-09-21 Thread alex
Hi Y'all!! Just as a note, I am new to the news group, but slightly less 
new to D =)


Back on topic:

I am unable to get multiple version specifications to work (from the 
website)


sometihng like:

version (foo) {
   version = bar;
   version = baz;
}
version (bar) {
   ... codes 'n' stuff
}
version (baz) {
   ... more codez
}

every time I get an error which says rhe version statement wants 
(statement), not '=', unlike from the website article on language


is this simply a deprecated feature, or am I doing something wrong?

pssst... I use DMD 2.055 on linux x86 (ubuntu, dont be a hater)


Re: How to read output of a script

2011-09-21 Thread alex

On 09/20/2011 10:51 PM, Jonathan M Davis wrote:

On Wednesday, September 21, 2011 04:40:34 Cheng Wei wrote:

Thanks a lot.
Weird. It is not in the library reference in http://www.d-programming-
language.org/, but it is in the library reference in digitalmars.com. I
throught the previous one was the official web site now. It seems it
still is not synced well.


That's easily answered. It looks like the documentation is on the windows
version of the function instead of in a version block specifically for the
documentation. Obviously, that needs to be fixed. I know that when Walter
generates the docs for the digitalmars site and the zip file, he uses Windows,
and my guess is that Andrei uses Linux for d-programming-language.org.
Regardless, d-programming-language.org is the official site now.

- Jonathan M Davis

One for linux =)


I can't build dsfml2 or derelict.sfml whit dsss

2011-09-21 Thread Cuauhtémoc Ledesma
First at all sorry for my english.

I've tried to build any binding of sfml in a 32-bit machine with archlinux.
My problem with dsfml2 is similar to this
http://www.digitalmars.com/d/archives/digitalmars/D/learn/Buliding_DSFML2_64-bit_Linux_25694.html.
After installing mingw32-pthreads (what i don't know if is the correct
library) the problem persist and I don't know which library link. But this
only happens when i try to compile an individual file with dmd, not with
dsss build.

After trying to build derelict (using the command dsss net install
derelict) to get derelict.sfml, I figured that the problem maybe is dsss,
becouse every time I invoke it the output is as if I hadn't written anything
after the dsss command, what is not true. I know that there is a derelict2
packages in yaourt, but i get some errors when I try to install it.

This is insane , I can't get any binding of this particular library becouse
I can't even get that the tools work properly (I also tried in a mac but I
get some different errors). So if any one can help me to solve any of this
problems I will extremly grateful.

Thanks.