Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread John Colvin via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 07:26:27 UTC, Daniel Kozak wrote:


That's very different to my results.

I see no important difference between ldc and dmd when using 
std.math, but when using core.stdc.math ldc halves its time 
where dmd only manages to get to ~80%


What CPU do you have? On my Intel Core i3 I have similar 
experience as Iov Gherman, but on my Amd FX4200 I have same 
results as you. Seems std.math.log is not good for my AMD CPU :)


Intel Core i5-4278U


Re: std.file.readText() extra Line Feed character

2014-12-23 Thread Ali Çehreli via Digitalmars-d-learn

On 12/19/2014 02:22 AM, Colin wrote:

 On Thursday, 18 December 2014 at 22:29:30 UTC, Ali Çehreli wrote:

 happy with Emacs :p

 Does emacs do this aswell? :)

Emacs can and does do everything: :)


http://www.gnu.org/software/emacs/manual/html_node/emacs/Customize-Save.html

Ali



Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Iov Gherman via Digitalmars-d-learn

That's very different to my results.

I see no important difference between ldc and dmd when using 
std.math, but when using core.stdc.math ldc halves its time 
where dmd only manages to get to ~80%


I checked again today and the results are interesting, on my pc I 
don't see any difference between std.math and core.stdc.math with 
ldc. Here are the results with all compilers.


- with std.math:
dmd: 4 secs, 878 ms
ldc: 5 secs, 650 ms
gdc: 9 secs, 161 ms

- with core.stdc.math:
dmd: 5 secs, 991 ms
ldc: 5 secs, 572 ms
gdc: 7 secs, 957 ms


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread John Colvin via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 10:20:04 UTC, Iov Gherman wrote:

That's very different to my results.

I see no important difference between ldc and dmd when using 
std.math, but when using core.stdc.math ldc halves its time 
where dmd only manages to get to ~80%


I checked again today and the results are interesting, on my pc 
I don't see any difference between std.math and core.stdc.math 
with ldc. Here are the results with all compilers.


- with std.math:
dmd: 4 secs, 878 ms
ldc: 5 secs, 650 ms
gdc: 9 secs, 161 ms

- with core.stdc.math:
dmd: 5 secs, 991 ms
ldc: 5 secs, 572 ms
gdc: 7 secs, 957 ms


These multi-threaded benchmarks can be very sensitive to their 
environment, you should try running it with nice -20 and do 
multiple passes to get a vague idea of the variability in the 
result. Also, it's important to minimise the number of other 
running processes.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Iov Gherman via Digitalmars-d-learn


These multi-threaded benchmarks can be very sensitive to their 
environment, you should try running it with nice -20 and do 
multiple passes to get a vague idea of the variability in the 
result. Also, it's important to minimise the number of other 
running processes.


I did not use the nice parameter but I always ran them multiple 
times and choose the average time. My system has very few running 
processes, minimalist ArchLinux with Xfce4 so I don't think the 
running processes are affecting in any way my tests.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Daniel Kozak via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 10:39:13 UTC, Iov Gherman wrote:


These multi-threaded benchmarks can be very sensitive to their 
environment, you should try running it with nice -20 and do 
multiple passes to get a vague idea of the variability in the 
result. Also, it's important to minimise the number of other 
running processes.


I did not use the nice parameter but I always ran them multiple 
times and choose the average time. My system has very few 
running processes, minimalist ArchLinux with Xfce4 so I don't 
think the running processes are affecting in any way my tests.


And what about single threaded version?

Btw. One reason why DMD is faster is because it use fyl2x X87 
instruction


here is version for others compilers:

import std.math, std.stdio, std.datetime;

enum SIZE = 100_000_000;

version(GNU)
{
real mylog(double x) pure nothrow
{
real result;
double y = LN2;
asm
{
fldl   %2\n
fldl   %1\n
fyl2x
: =t (result) : m (x), m (y);
}
return result;
}
}
else
{
real mylog(double x) pure nothrow
{
return yl2x(x, LN2);
}
}

void main() {

auto t1 = Clock.currTime();
auto logs = new double[SIZE];

foreach (i; 0 .. SIZE)
{
logs[i] = mylog(i + 1.0);
}

auto t2 = Clock.currTime();

writeln(time: , (t2 - t1));
}

But it is faster only on all Intel CPU, but on one of my AMD it 
is slower than core.stdc.log


Re: ini library in OSX

2014-12-23 Thread Robert burner Schadek via Digitalmars-d-learn

as you properly know, ini files don't support sections arrays.
If you know all items at compile time, you could create structs 
for all of them, but that is properly not what you're looking for.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Daniel Kozak via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 10:20:04 UTC, Iov Gherman wrote:

That's very different to my results.

I see no important difference between ldc and dmd when using 
std.math, but when using core.stdc.math ldc halves its time 
where dmd only manages to get to ~80%


I checked again today and the results are interesting, on my pc 
I don't see any difference between std.math and core.stdc.math 
with ldc. Here are the results with all compilers.


- with std.math:
dmd: 4 secs, 878 ms
ldc: 5 secs, 650 ms
gdc: 9 secs, 161 ms

- with core.stdc.math:
dmd: 5 secs, 991 ms
ldc: 5 secs, 572 ms
gdc: 7 secs, 957 ms


Btw. I just noticed small issue with D vs. java, you start 
messure in D before allocation, but in case of Java after 
allocation


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread John Colvin via Digitalmars-d-learn

On Monday, 22 December 2014 at 17:16:49 UTC, Iov Gherman wrote:

On Monday, 22 December 2014 at 17:16:05 UTC, bachmeier wrote:

On Monday, 22 December 2014 at 17:05:19 UTC, Iov Gherman wrote:

Hi Guys,

First of all, thank you all for responding so quick, it is so 
nice to see D having such an active community.


As I said in my first post, I used no other parameters to dmd 
when compiling because I don't know too much about dmd 
compilation flags. I can't wait to try the flags Daniel 
suggested with dmd (-O -release -inline -noboundscheck) and 
the other two compilers (ldc2 and gdc). Thank you guys for 
your suggestions.


Meanwhile, I created a git repository on github and I put 
there all my code. If you find any errors please let me know. 
Because I am keeping the results in a big array the programs 
take approximately 8Gb of RAM. If you don't have enough RAM 
feel free to decrease the size of the array. For java code 
you will also need to change 'compile-run.bsh' and use the 
right memory parameters.



Thank you all for helping,
Iov


Link to your repo?


Sorry, forgot about it:
https://github.com/ghermaniov/benchmarks


For posix-style threads, a per-thread workload of 200 calls to 
log seems rather small. It would interesting to see a graph of 
execution-time as a function of workgroup-size.


Traditionally one would use a workgroup size of (nElements / 
nCores) or similar, in order to get all the cores working but 
also minimise pressure on the scheduler, inter-thread 
communication and so on.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Iov Gherman via Digitalmars-d-learn

And what about single threaded version?


Just ran the single thread examples after I moved time start 
before array allocation, thanks for that, good catch. Still 
better results in Java:


- java:
21 secs, 612 ms

- with std.math:
dmd: 23 secs, 994 ms
ldc: 31 secs, 668 ms
gdc: 52 secs, 576 ms

- with core.stdc.math:
dmd: 30 secs, 724 ms
ldc: 30 secs, 988 ms
gdc: time: 25 secs, 970 ms


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Iov Gherman via Digitalmars-d-learn


Btw. I just noticed small issue with D vs. java, you start 
messure in D before allocation, but in case of Java after 
allocation


Here is the java result for parallel processing after moving the 
start time as the first line in main. Still best result:


4 secs, 50 ms average


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Iov Gherman via Digitalmars-d-learn

Forgot to mention that I pushed my changes to github.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 12:26:28 UTC, Iov Gherman wrote:

And what about single threaded version?


Just ran the single thread examples after I moved time start 
before array allocation, thanks for that, good catch. Still 
better results in Java:


- java:
21 secs, 612 ms

- with std.math:
dmd: 23 secs, 994 ms
ldc: 31 secs, 668 ms
gdc: 52 secs, 576 ms

- with core.stdc.math:
dmd: 30 secs, 724 ms
ldc: 30 secs, 988 ms
gdc: time: 25 secs, 970 ms


Note that log is done in software on x86 with different levels of 
precision and with different ability to handle corner cases. It 
is therefore a very bad benchmark tool.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Daniel Kozak via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 12:31:47 UTC, Iov Gherman wrote:


Btw. I just noticed small issue with D vs. java, you start 
messure in D before allocation, but in case of Java after 
allocation


Here is the java result for parallel processing after moving 
the start time as the first line in main. Still best result:


4 secs, 50 ms average


Java:

Exec time: 6 secs, 421 ms

LDC (-O3 -release -mcpu=native -singleobj -inline 
-boundscheck=off)


time: 5 secs, 321 ms, 877 μs, and 2 hnsecs

GDC(-O3 -frelease -march=native -finline -fno-bounds-check)

time: 5 secs, 237 ms, 453 μs, and 7 hnsecs

DMD(-O -release -inline -noboundscheck)
time: 5 secs, 107 ms, 931 μs, and 3 hnsecs

So all d compilers beat Java in my case:

but I have made some change in D version:

import std.parallelism, std.math, std.stdio, std.datetime;
import core.memory;

enum XMS = 3*1024*1024*1024; //3GB

version(GNU)
{
real mylog(double x) pure nothrow
{
double result;
double y = LN2;
asm
{
fldl   %2\n
fldl   %1\n
fyl2x\n
: =t (result) : m (x), m (y);
}

return result;
}
}
else
{
real mylog(double x) pure nothrow
{
return yl2x(x, LN2);
}
}

void main() {

GC.reserve(XMS);
auto t1 = Clock.currTime();


auto logs = new double[1_000_000_000];  
foreach(i, ref elem; taskPool.parallel(logs, 200)) {
elem = mylog(i + 1.0);
}


auto t2 = Clock.currTime();
writeln(time: , (t2 - t1)); 
}




Is D's GC.calloc and C's memset played the same role?

2014-12-23 Thread FrankLike via Digitalmars-d-learn

Today,I meet a question:get all processes names.

--C++ CODE-
#include stdafx.h
#include windows.h
#include stdio.h//C standard I/O
#include tlhelp32.h

int _tmain(int argc, _TCHAR* argv[])
{
HANDLE 
hProcessSnap=CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS,0);


if(hProcessSnap==INVALID_HANDLE_VALUE)
{
_tprintf(_T(CreateToolhelp32Snapshot error!\n));
return -1;
}

PROCESSENTRY32 pe32;
pe32.dwSize = sizeof(PROCESSENTRY32);

BOOL bMore=Process32First(hProcessSnap,pe32);
int i=0;

_tprintf(_T(PID\t thread nums \t name \n));

while(bMore)
{
bMore=Process32Next(hProcessSnap,pe32);
_tprintf(_T(%u\t),pe32.th32ProcessID);
_tprintf(_T(%u\t),pe32.cntThreads);
_tprintf(_T(%s\n),pe32.szExeFile);

i++;
}

CloseHandle(hProcessSnap);
_tprintf(_T(Count:%d\n),i);

return 0;
}
D code--
import std.stdio;
import std.string;
import core.sys.windows.windows;
import core.memory;
import win32.tlhelp32;

void main()
{

HANDLE 
hProcessSnap=CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS,0);


if(hProcessSnap is null)
{
writeln(CreateToolhelp32Snapshot error!\n);
return ;
}

PROCESSENTRY32* pe32 = 
cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof);


 pe32.dwSize = PROCESSENTRY32.sizeof;

bool bMore=cast(bool)Process32First(hProcessSnap,pe32);
int i=0;

writeln(PID\t thread nums\t name \n);

while(bMore)
{
bMore=cast(bool)Process32Next(hProcessSnap,pe32);
   string s = cast(string)pe32.szExeFile;
   auto a = s.indexOf('\0');
   if(a =0)

writeln(\t,pe32.th32ProcessID,\t,pe32.cntThreads,\t,s[0..a]);

i++;
}

CloseHandle(hProcessSnap);
writeln(format(count:%d,i));

return ;
}
---end--
you will find the different:
 D: PROCESSENTRY32* pe32 = 
cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof);


C++:PROCESSENTRY32 pe32;

GC.calloc means: memset ?!


Re: Is D's GC.calloc and C's memset played the same role?

2014-12-23 Thread Daniel Kozak via Digitalmars-d-learn
FrankLike via Digitalmars-d-learn píše v Út 23. 12. 2014 v 15:37 +:
 Today,I meet a question:get all processes names.
 
 --C++ CODE-
 #include stdafx.h
 #include windows.h
 #include stdio.h//C standard I/O
 #include tlhelp32.h
 
 int _tmain(int argc, _TCHAR* argv[])
 {
  HANDLE 
 hProcessSnap=CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS,0);
 
  if(hProcessSnap==INVALID_HANDLE_VALUE)
  {
  _tprintf(_T(CreateToolhelp32Snapshot error!\n));
  return -1;
  }
 
  PROCESSENTRY32 pe32;
  pe32.dwSize = sizeof(PROCESSENTRY32);
 
  BOOL bMore=Process32First(hProcessSnap,pe32);
  int i=0;
 
  _tprintf(_T(PID\t thread nums \t name \n));
 
  while(bMore)
  {
  bMore=Process32Next(hProcessSnap,pe32);
  _tprintf(_T(%u\t),pe32.th32ProcessID);
  _tprintf(_T(%u\t),pe32.cntThreads);
  _tprintf(_T(%s\n),pe32.szExeFile);
 
  i++;
  }
 
  CloseHandle(hProcessSnap);
  _tprintf(_T(Count:%d\n),i);
 
  return 0;
 }
 D code--
 import std.stdio;
 import std.string;
 import core.sys.windows.windows;
 import core.memory;
 import win32.tlhelp32;
 
 void main()
 {
   
  HANDLE 
 hProcessSnap=CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS,0);
 
  if(hProcessSnap is null)
  {
  writeln(CreateToolhelp32Snapshot error!\n);
  return ;
  }
 
  PROCESSENTRY32* pe32 = 
 cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof);
 
   pe32.dwSize = PROCESSENTRY32.sizeof;
 
  bool bMore=cast(bool)Process32First(hProcessSnap,pe32);
  int i=0;
 
  writeln(PID\t thread nums\t name \n);
 
  while(bMore)
  {
  bMore=cast(bool)Process32Next(hProcessSnap,pe32);
 string s = cast(string)pe32.szExeFile;
 auto a = s.indexOf('\0');
 if(a =0)
  
 writeln(\t,pe32.th32ProcessID,\t,pe32.cntThreads,\t,s[0..a]);
  i++;
  }
 
  CloseHandle(hProcessSnap);
  writeln(format(count:%d,i));
 
  return ;
 }
 ---end--
 you will find the different:
   D: PROCESSENTRY32* pe32 = 
 cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof);
 
 C++:PROCESSENTRY32 pe32;
 
 GC.calloc means: memset ?!
calloc means alloc cleared memory same as malloc but clear all bits to
zero



Re: Is D's GC.calloc and C's memset played the same role?

2014-12-23 Thread ketmar via Digitalmars-d-learn
On Tue, 23 Dec 2014 15:37:12 +
FrankLike via Digitalmars-d-learn digitalmars-d-learn@puremagic.com
wrote:

 you will find the different:
   D: PROCESSENTRY32* pe32 = 
 cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof);
 
 C++:PROCESSENTRY32 pe32;
 
 GC.calloc means: memset ?!

do you see that shining star there? here it is, right in the end:
`PROCESSENTRY32*`. and do you see that same star in C sample?

jokes aside, it's dead simple: C code using stack-allocated struct
(`PROCESSENTRY32` without an inderection) and D code using
heap-allocated struct (`PROCESSENTRY32*` with indirection).

hence C code using `memset()`, yet D code using `GC.calloc()`.


signature.asc
Description: PGP signature


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread via Digitalmars-d-learn
I'm getting faster execution on java thank dmd, gdc beats it 
though.


...although, what this topic really provides is a reason for me 
to get more RAM for my next laptop. How much do you people run 
with? I had to scale the java down to 300 million to avoid dying 
with 4G memory.


Storing arrays as Variant types.

2014-12-23 Thread Winter M. via Digitalmars-d-learn
I've run into a problem while trying to coerce array values from 
a variant; specifically,


char[] a = aVariant.coerce!(char[]); // This works just fine.

byte[] b = bVariant.coerce!(byte[]); // This causes a static 
assertion to fail.


I'm not really sure why a byte[] would be an unsupported type, 
since memory-wise the reference should take up as much space as 
for the char[] (as I understand it).
Perhaps I'm missing something, but I'm lost as to why this is the 
case.


Thanks.


Re: Storing arrays as Variant types.

2014-12-23 Thread Ali Çehreli via Digitalmars-d-learn

Minimal code for convenience to others:

import std.variant;

void main()
{
Variant aVariant;
Variant bVariant;

char[] a = aVariant.coerce!(char[]);
byte[] b = bVariant.coerce!(byte[]);
}

On 12/23/2014 02:57 PM, Winter M. wrote:

 I've run into a problem while trying to coerce array values from a
 variant; specifically,

 char[] a = aVariant.coerce!(char[]); // This works just fine.

 byte[] b = bVariant.coerce!(byte[]); // This causes a static assertion
 to fail.

 I'm not really sure why a byte[] would be an unsupported type, since
 memory-wise the reference should take up as much space as for the char[]
 (as I understand it).
 Perhaps I'm missing something, but I'm lost as to why this is the case.

The difference is that char[] passes the isSomeString test (as it is a 
string) but byte[] does not:



https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L877

Ali



Re: Storing arrays as Variant types.

2014-12-23 Thread ketmar via Digitalmars-d-learn
On Tue, 23 Dec 2014 22:57:07 +
Winter M. via Digitalmars-d-learn digitalmars-d-learn@puremagic.com
wrote:

 I've run into a problem while trying to coerce array values from 
 a variant; specifically,
 
 char[] a = aVariant.coerce!(char[]); // This works just fine.
 
 byte[] b = bVariant.coerce!(byte[]); // This causes a static 
 assertion to fail.
 
 I'm not really sure why a byte[] would be an unsupported type, 
 since memory-wise the reference should take up as much space as 
 for the char[] (as I understand it).
 Perhaps I'm missing something, but I'm lost as to why this is the 
 case.

heh. this is due to how `.coerce!` written. it doesn't really checks
for arrays, what it checks for is:

1. static if (isNumeric!T || isBoolean!T)
2. static if (is(T : Object))
3. static if (isSomeString!(T))

see the gotcha? ;-) both types you requested are not numeric, not
boolean and not objects. but `char[]` satisfies `isSomeString!`, and
`byte[]` doesn't.

i don't sure that coercing is designed to work this way, it seems that
`isSomeString!` is just a hack for coercing to strings.

i.e. with `char[]` variant tries to build some textual representation
of it's value, and with `byte[]` variant simply don't know what to do.

maybe we should allow coercing to `byte[]` and `ubyte[]` with the
defined meaning: get raw binary representation of variant contents.


signature.asc
Description: PGP signature


Re: Is D's GC.calloc and C's memset played the same role?

2014-12-23 Thread FrankLike via Digitalmars-d-learn
On Tuesday, 23 December 2014 at 20:22:12 UTC, ketmar via 
Digitalmars-d-learn wrote:

On Tue, 23 Dec 2014 15:37:12 +
FrankLike via Digitalmars-d-learn 
digitalmars-d-learn@puremagic.com

wrote:


you will find the different:
  D: PROCESSENTRY32* pe32 = 
cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof);


C++:PROCESSENTRY32 pe32;

GC.calloc means: memset ?!


do you see that shining star there? here it is, right in the 
end:

`PROCESSENTRY32*`. and do you see that same star in C sample?


Yes,if you not do like it,it  will  not  work.


jokes aside, it's dead simple: C code using stack-allocated


Not joke.it works fine,you can  run  it.
Not  C,it's  C++.

struct
(`PROCESSENTRY32` without an inderection) and D code using
heap-allocated struct (`PROCESSENTRY32*` with indirection).

hence C code using `memset()`, yet D code using `GC.calloc()`.




Re: Is D's GC.calloc and C's memset played the same role?

2014-12-23 Thread ketmar via Digitalmars-d-learn
On Wed, 24 Dec 2014 00:24:44 +
FrankLike via Digitalmars-d-learn digitalmars-d-learn@puremagic.com
wrote:

 On Tuesday, 23 December 2014 at 20:22:12 UTC, ketmar via 
 Digitalmars-d-learn wrote:
  On Tue, 23 Dec 2014 15:37:12 +
  FrankLike via Digitalmars-d-learn 
  digitalmars-d-learn@puremagic.com
  wrote:
 
  you will find the different:
D: PROCESSENTRY32* pe32 = 
  cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof);
  
  C++:PROCESSENTRY32 pe32;
  
  GC.calloc means: memset ?!
 
  do you see that shining star there? here it is, right in the 
  end:
  `PROCESSENTRY32*`. and do you see that same star in C sample?
 
 Yes,if you not do like it,it  will  not  work.
 
  jokes aside, it's dead simple: C code using stack-allocated
 
 Not joke.it works fine,you can  run  it.
 Not  C,it's  C++.
  struct
  (`PROCESSENTRY32` without an inderection) and D code using
  heap-allocated struct (`PROCESSENTRY32*` with indirection).
 
  hence C code using `memset()`, yet D code using `GC.calloc()`.

you did quoted the relevant part. let me repeat it:

C code using stack-allocated struct (`PROCESSENTRY32` without an
inderection) and D code using heap-allocated struct (`PROCESSENTRY32*`
with indirection).

hence C code using `memset()`, yet D code using `GC.calloc()`.

i.e. D code using *pointer* *to* *struct*, so you must allocate it
manually.


signature.asc
Description: PGP signature


Getting DAllegro 5 to work in Windows

2014-12-23 Thread Joel via Digitalmars-d-learn
I can't get implib.exe (http://ftp.digitalmars.com/bup.zip) to 
produce .lib files from dlls (https://www.allegro.cc/files/). I 
think it works for other people.


Thanks for any help.