Am 28.08.2012 01:53, schrieb Sean Kelly:
On Aug 24, 2012, at 1:16 PM, David d...@dav1d.de wrote:
That's not the problem. The problem has nothing to do with the tessellation,
since the *rendering* is also 1000% slower (when all data is already processed).
Is the alignment different between
David:
The arrays are 100% identical (I dumped a Vertex()-array and a
raw float-array, they were 100% identical).
I hope some people are realizing how much time is being wasted in
this thread. Taking a look at the asm is my suggestion still. If
someone is rusty in asm, it's time to brush
Am 28.08.2012 17:41, schrieb bearophile:
David:
The arrays are 100% identical (I dumped a Vertex()-array and a raw
float-array, they were 100% identical).
I hope some people are realizing how much time is being wasted in this
thread. Taking a look at the asm is my suggestion still. If
David:
I generally tend to ignore dmd bugs and just workaround them, I
don't have the time to track down every stuipid bug from a ~8k
codebase.
I understand you don't care much anymore for the discussed
problem, and I know that localizing D/DMD bugs requires some time
and work.
But I'd
But I'd like you to not ignore all the bugs you find, and instead
minimize some of them and submit them to Bugzilla. Despite thousands of
open bugs and about a hundred of open patches, many bugs do get fixed at
every release. If you submit bugs, D/DMD will improve, in your future
you will find
On 08/28/2012 06:35 PM, David wrote:
Am 28.08.2012 17:41, schrieb bearophile:
David:
The arrays are 100% identical (I dumped a Vertex()-array and a raw
float-array, they were 100% identical).
I hope some people are realizing how much time is being wasted in this
thread. Taking a look at the
Use this to create a minimal test case with minimal user interaction:
https://github.com/CyberShadow/DustMite
Doesn't help if dmd doesn't crash, or?
On 08/29/2012 01:26 AM, David wrote:
Use this to create a minimal test case with minimal user interaction:
https://github.com/CyberShadow/DustMite
Doesn't help if dmd doesn't crash, or?
It doesn't help a lot if compilation succeeds, but you stated that you
generally tend to ignore dmd bugs.
On Wed, 29 Aug 2012, Timon Gehr wrote:
On 08/29/2012 01:26 AM, David wrote:
Use this to create a minimal test case with minimal user interaction:
https://github.com/CyberShadow/DustMite
Doesn't help if dmd doesn't crash, or?
It doesn't help a lot if compilation succeeds, but you
On Aug 24, 2012, at 1:16 PM, David d...@dav1d.de wrote:
That's not the problem. The problem has nothing to do with the tessellation,
since the *rendering* is also 1000% slower (when all data is already
processed).
Is the alignment different between one and the other? I would't think so
Am 24.07.2012 20:38, schrieb David:
I am writing a game engine, well I was using a float[] array to store my
vertices, this worked well, but I have to send more and more uv
coordinates (and other information) which needn't be stored as `float`'s
so I moved from a float-Array to a Vertex Array:
Check the dissassembly view of this line:
buffer[elements++] = Vertex(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);
If you are using an old version of dmd it will allocate an block of
memory which has the size of Vertex, then it will fill the date into
that block of memory, and then memcpy it to
Am 26.07.2012 21:18, schrieb David:
Hm. Do you ever do pointer arithmetic on Vertex*? Is the size and
offsets are correct (like in Vertex vs float)?
No, yes. I really have no idea why this happens, I saved the contents of
my buffers and compared them with the buffers of the `float[]` version
Ok, interesting thing.
I switched my buffer from Vertex* to void* and I cast every Vertex I get
to void[] and add it to the buffer (slice → memcopy) and everything
works fine now. I can live with that (once the basic functions are
implemented it's not even a pain to use), but still, I wonder
On 26-Jul-12 14:14, David wrote:
Ok, interesting thing.
I switched my buffer from Vertex* to void* and I cast every Vertex I get
to void[] and add it to the buffer (slice → memcopy) and everything
works fine now. I can live with that (once the basic functions are
implemented it's not even a
Hm. Do you ever do pointer arithmetic on Vertex*? Is the size and
offsets are correct (like in Vertex vs float)?
No, yes. I really have no idea why this happens, I saved the contents of
my buffers and compared them with the buffers of the `float[]` version
(thanks to `git checkout`) and they
Am 25.07.2012 01:10, schrieb Era Scarecrow:
Remvoing the `align(1)` changes nothing, not 1ms slower or faster,
unfortunately.
[quote]
[code]
Vertex[] data;
foreach(i; 0..6) {
data ~= Vertex(positions[i][0], positions[i][1], positions[i][2],
[/code]
[/quote]
Try using reserve? The new
I'll try a different compiler, too.
It's the same issue with ldc
Have you checked your default compiler/linker args?
Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:
I'll try a different compiler, too.
It's the same issue with ldc
Am 25.07.2012 15:44, schrieb Andrea Fontana:
Have you checked your default compiler/linker args?
Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:
I'll try a different compiler, too.
It's the same issue with ldc
They didn't change (of course I changed the args which are
Ok here we go:
perf.data: http://dav1d.de/perf.data
and a fancy image (showing the results of perf): http://dav1d.de/output.png
I hope anyone knows where the time is spent.
Most time spent:
+ 53,14% bralad [unknown] [k] 0xc01e5d2b
I had a performance problem with std.xml some month ago. It takes me a
lot to point out that there was a default linker param (in gdc dmd
under linux) that slow down the whole thing.
So maybe it's not a code-related issue, I mean :)
Il giorno mer, 25/07/2012 alle 15.53 +0200, David ha
On 25-Jul-12 17:54, David wrote:
Ok here we go:
perf.data: http://dav1d.de/perf.data
and a fancy image (showing the results of perf): http://dav1d.de/output.png
I hope anyone knows where the time is spent.
Most time spent:
+ 53,14% bralad [unknown] [k] 0xc01e5d2b
Would
Am 25.07.2012 16:23, schrieb Dmitry Olshansky:
On 25-Jul-12 17:54, David wrote:
Ok here we go:
perf.data: http://dav1d.de/perf.data
and a fancy image (showing the results of perf):
http://dav1d.de/output.png
I hope anyone knows where the time is spent.
Most time spent:
+ 53,14% bralad
On 25-Jul-12 19:32, David wrote:
Am 25.07.2012 16:23, schrieb Dmitry Olshansky:
On 25-Jul-12 17:54, David wrote:
Ok here we go:
perf.data: http://dav1d.de/perf.data
and a fancy image (showing the results of perf):
http://dav1d.de/output.png
I hope anyone knows where the time is spent.
Most
It looks like a syscall/opengl issue. You somehow managed to hit a dark
corner of GL driver. It's either a fallback to software (partial) or
some extra translation layer.
I once had a cool table that showed which GL calls are direct to
hardware and which are not for various nvidia cards.
Now
On 26-Jul-12 00:52, David wrote:
It looks like a syscall/opengl issue. You somehow managed to hit a dark
corner of GL driver. It's either a fallback to software (partial) or
some extra translation layer.
I once had a cool table that showed which GL calls are direct to
hardware and which are not
Am 25.07.2012 23:03, schrieb Dmitry Olshansky:
On 26-Jul-12 00:52, David wrote:
It looks like a syscall/opengl issue. You somehow managed to hit a dark
corner of GL driver. It's either a fallback to software (partial) or
some extra translation layer.
I once had a cool table that showed which GL
David:
Well the intersting question is, why is it slower? I checked it
twice, the data passed to the GPU is 100% the same, no
difference, the only difference is the stored format on the CPU
(and that's just a matter of casting).
It's not easy to answer similar general questions. Why don't
Am 26.07.2012 00:12, schrieb Ali Çehreli:
On 07/24/2012 11:38 AM, David wrote:
Well this change decreases my performance by 1000%.
Random guess: CPU cache misses?
Ali
You're the 2nd one mentioning this, any ideas how to check this?
It's not easy to answer similar general questions. Why don't you list
the assembly of the two versions and compare?
My assembly is pretty rusty and actually, I have no idea what to look for.
On 07/25/2012 03:26 PM, David wrote:
Am 26.07.2012 00:12, schrieb Ali Çehreli:
On 07/24/2012 11:38 AM, David wrote:
Well this change decreases my performance by 1000%.
Random guess: CPU cache misses?
Ali
You're the 2nd one mentioning this, any ideas how to check this?
I have no
Am 26.07.2012 00:37, schrieb Ali Çehreli:
On 07/25/2012 03:26 PM, David wrote:
Am 26.07.2012 00:12, schrieb Ali Çehreli:
On 07/24/2012 11:38 AM, David wrote:
Well this change decreases my performance by 1000%.
Random guess: CPU cache misses?
Ali
You're the 2nd one mentioning this, any
I am writing a game engine, well I was using a float[] array to store my
vertices, this worked well, but I have to send more and more uv
coordinates (and other information) which needn't be stored as `float`'s
so I moved from a float-Array to a Vertex Array:
David:
align(1) struct Vertex {
float x;
float y;
float z;
float nx;
float ny;
float nz;
float u_terrain;
float v_terrain;
float u_biome;
float v_biome;
}
Everything is still a float, so it's easier. Nothing wrong with
that or? Well this change
Am 24.07.2012 20:57, schrieb bearophile:
David:
Everything is still a float, so it's easier. Nothing wrong with that
or? Well this change decreases my performance by 1000%.
Aligning floats to 1 byte doesn't seem a good idea. Try to remove the
aling(1).
Bye,
bearophile
This makes no
On Tue, Jul 24, 2012 at 08:57:08PM +0200, bearophile wrote:
David:
align(1) struct Vertex {
float x;
float y;
float z;
float nx;
float ny;
float nz;
float u_terrain;
float v_terrain;
float u_biome;
float v_biome;
}
Everything is still a
On Tue, Jul 24, 2012 at 09:08:10PM +0200, David wrote:
Am 24.07.2012 20:57, schrieb bearophile:
David:
Everything is still a float, so it's easier. Nothing wrong with that
or? Well this change decreases my performance by 1000%.
Aligning floats to 1 byte doesn't seem a good idea. Try to
On 24/07/2012 20:08, David wrote:
Am 24.07.2012 20:57, schrieb bearophile:
David:
Everything is still a float, so it's easier. Nothing wrong with that
or? Well this change decreases my performance by 1000%.
Aligning floats to 1 byte doesn't seem a good idea. Try to remove the
aling(1).
Bye,
Could be that your structs are getting default initialised so you will
be getting a constructor called for every instance of a Vertex.
This will be a lot slower than a float array.
Try void initialising your Vertex arrays.
http://dlang.org/declaration.html
See the bit Void Initializations near
I agree. I don't know how the CPU handles misaligned floats, but from
what I understand, it will do two loads to fetch the two word-aligned
parts of the float, and then assemble it together. This may be what's
causing the slowdown.
T
Remvoing the `align(1)` changes nothing, not 1ms slower or
Hmm. Could this be a GC-related issue?
Actually this could be. They are stored inside a Vertex* array which is
allocated which is allocated with `malloc`, maybe the GC scans all of
the created vertex structs? Could this be?
Am 24.07.2012 21:46, schrieb David:
Hmm. Could this be a GC-related issue?
Actually this could be. They are stored inside a Vertex* array which is
allocated which is allocated with `malloc`, maybe the GC scans all of
the created vertex structs? Could this be?
import core.memory;
On Tue, Jul 24, 2012 at 10:53:05PM +0200, David wrote:
Am 24.07.2012 21:46, schrieb David:
Hmm. Could this be a GC-related issue?
Actually this could be. They are stored inside a Vertex* array which is
allocated which is allocated with `malloc`, maybe the GC scans all of
the created vertex
On Tue, 24 Jul 2012 22:53:05 +0200, David d...@dav1d.de wrote:
Am 24.07.2012 21:46, schrieb David:
Hmm. Could this be a GC-related issue?
Actually this could be. They are stored inside a Vertex* array which is
allocated which is allocated with `malloc`, maybe the GC scans all of
the created
This is strange. You said that you profiled the program and the extra
time spent is not in user code? Where is it spent then?
This is a damn good question. I tried to debug it manually with
writefln's, it showed that glfwSwapBuffers needed the time (which, I
looked it up, is just a wrapper
On Wednesday, July 25, 2012 00:12:19 David wrote:
This is strange. You said that you profiled the program and the extra
time spent is not in user code? Where is it spent then?
This is a damn good question. I tried to debug it manually with
writefln's, it showed that glfwSwapBuffers needed
On Tuesday, 24 July 2012 at 19:42:34 UTC, David wrote:
I agree. I don't know how the CPU handles misaligned floats,
but from
what I understand, it will do two loads to fetch the two
word-aligned
parts of the float, and then assemble it together. This may be
what's
causing the slowdown.
T
48 matches
Mail list logo