Re: compiling ispc: curses vs. ncurses

2016-07-01 Thread Dmitry Babokin
Why not installing ncurses using package manager? Make sure you've installed also ncurses-devel. Dmitry. On Fri, Jul 1, 2016 at 1:46 PM, dkoerner wrote: > Hi, > I am trying to compile ispc 1.9 using llvm 3.8.0. It seems ispc requires > the curses library which I dont have on my system. I only c

Re: ispc under windows x64: invalid bitcode signature

2016-07-01 Thread Dmitry Babokin
I think building LLVM for x86 target should solve the problem. The resulting ispc will be able to built both x86 and x86-64. I guess the error occurs when ispc is trying to link x86 libraries, while LLVM TargetMachine is defaulted to x86-64, so the error. Though not completely sure. Let me know i

Re: compiling ispc on ubuntu

2016-07-08 Thread Dmitry Babokin
Hi, Alloy.py build should work, but not sure about other options. Just to make sure - have you checked that you correctly added clang built by alloy.py to your path? I.e, "which clang" points to newly built clang. Dmitry. On Fri, Jul 8, 2016 at 7:29 AM, Steve Heistand wrote: > Hi folks, > Im r

Re: compiling ispc on ubuntu

2016-07-08 Thread Dmitry Babokin
nd the latest ispc. will see how that goes. > > s > > > On 07/08/2016 09:49 AM, Dmitry Babokin wrote: > > Hi, > > > > Alloy.py build should work, but not sure about other options. Just to > make sure - have you checked that you correctly added clang built by >

Re: compiling ispc on ubuntu

2016-07-08 Thread Dmitry Babokin
; Your LLVM_HOME:/usr/local/src/ispc > Your ISPC_HOME:/usr/local/src/ispc > Warning: you don't have ISPC in your ISPC_HOME > Warning: You have no SDE_HOME > > > On 07/08/2016 10:00 AM, Dmitry Babokin wrote: > > That's weird. Can you run check_env.py and copy its outpu

ISPC 1.9.1 is released

2016-07-08 Thread Dmitry Babokin
Download page: http://ispc.github.io/downloads.html === v1.9.1 === (8 July 2016) An ISPC update with new native AVX512 target for future Xeon CPUs and improvements for debugging, including new switch --dwarf-version to support debugging on old systems. The release is based on patched LLVM 3.8.

Re: compiling ispc on ubuntu

2016-07-09 Thread Dmitry Babokin
t to update what make in the ispc > directory returns when > > llvm_home is defined not appropriately. > > > thanks > > > s > > > <https://github.com/ispc/ispc.git> > > On 07/08/2016 10:38 AM, Dmitry Babokin wrote: > > Everything looks perfectly correct.

Re: compiling ispc on ubuntu

2016-09-04 Thread Dmitry Babokin
39 PM, Aristid Breitkreuz wrote: > Do I understand it correctly that it's the GCC 5 ABI Tag issue? > > If so, using Clang 3.9 should also work, right? > > Also, is the SKX patch for LLVM 3.8 still applicable to 3.9? > > > Cheers, > > Aristid > > > On Saturday,

Re: Surprising code being generated by ARM NEON backend

2016-09-06 Thread Dmitry Babokin
Niall, Thanks for sharing your story, it's really rewarding to hear that our tool works so well for you! You've mentioned that ISPC generated code is 5-10% faster that hand-written intrinsics. Were you talking about ARM only or x86 as well? Also, I'm curious, what typical speed up are you observ

Re: Surprising code being generated by ARM NEON backend

2016-09-08 Thread Dmitry Babokin
You should have really parallelization friendly code to get close to theoretical scaling on all vector units. For parallelization approaches, intrinsics are obviously not good enough, as they are not suggesting performance portability and I think there's quite broad consensus about it in the indus

Re: Surprising code being generated by ARM NEON backend

2016-09-12 Thread Dmitry Babokin
uot;same" operations, I think it's not a hard problem and it should not be restricted by the language. It should be up to compiler to decide what level of control flow divergence the hardware can handle. On Fri, Sep 9, 2016 at 1:11 PM, Niall Douglas wrote: > On Thursday, September 8,

Re: getting an unpleasant lack of symmetry for a cross product with avx2 target only.

2016-09-12 Thread Dmitry Babokin
If I understand the problem correctly, on AVX2 ISPC has generated FMA (a*b+c) instructions, which led to the problem that you have. So the code is still numerically correct, but you don't have *exact* the same result for cross(v0,v1) and -cross(v1,v0). The "problem" comes from the fact that in FMA

Re: getting an unpleasant lack of symmetry for a cross product with avx2 target only.

2016-09-25 Thread Dmitry Babokin
ion in >> regards to --addressing=64: >> >> Is there a way (like an intrinsic) to tell a specific read to use 64 bit >> calculation while allowing others within the kernel to remain 32 bit. >> >> Thank you. >> >> Morten. >> >> >> >> >

Re: how to serialize in ispc into a uniform list execution

2016-09-25 Thread Dmitry Babokin
I guess you need "foreach_unique": http://ispc.github.io/ispc.html#iteration-over-unique-elements-foreach-unique On Sun, Sep 25, 2016 at 5:54 PM, Morten Mikkelsen wrote: > So I have a scenario where I am tracing through a tree structure and the > odds of hitting the same node is very significant

Re: question about avx512

2016-10-21 Thread Dmitry Babokin
You basically ask the world to be simpler. I wish it would be simpler, but it's not :) AVX512 is umbrella name for the set of ISA extensions, which work with 512 bit registers. https://en.wikipedia.org/wiki/AVX-512 explains it in more details. KNL (Xeon Phi x200, code name Knights Landing), the l

Re: ISPC Profiling program

2017-01-05 Thread Dmitry Babokin
Any profiling tool on your platform should work with ispc binaries. For example, Intel VTune. Dmitry. On Thu, Jan 5, 2017 at 12:07 AM, Dženana Kapetanović < dzenanakapetanovi...@gmail.com> wrote: > Can somebody tell me any profilling tool for ISPC programm? > > -- > You received this message bec

Re: Using ISPC in production

2017-05-04 Thread Dmitry Babokin
Ali, We have customers who use ISPC in production. Though we do have stability issues, but we prefer to release without known regressions. If you have something which blocks you, let us know and we'll give it higher priority. Dmitry. On Thu, May 4, 2017 at 7:01 AM, Ali Nakipoğlu wrote: > Hi,

Re: Using ISPC in production

2017-05-05 Thread Dmitry Babokin
to address. On Fri, May 5, 2017 at 9:44 AM, Ali Nakipoğlu wrote: > Hi Dmitry, > > Thank you very much for your kind response. > > I will start experimenting. Is there any release schedule? > > Ali > > On Thursday, May 4, 2017 at 11:55:38 PM UTC+1, Dmitry Babokin wrote: &

ispc 1.9.2rc1

2017-05-17 Thread Dmitry Babokin
Hello, We are going to release ispc 1.9.2 soon and have prepared a release candidate for those of you who prefer to use pre-built binaries. Please give it a try and let us know if you see any problems with your code. Windows (VS2015): https://drive.google.com/open?id=0Bxh4sVF04yhxYnRRczhlNzZOblU

Re: ispc 1.9.2rc1

2017-05-17 Thread Dmitry Babokin
personal release? I would expect the > binaries to be officially hosted on the Website as pre-release over being > emailed out and definitely not stored on a google drive. > > Odd to anyone else? > > -bret > > On Wed, May 17, 2017 at 12:33 AM, Dmitry Babokin > wrote: >

Re: fast hardware for ispc realtime simultation

2017-05-18 Thread Dmitry Babokin
Hi, ISPC generates singled threaded code, unless you take care about multi-threading explicitly. Look at this section for more details: http://ispc.github.io/ispc.html#task-parallel-execution Dmitry. On Thu, May 18, 2017 at 2:40 AM, nabi wrote: > We are trying to build a custom machine to run

Re: Coalescing double loads?

2017-05-23 Thread Dmitry Babokin
Philip, Note that i is varying int, not uniform. I.e. on first iteration has value (0,1,2,3), if you are compiling for avx1-i32x4 (or other 4-wide target). Hence, 4*i has value (0,4,8,12). Which means array[4*i] is not a continuous load, it has to be gather. So I assume your data layout is x,

Re: question about avx512

2017-05-30 Thread Dmitry Babokin
me cases (in other cases, it has a negligible effect). > > I don't know if Clang or ISPC support fat binaries, but that's a better > option than the union ISA if available, although it obviously has an impact > on code size and introduces runtime dispatch overhead. > > Je

Re: question about avx512

2017-05-30 Thread Dmitry Babokin
X-512 are likely to do so in an implementation/uarch-specific. > > Jeff > > On Tue, May 30, 2017 at 4:20 PM, Dmitry Babokin wrote: > >> Jeff, >> >> This is definitely an option. But from practical point of view, I don't >> think it has enough ROI - mostly

Re: ispc 1.9.2rc1

2017-06-05 Thread Dmitry Babokin
cond(s) > > > Thanks for your effort in improving ISCP compiler ! > > > > On Wednesday, May 17, 2017 at 9:33:58 AM UTC+2, Dmitry Babokin wrote: >> >> Hello, >> >> We are going to release ispc 1.9.2 soon and have prepared a release >> candidate for

Re: ispc 1.9.2rc1

2017-06-16 Thread Dmitry Babokin
0 > iters, 4096x4096) > > ISPC 1.9.2rc1 > No scatter / gather performance warnings and much better performance: > > Nexus::TestWaterSimPerformance: water sim took 13.282045 second(s) (1000 > iters, 4096x4096) > > > On Monday, June 5, 2017 at 7:01:38 PM UTC+2, Dmitry Babok

Re: Subtraction error

2017-06-16 Thread Dmitry Babokin
I can reproduce and I'm looking at this issue. Though I'm currently on vacation, so it may took a little longer that usually. On Tue, Jun 13, 2017 at 4:28 AM, wrote: > Hi, > > We are writing a molecular dynamics simulation code that will be used at > Los Alamos National Laboratory. So far with I

Re: Subtraction error

2017-06-22 Thread Dmitry Babokin
a.particles_fx[]. On Fri, Jun 16, 2017 at 2:05 AM, Dmitry Babokin wrote: > I can reproduce and I'm looking at this issue. Though I'm currently on > vacation, so it may took a little longer that usually. > > On Tue, Jun 13, 2017 at 4:28 AM, wrote: > >> Hi, >&g

Re: ispc 1.9.2rc1

2017-08-16 Thread Dmitry Babokin
code with 1.9.1). > > If the release is not happening in the near future, would you happen to > have a MSVC 2013 -compatible set of binaries for the release candidate? > > On Friday, June 16, 2017 at 10:52:21 AM UTC+3, Dmitry Babokin wrote: >> >> Yes, this is one of known p

Re: ispc 1.9.2rc1

2017-08-17 Thread Dmitry Babokin
or the information. I built myself the 1.9.2-dev yesterday > based on LLVM/Clang 3.9.1 on MSVC 2013 (x86) - it's working as expected. > > On that note, is LLVM 5.0 bringing new/enhanced optimizations to the next > ISPC release? > > On Wednesday, August 16, 2017 at 8:39:23 PM UTC+3, D

Re: ispc 1.9.2rc1

2017-10-24 Thread Dmitry Babokin
Hope to get it out in a couple of weeks. On Tue, Oct 24, 2017 at 3:29 AM, Ali Nakipoğlu wrote: > Hi Dmitry, > > Any ETA for 1.9.2? > > Kind Regards, > > Ali > > > On Wednesday, May 17, 2017 at 8:33:59 AM UTC+1, Dmitry Babokin wrote: >> >> Hello, >>

Re: Working with int8

2017-11-01 Thread Dmitry Babokin
Have you tried compiling with i8 target? I.e. sse4-i8x16. Sent from my iPhone > On Nov 1, 2017, at 10:19 AM, Bruno Martínez wrote: > > Hi, > > ispc generates code then times slower than msvc2013 for a simple loop add: > > void simple_msvc(int8_t vin1[], int8_t vin2[], int8_t vout[], int count

Re: Working with int8

2017-11-01 Thread Dmitry Babokin
#x27;t exist. Can I add it if I compile ispc myself? > What about avx2-i8x32? It would be nice to just tell ispc to optimize for > i8 without mentioning sse/avx. > > Bruno > > On Wednesday, November 1, 2017 at 2:46:23 PM UTC-3, Dmitry Babokin wrote: > >> Have you t

Re: crash -O0 and generic-x1

2017-11-05 Thread Dmitry Babokin
Generic targets are not well maintained and there are multiple known failures. And I don't think that's going to change. On Thu, Nov 2, 2017 at 5:12 PM, Brian Green wrote: > I am encountering a crash when combining the generic-x1 target at the -O0 > flag. For example: > > foo.ispc: > > void foo

ISPC 1.9.2 is released

2017-11-10 Thread Dmitry Babokin
Download page: http://ispc.github.io/downloads.html === v1.9.2 === (10 November 2017) An ISPC update, which brings out-of-the-box debug support on Windows, better performance of most of the targets and a bunch of stability and performance bug fixes. The release is based on patched LLVM 5.0 backe

Re: Arm Neon Support Status of ISPC

2017-12-27 Thread Dmitry Babokin
I'm not aware of active users of ARM Neon port. It should be functional, if not it should be easy to bring it back to life. Dmitry On Fri, Dec 22, 2017 at 9:30 PM, B.Stastny wrote: > Does anyone know the status of ARM Neon support for ISPC? I see there are > some updates in the git repo from j

Re: new compiler error in 1.9.2

2018-01-23 Thread Dmitry Babokin
This looks like a correct code to me. The code to check this kind of things was changed in 1.9.2, but reported behavior is not intended. Please file an issue, I'll have a look. On Tue, Jan 23, 2018 at 8:14 PM, Brian Green wrote: > When upgrading to 1.9.2 from 1.9.1 the following code no longer

Re: Linking an object file

2018-04-26 Thread Dmitry Babokin
What ispc and VS versions are you using? Dmitry. On Thu, Apr 26, 2018 at 7:10 AM, wrote: > Hello, > > I am getting some problems regarding the compilation and linking the ispc > objects files with visual studio. Visual studio doesn't recognize symbols > exported to object file that i am getting

Re: Linking an object file

2018-04-27 Thread Dmitry Babokin
Interesting. I haven't tried ispc with VS2017 yet, but I think people tried it and it worked well. Do you have a chance to try reproducing the problem with VS2015? On Fri, Apr 27, 2018 at 2:15 AM, wrote: > and ispc version 192. > > Best, > > -- > You received this message because you are subscr

Re: Produce Linux object files under Windows

2018-05-12 Thread Dmitry Babokin
Hi, You've got it pretty much right, you need to define target triple to the target platform that you need and you are almost done. But there are other minor issues. We are building PS4 target, which is Windows cross compiler targeting FreeBSD. You can have a look at this branch https://github.com

Re: Linking an object file

2018-05-21 Thread Dmitry Babokin
You are using task parallelism. To make it work, you need to provide implementation of ISPCAlloc/ISPCLaunch/ISPCSync. There's reference implementation with different underlying runtime libraries in examples/tasksys.cpp More info here: http://ispc.github.io/ispc.html#task-parallelism-runtime-requir

Re: Linking an object file

2018-05-22 Thread Dmitry Babokin
8 at 12:34 AM, Bruno Martins wrote: > I am following triangle_geometry_device tutorial that uses the same > functions and remember to use them without cpp implemention. Am i missing > something? > > Best Regards > Bruno > > On 21 May 2018 at 23:22, Dmitry Babokin wrote: >

Re: all, any, none rationale

2018-08-10 Thread Dmitry Babokin
The issues is fixed by pull request #1333: https://github.com/ispc/ispc/pull/1333 On Mon, Aug 6, 2018 at 3:04 PM wrote: > Hi Brian, > > Were your tests only for all() or were you able to see the same behavior > with any() and none() as well? I ran some code samples and do not see the > same prob

Re: ISPC 1.9.2 is released

2018-08-17 Thread Dmitry Babokin
On Saturday, November 11, 2017 at 6:12:41 AM UTC+2, Dmitry Babokin wrote: >> >> Download page: http://ispc.github.io/downloads.html >> >> === v1.9.2 === (10 November 2017) >> >> An ISPC update, which brings out-of-the-box debug support on Windows, >> be

Re: more numerical oriented examples

2018-08-17 Thread Dmitry Babokin
I don't know about this specific MATLAB solver, but I think they have computational intensive code implemented in native libraries. ISPC can get quite close to the peak hardware performance, so it's more a question of the quality of the MATLAB solver. On Fri, Aug 17, 2018 at 2:24 AM Royi wrote:

Re: How to Use the Box Blur Example

2018-08-17 Thread Dmitry Babokin
box3x3(uniform float image[32][32], int x, int y) takes varying x and y parameters and exploits parallelism coming from these vectors. For SSE4 it will handle 4 pixels at a time. So you need to organise outer loops to supply these pixels in chunks of 4 (or 8, or 16, depending on your target). This

Re: How to Use the Box Blur Example

2018-08-20 Thread Dmitry Babokin
first iteration it will > handle pixels (0,0), (1,0), (2,0), (3,0), i.e. jj is (0,1,2,3), ii is > uniform int 0, which is casted to varying int (0,0,0,0). > } > } > > > Which seems to require the whole image as input. > > > On Friday, August 17, 2018 at 10:01:

Re: How to Use the Box Blur Example

2018-08-20 Thread Dmitry Babokin
t? > > I will create a C wrapper which works on chunks of 4-8 pixels. > It gives the chunk to ISPC function which updated a buffer chunk which is > given as well. > Would that be more efficient or better have 2 levels of ISPC? > > > On Tue, Aug 21, 2018 at 2:29 AM Dmitry Babokin

Re: How to Use the Box Blur Example

2018-08-26 Thread Dmitry Babokin
try, > > What about the case you have no obligations. > What's the best way to apply 2D Convolution using ISPC? > > Thank You. > > On Tuesday, August 21, 2018 at 9:35:42 AM UTC+3, Dmitry Babokin wrote: >> >> Hi Royi, >> >> There's no benefit

Re: Confused about foreach semantics

2018-09-20 Thread Dmitry Babokin
Scott, First of all, I don't see how foreach may increase parallelism in this case, as the swap happen for varying values, not for scalars. I.e. the following code fragment float aux[2]; aux[0] = c[2*x + 0]; aux[1] = c[2*x + 1]; c[2*x + 0] = c[2*y + 0]; c[2*

Re: Confused about foreach semantics

2018-09-21 Thread Dmitry Babokin
Dmitry, > > On Thursday, September 20, 2018 at 5:05:34 PM UTC-6, Dmitry Babokin wrote: >> >> First of all, I don't see how foreach may increase parallelism in this >> case, as the swap happen for varying values, not for scalars. I.e. the >> following code frag

Re: Running windows binaries

2018-10-27 Thread Dmitry Babokin
You probably need VS redistributables (either 2013 or 2015, based on the version that you downloaded). Dmitry. On Sat, Oct 27, 2018 at 8:26 AM Tomek wrote: > Hi, > > I haven't yet tried to run ISPC under Windows until today. I have bare > Windows 7 installation on VirtualBox - downloaded ISPC a

ISPC 1.10.0 is released

2019-01-18 Thread Dmitry Babokin
Download page: http://ispc.github.io/downloads.html === v1.10.0 === (18 January 2019) An ISPC update, which brings several new features, has a bunch of stability and performance bug fixes, and infrastructure improvements for those who are interested in participating in hacking on the ISPC trunk.

Re: [ispc 1.10.0] Broken -MMM dependencies on Linux

2019-02-05 Thread Dmitry Babokin
Hi Jean, We've reproduced the issue. Looks like the problem was introduced when we migrated to CMake-based build. We are working on the fix. You are welcome to crate an issue on github. Thanks for reporting the problem! Dmitry. On Mon, Feb 4, 2019 at 12:04 PM Jean Ben wrote: > > Hi, > > I've

Re: Disable specific code paths similar to __builtin_unreachable in GCC and LLVM

2019-03-15 Thread Dmitry Babokin
No, we don't have mechanisms like __builtin_unreachable(). I'm not sure if extracting some data flow facts from "if ( none ( S > 25 ) ) flag_unreachable;" to apply to "if (S > 25)" is good idea, but we should definitely think about mechanisms to fine tune CFG. Would be good if you file an issue w

Re: Disable specific code paths similar to __builtin_unreachable in GCC and LLVM

2019-03-15 Thread Dmitry Babokin
given some pointers on > where to start looking. > > Cheers, > Christoph > > On Friday, March 15, 2019 at 5:53:19 PM UTC+1, Dmitry Babokin wrote: >> >> No, we don't have mechanisms like __builtin_unreachable(). I'm not sure >> if extracting some data flow fac

Re: Storing interleaved values without scatter

2019-04-09 Thread Dmitry Babokin
There are two options - (1) file a bug and wait until we fix it. It's generally compiler responsibility to do this optimization. It's actually was implemented at some point, but it doesn't work in most of the cases and we are planning to fix it. So one more test case and and a reminder in the bug t

Re: how to compile w/ispc for multiple avx targets

2019-04-11 Thread Dmitry Babokin
Hi Shachar, When you use multiple targets in the same compilation, it triggers enabling of auto-dispatch code. The problem is that auto-dispatch needs to make decision for the dispatch solely on CPUID. Hence two targets for the same ISA, but with different width cause ambiguity. Who are you going

Re: how to compile w/ispc for multiple avx targets

2019-04-12 Thread Dmitry Babokin
avx+avx2. Now the > question is what is the target I need to compile with? > —target=avx > —target-avx2 > Both? Other? > Any help is appreciated > > Thanks > Shachar > > > On Fri, Apr 12, 2019 at 1:13 AM Dmitry Babokin wrote: > >> Hi Shachar, >> >>

Re: Storing interleaved values without scatter

2019-04-22 Thread Dmitry Babokin
n[i]) / 3.0f; > *(data + 2 * i + 0) = data0; > *(data + 2 * i + 1) = data1; > } > } > > Just switch between 1.10 and 1.11 and see the magic :) > > > On Tue, Apr 9, 2019 at 7:57 PM Dmitry Babokin wrote: > >> There are two options - (1) file a bug and wait until we fix i

Re: varying pointers to varying types

2019-04-30 Thread Dmitry Babokin
Sorry for the late response. I agree with Matt that it looks like a bug. I was checking our test suit and I found a test, which does exactly what you've described with the struct wrapper. So probably we are probably missing the very simple test. I don't think that something changed in the latest

Re: Passing vectors from C++

2019-05-30 Thread Dmitry Babokin
Short vectors can't be passed by value in extern functions. You can pass them by pointer. Also note, that short vectors definition as a struct in interface header has alignment attribute, so it's aligned at least to 16 bytes. On Thu, May 30, 2019 at 7:49 AM Edward Catchpole wrote: > Is there a

Re: Gather warnings in a code with plain memory layout

2019-07-19 Thread Dmitry Babokin
Hi Tomek, We do have some problems with gathers optimization. But in this particular case I don't see any warnings. https://godbolt.org/z/XnQNlX Please send more complete example (in your code snippet it's not clean what REAL is), the link to godbolt.org would be ideal. And the version of ISPC t

Re: Gather warnings in a code with plain memory layout

2019-07-31 Thread Dmitry Babokin
e efficient instructions? See the mini > project, attached (type "make" or "make release" inside). It's just about > cluttering the console when we develop - to remove these type of warnings > we need to use many pragmas:) > > Tomek > > > On Friday, Jul

Re: Where are the AVX-512 speedup results for the ISPC example programs ?

2019-09-25 Thread Dmitry Babokin
Hi, We haven't updated performance numbers for a while, thanks for pointing this out. I'll make measurements on the machine that I have and will post the results here. And we'll update the "official" numbers a bit later. AVX512 is indeed ideal target for ISPC. Though a few factors need to be tak

Re: Where are the AVX-512 speedup results for the ISPC example programs ?

2019-09-25 Thread Dmitry Babokin
27;t have enough work for that many cores. Raw speedup geomean (against clang-8 compiler) is: avx2-i32x8: 6.85 avx2-i32x16: 7.33 avx512skx-i32x8: 7.13 avx512skx-i32x16: 9.18 Dmitry. On Wed, Sep 25, 2019 at 3:56 PM Dmitry Babokin wrote: > Hi, > > We haven't updated performance nu

Re: Where are the AVX-512 speedup results for the ISPC example programs ?

2019-10-04 Thread Dmitry Babokin
; gathers. I could test this myself, it just isn't the easiest thing to turn > on and off on production systems. > > Cheers, > -Brian > > On Wednesday, September 25, 2019 at 6:41:24 PM UTC-7, Dmitry Babokin wrote: >> >> I'm attaching perf measurements on SKX m

Re: returning struct by value from ispc-exported function?

2019-10-21 Thread Dmitry Babokin
Michael, This is supposed to work. Please file an issue. Dmitry. On Mon, Oct 21, 2019 at 2:04 PM Михаил Усачёв wrote: > ispc-program compiled to avx1 always gives zeros. My CPU is AMD fx-8350 > (avx1 is supported). > I can not attach VS 2017 solution (groups.google.com gives me "error > occur

Re: Dynamic allocation of SOA type.

2019-10-27 Thread Dmitry Babokin
Hi Alex, soa<> support is quite buggy, this is yet another problem in the collection of soa<> problems. We need either to solve them at once or deprecate soa<> support. Could you please file an issue with this problem? Dmitry. On Thu, Oct 24, 2019 at 9:08 AM Alex Yuan wrote: > Hi, > > Can anyo

Re: A mishap or a bug in ISPC? (pointer assignment issue)

2019-11-04 Thread Dmitry Babokin
Hi Tomek, It's definitely a bug. In both cases address of static variable is a constant, so it should work as it works in C. please file bug, we'll fix it. Dmitry. On Mon, Nov 4, 2019 at 2:15 AM Tomek wrote: > Hi:) > > I would like to do this: > > /* a simple case */ > > static uniform float a

Re: Array initialization

2019-11-06 Thread Dmitry Babokin
Tomek, It will initialize first value to the constant that you've supplied and the rest will be zero-initialized. Compile your code to see what's going on: > ispc --emit-llvm-text t.ispc -o - Dmitry. On Tue, Nov 5, 2019 at 3:38 AM Tomek wrote: > Hi, > > I'd like to initialise like this: > > f

Re: Is ispc deterministic or is it possible to make it be?

2020-02-25 Thread Dmitry Babokin
Non-determenism on this slide refers to allowing compiler to rearrange/reassociate/optimize floating point expression in the way that it may change numeric value of the result (but still being valid result in terms of satisfying the same mathematical formula in the source code). Note, compiled bin

Re: Is ispc deterministic or is it possible to make it be?

2020-02-25 Thread Dmitry Babokin
We don't guarantee bit reproducibility for floating point results across platforms (especially across x86 / ARM) - we never specifically tested for that. But I would expect that in most of the cases the same results. If you find the case with different results, please report them, I would be intere

Re: How to use struct without gather/scatter performance warnings

2020-03-22 Thread Dmitry Babokin
David, Typically if you target avx2 (using --target=avx2-i32x8), the code should use ymm registers. If you use --target=avx2-i32x4, then your data is 128 bit wide (for int and float vectors), which means that xmm registers will be used. Another possibility that you are using uniform float<4>, whi

Re: How to use struct without gather/scatter performance warnings

2020-03-23 Thread Dmitry Babokin
de is faster than >> the foreach version but lacks YMM, so I'm guessing it could be made more >> performant. >> Thank you for your help. >> >> David >> >> On Sunday, March 22, 2020 at 3:51:36 AM UTC-5, Dmitry Babokin wrote: >>> >>> D

Re: How to use struct without gather/scatter performance warnings

2020-03-25 Thread Dmitry Babokin
day, March 23, 2020 at 3:18:25 PM UTC-5, Pete Brubaker wrote: >>>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>> In looking over your code, unless you can reorder the data to SoA >>>>>>> your best strategy is

Re: How to use struct without gather/scatter performance warnings

2020-03-26 Thread Dmitry Babokin
c23); > > __m256 res_c = _mm256_add_ps(res_a, res_b); > > __m128 res_upper = _mm256_extractf128_ps(res_c, 1); > __m128 res_lower = _mm256_extractf128_ps(res_c, 0); > > __m128 res = _mm_add_ps(res_upper, res_lower); > > _mm_stream_ps(&result[i].data[0], res); > } >

Downloads has moved to github releases

2020-04-02 Thread Dmitry Babokin
Hello, We moved ISPC downloads to github releases ( https://github.com/ispc/ispc/releases). Sourceforge location will be discontinued in about a week, unless someone really needs it and speaks up. If you are downloading ISPC using scripts or pointing to ISPC downloads in your documentation, it's a

Re: Shipping ISPC binaries with software

2020-04-02 Thread Dmitry Babokin
Hello, Disclaimer: I'm not a lawyer and my understanding of licensing issues is far from perfect. Our license is 3-clause BSD license, which is quite permissive. We just moved our downloads to Github release, which should be reliable download location. So you can consider downloading ISPC on use

Re: ISPC emits vcmpleps + vblendvps instead of vminps/vmaxps

2020-04-13 Thread Dmitry Babokin
Hi Michael, I'm glad that I have an easy answer for you. Try ispc trunk, instead of v1.12. It has a fix. Download page has links to ispc trunk for Linux and Windows. If you need Mac, I can build it for you manually. Compile Explorer also has ispc trunk. I'm curious if this fixes all code gen issu

Re: ISPC emits vcmpleps + vblendvps instead of vminps/vmaxps

2020-04-13 Thread Dmitry Babokin
/ispc/issues/1711. > > Cheers, > Michael > > Den måndag 13 april 2020 kl. 18:38:44 UTC+2 skrev Dmitry Babokin: >> >> Hi Michael, >> >> I'm glad that I have an easy answer for you. Try ispc trunk, instead of >> v1.12. It has a fix. Download page has links

Re: Shipping ISPC binaries with software

2020-04-20 Thread Dmitry Babokin
ramework called NMODL : > https://github.com/BlueBrain/nmodl. The draft manuscript describing > overall framework is available on arXiv here: > https://arxiv.org/pdf/1905.02241.pdf > > I will be happy to provide any additional information needed. > > -Pramod > > On Friday, April

Re: ISPC emits vcmpleps + vblendvps instead of vminps/vmaxps

2020-04-21 Thread Dmitry Babokin
Den måndag 13 april 2020 kl. 21:47:40 UTC+2 skrev Dmitry Babokin: >> >> Thanks for reporting it! It's related to recent changes. We'll fix it. >> >> Dmitry. >> >> On Mon, Apr 13, 2020 at 11:37 AM Michael Andersson < >> andersso...@hotmail.com>

Fwd: [ispc/ispc] Release v1.13.0 - === v1.13.0 === (23 April 2020)

2020-04-23 Thread Dmitry Babokin
ISPC v1.13.0 was released! -- Forwarded message - From: Dmitry Babokin Date: Thu, Apr 23, 2020 at 11:28 PM Subject: [ispc/ispc] Release v1.13.0 - === v1.13.0 === (23 April 2020) To: ispc/ispc Cc: Subscribed === v1.13.0 === (23 April 2020) <https://github.com/ispc/i

Re: Why there isn't a three foreach loop?

2020-05-22 Thread Dmitry Babokin
We have it, for example here: https://github.com/ispc/ispc/blob/master/examples/aobench/ao.ispc#L216 Note, that tiled version is not much different. If you remove "_tiled" it will work as well. If you are asking why writing the former version, not the later, it yields different code sequence. Thi

Re: Data dependent lane conflicts

2020-07-21 Thread Dmitry Babokin
Hi Lars, Thank you for a detailed motivating examples describing the need for conflict detection capability in the language! Currently there's no way to generate vpconflictd instruction in ISPC. I see two paths to generate it: (1) introduce a library function, which has semantic of vpconflitd/q

Re: Data dependent lane conflicts

2020-07-21 Thread Dmitry Babokin
at > may leave some partitions underutilized and others full). > > As for *vp2intersectd*, it seems like it ought to be useful for > _something_ but just what hasn't come to mind yet. I expect once I get the > opportunity to play with it a bit more something may resolve itsel

Fwd: [ispc/ispc] Release v1.14.0 - === v1.14.0 === (30 July 2020)

2020-07-31 Thread Dmitry Babokin
-- Forwarded message - From: Dmitry Babokin Date: Fri, Jul 31, 2020 at 2:40 PM Subject: [ispc/ispc] Release v1.14.0 - === v1.14.0 === (30 July 2020) To: ispc/ispc Cc: Subscribed === v1.14.0 === (30 July 2020) <https://github.com/ispc/ispc/releases/tag/v1.14.0> Repo

Re: Significant Performance Regression After Switching Kernel to ISPC

2020-08-04 Thread Dmitry Babokin
Zach, What is the implementation you are comparing against? I assume you are comparing with C++, right? What compiler are you using? The key for performance in your case is "exp()" call. If C++ compiler has SVML library, that should explain the difference. ISPC scalarizes exp() call by default. L

Fwd: [ispc/ispc] Release v1.14.1 - === v1.14.1 === (28 August 2020)

2020-08-28 Thread Dmitry Babokin
-- Forwarded message - From: Dmitry Babokin Date: Fri, Aug 28, 2020 at 5:11 PM Subject: [ispc/ispc] Release v1.14.1 - === v1.14.1 === (28 August 2020) To: ispc/ispc Cc: Subscribed === v1.14.1 === (28 August 2020) <https://github.com/ispc/ispc/releases/tag/v1.14.1> Repo

Re: Windows ASTC compression fine, linux has errors?

2020-09-01 Thread Dmitry Babokin
Do you have your code open sourced? Are you using released binaries or build ISPC yourself? I also would encourage you to try the latest release - v1.14.1. We fixed some bugs related to handling bools and others that might affect your code. But generally speaking observing differences between Win

Re: No armv7 softfp support?

2020-09-10 Thread Dmitry Babokin
ISPC doesn't support soft float ABI. Seems we missed that current Android is soft float ABI... We need to fix that. Dmitry. On Thu, Sep 10, 2020 at 3:00 PM Nives Ktich wrote: > I'm attempting to link an ISPC 1.13.0 generated object into an Android > armv7 (32bit) shared object via clang but a

Re: No armv7 softfp support?

2020-09-10 Thread Dmitry Babokin
hard float (though I can't > find an official reference for that). > On Thursday, September 10, 2020 at 3:55:44 PM UTC-7 Dmitry Babokin wrote: > >> ISPC doesn't support soft float ABI. Seems we missed that current Android >> is soft float ABI... We need to fix that. &g

Release v1.15.0

2020-12-18 Thread Dmitry Babokin
=== v1.15.0 === (18 December 2020) An ISPC release with several improvements for CPU and Beta support of Intel graphics hardware architectures. The binaries in this release include CPU versions for Windows, Linux, and macOS, and a GPU-enabled Linux binary, which supports both CPU and GPU. CPU bina

Re: Retaining asserts' hints under --opt=disable-assertions?

2021-01-14 Thread Dmitry Babokin
Hi, This issue has been sitting in my personal to-do list for quite long. We can do so many performance hints with asserts, likely and unlikely hints and it would be good not to have them as runtime checks. I think right now we don't have a mechanism to have a workaround for that. Could you plea

Re: Installing VS Code extension

2021-04-23 Thread Dmitry Babokin
Could you please open an issue on github issue tracker? https://github.com/ispc/ispc/issues Dmitry. On Fri, Apr 23, 2021 at 4:01 PM Adhitha Dias wrote: > Hi, > > I am not able to install the ispc extension to VS code. I am getting the > below error. > > *[2021-04-19 21:45:32.215] [renderer1] [

Re: Converting ISPC compilation to clang format.

2021-05-03 Thread Dmitry Babokin
ISPC switches are definitely not compatible with clang switches. Could you point me to the ccache documentation stating the requirements to compiler switches format? I think it would be easier to add ISPC support to ccache directly, but not hacking the issue around making switches look like clang

Release v1.16.0

2021-06-11 Thread Dmitry Babokin
=== v1.16.0 === (11 June 2021) An ISPC release with language extensions for performance fine tuning, cpu definitions for AlderLake and SapphireRapids targets, support for macOS ARM targets, and massive update of Intel GPUs support. Windows and Linux binaries in this release support both CPU and GP

Re: SIMD array append idiom

2021-08-19 Thread Dmitry Babokin
Mike, I'm not sure that I understand your question correctly. Are you trying to pick some lanes of varying computations (i.e. based on should_append(x) selector) and serialize them in an array? If so, you probably need this: https://ispc.github.io/ispc.html#packed-load-and-store-operations Dmitr

Re: Regarding running my first simple executable

2022-03-21 Thread Dmitry Babokin
Now you have a "simple" executable, that you can run: > ./simple PS mailing lists are deprecated, please use Github Discussions ( https://github.com/ispc/ispc/discussions) or Issues ( https://github.com/ispc/ispc/issues) On Sun, Mar 20, 2022 at 9:58 PM Rupesh Kumar wrote: > I am new to ispc .

  1   2   >