Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Jacob Lifshay
On Tue, Apr 20, 2021, 09:25 Marek Olšák  wrote:

> Daniel, imagine hardware that can only do what Windows does: future fences
> signalled by userspace whenever userspace wants, and no kernel queues like
> we have today.
>

Hmm, that sounds kinda like what we're trying to do for Libre-SOC's gpu
which is basically where the cpu (exactly the same cores as the gpu) runs a
user-space software renderer with extra instructions to make it go fast, so
the kernel only gets involved for futex-wait or for video scan-out. This
causes problems when figuring out how to interact with dma-fences for
interoperability...

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Rust drivers in Mesa

2020-10-14 Thread Jacob Lifshay
On Tue, Oct 13, 2020, 23:52 Thomas Zimmermann  wrote:

> Hi
>
> On Tue, 13 Oct 2020 13:01:58 -0700 Eric Anholt  wrote:
>
> > On Tue, Oct 13, 2020 at 12:08 AM Thomas Zimmermann 
> > wrote:
> > >
> > > Hi
> > >
> > > On Fri, 02 Oct 2020 08:04:43 -0700 "Dylan Baker" 
> > > wrote:
> > >
> > > > I have serious concerns about cargo and crate usage. Cargo is
> basically
> > > > npm for rust, and shares all of the bad design decisions of npm,
> > > > including linking multiple versions of the same library together and
> > > > ballooning dependency lists that are fetched intrigued from the
> > > > internet. This is both a security problem and directly in conflict
> with
> > > > meson's design off one and only one version of a project. And while
> > > > rust prevents certain kinds of bugs, it doesn't prevent design bugs
> or
> > > > malicious code. Add a meson developer the rust community has been
> > > > incredibly hard to work with and basically hostile to every request
> > > > we've made "cargo is hour you build rust", is essentially the answer
> > > > we've gotten from them at every turn. And if you're not going to use
> > > > cargo, is rust really a win? The standard library is rather minimal
> > > > "because just pull in 1000 crates". The distro people can correct me
> if
> > > > I'm wrong, but when librsvg went to rust it was a nightmare, several
> > > > distros went a long time without u
> > >  pdates because of cargo.
> > >
> > > I can't say much about meson, but using Rust has broken the binaries of
> > > several packages on i586 for us; which consequently affects Gnome and
> KDE.
> > > [1][2] Rust uses SSE2 instructions on platforms that don't have them.
> > > There's a proposed workaround, but it's not yet clear if that's
> feasible
> > > in practice.
> > >
> > > Best regards
> > > Thomas
> > >
> > > [1] https://bugzilla.opensuse.org/show_bug.cgi?id=1162283
> > > [2] https://bugzilla.opensuse.org/show_bug.cgi?id=1077870
> >
> > From the first bug:
> >
> > >Not entirely sure what to do about this. i586 is unsupported by Rust
> (tier
> > >2) and as such the package is built for i686
> >
> > This really sounds like your distro is just building with the wrong
> > rust target for packages targeting an earlier processor.
>
> Every other language/compiler combination appears to get this right. And
> even
> i686 does not require SSE2. As I said before, there might be a
> workaround.
>

Rust has co-opted i586-unknown-linux-gnu to mean x86-32 without SSE
and i686-unknown-linux-gnu to mean x86-32 with SSE2 (technically it
implements it by using the pentium4 target cpu, so may require more than
just SSE2).

Rust just doesn't provide official binaries for i586-unknown-linux-gnu --
that does not imply that rustc and cargo won't work fine if they are
recompiled for i586-unknown-linux-gnu. The only major caveat I'd expect is
floats having slightly different semantics due to x87, which may cause some
of rustc's tests to fail.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Rust drivers in Mesa

2020-10-04 Thread Jacob Lifshay
On Sun, Oct 4, 2020, 10:13 Jacob Lifshay  wrote:

> On Sun, Oct 4, 2020, 08:19 Alyssa Rosenzweig <
> alyssa.rosenzw...@collabora.com> wrote:
>
>> Cc'd.
>>
>> On Sun, Oct 04, 2020 at 03:17:28PM +0300, Michael Shigorin wrote:
>> >   Hello,
>> > regarding this proposal:
>> > http://lists.freedesktop.org/archives/mesa-dev/2020-October/224639.html
>> >
>> > Alyssa, Rust is not "naturally fit for graphics driver
>> > development" since it's not as universally available as
>> > C/C++ in terms of mature compilers; you're basically
>> > trying to put a cart before a horse, which will just
>> > put more pressure on both Rust developers team *and*
>> > on those of us working on non-x86 arches capable of
>> > driving modern GPUs with Mesa.
>> >
>> > For one, I'm porting ALT Linux onto e2k platform
>> > (there's only an early non-optimizing version of
>> > Rust port there by now), and we're maintaining repos
>> > for aarch64, armv7hf, ppc64el, mipsel, and riscv64 either
>> > -- none of which seem to be described as better than
>> > "experimental" at http://doc.rust-lang.org/core/arch page
>> > in terms of Rust support.
>>
>
Those are the ISA-specific intrinsics, which aren't really necessary for
almost all rust code.

If your looking for the official support list, see:
https://doc.rust-lang.org/nightly/rustc/platform-support.html

You can also see the list of official binaries here:
https://forge.rust-lang.org/infra/other-installation-methods.html

Jacob

>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Rust drivers in Mesa

2020-10-04 Thread Jacob Lifshay
On Sun, Oct 4, 2020, 08:19 Alyssa Rosenzweig <
alyssa.rosenzw...@collabora.com> wrote:

> Cc'd.
>
> On Sun, Oct 04, 2020 at 03:17:28PM +0300, Michael Shigorin wrote:
> >   Hello,
> > regarding this proposal:
> > http://lists.freedesktop.org/archives/mesa-dev/2020-October/224639.html
> >
> > Alyssa, Rust is not "naturally fit for graphics driver
> > development" since it's not as universally available as
> > C/C++ in terms of mature compilers; you're basically
> > trying to put a cart before a horse, which will just
> > put more pressure on both Rust developers team *and*
> > on those of us working on non-x86 arches capable of
> > driving modern GPUs with Mesa.
> >
> > For one, I'm porting ALT Linux onto e2k platform
> > (there's only an early non-optimizing version of
> > Rust port there by now), and we're maintaining repos
> > for aarch64, armv7hf, ppc64el, mipsel, and riscv64 either
> > -- none of which seem to be described as better than
> > "experimental" at http://doc.rust-lang.org/core/arch page
> > in terms of Rust support.
>

AArch64 with Linux is moving to tier 1 support with official support from
ARM:
https://github.com/rust-lang/rfcs/pull/2959

In my experience on armv7hf and powerpc64le with Linux and on aarch64 on
Android (running rustc in termux), I've never had issues with Rust.

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Rust drivers in Mesa

2020-10-02 Thread Jacob Lifshay
On Fri, Oct 2, 2020, 10:53 Jason Ekstrand  wrote:

>
>  2. Rust's enums look awesome but are only mostly awesome:
> a. Pattern matching on them can lead to some pretty deep
> indentation which is a bit annoying.
> b. There's no good way to have multiple cases handled by the same
> code like you can with a C switch; you have to either repeat it or
> break it out into a generic helper.
>

You can use | in match which could help with 2.b:
https://doc.rust-lang.org/reference/expressions/match-expr.html
example:
match make_my_enum() {
B { strv: s, .. } | C(MyStruct(s)) => println!("B or C: the string is
{}", s),
A(_) => {}
}
struct MyStruct(String);
enum MyEnum {
A(String),
B {
intv: i32,
strv: String,
},
C(MyStruct),
}

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Libre-soc-dev] Loading Vulkan Driver

2020-08-23 Thread Jacob Lifshay
On Sun, Aug 23, 2020, 18:38 Dave Airlie  wrote:

> On Mon, 24 Aug 2020 at 10:12, Jacob Lifshay 
> wrote:
> > no, that is the existing LLVM backend from AMD's opengl/opencl drivers.
> amdvlk came later.
>
> Those are the same codebase, amdvlk just uses a fork of llvm, but the
> differences are only minor changes for impedance mismatch and release
> timing, they never diverge more than necessary.
>

yeah, what I had meant is that the llvm amdgpu backend was not originally
created for amdvlk, since amdvlk didn't exist then.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Libre-soc-dev] Loading Vulkan Driver

2020-08-23 Thread Jacob Lifshay
On Sun, Aug 23, 2020, 17:08 Luke Kenneth Casson Leighton 
wrote:

>
>
> On Monday, August 24, 2020, Dave Airlie  wrote:
>
>>
>> amdgpu is completely scalar,
>
>
> it is?? waah! that's new information to me.  does it even squash vec2/3/4,
> predication and swizzle?
>

yes, iirc.

>
> what about the upstream amdgpu LLVM-IR?  that still preserves vector
> intrinsics, right?
>
> i'm assuming that AMDVLK preserves vector intrinsics?
>

> AMDVLK's associated LLVM port was what ended up upstream, is that right?
>

no, that is the existing LLVM backend from AMD's opengl/opencl drivers.
amdvlk came later.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Libre-soc-dev] Loading Vulkan Driver

2020-08-23 Thread Jacob Lifshay
On Sun, Aug 23, 2020, 16:31 Dave Airlie  wrote:

> It's hard to know then what you can expect to leverage from Mesa for
> that sort of architecture, the SPIRV->NIR translation, and then you
> probably want some sort of driver specific NIR->LLVM translation,
> amdgpu is completely scalar, llvmpipe is manually vectorised, swr does
> something like you might want, afaik it does a scalar translation and
> then magic execution (but I haven't dug into it except to fix
> interfaces).
>

Kazan isn't built on Mesa or NIR (though it was originally intended to
integrate with Mesa). Luke decided that Libre-SOC should also have a
similar Mesa-based driver and applied to NLNet for funding for it, Vivek is
currently starting on implementing that Mesa driver.

>
> I think you will hit problems with vectorisation, because it's always
> been a problem for every effort in this area, but if the CPU design is
> such that everything can be vectorised and you never hit a scalar
> path, and you workout how texture derivatives work early, it might be
> something prototype-able.
>

My current plan for screen-space derivatives in Kazan is to have the
fragment shaders vectorized in multiples of 4 pixels (IIRC there's a nvidia
vulkan extension that basically requires that), so adjacent fragment shader
can be subtracted as-needed to calculate derivatives.

>
> But none of the current mesa drivers are even close to this
> architecture, llvmpipe or swr are probably a bit closer than anything,
> so it's likely you'll be doing most of this out-of-tree anyways. My
> other feeling is it sounds like over architecting, and reaching for
> the stars here, where it might be practical to bring vallium/llvmpipe
>

I would expect it to be quite easy to get llvmpipe to run on Libre-SOC's
processor, since it is PowerPC-compatible, it just won't have very good
performance due to llvmpipe's architecture.

up on the architecture first then work out how to do the big new
> thing, or bring up this architecture on an x86 chip and see it works
> at all.
>

The plan is to get Kazan working with vectorization on x86, then change the
backend out to unmodified PowerPC (LLVM may scalarize the instructions
here), then add the new vector ISA to the PowerPC backend, then add custom
instructions as-needed to improve performance.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Libre-soc-dev] Loading Vulkan Driver

2020-08-23 Thread Jacob Lifshay
On Sun, Aug 23, 2020, 15:55 Dave Airlie  wrote:

> What is your work submission model then, Vulkan is designed around
> having work submitted to a secondary processor from a control
> processor. Do you intend for the device to be used only as a
> coprocessor, or as a CPU for normal tasks that can then use the
> features in their threads, because none of the API are designed around
> that model. They have a running thread of execution that queues stuff
> up to a separate execution domain.
>

It's intended to be a combination, where CPU threads schedule work and
render threads dequeue and run the work, probably using something like
Rayon:
https://github.com/rayon-rs/rayon

When a CPU thread waits in the Vulkan driver, it can possibly be used as a
render thread instead of blocking on a futex, avoiding needing excessive
numbers of Linux kernel-level threads.

The CPU and render threads run on the same cores, as scheduled by Linux.

>
> What represents a vulkan queue,


The rayon task queues.

what will be recorded into vulkan
> command buffers,


a command sequence encoded as bytecode, a list of Rust enums, or something
similar.

what will execute those command buffers.


the render threads will dequeue the command buffers, run through all the
commands in them, and schedule the appropriate rendering tasks to rayon's
task execution mechanism.

These are
> the questions you need to ask yourself and answer before writing any
> code.


Yup, did that 2 years ago, though I don't remember if I explicitly wrote
them down before.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Libre-soc-dev] Loading Vulkan Driver

2020-08-23 Thread Jacob Lifshay
On Sun, Aug 23, 2020, 12:49 Dave Airlie  wrote:

> The big thing doing what Jacob did before, and from memory where he
> went wrong despite advice to the contrary is he skipped the
> "vectorises it" stage, which is highly important for vector CPU
> execution, as opposed to scalar GPU.
>

IIRC, my plan for Kazan all along was/is to vectorize and multi-thread it,
I just didn't get that far during the GSoC portion due to running out of
time, so, in order to get it to do something more impressive than parse
SPIR-V I kludged a rasterizer onto it during the last few weeks of GSoC.

Originally, my plan was to write a whole-function vectorization pass as a
LLVM pass. The new plan is to have the vectorization pass be done in a new
IR (shader-compiler-ir) before translating to LLVM IR since LLVM IR isn't
very good at retaining the structured IR needed to do full-function
vectorization. That also has the benefit of supporting non-LLVM backends,
such as cranelift.

For the rewritten version in Rust that I'm still working on (when not
working on the hardware design side), I'm not going to skip ahead and
actually get it to work properly.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Libre-soc-dev] Loading Vulkan Driver

2020-08-20 Thread Jacob Lifshay
On Thu, Aug 20, 2020, 22:28 vivek pandya  wrote:

>  Thanks Jason for your time to reply. I understand the error but I am not
> much familiar with entry points generation code. Currently I just make it
> compile (I just want to develop a broken pipeline quickly that just returns
> from the entry point) .
> I will study the code. Is there any document to read about that? I want to
> understand how loaders and icd interact.
>

IIRC the docs are here:
https://github.com/KhronosGroup/Vulkan-Loader/blob/master/loader/LoaderAndLayerInterface.md

Jacob Lifshay

>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Jacob Lifshay
On Tue, Mar 17, 2020 at 11:35 PM Jason Ekstrand  wrote:
>
> On Wed, Mar 18, 2020 at 12:20 AM Jacob Lifshay  
> wrote:
> >
> > The main issue with doing everything immediately is that a lot of the
> > function calls that games expect to take a very short time (e.g.
> > vkQueueSubmit) would instead take a much longer time, potentially
> > causing problems.
>
> Do you have any evidence that it will cause problems?  What I said
> above is what switfshader is doing and they're running real apps and
> I've not heard of it causing any problems.  It's also worth noting
> that you would only really have to stall at sync_file export.  You can
> async as much as you want internally.

Ok, seems worth trying out.

> > One idea for a safe userspace-backed sync_file is to have a step
> > counter that counts down until the sync_file is ready, where if
> > userspace doesn't tell it to count any steps in a certain amount of
> > time, then the sync_file switches to the error state. This way, it
> > will error shortly after a process deadlocks for some reason, while
> > still having the finite-time guarantee.
> >
> > When the sync_file is created, the step counter would be set to the
> > number of jobs that the fence is waiting on.
> >
> > It can also be set to pause the timeout to wait until another
> > sync_file signals, to handle cases where a sync_file is waiting on a
> > userspace process that is waiting on another sync_file.
> >
> > The main issue is that the kernel would have to make sure that the
> > sync_file graph doesn't have loops, maybe by erroring all sync_files
> > that it finds in the loop.
> >
> > Does that sound like a good idea?
>
> Honestly, I don't think you'll ever be able to sell that to the kernel
> community.  All of the deadlock detection would add massive complexity
> to the already non-trivial dma_fence infrastructure and for what
> benefit?  So that a software rasterizer can try to pretend to be more
> like a GPU?  You're going to have some very serious perf numbers
> and/or other proof of necessity if you want to convince the kernel to
> people to accept that level of complexity/risk.  "I designed my
> software to work this way" isn't going to convince anyone of anything
> especially when literally every other software rasterizer I'm aware of
> is immediate and they work just fine.

After some further research, it turns out that it will work to have
all the sync_files that a sync_file needs to depend on specified at
creation, which forces the dependence graph to be a DAG since you
can't depend on a sync_file that isn't yet created, so loops are
impossible by design.

Since kernel deadlock detection isn't actually required, just timeouts
for the case of halted userspace, does this seem feasable?

I'd guess that it'd require maybe 200-300 lines of code in a
self-contained driver similar to the sync_file debugging driver
mentioned previously but with the additional timeout code for safety.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jacob Lifshay
On Tue, Mar 17, 2020 at 7:08 PM Jason Ekstrand  wrote:
>
> On Tue, Mar 17, 2020 at 7:16 PM Jacob Lifshay  
> wrote:
> >
> > On Tue, Mar 17, 2020 at 11:14 AM Lucas Stach  wrote:
> > >
> > > Am Dienstag, den 17.03.2020, 10:59 -0700 schrieb Jacob Lifshay:
> > > > I think I found a userspace-accessible way to create sync_files and
> > > > dma_fences that would fulfill the requirements:
> > > > https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c
> > > >
> > > > I'm just not sure if that's a good interface to use, since it appears
> > > > to be designed only for debugging. Will have to check for additional
> > > > requirements of signalling an error when the process that created the
> > > > fence is killed.
>
> It is expressly only for debugging and testing.  Exposing such an API
> to userspace would break the finite time guarantees that are relied
> upon to keep sync_file a secure API.

Ok, I was figuring that was probably the case.

> > > Something like that can certainly be lifted for general use if it makes
> > > sense. But then with a software renderer I don't really see how fences
> > > help you at all. With a software renderer you know exactly when the
> > > frame is finished and you can just defer pushing it over to the next
> > > pipeline element until that time. You won't gain any parallelism by
> > > using fences as the CPU is busy doing the rendering and will not run
> > > other stuff concurrently, right?
> >
> > There definitely may be other hardware and/or processes that can
> > process some stuff concurrently with the main application, such as the
> > compositor and or video encoding processes (for video capture).
> > Additionally, from what I understand, sync_file is the standard way to
> > export and import explicit synchronization between processes and
> > between drivers on Linux, so it seems like a good idea to support it
> > from an interoperability standpoint even if it turns out that there
> > aren't any scheduling/timing benefits.
>
> There are different ways that one can handle interoperability,
> however.  One way is to try and make the software rasterizer look as
> much like a GPU as possible:  lots of threads to make things as
> asynchronous as possible, "real" implementations of semaphores and
> fences, etc.

This is basically the route I've picked, though rather than making
lots of native threads, I'm planning on having just one thread per
core and have a work-stealing scheduler (inspired by Rust's rayon
crate) schedule all the individual render/compute jobs, because that
allows making a lot more jobs to allow finer load balancing.

> Another is to let a SW rasterizer be a SW rasterizer: do
> everything immediately, thread only so you can exercise all the CPU
> cores, and minimally implement semaphores and fences well enough to
> maintain compatibility.  If you take the first approach, then we have
> to solve all these problems with letting userspace create unsignaled
> sync_files which it will signal later and figure out how to make it
> safe.  If you take the second approach, you'll only ever have to
> return already signaled sync_files and there's no problem with the
> sync_file finite time guarantees.

The main issue with doing everything immediately is that a lot of the
function calls that games expect to take a very short time (e.g.
vkQueueSubmit) would instead take a much longer time, potentially
causing problems.

One idea for a safe userspace-backed sync_file is to have a step
counter that counts down until the sync_file is ready, where if
userspace doesn't tell it to count any steps in a certain amount of
time, then the sync_file switches to the error state. This way, it
will error shortly after a process deadlocks for some reason, while
still having the finite-time guarantee.

When the sync_file is created, the step counter would be set to the
number of jobs that the fence is waiting on.

It can also be set to pause the timeout to wait until another
sync_file signals, to handle cases where a sync_file is waiting on a
userspace process that is waiting on another sync_file.

The main issue is that the kernel would have to make sure that the
sync_file graph doesn't have loops, maybe by erroring all sync_files
that it finds in the loop.

Does that sound like a good idea?

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jacob Lifshay
On Tue, Mar 17, 2020 at 11:14 AM Lucas Stach  wrote:
>
> Am Dienstag, den 17.03.2020, 10:59 -0700 schrieb Jacob Lifshay:
> > I think I found a userspace-accessible way to create sync_files and
> > dma_fences that would fulfill the requirements:
> > https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c
> >
> > I'm just not sure if that's a good interface to use, since it appears
> > to be designed only for debugging. Will have to check for additional
> > requirements of signalling an error when the process that created the
> > fence is killed.
>
> Something like that can certainly be lifted for general use if it makes
> sense. But then with a software renderer I don't really see how fences
> help you at all. With a software renderer you know exactly when the
> frame is finished and you can just defer pushing it over to the next
> pipeline element until that time. You won't gain any parallelism by
> using fences as the CPU is busy doing the rendering and will not run
> other stuff concurrently, right?

There definitely may be other hardware and/or processes that can
process some stuff concurrently with the main application, such as the
compositor and or video encoding processes (for video capture).
Additionally, from what I understand, sync_file is the standard way to
export and import explicit synchronization between processes and
between drivers on Linux, so it seems like a good idea to support it
from an interoperability standpoint even if it turns out that there
aren't any scheduling/timing benefits.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jacob Lifshay
On Tue, Mar 17, 2020 at 10:21 AM Lucas Stach  wrote:
>
> Am Dienstag, den 17.03.2020, 10:12 -0700 schrieb Jacob Lifshay:
> > One related issue with explicit sync using sync_file is that combined
> > CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
> > rendering in userspace (like llvmpipe but for Vulkan and with extra
> > instructions for GPU tasks) but need to synchronize with other
> > drivers/processes is that there should be some way to create an
> > explicit fence/semaphore from userspace and later signal it. This
> > seems to conflict with the requirement for a sync_file to complete in
> > finite time, since the user process could be stopped or killed.
> >
> > Any ideas?
>
> Finite just means "not infinite". If you stop the process that's doing
> part of the pipeline processing you block the pipeline, you get to keep
> the pieces in that case.

Seems reasonable.

> That's one of the issues with implicit sync
> that explicit may solve: a single client taking way too much time to
> render something can block the whole pipeline up until the display
> flip. With explicit sync the compositor can just decide to use the last
> client buffer if the latest buffer isn't ready by some deadline.
>
> With regard to the process getting killed: whatever you sync primitive
> is, you need to make sure to signal the fence (possibly with an error
> condition set) when you are not going to make progress anymore. So
> whatever your means to creating the sync_fd from your software renderer
> is, it needs to signal any outstanding fences on the sync_fd when the
> fd is closed.

I think I found a userspace-accessible way to create sync_files and
dma_fences that would fulfill the requirements:
https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c

I'm just not sure if that's a good interface to use, since it appears
to be designed only for debugging. Will have to check for additional
requirements of signalling an error when the process that created the
fence is killed.

Jacob

>
> Regards,
> Lucas
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jacob Lifshay
One related issue with explicit sync using sync_file is that combined
CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
rendering in userspace (like llvmpipe but for Vulkan and with extra
instructions for GPU tasks) but need to synchronize with other
drivers/processes is that there should be some way to create an
explicit fence/semaphore from userspace and later signal it. This
seems to conflict with the requirement for a sync_file to complete in
finite time, since the user process could be stopped or killed.

Any ideas?

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-03-01 Thread Jacob Lifshay
One idea for Marge-bot (don't know if you already do this):
Rust-lang has their bot (bors) automatically group together a few merge
requests into a single merge commit, which it then tests, then, then the
tests pass, it merges. This could help reduce CI runs to once a day (or
some other rate). If the tests fail, then it could automatically deduce
which one failed, by recursive subdivision or similar. There's also a
mechanism to adjust priority and grouping behavior when the defaults aren't
sufficient.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

2020-02-24 Thread Jacob Lifshay
See also: http://bugs.libre-riscv.org/show_bug.cgi?id=188

It might be worthwhile to consider a Vulkan extension to support this as a
translation target for DX9 as well as other old HW?

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU

2020-01-13 Thread Jacob Lifshay
On Mon, Jan 13, 2020 at 9:39 AM Jason Ekstrand  wrote:
>
> On Mon, Jan 13, 2020 at 11:27 AM Luke Kenneth Casson Leighton  
> wrote:
>> jason i'd be interested to hear your thoughts on what jacob wrote, does it 
>> alleviate your concerns, (we're not designing hardware specifically around 
>> vec2/3/4, it simply has that capability).
>
>
> Not at all.  If you just want a SW renderer that runs on RISC-V, feel free to 
> write one.  If you want to vectorize in hardware and actually get serious 
> performance out of it, I highly doubt his plan will work.  That said, I 
> wasn't planning to work on it so none of this is my problem so you're welcome 
> to take or leave anything I say. :-)

So, since it may not have been clearly explained before, the GPU we're
building has masked vectorization like most other GPUs, it just that
it additionally supports the masked vectors' elements being 1 to 4
element subvectors.

If it turns out that using subvectors makes the GPU slower, we can add
a scalarization pass before the SIMT to vector translation, converting
everything to using more conventional operations.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU

2020-01-13 Thread Jacob Lifshay
On Thu, Jan 9, 2020 at 3:56 AM Luke Kenneth Casson Leighton
 wrote:
>
> On 1/9/20, Jason Ekstrand  wrote:
> >> 2. as a flexible Vector Processor, soft-programmable, then over time if
> >> the industry moves to dropping vec4, so can we.
> >>
> >
> > That's very nice.  My primary reason for sending the first e-mail was that
> > SwiftShader vs. Mesa is a pretty big decision that's hard to reverse after
> > someone has poured several months into working on a driver and the argument
> > you gave in favor of Mesa was that it supports vec4.
>
> not quite :)  i garbled it (jacob spent some time explaining it, a few
> months back, so it's 3rd hand if you know what i mean).  what i can
> recall of what he said was: it's something to do with the data types,
> particularly predication, being maintained as part of SPIR-V (and
> NIR), which, if you drop that information, you have to use
> auto-vectorisation and other rather awful tricks to get it back when
> you get to the assembly level.
>
> jacob perhaps you could clarify, here?

So the major issue with the approach AMDGPU took where the SIMT to
predicated vector translation is done by the LLVM backend is that LLVM
doesn't really maintain a reducible CFG, which is needed to correctly
vectorize the code without devolving to a switch-in-a-loop. This
kinda-sorta works for AMDGPU because the backend can specifically tell
the optimization passes to try to maintain a reducible CFG. However,
that won't work for Libre-RISCV's GPU because we don't have a separate
GPU ISA (it's just RISC-V or Power, we're still deciding), so the
backends don't tell the optimization passes that they need to maintain
a reducible CFG, additionally, the AMDGPU vectorization is done as
part of the translation from LLVM IR to MIR, which makes it very hard
to adapt to a different ISA. Because of all of those issues, I decided
that it would be better to vectorize before translating to LLVM IR,
since that way, the CFG reducibility can be easily maintained. This
also gives the benefit that it's much easier to substitute a different
backend compiler such as gccjit or cranelift, since all of the
required SIMT-specific transformations are already completed before
the code goes to the backend. Both NIR and the IR I'm currently
implementing in Kazan (the non-Mesa Vulkan driver for libre-riscv)
maintain a reducible CFG throughout the optimization process. In fact,
the IR I'm implementing can't express non-reducible CFGs since it's
built as a tree of loops and code blocks where control transfer
operations can only continue a loop or exit a loop or block. Switches
work by having a nested set of blocks and the switch instruction picks
which block to break out of.

Hopefully, that all made sense. :)

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Adding support for EXT_sRGB for Opengl ES

2018-09-14 Thread Jacob Lifshay
Any progress on adding EXT_sRGB support to Mesa?

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Vulkan-CPU has been renamed to Kazan

2017-09-05 Thread Jacob Lifshay
The new name means "Volcano" in Japanese.

For those who don't remember, Kazan is a work-in-progress
software-rendering Vulkan implementation.

I moved the source code to https://github.com/kazan-3d/kazan, additionally,
I registered the domain name kazan-3d.org, which currently redirects to the
source code on GitHub.

I renamed the project because vulkan-cpu infringes the Vulkan trademark.

The source code will still be available at the old URL (
https://github.com/programmerjake/vulkan-cpu) to avoid breaking any links,
however I probably won't keep the old repository up-to-date.

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Vulkan on CPU Google Summer of Code project

2017-08-28 Thread Jacob Lifshay
GSOC 2017 is effectively over for me, Here's the link to the GSOC landing
page that I will be submitting later:
https://github.com/programmerjake/vulkan-cpu/blob/gsoc-2017/docs/gsoc-2017-landing-page.md

I've only completed part of the Vulkan on CPU project:
- I've completely implemented generating the SPIR-V parser from Khronos's
JSON grammar description files.
- I've completely implemented using LLVM ORC as the JIT compiler back-end.
I implemented a ORC layer that integrates with the old JITEventListener
interface, allowing me to use LLVM's debugger integration.
- I've developed this project on GNU/Linux, so it has complete support.

- I've partially implemented translating from SPIR-V to LLVM IR.
- I've partially implemented generating a graphics pipeline by compiling
several shaders together.
- I've partially implemented image support: currently only
VK_FORMAT_B8G8R8A8_UNORM
- I have a temporary rasterizer implementation: I've implemented a scanline
rasterizer just to get stuff to draw, I'm planning on replacing it with the
originally-planned tiled binning rasterizer.
- I have less than 5 functions to implement before it will work on Win32,
mostly filesystem support.
- I've not yet started implementing the actual Vulkan ICD interface.
- I've not yet started implementing the Whole-function vectorization pass.
- I've not yet started implementing support for other platforms, however,
it should compile and run without problems on most Unix systems.

Some interesting things I learned:
- Vulkan doesn't actually specify using the top-left fill rule. (I couldn't
find it in the rasterization section of the Vulkan specification.)
- SPIR-V is surprisingly complicated to parse for an IR that's designed to
be simple to parse. OpSwitch requires you to determine the bit-width of the
value being switched on before you can parse the list of cases. See
https://bugs.freedesktop.org/show_bug.cgi?id=101560

I had gotten delayed by my implementing a SPIR-V parser myself. In
hindsight, the amount of time needed to run the SPIR-V to LLVM IR
translator is dwarfed by running LLVM's optimizations, so I could have just
used the SPIR-V parser that Khronos has already written as it's fast enough.

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

2017-06-11 Thread Jacob Lifshay
sin/cos then it would be dead slow.
>
I am planning on using c++ templates to help with a lot of the texture
sampler code generation -- clang can convert it to llvm ir and then I can
inline it into the appropriate places. I think that all of the
non-compressed image formats should be pretty easy to handle that way, as
they are all pretty similar (bits packed into a long word or members of a
struct). I can implement interpolation on top of the functions to load and
unpack the image elements from memory. I'd estimate that, excluding the
compressed texture formats, I'd need less than 10k lines and maybe a week
or two to implement it all. (Glad I don't have to implement that in C.) I
am planning on compiling fdlibm with clang into llvm ir, then running my
vectorization algorithm on all the functions. LLVM has a spot where you can
tell it that you have optimized vectorized math intrinsics, I could add
them there, or implement another lowering pass to convert the intrinsics to
function calls, which can then be inlined. Hopefully, that will save most
of the work needed to implement vectorized math functions. Also, llvm is
already pretty good at converting vectorized sqrt intrinsics to vector sqrt
instructions, which x86 sse/avx and (i think) arm neon already have.

>
>
> Anyway, I hope this helps.  Best of luck.
>
Thanks,
Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] vulkan/wsi: Improve the DRI3 error message

2017-02-28 Thread Jacob Lifshay
This commit improves the message by telling them that they could probably
enable DRI3.  More importantly, it includes a little heuristic to check
to see if we're running on AMD or NVIDIA's proprietary X11 drivers and,
if we are, doesn't emit the warning.  This way, users with both a discrete
card and Intel graphics don't get the warning when they're just running
on the discrete card.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99715
Co-authored-by: Jason Ekstrand 
Reviewed-by: Kai Wasserbäch 
Tested-by: Rene Lindsay 
---
 src/vulkan/wsi/wsi_common_x11.c | 51 +
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_x11.c
index 64ba921..323209c 100644
--- a/src/vulkan/wsi/wsi_common_x11.c
+++ b/src/vulkan/wsi/wsi_common_x11.c
@@ -49,6 +49,7 @@
 struct wsi_x11_connection {
bool has_dri3;
bool has_present;
+   bool is_proprietary_x11;
 };
 
 struct wsi_x11 {
@@ -63,8 +64,8 @@ static struct wsi_x11_connection *
 wsi_x11_connection_create(const VkAllocationCallbacks *alloc,
   xcb_connection_t *conn)
 {
-   xcb_query_extension_cookie_t dri3_cookie, pres_cookie;
-   xcb_query_extension_reply_t *dri3_reply, *pres_reply;
+   xcb_query_extension_cookie_t dri3_cookie, pres_cookie, amd_cookie, 
nv_cookie;
+   xcb_query_extension_reply_t *dri3_reply, *pres_reply, *amd_reply, *nv_reply;
 
struct wsi_x11_connection *wsi_conn =
   vk_alloc(alloc, sizeof(*wsi_conn), 8,
@@ -75,20 +76,43 @@ wsi_x11_connection_create(const VkAllocationCallbacks 
*alloc,
dri3_cookie = xcb_query_extension(conn, 4, "DRI3");
pres_cookie = xcb_query_extension(conn, 7, "PRESENT");
 
+   /* We try to be nice to users and emit a warning if they try to use a
+* Vulkan application on a system without DRI3 enabled.  However, this ends
+* up spewing the warning when a user has, for example, both Intel
+* integrated graphics and a discrete card with proprietary drivers and are
+* running on the discrete card with the proprietary DDX.  In this case, we
+* really don't want to print the warning because it just confuses users.
+* As a heuristic to detect this case, we check for a couple of proprietary
+* X11 extensions.
+*/
+   amd_cookie = xcb_query_extension(conn, 11, "ATIFGLRXDRI");
+   nv_cookie = xcb_query_extension(conn, 10, "NV-CONTROL");
+
dri3_reply = xcb_query_extension_reply(conn, dri3_cookie, NULL);
pres_reply = xcb_query_extension_reply(conn, pres_cookie, NULL);
-   if (dri3_reply == NULL || pres_reply == NULL) {
+   amd_reply = xcb_query_extension_reply(conn, amd_cookie, NULL);
+   nv_reply = xcb_query_extension_reply(conn, nv_cookie, NULL);
+   if (!dri3_reply || !pres_reply) {
   free(dri3_reply);
   free(pres_reply);
+  free(amd_reply);
+  free(nv_reply);
   vk_free(alloc, wsi_conn);
   return NULL;
}
 
wsi_conn->has_dri3 = dri3_reply->present != 0;
wsi_conn->has_present = pres_reply->present != 0;
+   wsi_conn->is_proprietary_x11 = false;
+   if (amd_reply && amd_reply->present)
+  wsi_conn->is_proprietary_x11 = true;
+   if (nv_reply && nv_reply->present)
+  wsi_conn->is_proprietary_x11 = true;
 
free(dri3_reply);
free(pres_reply);
+   free(amd_reply);
+   free(nv_reply);
 
return wsi_conn;
 }
@@ -100,6 +124,18 @@ wsi_x11_connection_destroy(const VkAllocationCallbacks 
*alloc,
vk_free(alloc, conn);
 }
 
+static bool
+wsi_x11_check_for_dri3(struct wsi_x11_connection *wsi_conn)
+{
+  if (wsi_conn->has_dri3)
+return true;
+  if (!wsi_conn->is_proprietary_x11) {
+fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n"
+"Note: you can probably enable DRI3 in your Xorg 
config\n");
+  }
+  return false;
+}
+
 static struct wsi_x11_connection *
 wsi_x11_get_connection(struct wsi_device *wsi_dev,
   const VkAllocationCallbacks *alloc,
@@ -264,11 +300,8 @@ VkBool32 wsi_get_physical_device_xcb_presentation_support(
if (!wsi_conn)
   return false;
 
-   if (!wsi_conn->has_dri3) {
-  fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
+   if (!wsi_x11_check_for_dri3(wsi_conn))
   return false;
-   }
 
unsigned visual_depth;
if (!connection_get_visualtype(connection, visual_id, _depth))
@@ -313,9 +346,7 @@ x11_surface_get_support(VkIcdSurfaceBase *icd_surface,
if (!wsi_conn)
   return VK_ERROR_OUT_OF_HOST_MEMORY;
 
-   if (!wsi_conn->has_dri3) {
-  fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
+   if 

Re: [Mesa-dev] [PATCH] vulkan/wsi: Improve the DRI3 error message

2017-02-25 Thread Jacob Lifshay
Just to double check, is there anything else I need to do to have this
patch committed?
Jacob Lifshay

On Feb 19, 2017 02:08, "Kai Wasserbäch" <k...@dev.carbon-project.org> wrote:

> Jason Ekstrand wrote on 19.02.2017 06:01:
> > On Feb 18, 2017 12:37 PM, "Kai Wasserbäch" <k...@dev.carbon-project.org>
> > wrote:
> >
> > Hey Jacob,
> > sorry for not spotting this the first time, but I have an additional
> > comment.
> > Please see below.
> >
> > Jacob Lifshay wrote on 18.02.2017 18:48:> This commit improves the
> message
> > by
> > telling them that they could probably
> >> enable DRI3.  More importantly, it includes a little heuristic to check
> >> to see if we're running on AMD or NVIDIA's proprietary X11 drivers and,
> >> if we are, doesn't emit the warning.  This way, users with both a
> discrete
> >> card and Intel graphics don't get the warning when they're just running
> >> on the discrete card.
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99715
> >> Co-authored-by: Jason Ekstrand <jason.ekstr...@intel.com>
> >> ---
> >>  src/vulkan/wsi/wsi_common_x11.c | 47 ++
> > ++-
> >>  1 file changed, 37 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/src/vulkan/wsi/wsi_common_x11.c
> b/src/vulkan/wsi/wsi_common_
> > x11.c
> >> index 64ba921..b3a017a 100644
> >> --- a/src/vulkan/wsi/wsi_common_x11.c
> >> +++ b/src/vulkan/wsi/wsi_common_x11.c
> >> @@ -49,6 +49,7 @@
> >>  struct wsi_x11_connection {
> >> bool has_dri3;
> >> bool has_present;
> >> +   bool is_proprietary_x11;
> >>  };
> >>
> >>  struct wsi_x11 {
> >> @@ -63,8 +64,8 @@ static struct wsi_x11_connection *
> >>  wsi_x11_connection_create(const VkAllocationCallbacks *alloc,
> >>xcb_connection_t *conn)
> >>  {
> >> -   xcb_query_extension_cookie_t dri3_cookie, pres_cookie;
> >> -   xcb_query_extension_reply_t *dri3_reply, *pres_reply;
> >> +   xcb_query_extension_cookie_t dri3_cookie, pres_cookie, amd_cookie,
> > nv_cookie;
> >> +   xcb_query_extension_reply_t *dri3_reply, *pres_reply, *amd_reply,
> > *nv_reply;
> >>
> >> struct wsi_x11_connection *wsi_conn =
> >>vk_alloc(alloc, sizeof(*wsi_conn), 8,
> >> @@ -75,20 +76,39 @@ wsi_x11_connection_create(const
> VkAllocationCallbacks
> > *alloc,
> >> dri3_cookie = xcb_query_extension(conn, 4, "DRI3");
> >> pres_cookie = xcb_query_extension(conn, 7, "PRESENT");
> >>
> >> +   /* We try to be nice to users and emit a warning if they try to use
> a
> >> +* Vulkan application on a system without DRI3 enabled.  However,
> > this ends
> >> +* up spewing the warning when a user has, for example, both Intel
> >> +* integrated graphics and a discrete card with proprietary driers
> > and are
> >> +* running on the discrete card with the proprietary DDX.  In this
> > case, we
> >> +* really don't want to print the warning because it just confuses
> > users.
> >> +* As a heuristic to detect this case, we check for a couple of
> > proprietary
> >> +* X11 extensions.
> >> +*/
> >> +   amd_cookie = xcb_query_extension(conn, 11, "ATIFGLRXDRI");
> >> +   nv_cookie = xcb_query_extension(conn, 10, "NV-CONTROL");
> >> +
> >> dri3_reply = xcb_query_extension_reply(conn, dri3_cookie, NULL);
> >> pres_reply = xcb_query_extension_reply(conn, pres_cookie, NULL);
> >> -   if (dri3_reply == NULL || pres_reply == NULL) {
> >> +   amd_reply = xcb_query_extension_reply(conn, amd_cookie, NULL);
> >> +   nv_reply = xcb_query_extension_reply(conn, nv_cookie, NULL);
> >> +   if (!dri3_reply || !pres_reply || !amd_reply || !nv_reply) {
> >
> > I don't feel wsi_x11_connection_create should fail if there's no
> amd_reply
> > or
> > nv_reply. That should just lead to unconditionally warning, in case
> there's
> > no
> > DRI3 support.
> >
> >
> > Of there is no reply then we either lost our connection to the X server
> or
> > ran out of memory.  Either of those seem like a valid excuse to fail.
> The
> > chances of successfully connecting to X to create a swapchain at that
> point
> > is pretty close to zero.
>
> Fair enough.
>
> > With that fixed, t

[Mesa-dev] [PATCH] vulkan/wsi: Improve the DRI3 error message

2017-02-18 Thread Jacob Lifshay
This commit improves the message by telling them that they could probably
enable DRI3.  More importantly, it includes a little heuristic to check
to see if we're running on AMD or NVIDIA's proprietary X11 drivers and,
if we are, doesn't emit the warning.  This way, users with both a discrete
card and Intel graphics don't get the warning when they're just running
on the discrete card.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99715
Co-authored-by: Jason Ekstrand 
Reviewed-by: Kai Wasserbäch 
---
 src/vulkan/wsi/wsi_common_x11.c | 51 +
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_x11.c
index 64ba921..a6fd094 100644
--- a/src/vulkan/wsi/wsi_common_x11.c
+++ b/src/vulkan/wsi/wsi_common_x11.c
@@ -49,6 +49,7 @@
 struct wsi_x11_connection {
bool has_dri3;
bool has_present;
+   bool is_proprietary_x11;
 };
 
 struct wsi_x11 {
@@ -63,8 +64,8 @@ static struct wsi_x11_connection *
 wsi_x11_connection_create(const VkAllocationCallbacks *alloc,
   xcb_connection_t *conn)
 {
-   xcb_query_extension_cookie_t dri3_cookie, pres_cookie;
-   xcb_query_extension_reply_t *dri3_reply, *pres_reply;
+   xcb_query_extension_cookie_t dri3_cookie, pres_cookie, amd_cookie, 
nv_cookie;
+   xcb_query_extension_reply_t *dri3_reply, *pres_reply, *amd_reply, *nv_reply;
 
struct wsi_x11_connection *wsi_conn =
   vk_alloc(alloc, sizeof(*wsi_conn), 8,
@@ -75,20 +76,43 @@ wsi_x11_connection_create(const VkAllocationCallbacks 
*alloc,
dri3_cookie = xcb_query_extension(conn, 4, "DRI3");
pres_cookie = xcb_query_extension(conn, 7, "PRESENT");
 
+   /* We try to be nice to users and emit a warning if they try to use a
+* Vulkan application on a system without DRI3 enabled.  However, this ends
+* up spewing the warning when a user has, for example, both Intel
+* integrated graphics and a discrete card with proprietary drivers and are
+* running on the discrete card with the proprietary DDX.  In this case, we
+* really don't want to print the warning because it just confuses users.
+* As a heuristic to detect this case, we check for a couple of proprietary
+* X11 extensions.
+*/
+   amd_cookie = xcb_query_extension(conn, 11, "ATIFGLRXDRI");
+   nv_cookie = xcb_query_extension(conn, 10, "NV-CONTROL");
+
dri3_reply = xcb_query_extension_reply(conn, dri3_cookie, NULL);
pres_reply = xcb_query_extension_reply(conn, pres_cookie, NULL);
-   if (dri3_reply == NULL || pres_reply == NULL) {
+   amd_reply = xcb_query_extension_reply(conn, amd_cookie, NULL);
+   nv_reply = xcb_query_extension_reply(conn, nv_cookie, NULL);
+   if (!dri3_reply || !pres_reply) {
   free(dri3_reply);
   free(pres_reply);
+  free(amd_reply);
+  free(nv_reply);
   vk_free(alloc, wsi_conn);
   return NULL;
}
 
wsi_conn->has_dri3 = dri3_reply->present != 0;
wsi_conn->has_present = pres_reply->present != 0;
+   wsi_conn->is_proprietary_x11 = false;
+   if (amd_reply && amd_reply->present)
+  wsi_conn->is_proprietary_x11 = true;
+   if (nv_reply && nv_reply->present)
+  wsi_conn->is_proprietary_x11 = true;
 
free(dri3_reply);
free(pres_reply);
+   free(amd_reply);
+   free(nv_reply);
 
return wsi_conn;
 }
@@ -100,6 +124,18 @@ wsi_x11_connection_destroy(const VkAllocationCallbacks 
*alloc,
vk_free(alloc, conn);
 }
 
+static bool
+wsi_x11_check_for_dri3(struct wsi_x11_connection *wsi_conn)
+{
+  if (wsi_conn->has_dri3)
+return true;
+  if (!wsi_conn->is_proprietary_x11) {
+fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
+"Note: you can probably enable DRI3 in your Xorg 
config\n");
+  }
+  return false;
+}
+
 static struct wsi_x11_connection *
 wsi_x11_get_connection(struct wsi_device *wsi_dev,
   const VkAllocationCallbacks *alloc,
@@ -264,11 +300,8 @@ VkBool32 wsi_get_physical_device_xcb_presentation_support(
if (!wsi_conn)
   return false;
 
-   if (!wsi_conn->has_dri3) {
-  fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
+   if (!wsi_x11_check_for_dri3(wsi_conn))
   return false;
-   }
 
unsigned visual_depth;
if (!connection_get_visualtype(connection, visual_id, _depth))
@@ -313,9 +346,7 @@ x11_surface_get_support(VkIcdSurfaceBase *icd_surface,
if (!wsi_conn)
   return VK_ERROR_OUT_OF_HOST_MEMORY;
 
-   if (!wsi_conn->has_dri3) {
-  fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
+   if (!wsi_x11_check_for_dri3(wsi_conn)) {
   *pSupported = false;
   return 

[Mesa-dev] [PATCH] vulkan/wsi: Improve the DRI3 error message

2017-02-18 Thread Jacob Lifshay
This commit improves the message by telling them that they could probably
enable DRI3.  More importantly, it includes a little heuristic to check
to see if we're running on AMD or NVIDIA's proprietary X11 drivers and,
if we are, doesn't emit the warning.  This way, users with both a discrete
card and Intel graphics don't get the warning when they're just running
on the discrete card.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99715
Co-authored-by: Jason Ekstrand 
Reviewed-by: Kai Wasserbäch 
---
 src/vulkan/wsi/wsi_common_x11.c | 51 +
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_x11.c
index 64ba921..e4906f1 100644
--- a/src/vulkan/wsi/wsi_common_x11.c
+++ b/src/vulkan/wsi/wsi_common_x11.c
@@ -49,6 +49,7 @@
 struct wsi_x11_connection {
bool has_dri3;
bool has_present;
+   bool is_proprietary_x11;
 };
 
 struct wsi_x11 {
@@ -63,8 +64,8 @@ static struct wsi_x11_connection *
 wsi_x11_connection_create(const VkAllocationCallbacks *alloc,
   xcb_connection_t *conn)
 {
-   xcb_query_extension_cookie_t dri3_cookie, pres_cookie;
-   xcb_query_extension_reply_t *dri3_reply, *pres_reply;
+   xcb_query_extension_cookie_t dri3_cookie, pres_cookie, amd_cookie, 
nv_cookie;
+   xcb_query_extension_reply_t *dri3_reply, *pres_reply, *amd_reply, *nv_reply;
 
struct wsi_x11_connection *wsi_conn =
   vk_alloc(alloc, sizeof(*wsi_conn), 8,
@@ -75,20 +76,43 @@ wsi_x11_connection_create(const VkAllocationCallbacks 
*alloc,
dri3_cookie = xcb_query_extension(conn, 4, "DRI3");
pres_cookie = xcb_query_extension(conn, 7, "PRESENT");
 
+   /* We try to be nice to users and emit a warning if they try to use a
+* Vulkan application on a system without DRI3 enabled.  However, this ends
+* up spewing the warning when a user has, for example, both Intel
+* integrated graphics and a discrete card with proprietary driers and are
+* running on the discrete card with the proprietary DDX.  In this case, we
+* really don't want to print the warning because it just confuses users.
+* As a heuristic to detect this case, we check for a couple of proprietary
+* X11 extensions.
+*/
+   amd_cookie = xcb_query_extension(conn, 11, "ATIFGLRXDRI");
+   nv_cookie = xcb_query_extension(conn, 10, "NV-CONTROL");
+
dri3_reply = xcb_query_extension_reply(conn, dri3_cookie, NULL);
pres_reply = xcb_query_extension_reply(conn, pres_cookie, NULL);
-   if (dri3_reply == NULL || pres_reply == NULL) {
+   amd_reply = xcb_query_extension_reply(conn, amd_cookie, NULL);
+   nv_reply = xcb_query_extension_reply(conn, nv_cookie, NULL);
+   if (!dri3_reply || !pres_reply) {
   free(dri3_reply);
   free(pres_reply);
+  free(amd_reply);
+  free(nv_reply);
   vk_free(alloc, wsi_conn);
   return NULL;
}
 
wsi_conn->has_dri3 = dri3_reply->present != 0;
wsi_conn->has_present = pres_reply->present != 0;
+   wsi_conn->is_proprietary_x11 = false;
+   if (amd_reply && amd_reply->present)
+  wsi_conn->is_proprietary_x11 = true;
+   if (nv_reply && nv_reply->present)
+  wsi_conn->is_proprietary_x11 = true;
 
free(dri3_reply);
free(pres_reply);
+   free(amd_reply);
+   free(nv_reply);
 
return wsi_conn;
 }
@@ -100,6 +124,18 @@ wsi_x11_connection_destroy(const VkAllocationCallbacks 
*alloc,
vk_free(alloc, conn);
 }
 
+static bool
+wsi_x11_check_for_dri3(struct wsi_x11_connection *wsi_conn)
+{
+  if (wsi_conn->has_dri3)
+return true;
+  if (!wsi_conn->is_proprietary_x11) {
+fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
+"Note: you can probably enable DRI3 in your Xorg 
config\n");
+  }
+  return false;
+}
+
 static struct wsi_x11_connection *
 wsi_x11_get_connection(struct wsi_device *wsi_dev,
   const VkAllocationCallbacks *alloc,
@@ -264,11 +300,8 @@ VkBool32 wsi_get_physical_device_xcb_presentation_support(
if (!wsi_conn)
   return false;
 
-   if (!wsi_conn->has_dri3) {
-  fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
+   if (!wsi_x11_check_for_dri3(wsi_conn))
   return false;
-   }
 
unsigned visual_depth;
if (!connection_get_visualtype(connection, visual_id, _depth))
@@ -313,9 +346,7 @@ x11_surface_get_support(VkIcdSurfaceBase *icd_surface,
if (!wsi_conn)
   return VK_ERROR_OUT_OF_HOST_MEMORY;
 
-   if (!wsi_conn->has_dri3) {
-  fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
+   if (!wsi_x11_check_for_dri3(wsi_conn)) {
   *pSupported = false;
   return 

[Mesa-dev] [PATCH] vulkan/wsi: Improve the DRI3 error message

2017-02-18 Thread Jacob Lifshay
This commit improves the message by telling them that they could probably
enable DRI3.  More importantly, it includes a little heuristic to check
to see if we're running on AMD or NVIDIA's proprietary X11 drivers and,
if we are, doesn't emit the warning.  This way, users with both a discrete
card and Intel graphics don't get the warning when they're just running
on the discrete card.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99715
Co-authored-by: Jason Ekstrand 
---
 src/vulkan/wsi/wsi_common_x11.c | 47 -
 1 file changed, 37 insertions(+), 10 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_x11.c
index 64ba921..b3a017a 100644
--- a/src/vulkan/wsi/wsi_common_x11.c
+++ b/src/vulkan/wsi/wsi_common_x11.c
@@ -49,6 +49,7 @@
 struct wsi_x11_connection {
bool has_dri3;
bool has_present;
+   bool is_proprietary_x11;
 };
 
 struct wsi_x11 {
@@ -63,8 +64,8 @@ static struct wsi_x11_connection *
 wsi_x11_connection_create(const VkAllocationCallbacks *alloc,
   xcb_connection_t *conn)
 {
-   xcb_query_extension_cookie_t dri3_cookie, pres_cookie;
-   xcb_query_extension_reply_t *dri3_reply, *pres_reply;
+   xcb_query_extension_cookie_t dri3_cookie, pres_cookie, amd_cookie, 
nv_cookie;
+   xcb_query_extension_reply_t *dri3_reply, *pres_reply, *amd_reply, *nv_reply;
 
struct wsi_x11_connection *wsi_conn =
   vk_alloc(alloc, sizeof(*wsi_conn), 8,
@@ -75,20 +76,39 @@ wsi_x11_connection_create(const VkAllocationCallbacks 
*alloc,
dri3_cookie = xcb_query_extension(conn, 4, "DRI3");
pres_cookie = xcb_query_extension(conn, 7, "PRESENT");
 
+   /* We try to be nice to users and emit a warning if they try to use a
+* Vulkan application on a system without DRI3 enabled.  However, this ends
+* up spewing the warning when a user has, for example, both Intel
+* integrated graphics and a discrete card with proprietary driers and are
+* running on the discrete card with the proprietary DDX.  In this case, we
+* really don't want to print the warning because it just confuses users.
+* As a heuristic to detect this case, we check for a couple of proprietary
+* X11 extensions.
+*/
+   amd_cookie = xcb_query_extension(conn, 11, "ATIFGLRXDRI");
+   nv_cookie = xcb_query_extension(conn, 10, "NV-CONTROL");
+
dri3_reply = xcb_query_extension_reply(conn, dri3_cookie, NULL);
pres_reply = xcb_query_extension_reply(conn, pres_cookie, NULL);
-   if (dri3_reply == NULL || pres_reply == NULL) {
+   amd_reply = xcb_query_extension_reply(conn, amd_cookie, NULL);
+   nv_reply = xcb_query_extension_reply(conn, nv_cookie, NULL);
+   if (!dri3_reply || !pres_reply || !amd_reply || !nv_reply) {
   free(dri3_reply);
   free(pres_reply);
+  free(amd_reply);
+  free(nv_reply);
   vk_free(alloc, wsi_conn);
   return NULL;
}
 
wsi_conn->has_dri3 = dri3_reply->present != 0;
wsi_conn->has_present = pres_reply->present != 0;
+   wsi_conn->is_proprietary_x11 = amd_reply->present || nv_reply->present;
 
free(dri3_reply);
free(pres_reply);
+   free(amd_reply);
+   free(nv_reply);
 
return wsi_conn;
 }
@@ -100,6 +120,18 @@ wsi_x11_connection_destroy(const VkAllocationCallbacks 
*alloc,
vk_free(alloc, conn);
 }
 
+static bool
+wsi_x11_check_for_dri3(struct wsi_x11_connection *wsi_conn)
+{
+  if (wsi_conn->has_dri3)
+return true;
+  if (!wsi_conn->is_proprietary_x11) {
+fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
+"Note: you can probably enable DRI3 in your Xorg 
config\n");
+  }
+  return false;
+}
+
 static struct wsi_x11_connection *
 wsi_x11_get_connection(struct wsi_device *wsi_dev,
   const VkAllocationCallbacks *alloc,
@@ -264,11 +296,8 @@ VkBool32 wsi_get_physical_device_xcb_presentation_support(
if (!wsi_conn)
   return false;
 
-   if (!wsi_conn->has_dri3) {
-  fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
+   if (!wsi_x11_check_for_dri3(wsi_conn))
   return false;
-   }
 
unsigned visual_depth;
if (!connection_get_visualtype(connection, visual_id, _depth))
@@ -313,9 +342,7 @@ x11_surface_get_support(VkIcdSurfaceBase *icd_surface,
if (!wsi_conn)
   return VK_ERROR_OUT_OF_HOST_MEMORY;
 
-   if (!wsi_conn->has_dri3) {
-  fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
+   if (!wsi_x11_check_for_dri3(wsi_conn)) {
   *pSupported = false;
   return VK_SUCCESS;
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH] vulkan/wsi: Improve the DRI3 error message

2017-02-18 Thread Jacob Lifshay
This commit improves the message by telling them that they could probably
enable DRI3 and giving a url to a Ask Ubuntu question showing how to do
that.  More importantly, it includes a little heuristic to check to see
if we're running on AMD or NVIDIA's proprietary X11 drivers and, if we
are, doesn't emit the warning.  This way, users with both a discrete
card and Intel graphics don't get the warning when they're just running
on the discrete card.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99715
Co-authored-by: Jason Ekstrand 
---
 src/vulkan/wsi/wsi_common_x11.c | 55 +
 1 file changed, 45 insertions(+), 10 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_x11.c
index 64ba921..ac7a972 100644
--- a/src/vulkan/wsi/wsi_common_x11.c
+++ b/src/vulkan/wsi/wsi_common_x11.c
@@ -48,7 +48,9 @@
 
 struct wsi_x11_connection {
bool has_dri3;
+   bool has_dri2;
bool has_present;
+   bool is_proprietary_x11;
 };
 
 struct wsi_x11 {
@@ -63,8 +65,8 @@ static struct wsi_x11_connection *
 wsi_x11_connection_create(const VkAllocationCallbacks *alloc,
   xcb_connection_t *conn)
 {
-   xcb_query_extension_cookie_t dri3_cookie, pres_cookie;
-   xcb_query_extension_reply_t *dri3_reply, *pres_reply;
+   xcb_query_extension_cookie_t dri3_cookie, dri2_cookie, pres_cookie, 
amd_cookie, nv_cookie;
+   xcb_query_extension_reply_t *dri3_reply, *dri2_reply, *pres_reply, 
*amd_reply, *nv_reply;
 
struct wsi_x11_connection *wsi_conn =
   vk_alloc(alloc, sizeof(*wsi_conn), 8,
@@ -73,22 +75,46 @@ wsi_x11_connection_create(const VkAllocationCallbacks 
*alloc,
   return NULL;
 
dri3_cookie = xcb_query_extension(conn, 4, "DRI3");
+   dri2_cookie = xcb_query_extension(conn, 4, "DRI2");
pres_cookie = xcb_query_extension(conn, 7, "PRESENT");
 
+   /* We try to be nice to users and emit a warning if they try to use a
+* Vulkan application on a system without DRI3 enabled.  However, this ends
+* up spewing the warning when a user has, for example, both Intel
+* integrated graphics and a discrete card with proprietary driers and are
+* running on the discrete card with the proprietary DDX.  In this case, we
+* really don't want to print the warning because it just confuses users.
+* As a heuristic to detect this case, we check for a couple of proprietary
+* X11 extensions.
+*/
+   amd_cookie = xcb_query_extension(conn, 11, "ATIFGLRXDRI");
+   nv_cookie = xcb_query_extension(conn, 10, "NV-CONTROL");
+
dri3_reply = xcb_query_extension_reply(conn, dri3_cookie, NULL);
+   dri2_reply = xcb_query_extension_reply(conn, dri2_cookie, NULL);
pres_reply = xcb_query_extension_reply(conn, pres_cookie, NULL);
-   if (dri3_reply == NULL || pres_reply == NULL) {
+   amd_reply = xcb_query_extension_reply(conn, amd_cookie, NULL);
+   nv_reply = xcb_query_extension_reply(conn, nv_cookie, NULL);
+   if (!dri3_reply || !dri2_reply || !pres_reply || !amd_reply || !nv_reply) {
   free(dri3_reply);
+  free(dri2_reply);
   free(pres_reply);
+  free(amd_reply);
+  free(nv_reply);
   vk_free(alloc, wsi_conn);
   return NULL;
}
 
wsi_conn->has_dri3 = dri3_reply->present != 0;
+   wsi_conn->has_dri2 = dri2_reply->present != 0;
wsi_conn->has_present = pres_reply->present != 0;
+   wsi_conn->is_proprietary_x11 = amd_reply->present || nv_reply->present;
 
free(dri3_reply);
+   free(dri2_reply);
free(pres_reply);
+   free(amd_reply);
+   free(nv_reply);
 
return wsi_conn;
 }
@@ -100,6 +126,20 @@ wsi_x11_connection_destroy(const VkAllocationCallbacks 
*alloc,
vk_free(alloc, conn);
 }
 
+static bool
+wsi_x11_check_for_dri3(struct wsi_x11_connection *wsi_conn)
+{
+  if (wsi_conn->has_dri3)
+return true;
+  if (!wsi_conn->is_proprietary_x11) {
+fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
+if (wsi_conn->has_dri2)
+  fprintf(stderr, "Note: DRI2 support detected, you can probably enable 
DRI3 in your Xorg config;\n"
+  "  see 
http://askubuntu.com/questions/817226/how-to-enable-dri3-on-ubuntu-16-04\n;);
+  }
+  return false;
+}
+
 static struct wsi_x11_connection *
 wsi_x11_get_connection(struct wsi_device *wsi_dev,
   const VkAllocationCallbacks *alloc,
@@ -264,11 +304,8 @@ VkBool32 wsi_get_physical_device_xcb_presentation_support(
if (!wsi_conn)
   return false;
 
-   if (!wsi_conn->has_dri3) {
-  fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
+   if (!wsi_x11_check_for_dri3(wsi_conn))
   return false;
-   }
 
unsigned visual_depth;
if (!connection_get_visualtype(connection, visual_id, _depth))
@@ -313,9 +350,7 @@ x11_surface_get_support(VkIcdSurfaceBase *icd_surface,
   

Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

2017-02-14 Thread Jacob Lifshay
On Feb 14, 2017 12:18 AM, "Nicolai Hähnle" <nhaeh...@gmail.com> wrote:

On 13.02.2017 17:54, Jacob Lifshay wrote:

> the algorithm i was going to use would get the union of the sets of live
> variables at the barriers (union over barriers), create an array of
> structs that holds them all, then for each barrier, insert the code to
> store all live variables, then end the for loop over tid_in_workgroup,
> then run the memory barrier, then start another for loop over
> tid_in_workgroup, then load all live variables.
>

Okay, sounds reasonable in theory.

There are some issues, like: how do you actually determine live variables?
If you're working off TGSI like llvmpipe does today, you'd need to write
your own analysis for that, but in a structured control flow graph like
TGSI has, that shouldn't be too difficult.


I was planning on using the spir-v to llvm translator and never using tgsi.
I could implement the pass using llvm coroutines, however, I'd need to have
several additional passes to convert the output; it might not optimize all
the way because we would have the switch on the suspend point index still
left. Also, according to the docs from llvm trunk, llvm doesn't support
reducing the space required by using the minimum size needed to store all
the live variables at the suspend point with the largest space
requirements, instead, it allocates separate space for each variable at
each suspend point:
http://llvm.org/docs/Coroutines.html#areas-requiring-attention


I'd still recommend you to at least seriously read through the LLVM
coroutine stuff.

Cheers,
Nicolai

Jacob Lifshay
>
> On Feb 13, 2017 08:45, "Nicolai Hähnle" <nhaeh...@gmail.com
> <mailto:nhaeh...@gmail.com>> wrote:
>
> [ re-adding mesa-dev on the assumption that it got dropped by accident
> ]
>
> On 13.02.2017 17:27, Jacob Lifshay wrote:
>
> I would start a thread for each cpu, then have each
> thread run the
> compute shader a number of times instead of having a
> thread per
> shader
> invocation.
>
>
> This will not work.
>
> Please, read again what the barrier() instruction does: When
> the
> barrier() call is reached, _all_ threads within the
> workgroup are
> supposed to be run until they reach that barrier() call.
>
>
> to clarify, I had meant that each os thread would run the
> sections of
> the shader between the barriers for all the shaders in a work
> group,
> then, when it finished the work group, it would go to the next work
> group assigned to the os thread.
>
> so, if our shader is:
> a = b + tid;
> barrier();
> d = e + f;
>
> and our simd width is 4, our work-group size is 128, and we have
> 16 os
> threads, then it will run for each os thread:
> for(workgroup = os_thread_index; workgroup < workgroup_count;
> workgroup++)
> {
> for(tid_in_workgroup = 0; tid_in_workgroup < 128;
> tid_in_workgroup += 4)
> {
> ivec4 tid = ivec4(0, 1, 2, 3) + ivec4(tid_in_workgroup +
> workgroup * 128);
> a[tid_in_workgroup / 4] = ivec_add(b[tid_in_workgroup /
> 4], tid);
> }
> memory_fence(); // if needed
> for(tid_in_workgroup = 0; tid_in_workgroup < 128;
> tid_in_workgroup += 4)
> {
> d[tid_in_workgroup / 4] = vec_add(e[tid_in_workgroup / 4],
> f[tid_in_workgroup / 4]);
> }
> }
> // after this, we run the next rendering or compute job
>
>
> Okay good, that's the right concept.
>
> Actually doing that is not at all straightforward though: consider
> that the barrier() might occur inside a loop in the shader.
>
> So if you implemented that within the framework of llvmpipe, you'd
> make a lot of people very happy: it would allow finally adding
> compute shader support to llvmpipe. Mind you, that in itself would
> already be a pretty decent-sized project for GSoC!
>
> Cheers,
> Nicolai
>
>

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

2017-02-13 Thread Jacob Lifshay
the algorithm i was going to use would get the union of the sets of live
variables at the barriers (union over barriers), create an array of structs
that holds them all, then for each barrier, insert the code to store all
live variables, then end the for loop over tid_in_workgroup, then run the
memory barrier, then start another for loop over tid_in_workgroup, then
load all live variables.
Jacob Lifshay

On Feb 13, 2017 08:45, "Nicolai Hähnle" <nhaeh...@gmail.com> wrote:

> [ re-adding mesa-dev on the assumption that it got dropped by accident ]
>
> On 13.02.2017 17:27, Jacob Lifshay wrote:
>
>> I would start a thread for each cpu, then have each thread run the
>> compute shader a number of times instead of having a thread per
>> shader
>> invocation.
>>
>>
>> This will not work.
>>
>> Please, read again what the barrier() instruction does: When the
>> barrier() call is reached, _all_ threads within the workgroup are
>> supposed to be run until they reach that barrier() call.
>>
>>
>> to clarify, I had meant that each os thread would run the sections of
>> the shader between the barriers for all the shaders in a work group,
>> then, when it finished the work group, it would go to the next work
>> group assigned to the os thread.
>>
>> so, if our shader is:
>> a = b + tid;
>> barrier();
>> d = e + f;
>>
>> and our simd width is 4, our work-group size is 128, and we have 16 os
>> threads, then it will run for each os thread:
>> for(workgroup = os_thread_index; workgroup < workgroup_count; workgroup++)
>> {
>> for(tid_in_workgroup = 0; tid_in_workgroup < 128; tid_in_workgroup +=
>> 4)
>> {
>> ivec4 tid = ivec4(0, 1, 2, 3) + ivec4(tid_in_workgroup +
>> workgroup * 128);
>> a[tid_in_workgroup / 4] = ivec_add(b[tid_in_workgroup / 4], tid);
>> }
>> memory_fence(); // if needed
>> for(tid_in_workgroup = 0; tid_in_workgroup < 128; tid_in_workgroup +=
>> 4)
>> {
>> d[tid_in_workgroup / 4] = vec_add(e[tid_in_workgroup / 4],
>> f[tid_in_workgroup / 4]);
>> }
>> }
>> // after this, we run the next rendering or compute job
>>
>
> Okay good, that's the right concept.
>
> Actually doing that is not at all straightforward though: consider that
> the barrier() might occur inside a loop in the shader.
>
> So if you implemented that within the framework of llvmpipe, you'd make a
> lot of people very happy: it would allow finally adding compute shader
> support to llvmpipe. Mind you, that in itself would already be a pretty
> decent-sized project for GSoC!
>
> Cheers,
> Nicolai
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

2017-02-13 Thread Jacob Lifshay
forgot to add mesa-dev when I sent (again).
-- Forwarded message --
From: "Jacob Lifshay" <programmerj...@gmail.com>
Date: Feb 13, 2017 8:27 AM
Subject: Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc
To: "Nicolai Hähnle" <nhaeh...@gmail.com>
Cc:


>
> On Feb 13, 2017 7:54 AM, "Nicolai Hähnle" <nhaeh...@gmail.com> wrote:
>
> On 13.02.2017 03:17, Jacob Lifshay wrote:
>
>> On Feb 12, 2017 5:34 PM, "Dave Airlie" <airl...@gmail.com
>> <mailto:airl...@gmail.com>> wrote:
>>
>> > I'm assuming that control barriers in Vulkan are identical to
>> barriers
>> > across a work-group in opencl. I was going to have a work-group be
>> a single
>> > OS thread, with the different work-items mapped to SIMD lanes. If
>> we need to
>> > have additional scheduling, I have written a javascript compiler
>> that
>> > supports generator functions, so I mostly know how to write a llvm
>> pass to
>> > implement that. I was planning on writing the shader compiler
>> using llvm,
>> > using the whole-function-vectorization pass I will write, and
>> using the
>> > pre-existing spir-v to llvm translation layer. I would also write
>> some llvm
>> > passes to translate from texture reads and stuff to basic vector
>> ops.
>>
>> Well the problem is number of work-groups that gets launched could be
>> quite high, and this can cause a large overhead in number of host
>> threads
>> that have to be launched. There was some discussion on this in
>> mesa-dev
>> archives back when I added softpipe compute shaders.
>>
>>
>> I would start a thread for each cpu, then have each thread run the
>> compute shader a number of times instead of having a thread per shader
>> invocation.
>>
>
> This will not work.
>
> Please, read again what the barrier() instruction does: When the barrier()
> call is reached, _all_ threads within the workgroup are supposed to be run
> until they reach that barrier() call.
>
>
> to clarify, I had meant that each os thread would run the sections of the
> shader between the barriers for all the shaders in a work group, then, when
> it finished the work group, it would go to the next work group assigned to
> the os thread.
>
> so, if our shader is:
> a = b + tid;
> barrier();
> d = e + f;
>
> and our simd width is 4, our work-group size is 128, and we have 16 os
> threads, then it will run for each os thread:
> for(workgroup = os_thread_index; workgroup < workgroup_count; workgroup++)
> {
> for(tid_in_workgroup = 0; tid_in_workgroup < 128; tid_in_workgroup +=
> 4)
> {
> ivec4 tid = ivec4(0, 1, 2, 3) + ivec4(tid_in_workgroup + workgroup
> * 128);
> a[tid_in_workgroup / 4] = ivec_add(b[tid_in_workgroup / 4], tid);
> }
> memory_fence(); // if needed
> for(tid_in_workgroup = 0; tid_in_workgroup < 128; tid_in_workgroup +=
> 4)
> {
> d[tid_in_workgroup / 4] = vec_add(e[tid_in_workgroup / 4],
> f[tid_in_workgroup / 4]);
> }
> }
> // after this, we run the next rendering or compute job
>
>
>> > I have a prototype rasterizer, however I haven't implemented
>> binning for
>> > triangles yet or implemented interpolation. currently, it can handle
>> > triangles in 3D homogeneous and calculate edge equations.
>> > https://github.com/programmerjake/tiled-renderer
>> <https://github.com/programmerjake/tiled-renderer>
>> > A previous 3d renderer that doesn't implement any vectorization
>> and has
>> > opengl 1.x level functionality:
>> > https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
>> <https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>
>>
>> Well I think we already have a completely fine rasterizer and binning
>> and whatever
>> else in the llvmpipe code base. I'd much rather any Mesa based
>> project doesn't
>> throw all of that away, there is no reason the same swrast backend
>> couldn't
>> be abstracted to be used for both GL and Vulkan and introducing
>> another
>> just because it's interesting isn't a great fit for long term project
>> maintenance..
>>
>> If there are improvements to llvmpipe that need to be made, then that
>> is something
>> to possibly consider, but I'm not sure why a swrast vulkan needs a
>>

Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

2017-02-12 Thread Jacob Lifshay
forgot to add mesa-dev when I sent.
-- Forwarded message --
From: "Jacob Lifshay" <programmerj...@gmail.com>
Date: Feb 12, 2017 6:16 PM
Subject: Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc
To: "Dave Airlie" <airl...@gmail.com>
Cc:



On Feb 12, 2017 5:34 PM, "Dave Airlie" <airl...@gmail.com> wrote:

> I'm assuming that control barriers in Vulkan are identical to barriers
> across a work-group in opencl. I was going to have a work-group be a
single
> OS thread, with the different work-items mapped to SIMD lanes. If we need
to
> have additional scheduling, I have written a javascript compiler that
> supports generator functions, so I mostly know how to write a llvm pass to
> implement that. I was planning on writing the shader compiler using llvm,
> using the whole-function-vectorization pass I will write, and using the
> pre-existing spir-v to llvm translation layer. I would also write some
llvm
> passes to translate from texture reads and stuff to basic vector ops.

Well the problem is number of work-groups that gets launched could be
quite high, and this can cause a large overhead in number of host threads
that have to be launched. There was some discussion on this in mesa-dev
archives back when I added softpipe compute shaders.


I would start a thread for each cpu, then have each thread run the compute
shader a number of times instead of having a thread per shader invocation.


> I have a prototype rasterizer, however I haven't implemented binning for
> triangles yet or implemented interpolation. currently, it can handle
> triangles in 3D homogeneous and calculate edge equations.
> https://github.com/programmerjake/tiled-renderer
> A previous 3d renderer that doesn't implement any vectorization and has
> opengl 1.x level functionality:
> https://github.com/programmerjake/lib3d/blob/master/softrender.cpp

Well I think we already have a completely fine rasterizer and binning
and whatever
else in the llvmpipe code base. I'd much rather any Mesa based project
doesn't
throw all of that away, there is no reason the same swrast backend couldn't
be abstracted to be used for both GL and Vulkan and introducing another
just because it's interesting isn't a great fit for long term project
maintenance..

If there are improvements to llvmpipe that need to be made, then that
is something
to possibly consider, but I'm not sure why a swrast vulkan needs a from
scratch
raster implemented. For a project that is so large in scope, I'd think
reusing that code
would be of some use. Since most of the fun stuff is all the texture
sampling etc.


I actually think implementing the rasterization algorithm is the best part.
I wanted the rasterization algorithm to be included in the shaders, eg.
triangle setup and binning would be tacked on to the end of the vertex
shader and parameter interpolation and early z tests would be tacked on to
the beginning of the fragment shader and blending on to the end. That way,
llvm could do more specialization and instruction scheduling than is
possible in llvmpipe now.

so the tile rendering function would essentially be:

for(i = 0; i < triangle_count; i+= vector_width)
jit_functions[i](tile_x, tile_y, _setup_results[i]);

as opposed to the current llvmpipe code where there is a large amount of
fixed code that isn't optimized with the shaders.


> The scope that I intended to complete is the bare minimum to be vulkan
> conformant (i.e. no tessellation and no geometry shaders), so
implementing a
> loadable ICD for linux and windows that implements a single queue, vertex,
> fragment, and compute shaders, implementing events, semaphores, and
fences,
> implementing images with the minimum requirements, supporting a f32 depth
> buffer or a f24 with 8bit stencil, and supporting a yet-to-be-determined
> compressed format. For the image optimal layouts, I will probably use the
> same chunked layout I use in
> https://github.com/programmerjake/tiled-renderer/blob/master2/image.h#L59
,
> where I have a linear array of chunks where each chunk has a linear array
of
> texels. If you think that's too big, we could leave out all of the image
> formats except the two depth-stencil formats, the 8-bit and 32-bit integer
> and 32-bit float formats.
>

Seems like a quite large scope, possibly a bit big for a GSoC though,
esp one that
intends to not use any existing Mesa code.


most of the vulkan functions have a simple implementation when we don't
need to worry about building stuff for a gpu and synchronization (because
we have only one queue), and llvm implements most of the rest of the needed
functionality. If we leave out most of the image formats, that would
probably cut the amount of code by a third.


Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

2017-02-12 Thread Jacob Lifshay
I'm assuming that control barriers in Vulkan are identical to barriers
across a work-group in opencl. I was going to have a work-group be a single
OS thread, with the different work-items mapped to SIMD lanes. If we need
to have additional scheduling, I have written a javascript compiler that
supports generator functions, so I mostly know how to write a llvm pass to
implement that. I was planning on writing the shader compiler using llvm,
using the whole-function-vectorization pass I will write, and using the
pre-existing spir-v to llvm translation layer. I would also write some llvm
passes to translate from texture reads and stuff to basic vector ops.

I have a prototype rasterizer, however I haven't implemented binning for
triangles yet or implemented interpolation. currently, it can handle
triangles in 3D homogeneous and calculate edge equations.
https://github.com/programmerjake/tiled-renderer
A previous 3d renderer that doesn't implement any vectorization and has
opengl 1.x level functionality:
https://github.com/programmerjake/lib3d/blob/master/softrender.cpp

The scope that I intended to complete is the bare minimum to be vulkan
conformant (i.e. no tessellation and no geometry shaders), so implementing
a loadable ICD for linux and windows that implements a single queue,
vertex, fragment, and compute shaders, implementing events, semaphores, and
fences, implementing images with the minimum requirements, supporting a f32
depth buffer or a f24 with 8bit stencil, and supporting a
yet-to-be-determined compressed format. For the image optimal layouts, I
will probably use the same chunked layout I use in
https://github.com/programmerjake/tiled-renderer/blob/master2/image.h#L59 ,
where I have a linear array of chunks where each chunk has a linear array
of texels. If you think that's too big, we could leave out all of the image
formats except the two depth-stencil formats, the 8-bit and 32-bit integer
and 32-bit float formats.

As mentioned by Roland Mainz, I plan to implement it so all state is stored
in the VkDevice structure or structures created from VKDevice, so there are
no global variables that prevent the library from being completely
reentrant. I might have global variables for something like detecting cpu
features, but that will be protected by a mutex.

Jacob Lifshay

On Sun, Feb 12, 2017 at 3:14 PM Dave Airlie <airl...@gmail.com> wrote:

> On 11 February 2017 at 09:03, Jacob Lifshay <programmerj...@gmail.com>
> wrote:
> > I would like to write a software implementation of Vulkan for inclusion
> in
> > mesa3d. I wanted to use a tiled renderer coupled with llvm and either
> write
> > or use a whole-function-vectorization pass. Would anyone be willing to
> > mentor me for this project? I would probably only need help getting it
> > committed, and would be able to do the rest with minimal help.
>
> So I started writing a vulkan->gallium swrast layer
>
> https://cgit.freedesktop.org/~airlied/mesa/log/?h=not-a-vulkan-swrast
>
> with the intention of using it to prove a vulkan swrast driver on top
> of llvmpipe eventually.
>
> This was because I was being too lazy to just rewrite llvmpipe as a
> vulkan driver,
> and it seemed easier to just write the layer to investigate. The thing
> about vulkan is it
> already very based around the idea of command streams and parallel
> building/execution,
> so having the gallium/vulkan layer record a CPU command stream and execute
> that
> isn't going to be a large an overhead as doing something similiar with
> hw drivers.
>
> I got it working with softpipe after adding a bunch of features to
> softpipe, however to
> get it going with llvmpipe, there would need to be a lot of work on
> improving llvmpipe.
>
> Vulkan really wants images and compute shaders (i.e. it requires
> them), and so far we haven't got
> image and compute shader support for llvmpipe. There are a few threads
> previously on this,
> but the main problem with compute shader is getting efficient barriers
> working, which needs
> some kind of threading model, maybe llvm's coroutine support is useful
> for this we won't know
> until we try I suppose.
>
> I'd probably be happy to mentor on the project, but you'd want to
> define the scope of it pretty
> well, as there is a lot of work to get the non-graphics pieces even if
> you are just ripping stuff
> out of llvmpipe.
>
> Dave.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

2017-02-11 Thread Jacob Lifshay
by tiled renderer, I meant that I would split the render target into small
pieces, then, for each triangle, decide which pieces contains the triangle
and add that triangle to per-piece render lists. afterwards, I'd use the
constructed render lists and render all the parts of triangles in a piece,
then go to the next piece. Obviously, I'd use multiple threads that are all
rendering their separate pieces simultaneously. I'm not sure if you'd be
able to use the whole-function-vectorization pass with gallium3d, you'd
need to translate the shader to llvm ir and back. the
whole-function-vectorization pass would still output scalar code for
statically uniform values, llvm (as of 3.9.1) doesn't have a pass to
devectorize vectors where all elements are identical.
Jacob Lifshay

On Feb 11, 2017 11:11, "Roland Scheidegger" <srol...@vmware.com> wrote:

Am 11.02.2017 um 00:03 schrieb Jacob Lifshay:
> I would like to write a software implementation of Vulkan for inclusion
> in mesa3d. I wanted to use a tiled renderer coupled with llvm and either
> write or use a whole-function-vectorization pass. Would anyone be
> willing to mentor me for this project? I would probably only need help
> getting it committed, and would be able to do the rest with minimal help.
> Jacob Lifshay

This sounds like a potentially interesting project, though I don't have
much of an idea if it's feasible as gsoc.
By "using a tiled renderer" do you mean you want to "borrow" that,
presumably from either llvmpipe or openswr?
The whole-function-vectorization idea for shader execution looks
reasonable to me, just not sure if it will deliver good results. I guess
it would be nice if that could sort of be used as a replacement for the
current gallivm cpu shader execution implementation (used by both
llvmpipe and openswr).

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] removed report to vendor message when dri3 is not detected

2017-02-10 Thread Jacob Lifshay
fixes bug 99715

Signed-off-by: Jacob Lifshay <programmerj...@gmail.com>
---
 src/vulkan/wsi/wsi_common_x11.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_x11.c
index 64ba921..e092066 100644
--- a/src/vulkan/wsi/wsi_common_x11.c
+++ b/src/vulkan/wsi/wsi_common_x11.c
@@ -266,7 +266,6 @@ VkBool32 wsi_get_physical_device_xcb_presentation_support(
 
if (!wsi_conn->has_dri3) {
   fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
   return false;
}
 
@@ -315,7 +314,6 @@ x11_surface_get_support(VkIcdSurfaceBase *icd_surface,
 
if (!wsi_conn->has_dri3) {
   fprintf(stderr, "vulkan: No DRI3 support detected - required for 
presentation\n");
-  fprintf(stderr, "Note: Buggy applications may crash, if they do please 
report to vendor\n");
   *pSupported = false;
   return VK_SUCCESS;
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

2017-02-10 Thread Jacob Lifshay
I think vulkan is supposed to be reentrant already.
Jacob Lifshay

On Feb 10, 2017 3:38 PM, "Roland Mainz" <roland.ma...@nrubsig.org> wrote:

> On Sat, Feb 11, 2017 at 12:03 AM, Jacob Lifshay
> <programmerj...@gmail.com> wrote:
> > I would like to write a software implementation of Vulkan for inclusion
> in
> > mesa3d. I wanted to use a tiled renderer coupled with llvm and either
> write
> > or use a whole-function-vectorization pass. Would anyone be willing to
> > mentor me for this project? I would probably only need help getting it
> > committed, and would be able to do the rest with minimal help.
>
> Please do me a favour and implement the renderer in an reentrant way,
> i.e. no global variables (e.g. put all variables which are "global" in
> a "handle" struct which is then passed around, e.g. like libpam was
> implemented). This helps a lot with later multithreading and helps
> with debugging the code.
>
> 
>
> Bye,
> Roland
>
> --
>   __ .  . __
>  (o.\ \/ /.o) roland.ma...@nrubsig.org
>   \__\/\/__/  MPEG specialist, C&&& programmer
>   /O /==\ O\  TEL +49 641 3992797
>  (;O/ \/ \O;)
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] software implementation of vulkan for gsoc/evoc

2017-02-10 Thread Jacob Lifshay
I would like to write a software implementation of Vulkan for inclusion in
mesa3d. I wanted to use a tiled renderer coupled with llvm and either write
or use a whole-function-vectorization pass. Would anyone be willing to
mentor me for this project? I would probably only need help getting it
committed, and would be able to do the rest with minimal help.
Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev