Hi,
we've already implemented an open source library that integrates WebGL,
WebAudio, DeviceOrientation and gUM to create an easy to use VR/AR
framework that runs on Rift, Glass, mobile, tablet, pc, etc.
Here's an overview of our API.
https://buildar.com/awe/tutorials/intro_to_awe.js/index.html
And the project is in our github repos.
https://github.com/buildar/awe.js
Hope that's relevant.
PS: We're also working on a Depth Stream Extension proposal to add depth
camera support to gUM.
roBman
On 27/03/14 5:58 AM, Lars Knudsen wrote:
I think it could make sense to put stuff like this as an extension on
top of WebGL and WebAudio as they are the only two current APIs close
enough to the bare metal/low latency/high performance to get a decent
experience. Also - I seem to remember that some earlier generation VR
glasses solved the game support problem by providing their own GL and
Joystick drivers (today - probably device orientation events) so many
games didn't have to bother (too much) with the integration.
In theory - we could:
- extend (if needed at all) WebGL to provide stereo vision
- hook up WebAudio as is (as it supports audio objects, Doppler
effect, etc. similar to OpenAL
- hook up DeviceOrientation/Motion in Desktop browsers if a WiiMote,
HMD or other is connected
- hook up getUserMedia as is to the potential VR camera
..and make it possible to do low latency paths/hooks between them if
needed.
It seems that all (or at least most) of the components are already
present - but proper hooks need to be made for desktop browsers at
least (afaik .. it's been a while ;))
- Lars
On Wed, Mar 26, 2014 at 7:18 PM, Brandon Jones <[email protected]
<mailto:[email protected]>> wrote:
So there's a few things to consider regarding this. For one, I
think your ViewEvent structure would need to look more like this:
interface ViewEvent : UIEvent {
readonly attribute Quaternion orientation; // Where Quaternion is
4 floats. Prevents gimble lock.
readonly attribute float offsetX; // offset X from the calibrated
center 0 in millimeters
readonly attribute float offsetY; // offset Y from the calibrated
center 0 in millimeters
readonly attribute float offsetZ; // offset Z from the calibrated
center 0 in millimeters
readonly attribute float accelerationX; // Acceleration along
X axis in m/s^2
readonly attribute float accelerationY; // Acceleration along
Y axis in m/s^2
readonly attribute float accelerationZ; // Acceleration along
Z axis in m/s^2
}
You have to deal with explicit units for a case like this and not
clamped/normalized values. What would a normalized offset of 1.0
mean? Am I slightly off center? At the other end of the room? It's
meaningless without a frame of reference. Same goes
for acceleration. You can argue that you can normalize to 1.0 ==
9.8 m/s^2 but the accelerometers will happily report values
outside that range, and at that point you might as well just
report in a standard unit.
As for things like eye position and such, you'd want to query that
separately (no sense in sending it with every device), along with
other information about the device capabilities (Screen
resolution, FOV, Lens distortion factors, etc, etc.) And you'll
want to account for the scenario where there are more than one
device connected to the browser.
Also, if this is going to be a high quality experience you'll want
to be able to target rendering to the HMD directly and not rely on
OS mirroring to render the image. This is a can of worms in and of
itself: How do you reference the display? Can you manipulate a DOM
tree on it, or is it limited to WebGL/Canvas2D? If you can render
HTML there how do the appropriate distortions get applied, and how
do things like depth get communicated? Does this new rendering
surface share the same Javascript scope as the page that launched
it? If the HMD refreshes at 90hz and your monitor refreshes at
60hz, when does requestAnimationFrame fire? These are not simple
questions, and need to be considered carefully to make sure that
any resulting API is useful.
Finally, it's worth considering that for a VR experience to be
effective it needs to be pretty low latency. Put bluntly: Browser
suck at this. Optimizing for scrolling large pages of flat
content, text, and images is very different from optimizing for
realtime, super low latency I/O. If you were to take an Oculus
Rift and plug it into one of the existing browser/Rift demos
<https://github.com/Instrument/oculus-bridge> with Chrome, you'll
probably find that in the best case the rendering lags behind your
head movement by about 4 frames. Even if your code is rendering at
a consistent 60hz that means you're seeing ~67ms of lag, which
will result in a motion-sickness-inducing "swimming" effect where
the world is constantly catching up to your head position. And
that's not even taking into account the question of how well
Javascript/WebGL can keep up with rendering two high resolution
views of a moderately complex scene, something that even modern
gaming PCs can struggle with.
That's an awful lot of work for technology that, right now, does
not have a large user base and for which the standards and
conventions are still being defined. I think that you'll have a
hard time drumming up support for such an API until the technology
becomes a little more widespread.
(Disclaimer: I'm very enthusiastic about current VR research. If I
sound negative it's because I'm being practical, not because I
don't want to see this happen)
--Brandon
On Wed, Mar 26, 2014 at 12:34 AM, Brandon Andrews
<[email protected]
<mailto:[email protected]>> wrote:
I searched, but I can't find anything relevant in the
archives. Since pointer lock is now well supported, I think
it's time to begin thinking about virtual reality APIs. Since
this is a complex topic I think any spec should start simple.
With that I'm proposing we have a discussion on adding a head
tracking. This should be very generic with just position and
orientation information. So no matter if the data is coming
from a webcam, a VR headset, or a pair of glasses with eye
tracking in the future the interface would be the same. This
event would be similar to mouse move with a high sample rate
(which is why in the event the head tracking and eye tracking
are in the same event representing a user's total view).
interface ViewEvent : UIEvent {
readonly attribute float roll; // radians, positive is
slanting the head to the right
readonly attribute float pitch; // radians, positive is
looking up
readonly attribute float yaw; // radians, positive is
looking to the right
readonly attribute float offsetX; // offset X from the
calibrated center 0 in the range -1 to 1
readonly attribute float offsetY; // offset Y from the
calibrated center 0 in the range -1 to 1
readonly attribute float offsetZ; // offset Z from the
calibrated center 0 in the range -1 to 1, and 0 if not supported
readonly attribute float leftEyeX; // left eye X position
in screen coordinates from -1 to 1 (but not clamped) where 0
is the default if not supported
readonly attribute float leftEyeY; // left eye Y position
in screen coordinates from -1 to 1 (but not clamped) where 0
is the default if not supported
readonly attribute float rightEyeX; // right eye X
position in screen coordinates from -1 to 1 (but not clamped)
where 0 is the default if not supported
readonly attribute float rightEyeY; // right eye Y
position in screen coordinates from -1 to 1 (but not clamped)
where 0 is the default if not supported
}
Then like the pointer lock spec the user would be able to
request view lock to begin sampling head tracking data from
the selected source. There would thus be a view lock change event.
(It's not clear how the browser would list which sources to
let the user choose from. So if they had a webcam method that
the browser offered and an Oculus Rift then both would show
and the user would need to choose).
Now for discussion. Are there any features missing from the
proposed head tracking API or features that VR headsets offer
that need to be included from the beginning? Also I'm not sure
what it should be called. I like "view lock", but it was my
first thought so "head tracking" or something else might fit
the scope of the problem better.
Some justifications. The offset and head orientation are self
explanatory and calibrated by the device. The eye offsets
would be more for a UI that selects or highlights things as
the user moves their eyes around. Examples would be a web
enabled HUD on VR glasses and a laptop with a precision
webcam. The user calibrates with their device software which
reports the range (-1, -1) to (1, 1) in screen space. The
values are not clamped so the user can look beyond the
calibrated ranges. Separate left and right eye values enable
precision and versatility since most hardware supporting eye
tracking will have raw values for each eye.
--
Rob
Checkout my new book "Getting started with WebRTC" - it's a 5 star hit
on Amazon http://www.amazon.com/dp/1782166300/?tag=packtpubli-20
CEO & co-founder
http://MOB-labs.com
Chair of the W3C Augmented Web Community Group
http://www.w3.org/community/ar
Invited Expert with the ISO, Khronos Group & W3C