First of all, thanks for your efforts and for sending them for public review, it is much appreciated. Took me a while to go through, but I hope the below is useful. I've got a couple of higher-level comments, and a set of specific ones in-line. I've read the various blog posts but in the end what matters is the spec here so I concentrate on that only. I'm gonna answer some of the bits out of context too, just as a warning.
* What's not clear is how exactly this incorporates with the XI2.1 extension proposal I sent the draft out for earlier. Is this supposed to be parallel or exclusive, i.e will a set of touchpoints that generate a gesture still be sent as touchpoints to the client? or will they be converted and the raw touchpoint data thus discarded? Or is it to replace the other draft completely (I don't think so, given section 1) * If this supplements the XI2.1 MT proposal, how does gesture recognition work when there's a grab active on any device? * What kind of event stream is sent to the gesture engine (GE)? it just says "stream of events" but doesn't specify anything more. * Why does the GE need to be registered by the X server? Kristian asked that already and I don't think Section 3 answers this sufficiently. I'll be upfront again and re-state what I've said in the past - gestures do IMO not belong into the X server, so consider my view biased in this regard. The X protocol is not the ideal vehicle for gesture recognition and inter-client communication, especially given it's total lack of knowledge of user and/or session. I think the same goal could be achieved by having a daemon that communicates out-of-band with the applications to interpret gestures as needed, in the spirit of e.g. ibus. In your reply to Kristian, you said : > Also, we think that there's a case to be made for environmental gestures > that should override gestures recognized by clients. This is provided by > the mutual exclusion flag when selecting for events. Again, this > wouldn't be possible without integrating it into the X propagation > mechanism. http://lists.freedesktop.org/archives/xorg-devel/2010-August/012045.html Unless I screwed up, the passive grab mechanism for touches allows you to intercept touchpoints and interpret them as gestures as required. The mutex mask described below is like a synchronised passive grab mask. The major difference here is that unlike a truly synchronised grab, the events are being continuously sent to the client until the replay is requested (called "flushing" here in this document). I think this requirement can be folded into the XI2.1 draft. > Gesture primitives comprise events such as pans, pinches, and rotatation. question: what are some more gestures that are not pan/swipe (horiz or vert), pinch/zoom and rotation? Can you think of any? the problem with gestures is that there's a million of them, but at the same time there's only a handful that are "obvious". beyond that you're in unchartered territory and _highly_ context dependent (at this point I'd like to thank Daniel Wigdor for a highly interesting talk I recently attended that outlined these issues) One example (though gestures are notoriously difficult to explain via email): I have a map application like google maps. I put two fingers down and move them apart. What does the app do? Zoom? No, what I'd like it to do is to move the two waypoints I touched to two different locations. But only the app that controls these objects can know that. The app needs to decide whether this particular event is to be a gesture or not. The current proposal allows for an all-or-nothing approach, so the app above cannot support this particular type of interaction or not supporting the X-provided gestures. A more flexible approach would be to have the app hand events to the GE, thus making gestures a last resort if no other meaningful interaction can be performed. (Exactly the inverse of the AllowEvents approach which only flushes the events once no gesture has been recognised) > When a gesture engine recognizes a primitive that occurs within a window > with a client that selected to receive events for the primitive, a gesture > event is sent to the client. What happens if a group of touches represent a gesture but another group of touches present are not represented as part of a gesture? Use-case: I pinch to zoom while trying to rotate with another finger in the same app. > It is necessary for a gesture engine to recognize mutually exclusive gesture > primitives for each set of primitives defined for a given number of touches. In light of the often-used pinch and rotate gestures - how do you plan to handle this then? Require the user to lift the fingers before rotating the photo they just zoomed? I doubt so, but I don't quite know how this is supposed to work given the details here. On Mon, Aug 16, 2010 at 11:13:20AM -0400, Chase Douglas wrote: > > The X Gesture Extension > Version 1.0 1.0 is optimistic if you haven't had any exposure yet, I recommend changing this to something pre 1.0, especially given the massive draft proposal warning just below. > ******************************************************************************** > ******************************************************************************** > ************************** > *************************** > ************************** DRAFT PROPOSAL (v1) > *************************** > ************************** > *************************** > ******************************************************************************** > ******************************************************************************** > > > 1. Introduction > > The X Gesture Extension is a mechanism to provide the following: > - Interface for X clients to register and receive gesture primitive events > - Interface for an X client to act as a gesture engine > > Gestures may be seen as logical groupings of multitouch input events. Thus, > this extension is dependent on the X Input Extension version 2.1, which > implements multitouch input support. > > ❧❧❧❧❧❧❧❧❧❧❧ > > 2. Notations used in this document <skip> > > ❧❧❧❧❧❧❧❧❧❧❧ > 3. Data types > > DEVICE { DEVICEID, AllDevices } > A DEVICE specifies either an X Input DEVICEID or AllDevices. AllMasterDevices is missing here, and in the other mentions below. > GESTUREID { CARD16 } > A GESTUREID is a numerical ID for an X Gesture type currently > available > in the server. The server may re-use a gesture ID if available gesture > types change. > > GESTUREFLAG { MUTEX } > A flag telling the server to not propagate gestures to child clients > when a gesture type in the associated mask is set. The gesture will > only > be sent to the registering client. > When registering for gestures, a client using this flag may not listen > for gesture IDs that any other client has registered for with the > MUTEX > flag. > > GESTUREMASK > A GESTUREMASK is a binary mask defined as (1 << gesture ID). A > SETofGESTUREMASK is a binary OR of zero or more GESTUREMASK. > > GESTUREPROP { property: ATOM > property_type: ATOM } > A GESTUREPROP is the definition of a single property of a gesture. The > property field specifies a label for the gesture. The property_type > field specifies the data type of the property. For example, the > property > type may be the atom representing "FLOAT" for an IEEE 754 32-bit > representation of a floating point number. Where applicable, both the > property and the type should conform to the standard gesture > definitions. > > EVENTRANGE { start: CARD16 > end: CARD16 } > An EVENTRANGE specifies a range of event sequence numbers, > inclusively. > > ❧❧❧❧❧❧❧❧❧❧❧ > 3. Gesture Primitives and Events btw, two sections 3 here. > The XInput protocol through version 2.0 supports single touch input devices. > Beginning with version 2.1, XInput will support multitouch input devices. The > protocol extension provides a mechansim for selection and propagation of > multitouch events to windows they occur within. The selection and propagation > is performed per touch; if a user touches a parent window and a child window > with separate touches simultaneously, and both windows have input events > selected for, the touch events will be sent to their respective windows. This > functionality is useful for general multitouch events, but is ill-suited for > user interface paradigms that require groupings of touches to be interpreted > as > a single unit. > > The X Gesture extension aims to provide for selection and propagation of > gesture > primitives. Gesture primitives can be best thought of as fundamental > multitouch > input groupings that convey specific meaning. Gesture primitives comprise > events > such as pans, pinches, and rotatation. ^ typo, "rotation" > Primitives are also defined by the number > of touches that comprise them: two finger pans and three finger pans are > separate > primitives. > > 3.1 Gesture Event Selection and Propagation > > The mechanism for gesture selection and propagation follows the typical X > window > event model. An X client requests for gesture events for primitives it selects > for on a given window. When a gesture engine recognizes a primitive that > occurs > within a window with a client that selected to receive events for the > primitive, > a gesture event is sent to the client. > > Proper selection and propagation of new gesture events depends on a few > variables: where each touch event occurs, which clients requested for a > recognized primitive, and if any clients hold primitive mutual exclusion for > the > window. There are two gesture primitive selections a client can make: > initiation > and continuation selections. > > 3.1.1 Gesture Life > > The lifetime of a gesture is comprised of its beginning, continuation, and > ending. During this time, a gesture may change primitive forms. For example, a > user may initiate a gesture with a three finger pan, but then continue the > same > gesture by lifting one of the fingers. When a client makes a gesture event > selection request, the initiation selection is used to determine which clients > may begin receiving a newly recognized gesture. The continuation selection is > used to determine whether a gesture continues or ends when the primitive form > changes. Thus, a gesture may continue even if the number of fingers or the > type > of gesture primitve changes. > > 3.1.2 Gesture Mutual Exclusion > > Clients can request to receive gesture events on any windows, including > windows > it does not own. This level of control provides for many use cases, but it > limits the ability of gestures to be controlled in an environmental manner. I think I know what you mean here but "gestures that apply to the whole desktop" may be a better way to put it. the term "environmental gestures" is confusing, to me anyway. > For example, a window manager may wish to define gestures that should > always be available to a user, regardless of what other clients select > for. > The X Gesture extension provides a mechanism for mutual exclusion of gestures. > When a client registers for a selection of gestures and uses the MUTEX flag in > the request, it may be granted mutual exclusion for the selected gestures on > the > window and all its children. At any given moment, only one client may select > for > mutual exclusion on each window. Any gesture occurring within the window and > its > children that matches a mutually exclusive client gesture initiation selection > will only be sent to the mutually exclusive client. Further, mutually > exclusive > gesture initiation selections of parent windows are given priority over their > child windows. Note that mutually exclusive gesture propagation is in opposite > priority ordering from non-mutually exclusive gesture propagation. If I understand this correctly, this is the same principle we use for passive grabs, is this correct? If so, I recommend chosing the same wording for consistency with other specs. > 3.1.3 Gesture Event Initiation > > Each touch event location occurs within a hierarchy of windows from the child > window, the top-most window the touch occurred in, and the root window of the > screen in which the touch event occurred. The common ancestry of all touch > events is used for propagation. > > The common ancestry is traversed from child windows to parent windows to find > the first window with a client selecting for initiation of the gesture > primitive > comprising the touches. The first window meeting this criteria is the normal > event window. > > Propagation continues through the ancestry tree to find the eldest window > with a > client holding a mutual exclusion flag for the gesture primitive. If such a > window and client are found, the event is sent to the client exclusively. > Otherwise, the event is sent to all clients of the normal event window that > have > requested for the gesture primitive. > > 3.1.4 Gesture Event Continuation > > After clients are selected to receive events, any future gesture events for > the > same gesture primitive are sent to them exclusively, until the gesture > primitive > changes or the all touches end. > > When a gesture primitive during the lifetime of a gesture changes, the set of > all clients selected for the gesture is checked for their continuation > selection. If the new gesture primitive type is within the continuation > selection for a client, it will continue to receive gesture events with the > new > primitive. Otherwise, the gesture will end for the client, and the client will > be removed from the set of clients selected for receiving gesture events. Once > all clients have ended their gesture events, new gesture events can occur > through the gesture event initiation process. > > ❧❧❧❧❧❧❧❧❧❧❧ > 4. Gesture Primitive Events > > Gesture primitive events provide a complete picture of the gesture and the > touches that comprise it. Each gesture provides typical X data such as the > window ID of the window the gesture occurred within, and the time when the > event > occurred. The gesture specific data includes information such as the focus > point > of the event, the location and IDs of each touch comprising the gesture, the > properties of the gesture, and the status of the gesture. > > The focus point is a point computed by the gesture recognition engine to > provide > context about where an event occurred. For example, the focus point of a > rotation primitive is the pivot point of the rotation. The location and IDs of > each touch point can be used to map each touch to a point on screen. The > properties of a gesture event are dependent on the gesture primitive type. For > example, a pan gesture has properties such as the change in the X and Y > coordinates. The status of the gesture defines the state of the gesture > through > its lifetime. > > ❧❧❧❧❧❧❧❧❧❧❧ > 5. Gesture Engine > > A gesture engine is a client of X that recognizes gesture primitives. Only one > engine may be registered at any moment. When the gesture engine registers, it > provides the server with the set of gesture primitives it recognizes. A > Primitive is defined by its name and properties. The name and properties are > defined by strings, represented as X atoms in the protocol, and a property > has a > type that is also defined by a string in the same manner. For example, the > type > of a property may be "FLOAT" to designate an IEEE 754 floating point binary > format of the property data. > > When a gesture engine registers or unregisters, the primitives it can > recognize > become available for other clients to select. Clients will be sent gesture > availability update events if they have registered for them or if they are > currently selecting for any gesture primitives. In the latter case, gesture > primitive selections will be unregistered. An exception is made for mutual > exclusion clients: their event selections are reset to an empty set, but not > unregistered. > > It is necessary for a gesture engine to recognize mutually exclusive gesture > primitives for each set of primitives defined for a given number of touches. > This is a limitation required for proper selection and propagation of > gestures. > For example, two-finger pan, pinch, and rotate are mutually exclusive. A pan > cannot also be a pinch nor a rotate. However, a gesture engine should not > provide for pan, pinch, rotate, and a separate gesture primitive that is a > combination of some or all of the other primitives. > > 5.1 Gesture Engine Operation > > Once a gesture engine is registered, it will begin receiving a stream of > events > from the X server. The events are buffered inside the server until a request > by > the engine is received with instructions for how to handle the events. > > When the engine recognizes a gesture primitive, it sends a gesture event to > the > server with the set of input event sequence numbers that comprise the gesture > primitive. The server then selects and propagates the gesture event to > clients. > If clients were selected for propagation, the input events comprising the > gesture primitive are discarded. Otherwise, the input events are released to > propagate through to clients as normal XInput events. > > When the engine does not recognize any gesture primitive for a set of input > events, it sends a request to the server to release the input events to > propagate through to clients as normal XInput events. > > The server may set a timeout for receiving requests from the gesture engine. > If > no request from the engine is received within the timeout period, the server > may > release input events to propagate through to clients as normal XInput events. > The timeout value is implementation specific. any guesses as to what this timeout may be? do we have any data on how long the average gesture takes to be recognised? As I read this at the moment, this approach means that _any_ touch event is delayed until the server gets the ok from the GE. The passive grab approach for gesture recognition also means that any event is delayed if there is at least one client that wants gestures on the root window. What's the impact on the UI here? > ❧❧❧❧❧❧❧❧❧❧❧ > 6. Errors > > Errors are sent using core X error reports. > > GestureEngineRegistered > A client attempts to register a gesture engine while an engine is > already registered. This error isn't really necessary, you might as well return BadMatch from that one request that may generate the error. It's the semantics that count anyway. > ❧❧❧❧❧❧❧❧❧❧❧ > 7. Requests: <skip> > 7.1 General Client Requests > > ┌─── > GestureQueryVersion <skip> > ┌─── > GestureQueryAvailableGestures ListGestures would be more explanatory > ▶ > num_gestures: CARD16 > gestures: LISTofGESTUREMAP > └─── > > GESTUREMAP { gesture_id: GESTUREID > gesture_type: ATOM } > > GestureQueryAvailableGestures details information about the gesture types > that are recognizeable by the gesture recognizer registered with the > server. > > Each gesture is detailed as follows: > gesture_id > The unique ID of the gesture. Gesture IDs may be re-used when the > available gestures to be recognized changes. > gesture_type > An ATOM specifying the type of gesture. The gesture type should > conform > to the list of standard gesture types if applicable. Why do we need id and type? Can there be more ID's of one type? The QueryProperties request doesn't allow for per-id selection, so the ID seems superfluous, it might as well be just the type, right. > > ┌─── > GestureListenForGestureChanges > listen: CARD8 > └─── tbh I got confused by the description below and I'm not sure what it actually does. though if I understand it correctly, this request should be folded into the SelectEvents request. > GestureListenForGestureChanges registers the client to be sent a > GesturesChanged event. > > listen > Inequal to 0 if the client wants to receive GestureChanged events I don't quite see why this is necessary. If a client doesn't care, let it not register the event mask, otherwise just send the event and let the client ignore it. A simple bit in SelectEvents is enough here. > > On receipt of a GestureChanged event, the client may send the > GestureQueryAvailableGestures request to receive the new list of available > gestures. > Note that all clients listening for any gesture events on any window will > receive a GestureChanged event regardless of whether they have called this > request with any value for listen. However, the server will honor the last > listen value sent in this request whenever the client is not listening for > gesture events. I'm confused... > ┌─── > GestureQueryGestureProperties > gesture_type: ATOM > ▶ > num_properties: CARD16 > properties: ListofGESTUREPROP > └─── > > GestureQueryGestureProperties details properties of the requested gesture > type. > > ┌─── > GestureSelectEvents > window: Window > device_id: CARD16 > flags: SETofGESTUREFLAG > mask_len: CARD16 > init_mask: GESTUREMASK > cont_mask: GESTUREMASK > └─── > > window > The window to select the events on. > device_id > Numerical device ID, or AllDevices. > flags > Flags that may affect gesture recognition, selection, or propagation. > mask_len > Length of mask in 4 byte units. > init_mask > Gesture mask for initiation. A gesture mask for an event type T is > defined as (1 << T). Don't do this (1 << Y) thing, this wasn't one of the smarter decisions in XI2. Simply define the masks as they are, don't bind them to event types. Though it hasn't become a problem yet, I already ran into a few proposals where this would either be too inflexible or would create holes in the mask sets (latter not really a problem, but...). > cont_mask > Gesture mask for continuation. A gesture mask for an event type T is > defined as (1 << T). > > GestureSelectEvents selects for gesture events on window. > > The mask sets the (and overwrites a previous) gesture event mask for the > DEVICE specified through device_id. The device AllDevices is treated as a > separate device by server. A client's gesture mask is the union of > AllDevices and the per-device gesture mask. I'd add a reference to the XI2 definition of AllDevices and AllMasterDevices event mask handling here to avoid duplicating (and possibly accidentally changing) the definition > The removal of a device from the server unsets the gesture masks for the > device. If a gesture mask is set for AllDevices, the gesture mask is not > cleared on device removal and affects all future devices. > > If mask_len is 0, the gesture mask for the given device is cleared. > However, > a client requesting for mutual exclusion may register for any valid > mask_len > length of mask with all bits set to 0. This allows a mutual exclusion > client > to prohibit any other client from gaining exclusive privilege. > > ┌─── > GestureGetSelectedEvents <skip> > ┌─── > GestureGetAllSelectedEvents > window: Window > ▶ > num_masks: CARD8 > masks: LISTofCLIENTEVENTMASK > └─── > > CLIENTEVENTMASK { client_id: CLIENT, > device_id: DEVICE, > mask_len: CARD8, > init_mask: GESTUREMASK > cont_mask: GESTUREMASK } > <skip> > > GestureGetAllSelectedEvents retrieves the gesture selections for all > clients > on the given window. Is there a specific need for this request? > 7.2 Gesture engine client requests > > ┌─── > GestureEngineRegister > num_gestures: CARD16 > gestures: LISTofGESTUREINFO > └─── > > GESTUREINFO { gesture_type: ATOM, > num_properties: CARD16, > properties: LISTofGESTUREPROP } > > GestureEngineRegister is the mechanism by which a gesture engine registers > with the X Gesture extension to be able to process gesture events. Only > one > gesture engine may be registered to the server at any given time. Further > registration requests will cause a GestureEngineRegistered error. > When the gesture engine is registered successfully, a GesturesChanged > event > is sent to all clients registered to listen for the event. The clients may > then request the new list of available gestures from the server. > > ┌─── > GestureEngineUnregister > └─── > > GestureEngineUnregister unregisters the gesture engine from the server. If > the client has not registered a gesture engine successfully through the > GestureEngineRegister request, a BadValue error will result. Otherwise, a > GesturesChanged event will be sent to all clients registered to listen for > the event. > > ┌─── > GestureAllowInputEvents > num_ranges: CARD16 > ranges: LISTofEVENTRANGE > └─── > > GestureAllowInputEvents instructs the server to flush the input events to > clients unmodified. This is used when no gestures are recognized from > sequences of input events. > If any of the EVENTRANGE values are invalid, the BadValue error is > reported > and no input events are flushed. > > ┌─── > GestureRecognized > num_ranges: CARD16 > ranges: LISTofEVENTRANGE sequence numbers are not suited for this type of range. the sequence number set in the EVENTHEADER only increments if additional requests are processed. For clients that purely listen to events and e.g. dump them into a file, the sequence number does not change after the first set of requests. > gesture_id: CARD16 > gesture_instance: CARD16 > device_id: CARD16 > root_x: Float > root_y: Float > event_x: Float > event_y: Float probably better to use the same type as in the XI2 spec. > status: STATUS > num_touches: CARD8 > num_properties: CARD8 > touches: TOUCHES > properties: LISTofCARD32 > └─── > > See the GestureNotify event definition below for gesture data definitions. > > GestureRecognized instructs the server to perform propagation and > selection > for a potential GestureNotify event to be sent to clients. If clients are > selected to receive events, the input events specified in the event ranges > must be discarded by the server. Otherwise, the input events must be > propagated to clients as defined by the XInput protocol. > > 8. Events: <skip> > The following event types are available in X Gesture. > > GesturesChanged > GestureNotify > > All events have a set of common fields specified as EVENTHEADER. > > EVENTHEADER { type: BYTE > extension: BYTE > sequenceNumber: CARD16 > length: CARD32 > evtype: CARD16 } > > type > Always GenericEvent. > extension > Always the X Gesture extension offset. > sequenceNumber > Sequence number of last request processed by the server. > length > Length in 4-byte units after the initial 32 bytes. > evtype > X Gesture-specific event type. > > > ┌─── > GesturesChanged: > EVENTHEADER > └─── > > A GesturesChanged event is sent whenever the gestures available from the > server have changed. A client receiving this event may choose to request > the > new list of available gestures from the server. > > ┌─── > GestureNotify: > EVENTHEADER > gesture_id: CARD16 > gesture_instance: CARD16 > device_id: CARD16 > time: Time > root: Window > event: Window > child: Window > root_x: Float > root_y: Float > event_x: Float > event_y: Float > status: STATUS > num_touches: CARD8 > num_properties: CARD8 > touches: TOUCHES > properties: LISTofCARD32 > └─── > > STATUS { Begin, Update, Continue } > > TOUCHES { x: FLOAT > y: FLOAT > valuators_len CARD16 > valuators LISTofFP3232 } > > x > y > Location of touch in screen coordinates. > valuators_len > Number of values in axisvalues. > axisvalues need to pick one of axisvalues or valuators. > Values of all valuators for the touch. Valuators are defined in the > XInput protocol specification. The specific meaning of each valuator > is > specific to the input device. > > A GestureNotify event is generated whenever a gesture occurs in a window > for > which the client has requested the gesture ID. > > gesture_id > Gesture ID of the gesture type. > gesture_instance > Unique ID of this gesture instance. The ID will be maintained as the > gesture progresses from start to end as signified in the status field. > This value is monotonically increased by the server for every gesture > causing events to be sent to a client. The ID will only be reused once > a 32-bit wrap occurs. > device_id > X Input device ID of the slave device generating this gesture. why the slave device? why not use device_id and source_id? I think that's all I have for now, but I guess more will come up in the discussion. Thanks again for this draft! Cheers, Peter _______________________________________________ [email protected]: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
