Re: [PATCH v3 5/5] drm/i915: Implement proper fallback training for MST

2018-03-12 Thread Lyude Paul
On Mon, 2018-03-12 at 15:05 -0700, Manasi Navare wrote:
> On Fri, Mar 09, 2018 at 04:32:31PM -0500, Lyude Paul wrote:
> > For a while we actually haven't had any way of retraining MST links with
> > fallback link parameters like we do with SST. While uncommon, certain
> > setups such as my Caldigit TS3 + EVGA MST hub require this since
> > otherwise, they end up getting stuck in an infinite MST retraining loop.
> > 
> > MST retraining is somewhat different then SST retraining. While it's
> > possible during the normal link retraining sequence for a hub to indicate
> > bad link status, it's also possible for a hub to only indicate this
> > status through ESI messages and it's possible for this to happen after
> > the initial link training succeeds. This can lead to a pattern that
> > looks like this:
> > 
> > - Train MST link
> > - Training completes successfully
> > - MST hub sets Channel EQ failed bit in ESI
> > - Retraining starts
> > - Retraining completes successfully
> > - MST hub sets Channel EQ failed bit in ESI again
> > - Rinse and repeat
> > 
> > In these situations, we need to be able to actually trigger fallback
> > link training from the ESI handler as well, along with using the ESI
> > handler during retraining to figure out whether or not our retraining
> > actually succeeded.
> > 
> > This gets a bit more complicated since we have to ensure that we don't
> > block the ESI handler at all while doing retraining. If we do, due to
> > DisplayPort's general issues with being sensitive to IRQ latency most
> > MST hubs will just stop responding to us if their interrupts aren't
> > handled in a timely manner.
> > 
> > So: move retraining into it's own seperate handler. Running in a
> > seperate handler allows us to avoid stalling the ESI during link
> > retraining, and we can have the ESI signal that the channel EQ bit was
> > cleared through a simple completion struct. Additionally, we take care
> > to stick as much of this into the SST retraining path as possible since
> > sharing is caring.
> > 
> 
> Thanks for the patch for MST retraining. So just to confirm my understanding
> of the
> cases where MS retraining is handled:
> 1. On link the first link training failure during the modeset, this would
> just
> use SST modeset retry function and set the link status to BAD through
> drm_dp_mst_topology_mgr_lower_link_rate()
This shoyuld be the  the case for hubs that are a bit less awkward then mine.
I haven't actually seen the link training fail during the initial modeset once
on my setup here, only the channel EQ bit in the ESI handler ever seems to get
set.
> 
> 2. In case that it suceeds here but then loses synchronization in between,
> that time it will send IRQ_HPD and
> indicate this through ESI and the way its handled is through
> intel_dp_mst_check_link_status() and then through
> the separate mst_retrain_link work. And this time we first try to retrain at
> the current values for 5 times and
> then fallback and retry by sending hotplug uevent.
Yes. As well, there's two ways we could run into a situation that would count
as a failure:

 * (Note, this doesn't happen because I forgot to include it in this patch
   series, but it'll be fixed in the next revision) If the hub does everything
   it's supposed to and actually reports the link training status as failing
   through the registers that intel_dp_start_link_train() relies on, then
   intel_dp_start_link_train() will try five times; fail; and then we'll skip
   any additional attempts in intel_dp_retrain_link() and start the modeset
   retry work.
 * If the hub doesn't do everything that it's supposed to like mine does and
   only reports channel EQ failures through the ESI handler, we'll end up
   successfully link training; time out waiting for the ESI handler to signal
   through mst_retrain_completion that the channel EQ bit has been cleared,
   and repeat five times until we give up and fall back to a lower link rate
   with the modeset retry work.

> 
> Is this correct?
> 
> Manasi
> 
> > Signed-off-by: Lyude Paul 
> > Cc: Manasi Navare 
> > Cc: Ville Syrjälä 
> > ---
> >  drivers/gpu/drm/i915/intel_dp.c | 342 +++--
> > ---
> >  drivers/gpu/drm/i915/intel_dp_mst.c |  42 -
> >  drivers/gpu/drm/i915/intel_drv.h|   8 +
> >  3 files changed, 302 insertions(+), 90 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_dp.c
> > b/drivers/gpu/drm/i915/intel_dp.c
> > index 5645a194de92..7626652732b6 100644
> > --- a/drivers/gpu/drm/i915/intel_dp.c
> > +++ b/drivers/gpu/drm/i915/intel_dp.c
> > @@ -45,6 +45,8 @@
> >  
> >  #define DP_DPRX_ESI_LEN 14
> >  
> > +#define DP_MST_RETRAIN_TIMEOUT (msecs_to_jiffies(100))
> > +
> >  /* Compliance test status bits  */
> >  #define INTEL_DP_RESOLUTION_SHIFT_MASK 0
> >  #define INTEL_DP_RESOLUTION_PREFERRED  (1 <<
> > INTEL_DP_RESOLUTION_SHIFT_MASK)
> > @@ -4224,6 +4226,118 

Re: [PATCH v3 5/5] drm/i915: Implement proper fallback training for MST

2018-03-12 Thread Manasi Navare
On Fri, Mar 09, 2018 at 04:32:31PM -0500, Lyude Paul wrote:
> For a while we actually haven't had any way of retraining MST links with
> fallback link parameters like we do with SST. While uncommon, certain
> setups such as my Caldigit TS3 + EVGA MST hub require this since
> otherwise, they end up getting stuck in an infinite MST retraining loop.
> 
> MST retraining is somewhat different then SST retraining. While it's
> possible during the normal link retraining sequence for a hub to indicate
> bad link status, it's also possible for a hub to only indicate this
> status through ESI messages and it's possible for this to happen after
> the initial link training succeeds. This can lead to a pattern that
> looks like this:
> 
> - Train MST link
> - Training completes successfully
> - MST hub sets Channel EQ failed bit in ESI
> - Retraining starts
> - Retraining completes successfully
> - MST hub sets Channel EQ failed bit in ESI again
> - Rinse and repeat
> 
> In these situations, we need to be able to actually trigger fallback
> link training from the ESI handler as well, along with using the ESI
> handler during retraining to figure out whether or not our retraining
> actually succeeded.
> 
> This gets a bit more complicated since we have to ensure that we don't
> block the ESI handler at all while doing retraining. If we do, due to
> DisplayPort's general issues with being sensitive to IRQ latency most
> MST hubs will just stop responding to us if their interrupts aren't
> handled in a timely manner.
> 
> So: move retraining into it's own seperate handler. Running in a
> seperate handler allows us to avoid stalling the ESI during link
> retraining, and we can have the ESI signal that the channel EQ bit was
> cleared through a simple completion struct. Additionally, we take care
> to stick as much of this into the SST retraining path as possible since
> sharing is caring.
>

Thanks for the patch for MST retraining. So just to confirm my understanding of 
the
cases where MS retraining is handled:
1. On link the first link training failure during the modeset, this would just
use SST modeset retry function and set the link status to BAD through 
drm_dp_mst_topology_mgr_lower_link_rate()

2. In case that it suceeds here but then loses synchronization in between, that 
time it will send IRQ_HPD and
indicate this through ESI and the way its handled is through 
intel_dp_mst_check_link_status() and then through
the separate mst_retrain_link work. And this time we first try to retrain at 
the current values for 5 times and
then fallback and retry by sending hotplug uevent.

Is this correct?

Manasi

> Signed-off-by: Lyude Paul 
> Cc: Manasi Navare 
> Cc: Ville Syrjälä 
> ---
>  drivers/gpu/drm/i915/intel_dp.c | 342 
> +++-
>  drivers/gpu/drm/i915/intel_dp_mst.c |  42 -
>  drivers/gpu/drm/i915/intel_drv.h|   8 +
>  3 files changed, 302 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> index 5645a194de92..7626652732b6 100644
> --- a/drivers/gpu/drm/i915/intel_dp.c
> +++ b/drivers/gpu/drm/i915/intel_dp.c
> @@ -45,6 +45,8 @@
>  
>  #define DP_DPRX_ESI_LEN 14
>  
> +#define DP_MST_RETRAIN_TIMEOUT (msecs_to_jiffies(100))
> +
>  /* Compliance test status bits  */
>  #define INTEL_DP_RESOLUTION_SHIFT_MASK   0
>  #define INTEL_DP_RESOLUTION_PREFERRED(1 << 
> INTEL_DP_RESOLUTION_SHIFT_MASK)
> @@ -4224,6 +4226,118 @@ static void intel_dp_handle_test_request(struct 
> intel_dp *intel_dp)
>   DRM_DEBUG_KMS("Could not write test response to sink\n");
>  }
>  
> +/* Get a mask of the CRTCs that are running on the given intel_dp struct. For
> + * MST, this returns a crtc mask containing all of the CRTCs driving
> + * downstream sinks, for SST it just returns a mask of the attached
> + * connector's CRTC.
> + */
> +int
> +intel_dp_get_crtc_mask(struct intel_dp *intel_dp)
> +{
> + struct drm_device *dev = dp_to_dig_port(intel_dp)->base.base.dev;
> + struct drm_connector *connector;
> + struct drm_connector_state *conn_state;
> + struct intel_connector *intel_connector;
> + struct drm_crtc *crtc;
> + int crtc_mask = 0;
> +
> + WARN_ON(!drm_modeset_is_locked(>mode_config.connection_mutex));
> +
> + if (intel_dp->is_mst) {
> + struct drm_connector_list_iter conn_iter;
> +
> + drm_connector_list_iter_begin(dev, _iter);
> + for_each_intel_connector_iter(intel_connector, _iter) {
> + if (intel_connector->mst_port != intel_dp)
> + continue;
> +
> + conn_state = intel_connector->base.state;
> + if (!conn_state->crtc)
> + continue;
> +
> + crtc_mask |= drm_crtc_mask(conn_state->crtc);
> + }
> + 

Re: [PATCH v3 5/5] drm/i915: Implement proper fallback training for MST

2018-03-12 Thread Ville Syrjälä
On Fri, Mar 09, 2018 at 04:32:31PM -0500, Lyude Paul wrote:
> @@ -6266,25 +6368,98 @@ static bool intel_edp_init_connector(struct intel_dp 
> *intel_dp,
>   return false;
>  }
>  
> +static void intel_dp_mst_retrain_link_work(struct work_struct *work)
> +{

Why do we need another work for this? Can't we just retrain from the
hotplug work always?

> + struct drm_modeset_acquire_ctx ctx;
> + struct intel_dp *intel_dp = container_of(work, typeof(*intel_dp),
> +  mst_retrain_work);
> + struct intel_encoder *intel_encoder = _to_dig_port(intel_dp)->base;
> + struct drm_device *dev = intel_encoder->base.dev;
> + int ret;
> + bool had_error = false;
> +
> + drm_modeset_acquire_init(, 0);
> +
> + for (;;) {
> + ret = intel_dp_retrain_link(intel_encoder, );
> + if (ret == -EDEADLK) {
> + drm_modeset_backoff();
> + continue;
> + }
> +
> + break;
> + }
> + if (!ret) {
> + DRM_DEBUG_KMS("Retrain complete\n");
> + goto out;
> + } else if (ret == -EIO) {
> + DRM_ERROR("IO error with sink during retrain? Aborting\n");
> + had_error = true;
> + goto out;
> + }
> +
> + DRM_DEBUG_KMS("Retraining failed with %d, marking link status as bad\n",
> +   ret);
> +
> + /* We ran out of retries, if the sink hasn't changed the link rate in
> +  * it's dpcd yet force us to fallback to a lower link rate/count */
> + if (ret == -EINVAL) {
> + ret = intel_dp_get_dpcd(intel_dp);
> + if (!ret) {
> + DRM_ERROR("IO error while reading dpcd from sink\n");
> + had_error = true;
> + goto out;
> + }
> +
> + if (intel_dp->link_rate == intel_dp_max_link_rate(intel_dp) &&
> + intel_dp->lane_count == intel_dp_max_lane_count(intel_dp)) {
> + intel_dp_get_link_train_fallback_values(
> + intel_dp, intel_dp_max_link_rate(intel_dp),
> + intel_dp_max_lane_count(intel_dp));
> + }
> + }
> +
> + intel_dp->mst_link_is_bad = true;
> + intel_dp->mst_bw_locked = false;
> + schedule_work(_dp->modeset_retry_work);
> +out:
> + drm_modeset_drop_locks();
> + drm_modeset_acquire_fini();
> + if (had_error)
> + drm_kms_helper_hotplug_event(dev);
> +}
> +

-- 
Ville Syrjälä
Intel OTC
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v3 5/5] drm/i915: Implement proper fallback training for MST

2018-03-09 Thread Lyude Paul
For a while we actually haven't had any way of retraining MST links with
fallback link parameters like we do with SST. While uncommon, certain
setups such as my Caldigit TS3 + EVGA MST hub require this since
otherwise, they end up getting stuck in an infinite MST retraining loop.

MST retraining is somewhat different then SST retraining. While it's
possible during the normal link retraining sequence for a hub to indicate
bad link status, it's also possible for a hub to only indicate this
status through ESI messages and it's possible for this to happen after
the initial link training succeeds. This can lead to a pattern that
looks like this:

- Train MST link
- Training completes successfully
- MST hub sets Channel EQ failed bit in ESI
- Retraining starts
- Retraining completes successfully
- MST hub sets Channel EQ failed bit in ESI again
- Rinse and repeat

In these situations, we need to be able to actually trigger fallback
link training from the ESI handler as well, along with using the ESI
handler during retraining to figure out whether or not our retraining
actually succeeded.

This gets a bit more complicated since we have to ensure that we don't
block the ESI handler at all while doing retraining. If we do, due to
DisplayPort's general issues with being sensitive to IRQ latency most
MST hubs will just stop responding to us if their interrupts aren't
handled in a timely manner.

So: move retraining into it's own seperate handler. Running in a
seperate handler allows us to avoid stalling the ESI during link
retraining, and we can have the ESI signal that the channel EQ bit was
cleared through a simple completion struct. Additionally, we take care
to stick as much of this into the SST retraining path as possible since
sharing is caring.

Signed-off-by: Lyude Paul 
Cc: Manasi Navare 
Cc: Ville Syrjälä 
---
 drivers/gpu/drm/i915/intel_dp.c | 342 +++-
 drivers/gpu/drm/i915/intel_dp_mst.c |  42 -
 drivers/gpu/drm/i915/intel_drv.h|   8 +
 3 files changed, 302 insertions(+), 90 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 5645a194de92..7626652732b6 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -45,6 +45,8 @@
 
 #define DP_DPRX_ESI_LEN 14
 
+#define DP_MST_RETRAIN_TIMEOUT (msecs_to_jiffies(100))
+
 /* Compliance test status bits  */
 #define INTEL_DP_RESOLUTION_SHIFT_MASK 0
 #define INTEL_DP_RESOLUTION_PREFERRED  (1 << INTEL_DP_RESOLUTION_SHIFT_MASK)
@@ -4224,6 +4226,118 @@ static void intel_dp_handle_test_request(struct 
intel_dp *intel_dp)
DRM_DEBUG_KMS("Could not write test response to sink\n");
 }
 
+/* Get a mask of the CRTCs that are running on the given intel_dp struct. For
+ * MST, this returns a crtc mask containing all of the CRTCs driving
+ * downstream sinks, for SST it just returns a mask of the attached
+ * connector's CRTC.
+ */
+int
+intel_dp_get_crtc_mask(struct intel_dp *intel_dp)
+{
+   struct drm_device *dev = dp_to_dig_port(intel_dp)->base.base.dev;
+   struct drm_connector *connector;
+   struct drm_connector_state *conn_state;
+   struct intel_connector *intel_connector;
+   struct drm_crtc *crtc;
+   int crtc_mask = 0;
+
+   WARN_ON(!drm_modeset_is_locked(>mode_config.connection_mutex));
+
+   if (intel_dp->is_mst) {
+   struct drm_connector_list_iter conn_iter;
+
+   drm_connector_list_iter_begin(dev, _iter);
+   for_each_intel_connector_iter(intel_connector, _iter) {
+   if (intel_connector->mst_port != intel_dp)
+   continue;
+
+   conn_state = intel_connector->base.state;
+   if (!conn_state->crtc)
+   continue;
+
+   crtc_mask |= drm_crtc_mask(conn_state->crtc);
+   }
+   drm_connector_list_iter_end(_iter);
+   } else {
+   connector = _dp->attached_connector->base;
+   crtc = connector->state->crtc;
+
+   if (crtc)
+   crtc_mask |= drm_crtc_mask(crtc);
+   }
+
+   return crtc_mask;
+}
+
+static bool
+intel_dp_needs_link_retrain(struct intel_dp *intel_dp,
+   const u8 esi[DP_DPRX_ESI_LEN])
+{
+   u8 buf[max(DP_LINK_STATUS_SIZE, DP_DPRX_ESI_LEN)];
+   const u8 *link_status = NULL;
+
+   if (intel_dp->is_mst) {
+   if (!intel_dp->active_mst_links)
+   return false;
+   if (intel_dp->mst_link_is_bad)
+   return false;
+
+   if (esi) {
+   link_status = [10];
+   } else {
+   /* We're not running from the ESI handler, so wait a
+* little bit to see if the ESI handler lets us know
+