Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper
Am 08.09.2018 um 09:54 schrieb Huang Rui: On Fri, Sep 07, 2018 at 07:18:59PM +0800, Christian K�nig wrote: Am 07.09.2018 um 13:02 schrieb Huang, Ray: Yes, that was one problem. Another was that the cutting code was buggy and determined one of the positions to cut at the wrong time. I went through again about the list cutting behavior and wrote down with the attached picture. After do the second list_cut_position, the list2 should be point the end of "before" list. And list2 won't be used anymore after list cutting. May I know am I something missed? Let's take a look at the original code: list1 = is_swap ? >last->swap : >last->lru; list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; list_cut_position(, lru, list1); list_cut_position(, , list2); As far as I can see the problem is that the first list_cur_position could modify the value of pos->first->lru.prev and so make the second list_cut_position work on the wrong position. I think I understood. In this case (the first element of LRU == pos->first->lru), please see the picture. If we store first->lru.prev as list2(LRU head), after do list cutting, the first->lru.prev will overwrite as new head (entries), however, the orignal list2 will still point previous head (that is the wrong position now). We actually expected to use the latest first->lru.prev as the second cutting position. So we should adjust code sequence like below: list1 = is_swap ? >last->swap : >last->lru; list_cut_position(, lru, list1); list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; list_cut_position(, , list2); Am I understanding right? Yes, exactly. And the fix looks like what I did as well. Just only used one variable and added some extra BUG_ON(). Christian. Thanks, Ray Regards, Christian. Thanks, Ray From: amd-gfx on behalf of Christian König Sent: Thursday, September 6, 2018 6:06 PM To: Huang, Ray Cc: Michel Dänzer; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper Am 06.09.2018 um 12:02 schrieb Huang Rui: On Fri, Aug 31, 2018 at 05:17:33PM +0200, Christian König wrote: Am 31.08.2018 um 17:15 schrieb Michel Dänzer: On 2018-08-31 3:10 p.m., Christian König wrote: Staring at the function for six hours, just to essentially move one line of code. That sucks, but the commit log should describe what the problem was and how this patch solves it. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 35d53d81f486..138c98902033 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -250,15 +250,18 @@ EXPORT_SYMBOL(ttm_bo_move_to_lru_tail); static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos, struct list_head *lru, bool is_swap) { + struct list_head *list; LIST_HEAD(entries); LIST_HEAD(before); - struct list_head *list1, *list2; - list1 = is_swap ? >last->swap : >last->lru; - list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; + reservation_object_assert_held(pos->last->resv); + list = is_swap ? >last->swap : >last->lru; + list_cut_position(, lru, list); + + reservation_object_assert_held(pos->first->resv); + list = is_swap ? pos->first->swap.prev : pos->first->lru.prev; + list_cut_position(, , list); So the problem was that the first list_cut_position call could result in list2 pointing to la-la-land? Good catch! Yes, exactly. Thought that would be obvious, but going to add that to the commit log. Can I get a tested-by? You where much better at reproducing that than I'm. Michel, Christian, thanks so much to take care of this when I was on vacation. And sorry to let you take a long time for finding the cause. Is that because I didn't hold the resveration before cut the list from position "first" and &
Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper
On Fri, Sep 07, 2018 at 07:18:59PM +0800, Christian K�nig wrote: > Am 07.09.2018 um 13:02 schrieb Huang, Ray: > > Yes, that was one problem. Another was that the cutting code was > buggy > and determined one of the positions to cut at the wrong time. > > I went through again about the list cutting behavior and wrote down with > the attached picture. > After do the second list_cut_position, the list2 should be point the end > of "before" list. And list2 won't be used anymore after list cutting. May I > know am I something missed? > > > Let's take a look at the original code: > > list1 = is_swap ? >last->swap : >last->lru; > list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; > > list_cut_position(, lru, list1); > list_cut_position(, , list2); > > > As far as I can see the problem is that the first list_cur_position could > modify the value of pos->first->lru.prev and so make the second > list_cut_position work on the wrong position. > I think I understood. In this case (the first element of LRU == pos->first->lru), please see the picture. If we store first->lru.prev as list2(LRU head), after do list cutting, the first->lru.prev will overwrite as new head (entries), however, the orignal list2 will still point previous head (that is the wrong position now). We actually expected to use the latest first->lru.prev as the second cutting position. So we should adjust code sequence like below: list1 = is_swap ? >last->swap : >last->lru; list_cut_position(, lru, list1); list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; list_cut_position(, , list2); Am I understanding right? Thanks, Ray > Regards, > Christian. > > > > Thanks, > Ray > > From: amd-gfx on behalf of > Christian König > Sent: Thursday, September 6, 2018 6:06 PM > To: Huang, Ray > Cc: Michel Dänzer; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper > > > Am 06.09.2018 um 12:02 schrieb Huang Rui: > > On Fri, Aug 31, 2018 at 05:17:33PM +0200, Christian König wrote: > > Am 31.08.2018 um 17:15 schrieb Michel Dänzer: > > On 2018-08-31 3:10 p.m., Christian König wrote: > > Staring at the function for six hours, just to > essentially move one line > of code. > > That sucks, but the commit log should describe what the > problem was and > how this patch solves it. > > > > Signed-off-by: Christian König > --- >drivers/gpu/drm/ttm/ttm_bo.c | 13 - >1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c > b/drivers/gpu/drm/ttm/ttm_bo.c > index 35d53d81f486..138c98902033 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -250,15 +250,18 @@ > EXPORT_SYMBOL(ttm_bo_move_to_lru_tail); >static void ttm_bo_bulk_move_helper(struct > ttm_lru_bulk_move_pos *pos, > struct list_head *lru, > bool is_swap) >{ > + struct list_head *list; > LIST_HEAD(entries); > LIST_HEAD(before); > - struct list_head *list1, *list2; > - list1 = is_swap ? >last->swap : >last->lru; > - list2 = is_swap ? pos->first->swap.prev : > pos->first->lru.prev; > + reservation_object_assert_held(pos->last->resv); > + list = is_swap ? >last->swap : >last->lru; > + list_cut_position(, lru, list); > + > + reservation_object_assert_held(pos->first->resv); > + list = is_swap ? pos->first->swap.prev : > pos->first->lru.prev; > + list_cut_position(, , list); > > So the problem was that the first list_cut_position call > could result in > list2 pointing to la-la-land? Good catch! > > Yes, exactly. Thought that would be obvious, but going to add that > to the commit log. > > Can I get a tested-by? You where much better at reproducing that >
Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper
Am 07.09.2018 um 13:02 schrieb Huang, Ray: Yes, that was one problem. Another was that the cutting code was buggy and determined one of the positions to cut at the wrong time. I went through again about the list cutting behavior and wrote down with the attached picture. After do the second list_cut_position, the list2 should be point the end of "before" list. And list2 won't be used anymore after list cutting. May I know am I something missed? Let's take a look at the original code: list1 = is_swap ? >last->swap : >last->lru; list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; list_cut_position(, lru, list1); list_cut_position(, , list2); As far as I can see the problem is that the first list_cur_position could modify the value of pos->first->lru.prev and so make the second list_cut_position work on the wrong position. Regards, Christian. Thanks, Ray From: amd-gfx on behalf of Christian König Sent: Thursday, September 6, 2018 6:06 PM To: Huang, Ray Cc: Michel Dänzer; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper Am 06.09.2018 um 12:02 schrieb Huang Rui: On Fri, Aug 31, 2018 at 05:17:33PM +0200, Christian König wrote: Am 31.08.2018 um 17:15 schrieb Michel Dänzer: On 2018-08-31 3:10 p.m., Christian König wrote: Staring at the function for six hours, just to essentially move one line of code. That sucks, but the commit log should describe what the problem was and how this patch solves it. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 35d53d81f486..138c98902033 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -250,15 +250,18 @@ EXPORT_SYMBOL(ttm_bo_move_to_lru_tail); static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos, struct list_head *lru, bool is_swap) { + struct list_head *list; LIST_HEAD(entries); LIST_HEAD(before); - struct list_head *list1, *list2; - list1 = is_swap ? >last->swap : >last->lru; - list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; + reservation_object_assert_held(pos->last->resv); + list = is_swap ? >last->swap : >last->lru; + list_cut_position(, lru, list); + + reservation_object_assert_held(pos->first->resv); + list = is_swap ? pos->first->swap.prev : pos->first->lru.prev; + list_cut_position(, , list); So the problem was that the first list_cut_position call could result in list2 pointing to la-la-land? Good catch! Yes, exactly. Thought that would be obvious, but going to add that to the commit log. Can I get a tested-by? You where much better at reproducing that than I'm. Michel, Christian, thanks so much to take care of this when I was on vacation. And sorry to let you take a long time for finding the cause. Is that because I didn't hold the resveration before cut the list from position "first" and "last"? Yes, that was one problem. Another was that the cutting code was buggy and determined one of the positions to cut at the wrong time. May I know in which cases, we must hold the bo's reservation firstly? BOs are reserved to prevent moving them. E.g. when the BO isn't reserved it can move around and so the LRU where you want to add/remove it could change. Christian. Thanks, Ray ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx amd-gfx Info Page - freedesktop.org lists.freedesktop.org To see the collection of prior postings to the list, visit the amd-gfx Archives.. Using amd-gfx: To post a message to all the list members, send email to amd-gfx@lists.freedesktop.org. You can subscribe to the list, or change your existing subscription, in the sections below. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper
Am 06.09.2018 um 12:02 schrieb Huang Rui: On Fri, Aug 31, 2018 at 05:17:33PM +0200, Christian König wrote: Am 31.08.2018 um 17:15 schrieb Michel Dänzer: On 2018-08-31 3:10 p.m., Christian König wrote: Staring at the function for six hours, just to essentially move one line of code. That sucks, but the commit log should describe what the problem was and how this patch solves it. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 35d53d81f486..138c98902033 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -250,15 +250,18 @@ EXPORT_SYMBOL(ttm_bo_move_to_lru_tail); static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos, struct list_head *lru, bool is_swap) { + struct list_head *list; LIST_HEAD(entries); LIST_HEAD(before); - struct list_head *list1, *list2; - list1 = is_swap ? >last->swap : >last->lru; - list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; + reservation_object_assert_held(pos->last->resv); + list = is_swap ? >last->swap : >last->lru; + list_cut_position(, lru, list); + + reservation_object_assert_held(pos->first->resv); + list = is_swap ? pos->first->swap.prev : pos->first->lru.prev; + list_cut_position(, , list); So the problem was that the first list_cut_position call could result in list2 pointing to la-la-land? Good catch! Yes, exactly. Thought that would be obvious, but going to add that to the commit log. Can I get a tested-by? You where much better at reproducing that than I'm. Michel, Christian, thanks so much to take care of this when I was on vacation. And sorry to let you take a long time for finding the cause. Is that because I didn't hold the resveration before cut the list from position "first" and "last"? Yes, that was one problem. Another was that the cutting code was buggy and determined one of the positions to cut at the wrong time. May I know in which cases, we must hold the bo's reservation firstly? BOs are reserved to prevent moving them. E.g. when the BO isn't reserved it can move around and so the LRU where you want to add/remove it could change. Christian. Thanks, Ray ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper
On Fri, Aug 31, 2018 at 05:17:33PM +0200, Christian König wrote: > Am 31.08.2018 um 17:15 schrieb Michel Dänzer: > >On 2018-08-31 3:10 p.m., Christian König wrote: > >>Staring at the function for six hours, just to essentially move one line > >>of code. > >That sucks, but the commit log should describe what the problem was and > >how this patch solves it. > > > > > >>Signed-off-by: Christian König > >>--- > >> drivers/gpu/drm/ttm/ttm_bo.c | 13 - > >> 1 file changed, 8 insertions(+), 5 deletions(-) > >> > >>diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > >>index 35d53d81f486..138c98902033 100644 > >>--- a/drivers/gpu/drm/ttm/ttm_bo.c > >>+++ b/drivers/gpu/drm/ttm/ttm_bo.c > >>@@ -250,15 +250,18 @@ EXPORT_SYMBOL(ttm_bo_move_to_lru_tail); > >> static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos, > >>struct list_head *lru, bool is_swap) > >> { > >>+ struct list_head *list; > >>LIST_HEAD(entries); > >>LIST_HEAD(before); > >>- struct list_head *list1, *list2; > >>- list1 = is_swap ? >last->swap : >last->lru; > >>- list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; > >>+ reservation_object_assert_held(pos->last->resv); > >>+ list = is_swap ? >last->swap : >last->lru; > >>+ list_cut_position(, lru, list); > >>+ > >>+ reservation_object_assert_held(pos->first->resv); > >>+ list = is_swap ? pos->first->swap.prev : pos->first->lru.prev; > >>+ list_cut_position(, , list); > >So the problem was that the first list_cut_position call could result in > >list2 pointing to la-la-land? Good catch! > > Yes, exactly. Thought that would be obvious, but going to add that > to the commit log. > > Can I get a tested-by? You where much better at reproducing that than I'm. > Michel, Christian, thanks so much to take care of this when I was on vacation. And sorry to let you take a long time for finding the cause. Is that because I didn't hold the resveration before cut the list from position "first" and "last"? May I know in which cases, we must hold the bo's reservation firstly? Thanks, Ray ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper
On 08/31/2018 09:10 PM, Christian König wrote: Staring at the function for six hours, just to essentially move one line of code. Signed-off-by: Christian König Reviewed-by: Junwei Zhang --- drivers/gpu/drm/ttm/ttm_bo.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 35d53d81f486..138c98902033 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -250,15 +250,18 @@ EXPORT_SYMBOL(ttm_bo_move_to_lru_tail); static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos, struct list_head *lru, bool is_swap) { + struct list_head *list; LIST_HEAD(entries); LIST_HEAD(before); - struct list_head *list1, *list2; - list1 = is_swap ? >last->swap : >last->lru; - list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; + reservation_object_assert_held(pos->last->resv); + list = is_swap ? >last->swap : >last->lru; + list_cut_position(, lru, list); + + reservation_object_assert_held(pos->first->resv); + list = is_swap ? pos->first->swap.prev : pos->first->lru.prev; + list_cut_position(, , list); - list_cut_position(, lru, list1); - list_cut_position(, , list2); list_splice(, lru); list_splice_tail(, lru); } ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper
Am 31.08.2018 um 17:15 schrieb Michel Dänzer: On 2018-08-31 3:10 p.m., Christian König wrote: Staring at the function for six hours, just to essentially move one line of code. That sucks, but the commit log should describe what the problem was and how this patch solves it. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 35d53d81f486..138c98902033 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -250,15 +250,18 @@ EXPORT_SYMBOL(ttm_bo_move_to_lru_tail); static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos, struct list_head *lru, bool is_swap) { + struct list_head *list; LIST_HEAD(entries); LIST_HEAD(before); - struct list_head *list1, *list2; - list1 = is_swap ? >last->swap : >last->lru; - list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; + reservation_object_assert_held(pos->last->resv); + list = is_swap ? >last->swap : >last->lru; + list_cut_position(, lru, list); + + reservation_object_assert_held(pos->first->resv); + list = is_swap ? pos->first->swap.prev : pos->first->lru.prev; + list_cut_position(, , list); So the problem was that the first list_cut_position call could result in list2 pointing to la-la-land? Good catch! Yes, exactly. Thought that would be obvious, but going to add that to the commit log. Can I get a tested-by? You where much better at reproducing that than I'm. Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/3] drm/ttm: fix ttm_bo_bulk_move_helper
On 2018-08-31 3:10 p.m., Christian König wrote: > Staring at the function for six hours, just to essentially move one line > of code. That sucks, but the commit log should describe what the problem was and how this patch solves it. > Signed-off-by: Christian König > --- > drivers/gpu/drm/ttm/ttm_bo.c | 13 - > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 35d53d81f486..138c98902033 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -250,15 +250,18 @@ EXPORT_SYMBOL(ttm_bo_move_to_lru_tail); > static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos, > struct list_head *lru, bool is_swap) > { > + struct list_head *list; > LIST_HEAD(entries); > LIST_HEAD(before); > - struct list_head *list1, *list2; > > - list1 = is_swap ? >last->swap : >last->lru; > - list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev; > + reservation_object_assert_held(pos->last->resv); > + list = is_swap ? >last->swap : >last->lru; > + list_cut_position(, lru, list); > + > + reservation_object_assert_held(pos->first->resv); > + list = is_swap ? pos->first->swap.prev : pos->first->lru.prev; > + list_cut_position(, , list); So the problem was that the first list_cut_position call could result in list2 pointing to la-la-land? Good catch! -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx