Re: [PATCH v2 0/4] Update memcpy, memset etc. for M7/M8 architectures
David, Thanks for applying. On 8/10/2017 4:38 PM, David Miller wrote: From: Babu MogerDate: Mon, 7 Aug 2017 17:52:48 -0600 This series of patches updates the memcpy, memset, copy_to_user, copy_from_user etc for SPARC M7/M8 architecture. This doesn't build, you cannot assume the existence of "%ncc", it is a recent addition. Furthermore there is no need to ever use %ncc in v9 targetted code anyways. I'll fix that up, but this was a really disappointing build failure to hit. Thank you.. Meanwhile, two questions: 1) Is this also faster on T4 as well? If it is, we can just get rid of the T4 routines and use this on those chips as well. At the time of this work, our focus was mostly on T7 and T8. We did not test this code on T4. For T4 and other older configs we used NG4 versions. I would think it would require some changes to make it work on T4. 2) There has been a lot of discussion and consideration put into how a memcpy/memset routine might be really great for the local cpu but overall pessimize performance for other cpus either locally on the same core (contention for physical resources such as ports to the store buffer and/or L3 cache) or on other cores. Has any such study been done into these issues wrt. this new code? No, we have not done this kind of study.
Re: [PATCH v2 0/4] Update memcpy, memset etc. for M7/M8 architectures
David, Thanks for applying. On 8/10/2017 4:38 PM, David Miller wrote: From: Babu Moger Date: Mon, 7 Aug 2017 17:52:48 -0600 This series of patches updates the memcpy, memset, copy_to_user, copy_from_user etc for SPARC M7/M8 architecture. This doesn't build, you cannot assume the existence of "%ncc", it is a recent addition. Furthermore there is no need to ever use %ncc in v9 targetted code anyways. I'll fix that up, but this was a really disappointing build failure to hit. Thank you.. Meanwhile, two questions: 1) Is this also faster on T4 as well? If it is, we can just get rid of the T4 routines and use this on those chips as well. At the time of this work, our focus was mostly on T7 and T8. We did not test this code on T4. For T4 and other older configs we used NG4 versions. I would think it would require some changes to make it work on T4. 2) There has been a lot of discussion and consideration put into how a memcpy/memset routine might be really great for the local cpu but overall pessimize performance for other cpus either locally on the same core (contention for physical resources such as ports to the store buffer and/or L3 cache) or on other cores. Has any such study been done into these issues wrt. this new code? No, we have not done this kind of study.
Re: [PATCH v2 0/4] Update memcpy, memset etc. for M7/M8 architectures
From: Babu MogerDate: Mon, 7 Aug 2017 17:52:48 -0600 > This series of patches updates the memcpy, memset, copy_to_user, > copy_from_user etc for SPARC M7/M8 architecture. This doesn't build, you cannot assume the existence of "%ncc", it is a recent addition. Furthermore there is no need to ever use %ncc in v9 targetted code anyways. I'll fix that up, but this was a really disappointing build failure to hit. Meanwhile, two questions: 1) Is this also faster on T4 as well? If it is, we can just get rid of the T4 routines and use this on those chips as well. 2) There has been a lot of discussion and consideration put into how a memcpy/memset routine might be really great for the local cpu but overall pessimize performance for other cpus either locally on the same core (contention for physical resources such as ports to the store buffer and/or L3 cache) or on other cores. Has any such study been done into these issues wrt. this new code?
Re: [PATCH v2 0/4] Update memcpy, memset etc. for M7/M8 architectures
From: Babu Moger Date: Mon, 7 Aug 2017 17:52:48 -0600 > This series of patches updates the memcpy, memset, copy_to_user, > copy_from_user etc for SPARC M7/M8 architecture. This doesn't build, you cannot assume the existence of "%ncc", it is a recent addition. Furthermore there is no need to ever use %ncc in v9 targetted code anyways. I'll fix that up, but this was a really disappointing build failure to hit. Meanwhile, two questions: 1) Is this also faster on T4 as well? If it is, we can just get rid of the T4 routines and use this on those chips as well. 2) There has been a lot of discussion and consideration put into how a memcpy/memset routine might be really great for the local cpu but overall pessimize performance for other cpus either locally on the same core (contention for physical resources such as ports to the store buffer and/or L3 cache) or on other cores. Has any such study been done into these issues wrt. this new code?