Re: [vpp-dev] Multiarch/target select for dpdk_device_input

Damjan Marion Fri, 01 Jun 2018 09:10:04 -0700

Dear Nitin,

I really don't have anything else to add. It your call how do you want to 
proceed....


Regards,

Damjan

> On 1 Jun 2018, at 18:02, Nitin Saxena <nitin.sax...@cavium.com> wrote:
> 
> Hi Damjan,
> 
> Answers Inline.
> 
> Thanks,
> Nitin
> 
> On Friday 01 June 2018 08:49 PM, Damjan Marion wrote:
>> Hi Nitin,
>> inline...
>>> On 1 Jun 2018, at 15:23, Nitin Saxena <nitin.sax...@cavium.com> wrote:
>>> 
>>> Hi Damjan,
>>> 
>>>> It was hard to know that you have subset of patches hidden somewhere.
>>> I wouldn't say patches are hidden. We are trying to fine tune dpdk-input 
>>> initially from our end first and later we will seek your expertise while 
>>> upstreaming.
>> for me they were hidden.
>>>> Typically it makes sense to discuss such kind of changes with person >who 
>>>> "maintains" the code before starting writing the code.
>>> Agreed. However we prefer to do internal analysis/POC first before reaching 
>>> out to MAINTAINERS. That way we can better understand code review comments.
>> Perfectly fine, but then don't put blame on us for not knowing that you are 
>> doing something internally...
> The intention was not to blame anybody but to understand modular approach in 
> vpp to accommodate multi-arch(s).
>>> 
>>>> Maybe, but sounds to me like we are still in guessing phase.
>>> I wouldn't do any guess work with MAINTAINERS.
>>> 
>>>> Maybe we even need different function for each ARM CPU core as they
>>>> maybe have different memory subsystem and pipeline....
>>> This is what I am looking for. Is it ok to detect our hardware natively 
>>> from autoconf and append target specific macro to CFLAGS? And then separate 
>>> function for our target in dpdk/device/node.c? Sorry my multi-arch select 
>>> example was incorrect and that's not what I am looking at.
>> Here I will be able to help when I get reasonable understanding what is the 
>> "big" plan.
> The "Big" plan is to optimize each vpp node for Aarch64. For now focus is 
> dpdk-input.
>> I don't want that we end up in 6 months with cavium patches, nxp patches, 
>> marvell patches, and so on.
> Is it a problem? If yes than I am not able to visualize it as the same 
> problem would exist for any architecture and not just for Aarch64.
>>> 
>>>> Is there an agreement between ARM vendors what is the targeted core
>>>> you want to have code tuned for or you are simply tuning to whatever
>>>> core Cavium uses?
>>> I am trying to optimize Cavium's SOC. This question is in this regard only. 
>>> However efforts are going on optimizing Cortex cores as well by ARM 
>>> community.
>> What about agreeing on plan for optimising on all ARM cores, and then 
>> starting doing optimisation?
> This is cross-company question so hard to answer but Cavium has the "big" 
> plan described above.
>>> 
>>> Thanks,
>>> Nitin
>>> 
>>> On Friday 01 June 2018 01:55 AM, Damjan Marion wrote:
>>>> inline...
>>>> -- 
>>>> Damjan
>>>>> On 31 May 2018, at 21:10, Saxena, Nitin <nitin.sax...@cavium.com 
>>>>> <mailto:nitin.sax...@cavium.com>> wrote:
>>>>> 
>>>>> Hi Damjan,
>>>>> 
>>>>> Answers inline.
>>>>> 
>>>>> Thanks,
>>>>> Nitin
>>>>> 
>>>>>> On 01-Jun-2018, at 12:15 AM, Damjan Marion <dmarion.li...@gmail.com 
>>>>>> <mailto:dmarion.li...@gmail.com>> wrote:
>>>>>> 
>>>>>> 
>>>>>> Dear Nitin,
>>>>>> 
>>>>>> See inline….
>>>>>> 
>>>>>> 
>>>>>>> On 31 May 2018, at 19:59, Nitin Saxena <nitin.sax...@cavium.com 
>>>>>>> <mailto:nitin.sax...@cavium.com>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I am working on optimising dpdk-input node (based on vpp v1804) for our 
>>>>>>> target. I am able to get performance improvements on our target but the 
>>>>>>> problem I am finding now are:
>>>>>>> 
>>>>>>> 1) The dpdk-input code is completely changed on master branch from 
>>>>>>> v1804.
>>>>>> 
>>>>>> Why is this a problem? It was done with reason and for tangible benefit.
>>>>> This is a problem for me as I can not apply my v1804 changes directly to 
>>>>> the master branch. I have to again rework on master branch and that’s why 
>>>>> I am not able to move to master branch or v1807 in future.
>>>> It was hard to know that you have subset of patches hidden somewhere. 
>>>> Typically it makes sense to discuss such kind of changes with person who 
>>>> "maintains" the code before starting writing the code.
>>>>>> 
>>>>>>> Not to mention the dpdk-input master branch code do not give better 
>>>>>>> numbers on our target as compared to v1804
>>>>>> 
>>>>>> Sad to hear that, good thing is, it gives better numbers on x86.
>>>>> As I understand one dpdk_device_input function cannot be same for all 
>>>>> architectures because if the underlying micro-architecture is different, 
>>>>> the hot spots changes.
>>>> Maybe, but sounds to me like we are still in guessing phase.
>>>> Maybe we even need different function for each ARM CPU core as they maybe 
>>>> have different memory subsystem and pipeline....
>>>> Is there an agreement between ARM vendors what is the targeted core you 
>>>> want to have code tuned for or you are simply tuning to whatever core 
>>>> Cavium uses?
>>>>> I have seen dpdk-input master branch changes and on a positive notes 
>>>>> those changes make sense however some codes are tuned for x86 specially 
>>>>> Skylake. I was looking for some kind of  way to have mutiarch select 
>>>>> function for the Rx path, like the way it’s done for tx path.
>>>> Not sure why do you need that, unless you are going to have code optimised 
>>>> for different CPU variants (i.e. Cortex-A53 and Cortex-A72) in the same 
>>>> binary.
>>>>>> 
>>>>>>> 2) I don’t know the modular approach I should follow to merge my 
>>>>>>> changes as I have completely changed the quad loop handling and the 
>>>>>>> prefetches order in dpdk-input.
>>>>>> 
>>>>>> I carefully tuned that code. It was multi day exercise and losing single 
>>>>>> clock/packet on x86 with additional modifications are not acceptable. 
>>>>>> Still I’m open for discussion how to address this problem.
>>>>>> 
>>>>>>> 
>>>>>>> Note: I am far away from upstreaming the code currently as my 
>>>>>>> optimisation is still in progress. It will be better if I know the 
>>>>>>> proper way of doing it.
>>>>>> 
>>>>>> I suggest that you don’t even start on working on upstreaming before we 
>>>>>> have deep understanding of what and why needs to be done and we are all 
>>>>>> in agreement.
>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Nitin
>>> 
>>> 
> 
>

Re: [vpp-dev] Multiarch/target select for dpdk_device_input

Reply via email to