Re: [vpp-dev] Multiarch/target select for dpdk_device_input

2018-06-02 Thread Damjan Marion

Dear Nitin,

Anybody is free to submit patch to gerrit.fd.io  or to 
provide comments to any patch there.

Regards,

Damjan

> On 2 Jun 2018, at 03:11, Nitin Saxena  wrote:
> 
> Hi Damjan,
> 
> If VPP is an open-source project that supports multiple architectures, then 
> there should be a review of every commit which provides others using the open 
> source project an opportunity to raise their concerns. So my request is to 
> post changes for review before they are committed to ensure VPP stays true to 
> open-source philosophy. Please let me know if this is possible. If not, i'd 
> like to understand the reasons for it.
> 
> Regards,
> Nitin
> 
> On 02-Jun-2018, at 00:17, Damjan Marion  > wrote:
> 
>> 
>> Dear Nitin,
>> 
>> That doesn't work that way. 
>> 
>> Regards,
>> 
>> Damjan
>> 
>>> On 1 Jun 2018, at 19:41, Saxena, Nitin >> > wrote:
>>> 
>>> Hi Damjan,
>>> 
>>>  Now that you are aware that Cavium is working on optimisations for ARM, 
>>> can I request that you check with us on implications for ARM(at least 
>>> Cavium), before bringing changes in dpdk-input? 
>>> 
>>> Regards,
>>> Nitin
>>> 
>>> On 01-Jun-2018, at 21:39, Damjan Marion >> > wrote:
>>> 
 
 Dear Nitin,
 
 I really don't have anything else to add. It your call how do you want to 
 proceed
 
 Regards,
 
 Damjan
 
> On 1 Jun 2018, at 18:02, Nitin Saxena  > wrote:
> 
> Hi Damjan,
> 
> Answers Inline.
> 
> Thanks,
> Nitin
> 
> On Friday 01 June 2018 08:49 PM, Damjan Marion wrote:
>> Hi Nitin,
>> inline...
>>> On 1 Jun 2018, at 15:23, Nitin Saxena >> > wrote:
>>> 
>>> Hi Damjan,
>>> 
 It was hard to know that you have subset of patches hidden somewhere.
>>> I wouldn't say patches are hidden. We are trying to fine tune 
>>> dpdk-input initially from our end first and later we will seek your 
>>> expertise while upstreaming.
>> for me they were hidden.
 Typically it makes sense to discuss such kind of changes with person 
 >who "maintains" the code before starting writing the code.
>>> Agreed. However we prefer to do internal analysis/POC first before 
>>> reaching out to MAINTAINERS. That way we can better understand code 
>>> review comments.
>> Perfectly fine, but then don't put blame on us for not knowing that you 
>> are doing something internally...
> The intention was not to blame anybody but to understand modular approach 
> in vpp to accommodate multi-arch(s).
>>> 
 Maybe, but sounds to me like we are still in guessing phase.
>>> I wouldn't do any guess work with MAINTAINERS.
>>> 
 Maybe we even need different function for each ARM CPU core as they
 maybe have different memory subsystem and pipeline
>>> This is what I am looking for. Is it ok to detect our hardware natively 
>>> from autoconf and append target specific macro to CFLAGS? And then 
>>> separate function for our target in dpdk/device/node.c? Sorry my 
>>> multi-arch select example was incorrect and that's not what I am 
>>> looking at.
>> Here I will be able to help when I get reasonable understanding what is 
>> the "big" plan.
> The "Big" plan is to optimize each vpp node for Aarch64. For now focus is 
> dpdk-input.
>> I don't want that we end up in 6 months with cavium patches, nxp 
>> patches, marvell patches, and so on.
> Is it a problem? If yes than I am not able to visualize it as the same 
> problem would exist for any architecture and not just for Aarch64.
>>> 
 Is there an agreement between ARM vendors what is the targeted core
 you want to have code tuned for or you are simply tuning to whatever
 core Cavium uses?
>>> I am trying to optimize Cavium's SOC. This question is in this regard 
>>> only. However efforts are going on optimizing Cortex cores as well by 
>>> ARM community.
>> What about agreeing on plan for optimising on all ARM cores, and then 
>> starting doing optimisation?
> This is cross-company question so hard to answer but Cavium has the "big" 
> plan described above.
>>> 
>>> Thanks,
>>> Nitin
>>> 
>>> On Friday 01 June 2018 01:55 AM, Damjan Marion wrote:
 inline...
 -- 
 Damjan
> On 31 May 2018, at 21:10, Saxena, Nitin    >> wrote:
> 
> Hi Damjan,
> 
> Answers inline.
> 
> Thanks,
> Nitin
> 
>> On 01-Jun-2018, at 12:15 AM, Damjan Marion >  

Re: [vpp-dev] Multiarch/target select for dpdk_device_input

2018-06-01 Thread Nitin Saxena
Hi Damjan,

If VPP is an open-source project that supports multiple architectures, then 
there should be a review of every commit which provides others using the open 
source project an opportunity to raise their concerns. So my request is to post 
changes for review before they are committed to ensure VPP stays true to 
open-source philosophy. Please let me know if this is possible. If not, i'd 
like to understand the reasons for it.

Regards,
Nitin

On 02-Jun-2018, at 00:17, Damjan Marion mailto:dmar...@me.com>> 
wrote:


Dear Nitin,

That doesn't work that way.

Regards,

Damjan

On 1 Jun 2018, at 19:41, Saxena, Nitin 
mailto:nitin.sax...@cavium.com>> wrote:

Hi Damjan,

 Now that you are aware that Cavium is working on optimisations for ARM, can I 
request that you check with us on implications for ARM(at least Cavium), before 
bringing changes in dpdk-input?

Regards,
Nitin

On 01-Jun-2018, at 21:39, Damjan Marion mailto:dmar...@me.com>> 
wrote:


Dear Nitin,

I really don't have anything else to add. It your call how do you want to 
proceed

Regards,

Damjan

On 1 Jun 2018, at 18:02, Nitin Saxena 
mailto:nitin.sax...@cavium.com>> wrote:

Hi Damjan,

Answers Inline.

Thanks,
Nitin

On Friday 01 June 2018 08:49 PM, Damjan Marion wrote:
Hi Nitin,
inline...
On 1 Jun 2018, at 15:23, Nitin Saxena 
mailto:nitin.sax...@cavium.com>> wrote:

Hi Damjan,

It was hard to know that you have subset of patches hidden somewhere.
I wouldn't say patches are hidden. We are trying to fine tune dpdk-input 
initially from our end first and later we will seek your expertise while 
upstreaming.
for me they were hidden.
Typically it makes sense to discuss such kind of changes with person >who 
"maintains" the code before starting writing the code.
Agreed. However we prefer to do internal analysis/POC first before reaching out 
to MAINTAINERS. That way we can better understand code review comments.
Perfectly fine, but then don't put blame on us for not knowing that you are 
doing something internally...
The intention was not to blame anybody but to understand modular approach in 
vpp to accommodate multi-arch(s).

Maybe, but sounds to me like we are still in guessing phase.
I wouldn't do any guess work with MAINTAINERS.

Maybe we even need different function for each ARM CPU core as they
maybe have different memory subsystem and pipeline
This is what I am looking for. Is it ok to detect our hardware natively from 
autoconf and append target specific macro to CFLAGS? And then separate function 
for our target in dpdk/device/node.c? Sorry my multi-arch select example was 
incorrect and that's not what I am looking at.
Here I will be able to help when I get reasonable understanding what is the 
"big" plan.
The "Big" plan is to optimize each vpp node for Aarch64. For now focus is 
dpdk-input.
I don't want that we end up in 6 months with cavium patches, nxp patches, 
marvell patches, and so on.
Is it a problem? If yes than I am not able to visualize it as the same problem 
would exist for any architecture and not just for Aarch64.

Is there an agreement between ARM vendors what is the targeted core
you want to have code tuned for or you are simply tuning to whatever
core Cavium uses?
I am trying to optimize Cavium's SOC. This question is in this regard only. 
However efforts are going on optimizing Cortex cores as well by ARM community.
What about agreeing on plan for optimising on all ARM cores, and then starting 
doing optimisation?
This is cross-company question so hard to answer but Cavium has the "big" plan 
described above.

Thanks,
Nitin

On Friday 01 June 2018 01:55 AM, Damjan Marion wrote:
inline...
--
Damjan
On 31 May 2018, at 21:10, Saxena, Nitin 
mailto:nitin.sax...@cavium.com> 
> wrote:

Hi Damjan,

Answers inline.

Thanks,
Nitin

On 01-Jun-2018, at 12:15 AM, Damjan Marion 
mailto:dmarion.li...@gmail.com> 
> wrote:


Dear Nitin,

See inline….


On 31 May 2018, at 19:59, Nitin Saxena 
mailto:nitin.sax...@cavium.com> 
> wrote:

Hi,

I am working on optimising dpdk-input node (based on vpp v1804) for our target. 
I am able to get performance improvements on our target but the problem I am 
finding now are:

1) The dpdk-input code is completely changed on master branch from v1804.

Why is this a problem? It was done with reason and for tangible benefit.
This is a problem for me as I can not apply my v1804 changes directly to the 
master branch. I have to again rework on master branch and that’s why I am not 
able to move to master branch or v1807 in future.
It was hard to know that you have subset of patches hidden somewhere. Typically 
it makes sense to discuss such kind of changes with person who "maintains" the 
code before starting writing the code.

Not to mention the dpdk-input master branch code do not give better numbers on 
our target as compared to v1804

Sad to hear that, good thing is, it gives better 

Re: [vpp-dev] Multiarch/target select for dpdk_device_input

2018-06-01 Thread Damjan Marion

Dear Nitin,

That doesn't work that way. 

Regards,

Damjan

> On 1 Jun 2018, at 19:41, Saxena, Nitin  wrote:
> 
> Hi Damjan,
> 
>  Now that you are aware that Cavium is working on optimisations for ARM, can 
> I request that you check with us on implications for ARM(at least Cavium), 
> before bringing changes in dpdk-input? 
> 
> Regards,
> Nitin
> 
> On 01-Jun-2018, at 21:39, Damjan Marion  > wrote:
> 
>> 
>> Dear Nitin,
>> 
>> I really don't have anything else to add. It your call how do you want to 
>> proceed
>> 
>> Regards,
>> 
>> Damjan
>> 
>>> On 1 Jun 2018, at 18:02, Nitin Saxena >> > wrote:
>>> 
>>> Hi Damjan,
>>> 
>>> Answers Inline.
>>> 
>>> Thanks,
>>> Nitin
>>> 
>>> On Friday 01 June 2018 08:49 PM, Damjan Marion wrote:
 Hi Nitin,
 inline...
> On 1 Jun 2018, at 15:23, Nitin Saxena  > wrote:
> 
> Hi Damjan,
> 
>> It was hard to know that you have subset of patches hidden somewhere.
> I wouldn't say patches are hidden. We are trying to fine tune dpdk-input 
> initially from our end first and later we will seek your expertise while 
> upstreaming.
 for me they were hidden.
>> Typically it makes sense to discuss such kind of changes with person 
>> >who "maintains" the code before starting writing the code.
> Agreed. However we prefer to do internal analysis/POC first before 
> reaching out to MAINTAINERS. That way we can better understand code 
> review comments.
 Perfectly fine, but then don't put blame on us for not knowing that you 
 are doing something internally...
>>> The intention was not to blame anybody but to understand modular approach 
>>> in vpp to accommodate multi-arch(s).
> 
>> Maybe, but sounds to me like we are still in guessing phase.
> I wouldn't do any guess work with MAINTAINERS.
> 
>> Maybe we even need different function for each ARM CPU core as they
>> maybe have different memory subsystem and pipeline
> This is what I am looking for. Is it ok to detect our hardware natively 
> from autoconf and append target specific macro to CFLAGS? And then 
> separate function for our target in dpdk/device/node.c? Sorry my 
> multi-arch select example was incorrect and that's not what I am looking 
> at.
 Here I will be able to help when I get reasonable understanding what is 
 the "big" plan.
>>> The "Big" plan is to optimize each vpp node for Aarch64. For now focus is 
>>> dpdk-input.
 I don't want that we end up in 6 months with cavium patches, nxp patches, 
 marvell patches, and so on.
>>> Is it a problem? If yes than I am not able to visualize it as the same 
>>> problem would exist for any architecture and not just for Aarch64.
> 
>> Is there an agreement between ARM vendors what is the targeted core
>> you want to have code tuned for or you are simply tuning to whatever
>> core Cavium uses?
> I am trying to optimize Cavium's SOC. This question is in this regard 
> only. However efforts are going on optimizing Cortex cores as well by ARM 
> community.
 What about agreeing on plan for optimising on all ARM cores, and then 
 starting doing optimisation?
>>> This is cross-company question so hard to answer but Cavium has the "big" 
>>> plan described above.
> 
> Thanks,
> Nitin
> 
> On Friday 01 June 2018 01:55 AM, Damjan Marion wrote:
>> inline...
>> -- 
>> Damjan
>>> On 31 May 2018, at 21:10, Saxena, Nitin >>  >> >> wrote:
>>> 
>>> Hi Damjan,
>>> 
>>> Answers inline.
>>> 
>>> Thanks,
>>> Nitin
>>> 
 On 01-Jun-2018, at 12:15 AM, Damjan Marion >>>  >> wrote:
 
 
 Dear Nitin,
 
 See inline….
 
 
> On 31 May 2018, at 19:59, Nitin Saxena    >> wrote:
> 
> Hi,
> 
> I am working on optimising dpdk-input node (based on vpp v1804) for 
> our target. I am able to get performance improvements on our target 
> but the problem I am finding now are:
> 
> 1) The dpdk-input code is completely changed on master branch from 
> v1804.
 
 Why is this a problem? It was done with reason and for tangible 
 benefit.
>>> This is a problem for me as I can not apply my v1804 changes directly 
>>> to the master branch. I have to again rework on master branch and 
>>> that’s why I am not able to move to master branch or v1807 in future.
>> It was hard to know that you have 

Re: [vpp-dev] Multiarch/target select for dpdk_device_input

2018-06-01 Thread Nitin Saxena
Hi Damjan,

 Now that you are aware that Cavium is working on optimisations for ARM, can I 
request that you check with us on implications for ARM(at least Cavium), before 
bringing changes in dpdk-input?

Regards,
Nitin

On 01-Jun-2018, at 21:39, Damjan Marion mailto:dmar...@me.com>> 
wrote:


Dear Nitin,

I really don't have anything else to add. It your call how do you want to 
proceed

Regards,

Damjan

On 1 Jun 2018, at 18:02, Nitin Saxena 
mailto:nitin.sax...@cavium.com>> wrote:

Hi Damjan,

Answers Inline.

Thanks,
Nitin

On Friday 01 June 2018 08:49 PM, Damjan Marion wrote:
Hi Nitin,
inline...
On 1 Jun 2018, at 15:23, Nitin Saxena 
mailto:nitin.sax...@cavium.com>> wrote:

Hi Damjan,

It was hard to know that you have subset of patches hidden somewhere.
I wouldn't say patches are hidden. We are trying to fine tune dpdk-input 
initially from our end first and later we will seek your expertise while 
upstreaming.
for me they were hidden.
Typically it makes sense to discuss such kind of changes with person >who 
"maintains" the code before starting writing the code.
Agreed. However we prefer to do internal analysis/POC first before reaching out 
to MAINTAINERS. That way we can better understand code review comments.
Perfectly fine, but then don't put blame on us for not knowing that you are 
doing something internally...
The intention was not to blame anybody but to understand modular approach in 
vpp to accommodate multi-arch(s).

Maybe, but sounds to me like we are still in guessing phase.
I wouldn't do any guess work with MAINTAINERS.

Maybe we even need different function for each ARM CPU core as they
maybe have different memory subsystem and pipeline
This is what I am looking for. Is it ok to detect our hardware natively from 
autoconf and append target specific macro to CFLAGS? And then separate function 
for our target in dpdk/device/node.c? Sorry my multi-arch select example was 
incorrect and that's not what I am looking at.
Here I will be able to help when I get reasonable understanding what is the 
"big" plan.
The "Big" plan is to optimize each vpp node for Aarch64. For now focus is 
dpdk-input.
I don't want that we end up in 6 months with cavium patches, nxp patches, 
marvell patches, and so on.
Is it a problem? If yes than I am not able to visualize it as the same problem 
would exist for any architecture and not just for Aarch64.

Is there an agreement between ARM vendors what is the targeted core
you want to have code tuned for or you are simply tuning to whatever
core Cavium uses?
I am trying to optimize Cavium's SOC. This question is in this regard only. 
However efforts are going on optimizing Cortex cores as well by ARM community.
What about agreeing on plan for optimising on all ARM cores, and then starting 
doing optimisation?
This is cross-company question so hard to answer but Cavium has the "big" plan 
described above.

Thanks,
Nitin

On Friday 01 June 2018 01:55 AM, Damjan Marion wrote:
inline...
--
Damjan
On 31 May 2018, at 21:10, Saxena, Nitin 
mailto:nitin.sax...@cavium.com> 
> wrote:

Hi Damjan,

Answers inline.

Thanks,
Nitin

On 01-Jun-2018, at 12:15 AM, Damjan Marion 
mailto:dmarion.li...@gmail.com> 
> wrote:


Dear Nitin,

See inline….


On 31 May 2018, at 19:59, Nitin Saxena 
mailto:nitin.sax...@cavium.com> 
> wrote:

Hi,

I am working on optimising dpdk-input node (based on vpp v1804) for our target. 
I am able to get performance improvements on our target but the problem I am 
finding now are:

1) The dpdk-input code is completely changed on master branch from v1804.

Why is this a problem? It was done with reason and for tangible benefit.
This is a problem for me as I can not apply my v1804 changes directly to the 
master branch. I have to again rework on master branch and that’s why I am not 
able to move to master branch or v1807 in future.
It was hard to know that you have subset of patches hidden somewhere. Typically 
it makes sense to discuss such kind of changes with person who "maintains" the 
code before starting writing the code.

Not to mention the dpdk-input master branch code do not give better numbers on 
our target as compared to v1804

Sad to hear that, good thing is, it gives better numbers on x86.
As I understand one dpdk_device_input function cannot be same for all 
architectures because if the underlying micro-architecture is different, the 
hot spots changes.
Maybe, but sounds to me like we are still in guessing phase.
Maybe we even need different function for each ARM CPU core as they maybe have 
different memory subsystem and pipeline
Is there an agreement between ARM vendors what is the targeted core you want to 
have code tuned for or you are simply tuning to whatever core Cavium uses?
I have seen dpdk-input master branch changes and on a positive notes those 
changes make sense however some codes are tuned for x86 specially Skylake. I 

Re: [vpp-dev] Multiarch/target select for dpdk_device_input

2018-06-01 Thread Damjan Marion

Dear Nitin,

I really don't have anything else to add. It your call how do you want to 
proceed

Regards,

Damjan

> On 1 Jun 2018, at 18:02, Nitin Saxena  wrote:
> 
> Hi Damjan,
> 
> Answers Inline.
> 
> Thanks,
> Nitin
> 
> On Friday 01 June 2018 08:49 PM, Damjan Marion wrote:
>> Hi Nitin,
>> inline...
>>> On 1 Jun 2018, at 15:23, Nitin Saxena  wrote:
>>> 
>>> Hi Damjan,
>>> 
 It was hard to know that you have subset of patches hidden somewhere.
>>> I wouldn't say patches are hidden. We are trying to fine tune dpdk-input 
>>> initially from our end first and later we will seek your expertise while 
>>> upstreaming.
>> for me they were hidden.
 Typically it makes sense to discuss such kind of changes with person >who 
 "maintains" the code before starting writing the code.
>>> Agreed. However we prefer to do internal analysis/POC first before reaching 
>>> out to MAINTAINERS. That way we can better understand code review comments.
>> Perfectly fine, but then don't put blame on us for not knowing that you are 
>> doing something internally...
> The intention was not to blame anybody but to understand modular approach in 
> vpp to accommodate multi-arch(s).
>>> 
 Maybe, but sounds to me like we are still in guessing phase.
>>> I wouldn't do any guess work with MAINTAINERS.
>>> 
 Maybe we even need different function for each ARM CPU core as they
 maybe have different memory subsystem and pipeline
>>> This is what I am looking for. Is it ok to detect our hardware natively 
>>> from autoconf and append target specific macro to CFLAGS? And then separate 
>>> function for our target in dpdk/device/node.c? Sorry my multi-arch select 
>>> example was incorrect and that's not what I am looking at.
>> Here I will be able to help when I get reasonable understanding what is the 
>> "big" plan.
> The "Big" plan is to optimize each vpp node for Aarch64. For now focus is 
> dpdk-input.
>> I don't want that we end up in 6 months with cavium patches, nxp patches, 
>> marvell patches, and so on.
> Is it a problem? If yes than I am not able to visualize it as the same 
> problem would exist for any architecture and not just for Aarch64.
>>> 
 Is there an agreement between ARM vendors what is the targeted core
 you want to have code tuned for or you are simply tuning to whatever
 core Cavium uses?
>>> I am trying to optimize Cavium's SOC. This question is in this regard only. 
>>> However efforts are going on optimizing Cortex cores as well by ARM 
>>> community.
>> What about agreeing on plan for optimising on all ARM cores, and then 
>> starting doing optimisation?
> This is cross-company question so hard to answer but Cavium has the "big" 
> plan described above.
>>> 
>>> Thanks,
>>> Nitin
>>> 
>>> On Friday 01 June 2018 01:55 AM, Damjan Marion wrote:
 inline...
 -- 
 Damjan
> On 31 May 2018, at 21:10, Saxena, Nitin  > wrote:
> 
> Hi Damjan,
> 
> Answers inline.
> 
> Thanks,
> Nitin
> 
>> On 01-Jun-2018, at 12:15 AM, Damjan Marion > > wrote:
>> 
>> 
>> Dear Nitin,
>> 
>> See inline….
>> 
>> 
>>> On 31 May 2018, at 19:59, Nitin Saxena >> > wrote:
>>> 
>>> Hi,
>>> 
>>> I am working on optimising dpdk-input node (based on vpp v1804) for our 
>>> target. I am able to get performance improvements on our target but the 
>>> problem I am finding now are:
>>> 
>>> 1) The dpdk-input code is completely changed on master branch from 
>>> v1804.
>> 
>> Why is this a problem? It was done with reason and for tangible benefit.
> This is a problem for me as I can not apply my v1804 changes directly to 
> the master branch. I have to again rework on master branch and that’s why 
> I am not able to move to master branch or v1807 in future.
 It was hard to know that you have subset of patches hidden somewhere. 
 Typically it makes sense to discuss such kind of changes with person who 
 "maintains" the code before starting writing the code.
>> 
>>> Not to mention the dpdk-input master branch code do not give better 
>>> numbers on our target as compared to v1804
>> 
>> Sad to hear that, good thing is, it gives better numbers on x86.
> As I understand one dpdk_device_input function cannot be same for all 
> architectures because if the underlying micro-architecture is different, 
> the hot spots changes.
 Maybe, but sounds to me like we are still in guessing phase.
 Maybe we even need different function for each ARM CPU core as they maybe 
 have different memory subsystem and pipeline
 Is there an agreement between ARM vendors what is the targeted core you 
 want to have code tuned for or you are simply tuning to whatever core 
 Cavium uses?
> I have seen dpdk-input master branch 

Re: [vpp-dev] Multiarch/target select for dpdk_device_input

2018-06-01 Thread Nitin Saxena

Hi Damjan,

Answers Inline.

Thanks,
Nitin

On Friday 01 June 2018 08:49 PM, Damjan Marion wrote:

Hi Nitin,

inline...



On 1 Jun 2018, at 15:23, Nitin Saxena  wrote:

Hi Damjan,


It was hard to know that you have subset of patches hidden somewhere.

I wouldn't say patches are hidden. We are trying to fine tune dpdk-input 
initially from our end first and later we will seek your expertise while 
upstreaming.


for me they were hidden.


Typically it makes sense to discuss such kind of changes with person >who 
"maintains" the code before starting writing the code.

Agreed. However we prefer to do internal analysis/POC first before reaching out 
to MAINTAINERS. That way we can better understand code review comments.


Perfectly fine, but then don't put blame on us for not knowing that you are 
doing something internally...
The intention was not to blame anybody but to understand modular 
approach in vpp to accommodate multi-arch(s).





Maybe, but sounds to me like we are still in guessing phase.

I wouldn't do any guess work with MAINTAINERS.


Maybe we even need different function for each ARM CPU core as they
maybe have different memory subsystem and pipeline

This is what I am looking for. Is it ok to detect our hardware natively from 
autoconf and append target specific macro to CFLAGS? And then separate function 
for our target in dpdk/device/node.c? Sorry my multi-arch select example was 
incorrect and that's not what I am looking at.


Here I will be able to help when I get reasonable understanding what is the 
"big" plan.
The "Big" plan is to optimize each vpp node for Aarch64. For now focus 
is dpdk-input.

I don't want that we end up in 6 months with cavium patches, nxp patches, 
marvell patches, and so on.
Is it a problem? If yes than I am not able to visualize it as the same 
problem would exist for any architecture and not just for Aarch64.





Is there an agreement between ARM vendors what is the targeted core
you want to have code tuned for or you are simply tuning to whatever
core Cavium uses?

I am trying to optimize Cavium's SOC. This question is in this regard only. 
However efforts are going on optimizing Cortex cores as well by ARM community.


What about agreeing on plan for optimising on all ARM cores, and then starting 
doing optimisation?
This is cross-company question so hard to answer but Cavium has the 
"big" plan described above.




Thanks,
Nitin

On Friday 01 June 2018 01:55 AM, Damjan Marion wrote:

inline...
--
Damjan

On 31 May 2018, at 21:10, Saxena, Nitin mailto:nitin.sax...@cavium.com>> wrote:

Hi Damjan,

Answers inline.

Thanks,
Nitin


On 01-Jun-2018, at 12:15 AM, Damjan Marion mailto:dmarion.li...@gmail.com>> wrote:


Dear Nitin,

See inline….



On 31 May 2018, at 19:59, Nitin Saxena mailto:nitin.sax...@cavium.com>> wrote:

Hi,

I am working on optimising dpdk-input node (based on vpp v1804) for our target. 
I am able to get performance improvements on our target but the problem I am 
finding now are:

1) The dpdk-input code is completely changed on master branch from v1804.


Why is this a problem? It was done with reason and for tangible benefit.

This is a problem for me as I can not apply my v1804 changes directly to the 
master branch. I have to again rework on master branch and that’s why I am not 
able to move to master branch or v1807 in future.

It was hard to know that you have subset of patches hidden somewhere. Typically it makes 
sense to discuss such kind of changes with person who "maintains" the code 
before starting writing the code.



Not to mention the dpdk-input master branch code do not give better numbers on 
our target as compared to v1804


Sad to hear that, good thing is, it gives better numbers on x86.

As I understand one dpdk_device_input function cannot be same for all 
architectures because if the underlying micro-architecture is different, the 
hot spots changes.

Maybe, but sounds to me like we are still in guessing phase.
Maybe we even need different function for each ARM CPU core as they maybe have 
different memory subsystem and pipeline
Is there an agreement between ARM vendors what is the targeted core you want to 
have code tuned for or you are simply tuning to whatever core Cavium uses?

I have seen dpdk-input master branch changes and on a positive notes those 
changes make sense however some codes are tuned for x86 specially Skylake. I 
was looking for some kind of  way to have mutiarch select function for the Rx 
path, like the way it’s done for tx path.

Not sure why do you need that, unless you are going to have code optimised for 
different CPU variants (i.e. Cortex-A53 and Cortex-A72) in the same binary.



2) I don’t know the modular approach I should follow to merge my changes as I 
have completely changed the quad loop handling and the prefetches order in 
dpdk-input.


I carefully tuned that code. It was multi day exercise and losing single 
clock/packet on x86 with additional modifications are not 

Re: [vpp-dev] Multiarch/target select for dpdk_device_input

2018-06-01 Thread Nitin Saxena

Hi Damjan,

> It was hard to know that you have subset of patches hidden somewhere.
I wouldn't say patches are hidden. We are trying to fine tune dpdk-input 
initially from our end first and later we will seek your expertise while 
upstreaming.


> Typically it makes sense to discuss such kind of changes with person 
>who "maintains" the code before starting writing the code.
Agreed. However we prefer to do internal analysis/POC first before 
reaching out to MAINTAINERS. That way we can better understand code 
review comments.


> Maybe, but sounds to me like we are still in guessing phase.
I wouldn't do any guess work with MAINTAINERS.

> Maybe we even need different function for each ARM CPU core as they
> maybe have different memory subsystem and pipeline
This is what I am looking for. Is it ok to detect our hardware natively 
from autoconf and append target specific macro to CFLAGS? And then 
separate function for our target in dpdk/device/node.c? Sorry my 
multi-arch select example was incorrect and that's not what I am looking at.


> Is there an agreement between ARM vendors what is the targeted core
> you want to have code tuned for or you are simply tuning to whatever
> core Cavium uses?
I am trying to optimize Cavium's SOC. This question is in this regard 
only. However efforts are going on optimizing Cortex cores as well by 
ARM community.


Thanks,
Nitin

On Friday 01 June 2018 01:55 AM, Damjan Marion wrote:

inline...
--
Damjan

On 31 May 2018, at 21:10, Saxena, Nitin > wrote:


Hi Damjan,

Answers inline.

Thanks,
Nitin

On 01-Jun-2018, at 12:15 AM, Damjan Marion > wrote:



Dear Nitin,

See inline….


On 31 May 2018, at 19:59, Nitin Saxena > wrote:


Hi,

I am working on optimising dpdk-input node (based on vpp v1804) for 
our target. I am able to get performance improvements on our target 
but the problem I am finding now are:


1) The dpdk-input code is completely changed on master branch from 
v1804.


Why is this a problem? It was done with reason and for tangible benefit.
This is a problem for me as I can not apply my v1804 changes directly 
to the master branch. I have to again rework on master branch and 
that’s why I am not able to move to master branch or v1807 in future.


It was hard to know that you have subset of patches hidden somewhere. 
Typically it makes sense to discuss such kind of changes with person who 
"maintains" the code before starting writing the code.




Not to mention the dpdk-input master branch code do not give better 
numbers on our target as compared to v1804


Sad to hear that, good thing is, it gives better numbers on x86.
As I understand one dpdk_device_input function cannot be same for all 
architectures because if the underlying micro-architecture is 
different, the hot spots changes.


Maybe, but sounds to me like we are still in guessing phase.
Maybe we even need different function for each ARM CPU core as they 
maybe have different memory subsystem and pipeline


Is there an agreement between ARM vendors what is the targeted core you 
want to have code tuned for or you are simply tuning to whatever core 
Cavium uses?



I have seen dpdk-input master branch changes and on a positive notes 
those changes make sense however some codes are tuned for x86 
specially Skylake. I was looking for some kind of  way to have 
mutiarch select function for the Rx path, like the way it’s done for 
tx path.


Not sure why do you need that, unless you are going to have code 
optimised for different CPU variants (i.e. Cortex-A53 and Cortex-A72) in 
the same binary.




2) I don’t know the modular approach I should follow to merge my 
changes as I have completely changed the quad loop handling and the 
prefetches order in dpdk-input.


I carefully tuned that code. It was multi day exercise and losing 
single clock/packet on x86 with additional modifications are not 
acceptable. Still I’m open for discussion how to address this problem.




Note: I am far away from upstreaming the code currently as my 
optimisation is still in progress. It will be better if I know the 
proper way of doing it.


I suggest that you don’t even start on working on upstreaming before 
we have deep understanding of what and why needs to be done and we 
are all in agreement.




Thanks,
Nitin





-=-=-=-=-=-=-=-=-=-=-=-
Links:

You receive all messages sent to this group.

View/Reply Online (#9492): https://lists.fd.io/g/vpp-dev/message/9492
View All Messages In Topic (5): https://lists.fd.io/g/vpp-dev/topic/20748102
Mute This Topic: https://lists.fd.io/mt/20748102/21656
New Topic: https://lists.fd.io/g/vpp-dev/post

Change Your Subscription: https://lists.fd.io/g/vpp-dev/editsub/21656
Group Home: https://lists.fd.io/g/vpp-dev
Contact Group Owner: vpp-dev+ow...@lists.fd.io
Terms of Service: https://lists.fd.io/static/tos
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub
Email sent to: 

Re: [vpp-dev] Multiarch/target select for dpdk_device_input

2018-05-31 Thread Damjan Marion
inline...
-- 
Damjan

> On 31 May 2018, at 21:10, Saxena, Nitin  wrote:
> 
> Hi Damjan,
> 
> Answers inline.
> 
> Thanks,
> Nitin
> 
>> On 01-Jun-2018, at 12:15 AM, Damjan Marion  wrote:
>> 
>> 
>> Dear Nitin,
>> 
>> See inline….
>> 
>> 
>>> On 31 May 2018, at 19:59, Nitin Saxena  wrote:
>>> 
>>> Hi,
>>> 
>>> I am working on optimising dpdk-input node (based on vpp v1804) for our 
>>> target. I am able to get performance improvements on our target but the 
>>> problem I am finding now are:
>>> 
>>> 1) The dpdk-input code is completely changed on master branch from v1804.
>> 
>> Why is this a problem? It was done with reason and for tangible benefit.
> This is a problem for me as I can not apply my v1804 changes directly to the 
> master branch. I have to again rework on master branch and that’s why I am 
> not able to move to master branch or v1807 in future. 

It was hard to know that you have subset of patches hidden somewhere. Typically 
it makes sense to discuss such kind of changes with person who "maintains" the 
code before starting writing the code.

>> 
>>> Not to mention the dpdk-input master branch code do not give better numbers 
>>> on our target as compared to v1804
>> 
>> Sad to hear that, good thing is, it gives better numbers on x86.
> As I understand one dpdk_device_input function cannot be same for all 
> architectures because if the underlying micro-architecture is different, the 
> hot spots changes.

Maybe, but sounds to me like we are still in guessing phase.
Maybe we even need different function for each ARM CPU core as they maybe have 
different memory subsystem and pipeline

Is there an agreement between ARM vendors what is the targeted core you want to 
have code tuned for or you are simply tuning to whatever core Cavium uses?


> I have seen dpdk-input master branch changes and on a positive notes those 
> changes make sense however some codes are tuned for x86 specially Skylake. I 
> was looking for some kind of  way to have mutiarch select function for the Rx 
> path, like the way it’s done for tx path.

Not sure why do you need that, unless you are going to have code optimised for 
different CPU variants (i.e. Cortex-A53 and Cortex-A72) in the same binary.

>> 
>>> 2) I don’t know the modular approach I should follow to merge my changes as 
>>> I have completely changed the quad loop handling and the prefetches order 
>>> in dpdk-input.
>> 
>> I carefully tuned that code. It was multi day exercise and losing single 
>> clock/packet on x86 with additional modifications are not acceptable. Still 
>> I’m open for discussion how to address this problem.
>> 
>>> 
>>> Note: I am far away from upstreaming the code currently as my optimisation 
>>> is still in progress. It will be better if I know the proper way of doing 
>>> it.
>> 
>> I suggest that you don’t even start on working on upstreaming before we have 
>> deep understanding of what and why needs to be done and we are all in 
>> agreement.
>> 
>>> 
>>> Thanks,
>>> Nitin
>>> 



Re: [vpp-dev] Multiarch/target select for dpdk_device_input

2018-05-31 Thread Nitin Saxena
Hi Damjan,

Answers inline.

Thanks,
Nitin

> On 01-Jun-2018, at 12:15 AM, Damjan Marion  wrote:
> 
> 
> Dear Nitin,
> 
> See inline….
> 
> 
>> On 31 May 2018, at 19:59, Nitin Saxena  wrote:
>> 
>> Hi,
>> 
>> I am working on optimising dpdk-input node (based on vpp v1804) for our 
>> target. I am able to get performance improvements on our target but the 
>> problem I am finding now are:
>> 
>> 1) The dpdk-input code is completely changed on master branch from v1804.
> 
> Why is this a problem? It was done with reason and for tangible benefit.
This is a problem for me as I can not apply my v1804 changes directly to the 
master branch. I have to again rework on master branch and that’s why I am not 
able to move to master branch or v1807 in future. 
> 
>> Not to mention the dpdk-input master branch code do not give better numbers 
>> on our target as compared to v1804
> 
> Sad to hear that, good thing is, it gives better numbers on x86.
As I understand one dpdk_device_input function cannot be same for all 
architectures because if the underlying micro-architecture is different, the 
hot spots changes. I have seen dpdk-input master branch changes and on a 
positive notes those changes make sense however some codes are tuned for x86 
specially Skylake. I was looking for some kind of  way to have mutiarch select 
function for the Rx path, like the way it’s done for tx path.
> 
>> 2) I don’t know the modular approach I should follow to merge my changes as 
>> I have completely changed the quad loop handling and the prefetches order in 
>> dpdk-input.
> 
> I carefully tuned that code. It was multi day exercise and losing single 
> clock/packet on x86 with additional modifications are not acceptable. Still 
> I’m open for discussion how to address this problem.
> 
>> 
>> Note: I am far away from upstreaming the code currently as my optimisation 
>> is still in progress. It will be better if I know the proper way of doing it.
> 
> I suggest that you don’t even start on working on upstreaming before we have 
> deep understanding of what and why needs to be done and we are all in 
> agreement.
> 
>> 
>> Thanks,
>> Nitin
>> 
>> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links:

You receive all messages sent to this group.

View/Reply Online (#9484): https://lists.fd.io/g/vpp-dev/message/9484
View All Messages In Topic (3): https://lists.fd.io/g/vpp-dev/topic/20748102
Mute This Topic: https://lists.fd.io/mt/20748102/21656
New Topic: https://lists.fd.io/g/vpp-dev/post

Change Your Subscription: https://lists.fd.io/g/vpp-dev/editsub/21656
Group Home: https://lists.fd.io/g/vpp-dev
Contact Group Owner: vpp-dev+ow...@lists.fd.io
Terms of Service: https://lists.fd.io/static/tos
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub
Email sent to: arch...@mail-archive.com
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Multiarch/target select for dpdk_device_input

2018-05-31 Thread Damjan Marion


Dear Nitin,

See inline….


> On 31 May 2018, at 19:59, Nitin Saxena  wrote:
> 
> Hi,
> 
> I am working on optimising dpdk-input node (based on vpp v1804) for our 
> target. I am able to get performance improvements on our target but the 
> problem I am finding now are:
> 
> 1) The dpdk-input code is completely changed on master branch from v1804.

Why is this a problem? It was done with reason and for tangible benefit.

> Not to mention the dpdk-input master branch code do not give better numbers 
> on our target as compared to v1804

Sad to hear that, good thing is, it gives better numbers on x86.

> 2) I don’t know the modular approach I should follow to merge my changes as I 
> have completely changed the quad loop handling and the prefetches order in 
> dpdk-input.

I carefully tuned that code. It was multi day exercise and losing single 
clock/packet on x86 with additional modifications are not acceptable. Still I’m 
open for discussion how to address this problem.

> 
> Note: I am far away from upstreaming the code currently as my optimisation is 
> still in progress. It will be better if I know the proper way of doing it.

I suggest that you don’t even start on working on upstreaming before we have 
deep understanding of what and why needs to be done and we are all in agreement.

> 
> Thanks,
> Nitin
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links:

You receive all messages sent to this group.

View/Reply Online (#9482): https://lists.fd.io/g/vpp-dev/message/9482
View All Messages In Topic (2): https://lists.fd.io/g/vpp-dev/topic/20748102
Mute This Topic: https://lists.fd.io/mt/20748102/21656
New Topic: https://lists.fd.io/g/vpp-dev/post

Change Your Subscription: https://lists.fd.io/g/vpp-dev/editsub/21656
Group Home: https://lists.fd.io/g/vpp-dev
Contact Group Owner: vpp-dev+ow...@lists.fd.io
Terms of Service: https://lists.fd.io/static/tos
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub
Email sent to: arch...@mail-archive.com
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] Multiarch/target select for dpdk_device_input

2018-05-31 Thread Nitin Saxena
Hi,

I am working on optimising dpdk-input node (based on vpp v1804) for our target. 
I am able to get performance improvements on our target but the problem I am 
finding now are:

1) The dpdk-input code is completely changed on master branch from v1804. Not 
to mention the dpdk-input master branch code do not give better numbers on our 
target as compared to v1804
2) I don’t know the modular approach I should follow to merge my changes as I 
have completely changed the quad loop handling and the prefetches order in 
dpdk-input.

Note: I am far away from upstreaming the code currently as my optimisation is 
still in progress. It will be better if I know the proper way of doing it.

Thanks,
Nitin
-=-=-=-=-=-=-=-=-=-=-=-
Links:

You receive all messages sent to this group.

View/Reply Online (#9480): https://lists.fd.io/g/vpp-dev/message/9480
View All Messages In Topic (1): https://lists.fd.io/g/vpp-dev/topic/20748102
Mute This Topic: https://lists.fd.io/mt/20748102/21656
New Topic: https://lists.fd.io/g/vpp-dev/post

Change Your Subscription: https://lists.fd.io/g/vpp-dev/editsub/21656
Group Home: https://lists.fd.io/g/vpp-dev
Contact Group Owner: vpp-dev+ow...@lists.fd.io
Terms of Service: https://lists.fd.io/static/tos
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub
Email sent to: arch...@mail-archive.com
-=-=-=-=-=-=-=-=-=-=-=-