Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread Mark Ruzindana
Thanks a lot for the quick responses John and David! I really appreciate it.

I will definitely be updating the version of Hashpipe that I currently have
on the server as well as ensure that the network tuning is good.

I'm currently using the standard "socket()" function, and a switch to
packet sockets, with the description that you gave, seems like it will
definitely be beneficial.

I also currently pin the threads to the desired cores with a "-c #" on the
command line, but thank you for mentioning it, I might have not been doing
so. The NUMA info is also very helpful. I'll make sure that the
architecture is as optimal as it should be.

Thanks again! This was very helpful and I'll update you with the progress
that I make.

Mark




On Tue, Mar 31, 2020 at 4:38 PM David MacMahon  wrote:

> Just to expand on John's excellent tips, Hashpipe does lock its shared
> memory buffers with mlock.  These buffers will have the NUMA node affinity
> of the thread that created them so be sure to pin the threads to the
> desired core or cores by preceding the thread names on the command line
> with a -c # (set thread affinity to a single core) or -m # (set thread
> affinity to multiple cores) option.  Alternatively (or additional) you can
> run the entire hashpipe process with numactl.  For example...
>
> numactl --cpunodebind=1 --membind=1 hashpipe [...]
>
> ...will restrict hashpipe and all its threads to run on NUMA node 1 and
> all memory allocations will (to the extent possible) be made within memory
> that is affiliated with NUMA node 1.  You can use various tools to find out
> which hardware is associated with which NUMA node such as "numactl
> --hardware" or "lstopo".  Hashpipe includes its own such utility:
> "hashpipe_topology.sh".
>
> On NUMA (i.e. multi-socket) systems, each PCIe slot is associated with a
> specific NUMA node.  It can be beneficial to have relevant peripherals
> (e.g. NIC and GPU) be in PCIe slots that are on the same NUMA node.
>
> Of course, if you have as single socket mainboard, then all this NUMA
> stuff is irrelevant. :P
>
> Cheers,
> Dave
>
> On Mar 31, 2020, at 15:04, John Ford  wrote:
>
>
>
> Hi Mark.  Since the newer version has a script called
> "hashpipe_irqaffinity.sh" I would think that the most expedient thing to do
> is to upgrade to the newer version.  It's likely to fix some or all of this.
>
> That said, there are a lot of things that you can check, and not only the
> irq affinity, but also make sure that your network tuning is good, that
> your network card irqs are attached to processes where the memory is local
> to that processor, and that the hashpipe threads are mapped to processor
> cores that are also local to that memory.   Sometimes it's
> counterproductive to map processes to processor cores by themselves if they
> need data that is produced by a different core that's far away, NUMA-wise.
> And lock all the memory in core with mlockall() or one of his friends.
>
> Good luck with it!
>
> John
>
>
>
>
> On Tue, Mar 31, 2020 at 12:09 PM Mark Ruzindana 
> wrote:
>
>> Hi all,
>>
>> I am fairly new to asking questions on a forum so if I need to provide
>> more details, please let me know.
>>
>> Worth noting that just as I was about to send this out, I checked and I
>> don't have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh
>> among other additions and modifications. So this might fix my problem, but
>> maybe not and someone else has more insight. I will update everyone if it
>> does.
>>
>> I am trying to reduce the number of packets lost/dropped when running
>> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and
>> diagnostics to be confident that the problem is not any HASHPIPE thread
>> running for too long. Also, the percentage of packets dropped on any given
>> scan is between about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second
>> scan with a total of 1,650,000 packets. So while it's a small percentage,
>> the number of packets lost is still quite large. I have also done enough
>> tests with 'top', 'iostat' as well as timing HASHPIPE in between time
>> windows where there are no packets dropped to diagnose the issue further. I
>> (as well as my colleagues) have come to the conclusion that the kernel is
>> allowing processes to interrupt HASHPIPE as it is running.
>>
>> So I have researched and run tests involving 'niceness' and I am
>> currently trying to configure smp affinities and irq balancing, but the
>> changes that I make to the smp_affinity files aren't doing anything. My
>> plan was to have the interrupts run on the 20 cores that aren't being used
>> by HASHPIPE. Also, disabling 'irqbalance' didn't do anything either. I also
>> restarted the machine to see whether the changes made are permanent, but
>> the system reverts back to what it was.
>>
>> I might be missing something, or trying the wrong things. Has anyone
>> experienced this? And could you point me in the right direction if you have
>> any insight?
>>
>> If 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread David MacMahon
Just to expand on John's excellent tips, Hashpipe does lock its shared memory 
buffers with mlock.  These buffers will have the NUMA node affinity of the 
thread that created them so be sure to pin the threads to the desired core or 
cores by preceding the thread names on the command line with a -c # (set thread 
affinity to a single core) or -m # (set thread affinity to multiple cores) 
option.  Alternatively (or additional) you can run the entire hashpipe process 
with numactl.  For example...

numactl --cpunodebind=1 --membind=1 hashpipe [...]

...will restrict hashpipe and all its threads to run on NUMA node 1 and all 
memory allocations will (to the extent possible) be made within memory that is 
affiliated with NUMA node 1.  You can use various tools to find out which 
hardware is associated with which NUMA node such as "numactl --hardware" or 
"lstopo".  Hashpipe includes its own such utility: "hashpipe_topology.sh".

On NUMA (i.e. multi-socket) systems, each PCIe slot is associated with a 
specific NUMA node.  It can be beneficial to have relevant peripherals (e.g. 
NIC and GPU) be in PCIe slots that are on the same NUMA node.

Of course, if you have as single socket mainboard, then all this NUMA stuff is 
irrelevant. :P

Cheers,
Dave

> On Mar 31, 2020, at 15:04, John Ford  wrote:
> 
> 
> 
> Hi Mark.  Since the newer version has a script called 
> "hashpipe_irqaffinity.sh" I would think that the most expedient thing to do 
> is to upgrade to the newer version.  It's likely to fix some or all of this.
> 
> That said, there are a lot of things that you can check, and not only the irq 
> affinity, but also make sure that your network tuning is good, that your 
> network card irqs are attached to processes where the memory is local to that 
> processor, and that the hashpipe threads are mapped to processor cores that 
> are also local to that memory.   Sometimes it's counterproductive to map 
> processes to processor cores by themselves if they need data that is produced 
> by a different core that's far away, NUMA-wise.  And lock all the memory in 
> core with mlockall() or one of his friends.
> 
> Good luck with it!
> 
> John
> 
> 
> 
> 
> On Tue, Mar 31, 2020 at 12:09 PM Mark Ruzindana  > wrote:
> Hi all,
> 
> I am fairly new to asking questions on a forum so if I need to provide more 
> details, please let me know. 
> 
> Worth noting that just as I was about to send this out, I checked and I don't 
> have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh among 
> other additions and modifications. So this might fix my problem, but maybe 
> not and someone else has more insight. I will update everyone if it does.
> 
> I am trying to reduce the number of packets lost/dropped when running 
> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and diagnostics 
> to be confident that the problem is not any HASHPIPE thread running for too 
> long. Also, the percentage of packets dropped on any given scan is between 
> about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second scan with a total of 
> 1,650,000 packets. So while it's a small percentage, the number of packets 
> lost is still quite large. I have also done enough tests with 'top', 'iostat' 
> as well as timing HASHPIPE in between time windows where there are no packets 
> dropped to diagnose the issue further. I (as well as my colleagues) have come 
> to the conclusion that the kernel is allowing processes to interrupt HASHPIPE 
> as it is running. 
> 
> So I have researched and run tests involving 'niceness' and I am currently 
> trying to configure smp affinities and irq balancing, but the changes that I 
> make to the smp_affinity files aren't doing anything. My plan was to have the 
> interrupts run on the 20 cores that aren't being used by HASHPIPE. Also, 
> disabling 'irqbalance' didn't do anything either. I also restarted the 
> machine to see whether the changes made are permanent, but the system reverts 
> back to what it was.
> 
> I might be missing something, or trying the wrong things. Has anyone 
> experienced this? And could you point me in the right direction if you have 
> any insight?
> 
> If you need anymore details, please let me know. I didn't add as much as I 
> could because I wanted this to be a reasonably sized message.
> 
> Thanks,
> 
> Mark Ruzindana
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu " group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxcwSQT-EsjuyqXpGmmBzykDeLt6JbfUUg_ZYpkXyat2w%40mail.gmail.com
>  
> 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread David MacMahon
Hi, Mark,

That packet rate should be very manageable.  Are you using the standard 
"socket()" and "recv()" functions or are you using packet sockets?  Packet 
sockets are a more efficient way to get packets from the kernel that bypasses 
the kernel's IP stack.  It's not as efficient as IBVerbs or DPDK, but it is 
widely supported and should be more than adequate for the packet/data rates you 
are dealing with.  Hashpipe has functions that make it easy to work with packet 
sockets by providing a somewhat higher level interface to them.  If your 
version of Hashpipe doesn't have a "hashpipe_pktsock.h" then you should update 
for sure.

HTH,
Dave

> On Mar 31, 2020, at 12:09, Mark Ruzindana  wrote:
> 
> Hi all,
> 
> I am fairly new to asking questions on a forum so if I need to provide more 
> details, please let me know. 
> 
> Worth noting that just as I was about to send this out, I checked and I don't 
> have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh among 
> other additions and modifications. So this might fix my problem, but maybe 
> not and someone else has more insight. I will update everyone if it does.
> 
> I am trying to reduce the number of packets lost/dropped when running 
> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and diagnostics 
> to be confident that the problem is not any HASHPIPE thread running for too 
> long. Also, the percentage of packets dropped on any given scan is between 
> about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second scan with a total of 
> 1,650,000 packets. So while it's a small percentage, the number of packets 
> lost is still quite large. I have also done enough tests with 'top', 'iostat' 
> as well as timing HASHPIPE in between time windows where there are no packets 
> dropped to diagnose the issue further. I (as well as my colleagues) have come 
> to the conclusion that the kernel is allowing processes to interrupt HASHPIPE 
> as it is running. 
> 
> So I have researched and run tests involving 'niceness' and I am currently 
> trying to configure smp affinities and irq balancing, but the changes that I 
> make to the smp_affinity files aren't doing anything. My plan was to have the 
> interrupts run on the 20 cores that aren't being used by HASHPIPE. Also, 
> disabling 'irqbalance' didn't do anything either. I also restarted the 
> machine to see whether the changes made are permanent, but the system reverts 
> back to what it was.
> 
> I might be missing something, or trying the wrong things. Has anyone 
> experienced this? And could you point me in the right direction if you have 
> any insight?
> 
> If you need anymore details, please let me know. I didn't add as much as I 
> could because I wanted this to be a reasonably sized message.
> 
> Thanks,
> 
> Mark Ruzindana
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxcwSQT-EsjuyqXpGmmBzykDeLt6JbfUUg_ZYpkXyat2w%40mail.gmail.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/C3EFBF88-75DA-477C-A28A-D3235996E0FB%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread John Ford
Hi Mark.  Since the newer version has a script called
"hashpipe_irqaffinity.sh" I would think that the most expedient thing to do
is to upgrade to the newer version.  It's likely to fix some or all of this.

That said, there are a lot of things that you can check, and not only the
irq affinity, but also make sure that your network tuning is good, that
your network card irqs are attached to processes where the memory is local
to that processor, and that the hashpipe threads are mapped to processor
cores that are also local to that memory.   Sometimes it's
counterproductive to map processes to processor cores by themselves if they
need data that is produced by a different core that's far away, NUMA-wise.
And lock all the memory in core with mlockall() or one of his friends.

Good luck with it!

John




On Tue, Mar 31, 2020 at 12:09 PM Mark Ruzindana  wrote:

> Hi all,
>
> I am fairly new to asking questions on a forum so if I need to provide
> more details, please let me know.
>
> Worth noting that just as I was about to send this out, I checked and I
> don't have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh
> among other additions and modifications. So this might fix my problem, but
> maybe not and someone else has more insight. I will update everyone if it
> does.
>
> I am trying to reduce the number of packets lost/dropped when running
> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and
> diagnostics to be confident that the problem is not any HASHPIPE thread
> running for too long. Also, the percentage of packets dropped on any given
> scan is between about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second
> scan with a total of 1,650,000 packets. So while it's a small percentage,
> the number of packets lost is still quite large. I have also done enough
> tests with 'top', 'iostat' as well as timing HASHPIPE in between time
> windows where there are no packets dropped to diagnose the issue further. I
> (as well as my colleagues) have come to the conclusion that the kernel is
> allowing processes to interrupt HASHPIPE as it is running.
>
> So I have researched and run tests involving 'niceness' and I am currently
> trying to configure smp affinities and irq balancing, but the changes that
> I make to the smp_affinity files aren't doing anything. My plan was to have
> the interrupts run on the 20 cores that aren't being used by HASHPIPE.
> Also, disabling 'irqbalance' didn't do anything either. I also restarted
> the machine to see whether the changes made are permanent, but the system
> reverts back to what it was.
>
> I might be missing something, or trying the wrong things. Has anyone
> experienced this? And could you point me in the right direction if you have
> any insight?
>
> If you need anymore details, please let me know. I didn't add as much as I
> could because I wanted this to be a reasonably sized message.
>
> Thanks,
>
> Mark Ruzindana
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxcwSQT-EsjuyqXpGmmBzykDeLt6JbfUUg_ZYpkXyat2w%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CABmH8B_4MoNDsO4yZNYH608u6DVtbSPkKz0YBS8%2Bb%3DffqS%3DwaA%40mail.gmail.com.


[casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread Mark Ruzindana
Hi all,

I am fairly new to asking questions on a forum so if I need to provide more
details, please let me know.

Worth noting that just as I was about to send this out, I checked and I
don't have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh
among other additions and modifications. So this might fix my problem, but
maybe not and someone else has more insight. I will update everyone if it
does.

I am trying to reduce the number of packets lost/dropped when running
HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and
diagnostics to be confident that the problem is not any HASHPIPE thread
running for too long. Also, the percentage of packets dropped on any given
scan is between about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second
scan with a total of 1,650,000 packets. So while it's a small percentage,
the number of packets lost is still quite large. I have also done enough
tests with 'top', 'iostat' as well as timing HASHPIPE in between time
windows where there are no packets dropped to diagnose the issue further. I
(as well as my colleagues) have come to the conclusion that the kernel is
allowing processes to interrupt HASHPIPE as it is running.

So I have researched and run tests involving 'niceness' and I am currently
trying to configure smp affinities and irq balancing, but the changes that
I make to the smp_affinity files aren't doing anything. My plan was to have
the interrupts run on the 20 cores that aren't being used by HASHPIPE.
Also, disabling 'irqbalance' didn't do anything either. I also restarted
the machine to see whether the changes made are permanent, but the system
reverts back to what it was.

I might be missing something, or trying the wrong things. Has anyone
experienced this? And could you point me in the right direction if you have
any insight?

If you need anymore details, please let me know. I didn't add as much as I
could because I wanted this to be a reasonably sized message.

Thanks,

Mark Ruzindana

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxcwSQT-EsjuyqXpGmmBzykDeLt6JbfUUg_ZYpkXyat2w%40mail.gmail.com.