Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-12-18 Thread David MacMahon
Hi, Mark,

Glad to hear that your segfault issue has gone away :) even though it sounds 
frustrating not to understand why :(.  Here are some additional responses for 
you:

> On Dec 15, 2020, at 21:00, Mark Ruzindana  wrote:
> 
> I'm taking note of the following change for documentation purposes. It's not 
> the reason for my issue. Feel free to ignore or comment on it. This change 
> was made before and remained after I observed the segfault issue. To flush 
> the packets in the port before the thread is run, I am using 
> "p_frame=hashpipe_pktsock_recv_udp_frame_nonblock(p_ps, bindport)" instead of 
> "p_frame=hashpipe_pktsock_recv_frame_nonblock(p_ps, bindport)" in the while 
> loop, otherwise, there's an infinite loop because there are packets with 
> other protocols constantly being captured by the port. 

Looping until hashpipe_recv_udp_frame_nonblock() returns NULL will only discard 
the initial UDP packets and one non-UDP packet.  What sort of packet rate is 
the interface receiving?  I find it hard to imagine packets being received so 
fast that the discard loop never completes.

> Okay, so now, I'm still experiencing dropped packets. Given a kernel page 
> size of 4096 bytes and a frame size of 16384 bytes, I have tried buffer 
> parameters ranging from, 480 to 128000 total number of frames and 60 to 1000 
> blocks respectively. With improvements in throughput in one instance, but not 
> the other three that I have running. The one instance with improvements, on 
> the upper end of that range, exceeds the number of packets expected in a 
> hashpipe shared memory buffer block (the ring buffers in between threads), 
> but only for about four or so of them at the very beginning of a scan. No 
> dropped packets for the rest of the scan. While the other instances, with no 
> recognizable improvements, drop packets through out the scan with one of them 
> dropping significantly more than the other two.

If you are you running four instances on the same host, do they each bind to a 
different interface?  Multiple instances binding to the same interface is not 
going improve performance because each instance will receive copies of all 
packets that arrive at the interface.  This is almost certainly not what you 
want.  The way the packet socket buffer is specified by frames and blocks is a 
bit unusual and the rationale for it could be better explained in the kernel 
docs IMHO.

> I'm currently trying a few things to debug this, but I figured that I would 
> ask sooner rather than later. Is there a configuration or step that I may 
> have missed in the implementation of packet sockets? My understanding is that 
> it should handle my current data rates with no problem. So with multiple 
> instances running (four in my case), I should be able to capture data with 0 
> dropped packets (100% data throughput).

What is the incoming packet rate and data rate?  What packet and data rate are 
your instances achieving?  Obviously both the latter has to be higher than both 
the former or things won't work.

>  Just a note, with a packet size of 8168 bytes, and a frame size of 8192 
> bytes, hashpipe was crashing, but in a completely unrelated way to how it did 
> before. It was not a segfault after capturing the exact number of packets 
> that correspond to the number of frames in the packet socket ring buffer as I 
> described in previous emails. The crashes were more inconsistent and I think 
> it's because the frame size needs to be considerably larger than the packet 
> size. An order of 2 seemed to be enough. I currently have the frame size set 
> to 16384 (also a multiple of the kernel page size), and do not have an issue 
> with hashpipe crashing.

The frame size used when sizing the buffers needs to be large enough to hold 
the entire packet (including network headers) plus TPACKET_HDRLEN.  A 
frame_size of 8192 bytes and a packet size of 8168 bytes leaves just 24 bytes, 
which is definitely less than TPACKET_HDRLEN.  You could probably use 12,288 
bytes (3*4096) instead of 16,384 for a frame size if you really want/need to 
minimize memory usage.  I'm not sure what happens if the frame size is not 
large enough.  At best the packets will get truncated, but that's still not 
good.

Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/8EE2FE4A-8710-4E40-B25F-299B3466C8D6%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-12-15 Thread Mark Ruzindana
Also, I tried to condense/summarize the issue so if you would like
additional details, please feel free to ask and I'll provide them.

Thanks again,

Mark Ruzindana

On Tue, Dec 15, 2020 at 10:00 PM Mark Ruzindana  wrote:

> Hi all,
>
> While running hashpipe with the intention of debugging using gdb as
> suggested, I failed to replicate my segfault issue. On one hand, it should
> have been working given what I understand about the packet socket
> implementation and the way that I wrote the code, but on the other, I don't
> know why it works now, and not before because I didn't make any changes
> between runs. It's a stretch, but there were a few reboots and improvements
> in cable organization within the rack, but that's about it.
>
> I'm taking note of the following change for documentation purposes. It's
> not the reason for my issue. Feel free to ignore or comment on it. This
> change was made before and remained after I observed the segfault issue. To
> flush the packets in the port before the thread is run, I am using "
> p_frame=hashpipe_pktsock_recv_udp_frame_nonblock(p_ps, bindport)" instead
> of "p_frame=hashpipe_pktsock_recv_frame_nonblock(p_ps, bindport)" in the
> while loop, otherwise, there's an infinite loop because there are packets
> with other protocols constantly being captured by the port.
>
> I'm hoping I figure out what change was made as I am debugging the rest of
> this, but for now the specific segfault that I was having is no longer an
> issue. It's unsatisfying and I'll come back to it if I don't figure it out
> as I go, but for now, I'm moving on.
>
> Okay, so now, I'm still experiencing dropped packets. Given a kernel page
> size of 4096 bytes and a frame size of 16384 bytes, I have tried buffer
> parameters ranging from, 480 to 128000 total number of frames and 60 to
> 1000 blocks respectively. With improvements in throughput in one instance,
> but not the other three that I have running. The one instance with
> improvements, on the upper end of that range, exceeds the number of packets
> expected in a hashpipe shared memory buffer block (the ring buffers in
> between threads), but only for about four or so of them at the very
> beginning of a scan. No dropped packets for the rest of the scan. While the
> other instances, with no recognizable improvements, drop packets through
> out the scan with one of them dropping significantly more than the other
> two.
>
> I'm currently trying a few things to debug this, but I figured that I
> would ask sooner rather than later. Is there a configuration or step that I
> may have missed in the implementation of packet sockets? My understanding
> is that it should handle my current data rates with no problem. So with
> multiple instances running (four in my case), I should be able to capture
> data with 0 dropped packets (100% data throughput).
>
> Just a note, with a packet size of 8168 bytes, and a frame size of 8192
> bytes, hashpipe was crashing, but in a completely unrelated way to how it
> did before. It was *not* a segfault after capturing the exact number of
> packets that correspond to the number of frames in the packet socket ring
> buffer as I described in previous emails. The crashes were more
> inconsistent and I think it's because the frame size needs to be
> considerably larger than the packet size. An order of 2 seemed to be
> enough. I currently have the frame size set to 16384 (also a multiple of
> the kernel page size), and do not have an issue with hashpipe crashing.
>
> Let me know if you have any thoughts and suggestions. I really appreciate
> the help.
>
> Thanks,
>
> Mark Ruzindana
>
> On Thu, Dec 3, 2020 at 11:16 AM Mark Ruzindana 
> wrote:
>
>> Thanks for the suggestion David!
>>
>> I was starting hashpipe in the debugger. I'll use gdb and the core file,
>> and let you know what I find. If I still can't figure out the problem, I
>> will send you a minimum non-working example. I definitely think it's some
>> sort of pointer arithmetic error as well, I just can't see it yet. I really
>> appreciate the help.
>>
>> Thanks again,
>>
>> Mark
>>
>> On Thu, Dec 3, 2020 at 1:30 AM David MacMahon 
>> wrote:
>>
>>> Hi, Mark,
>>>
>>> Sorry to hear you're still getting a segfault.  It sounds like you made
>>> some progress with gdb, but the fact that you ended up with a different
>>> sort of error suggests that you were starting hashpipe in the debugger.  To
>>> debug your initial segfault problem, you can run hashpipe without the
>>> debugger, let it segfault and generate a core file, then use gdb and the
>>> core file (and hashpipe) to examine the state of the program when the
>>> segfault occurred.  The tricky part is getting the core file to be
>>> generated on a segfault.  You typically have to increase the core file size
>>> limit using "ulimit -c unlimited" and (because hashpipe is typically
>>> installed with the suid bit set) you have to let the kernel know it's OK to
>>> dump core files for suid programs using "sudo 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-12-15 Thread Mark Ruzindana
Hi all,

While running hashpipe with the intention of debugging using gdb as
suggested, I failed to replicate my segfault issue. On one hand, it should
have been working given what I understand about the packet socket
implementation and the way that I wrote the code, but on the other, I don't
know why it works now, and not before because I didn't make any changes
between runs. It's a stretch, but there were a few reboots and improvements
in cable organization within the rack, but that's about it.

I'm taking note of the following change for documentation purposes. It's
not the reason for my issue. Feel free to ignore or comment on it. This
change was made before and remained after I observed the segfault issue. To
flush the packets in the port before the thread is run, I am using "p_frame=
hashpipe_pktsock_recv_udp_frame_nonblock(p_ps, bindport)" instead of "
p_frame=hashpipe_pktsock_recv_frame_nonblock(p_ps, bindport)" in the while
loop, otherwise, there's an infinite loop because there are packets with
other protocols constantly being captured by the port.

I'm hoping I figure out what change was made as I am debugging the rest of
this, but for now the specific segfault that I was having is no longer an
issue. It's unsatisfying and I'll come back to it if I don't figure it out
as I go, but for now, I'm moving on.

Okay, so now, I'm still experiencing dropped packets. Given a kernel page
size of 4096 bytes and a frame size of 16384 bytes, I have tried buffer
parameters ranging from, 480 to 128000 total number of frames and 60 to
1000 blocks respectively. With improvements in throughput in one instance,
but not the other three that I have running. The one instance with
improvements, on the upper end of that range, exceeds the number of packets
expected in a hashpipe shared memory buffer block (the ring buffers in
between threads), but only for about four or so of them at the very
beginning of a scan. No dropped packets for the rest of the scan. While the
other instances, with no recognizable improvements, drop packets through
out the scan with one of them dropping significantly more than the other
two.

I'm currently trying a few things to debug this, but I figured that I would
ask sooner rather than later. Is there a configuration or step that I may
have missed in the implementation of packet sockets? My understanding is
that it should handle my current data rates with no problem. So with
multiple instances running (four in my case), I should be able to capture
data with 0 dropped packets (100% data throughput).

Just a note, with a packet size of 8168 bytes, and a frame size of 8192
bytes, hashpipe was crashing, but in a completely unrelated way to how it
did before. It was *not* a segfault after capturing the exact number of
packets that correspond to the number of frames in the packet socket ring
buffer as I described in previous emails. The crashes were more
inconsistent and I think it's because the frame size needs to be
considerably larger than the packet size. An order of 2 seemed to be
enough. I currently have the frame size set to 16384 (also a multiple of
the kernel page size), and do not have an issue with hashpipe crashing.

Let me know if you have any thoughts and suggestions. I really appreciate
the help.

Thanks,

Mark Ruzindana

On Thu, Dec 3, 2020 at 11:16 AM Mark Ruzindana  wrote:

> Thanks for the suggestion David!
>
> I was starting hashpipe in the debugger. I'll use gdb and the core file,
> and let you know what I find. If I still can't figure out the problem, I
> will send you a minimum non-working example. I definitely think it's some
> sort of pointer arithmetic error as well, I just can't see it yet. I really
> appreciate the help.
>
> Thanks again,
>
> Mark
>
> On Thu, Dec 3, 2020 at 1:30 AM David MacMahon  wrote:
>
>> Hi, Mark,
>>
>> Sorry to hear you're still getting a segfault.  It sounds like you made
>> some progress with gdb, but the fact that you ended up with a different
>> sort of error suggests that you were starting hashpipe in the debugger.  To
>> debug your initial segfault problem, you can run hashpipe without the
>> debugger, let it segfault and generate a core file, then use gdb and the
>> core file (and hashpipe) to examine the state of the program when the
>> segfault occurred.  The tricky part is getting the core file to be
>> generated on a segfault.  You typically have to increase the core file size
>> limit using "ulimit -c unlimited" and (because hashpipe is typically
>> installed with the suid bit set) you have to let the kernel know it's OK to
>> dump core files for suid programs using "sudo sysctl -w fs.suid_dumpable=1"
>> (or maybe 2 if 1 doesn't quite do it).  You can read more about these steps
>> with "help ulimit" (ulimit is a bash builtin) and "man 5 proc".
>>
>> Once you have the core file (typically named "core" but it may have a
>> numeric extension from the PID of the crashing process) you can debug
>> things with "gbd /path/to/hashpipe 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-12-03 Thread Mark Ruzindana
Thanks for the suggestion David!

I was starting hashpipe in the debugger. I'll use gdb and the core file,
and let you know what I find. If I still can't figure out the problem, I
will send you a minimum non-working example. I definitely think it's some
sort of pointer arithmetic error as well, I just can't see it yet. I really
appreciate the help.

Thanks again,

Mark

On Thu, Dec 3, 2020 at 1:30 AM David MacMahon  wrote:

> Hi, Mark,
>
> Sorry to hear you're still getting a segfault.  It sounds like you made
> some progress with gdb, but the fact that you ended up with a different
> sort of error suggests that you were starting hashpipe in the debugger.  To
> debug your initial segfault problem, you can run hashpipe without the
> debugger, let it segfault and generate a core file, then use gdb and the
> core file (and hashpipe) to examine the state of the program when the
> segfault occurred.  The tricky part is getting the core file to be
> generated on a segfault.  You typically have to increase the core file size
> limit using "ulimit -c unlimited" and (because hashpipe is typically
> installed with the suid bit set) you have to let the kernel know it's OK to
> dump core files for suid programs using "sudo sysctl -w fs.suid_dumpable=1"
> (or maybe 2 if 1 doesn't quite do it).  You can read more about these steps
> with "help ulimit" (ulimit is a bash builtin) and "man 5 proc".
>
> Once you have the core file (typically named "core" but it may have a
> numeric extension from the PID of the crashing process) you can debug
> things with "gbd /path/to/hashpipe /path/to/core/file".  Note that the core
> file may be created with permissions that only let root read it, so you
> might have to "sudo chown a+r core" or similar to get read access to it.
> This starts the debugger in a a sort of forensic mode using the core file
> as a snapshot of the process and its memory space at the time of the
> segfault.  You can use "info threads" to see which threads existed, "thread
> N" to switch between threads (N is a thread number as shown by "info
> threads"), "bt" to see the function call backtrace fo the current thread,
> and "frame N" to switch to a specific frame in the function call
> backtrace.  Once you zero in on which part of your code was executing when
> the segfault occurred you can examine variables to see what exactly caused
> the segfault to occur.  You might find that the "interesting" or "relevant"
> variables have been optimized away, so you may want/need to recompile with
> a lower optimization level (e.g. -O1 or maybe even -O0?) to prevent that
> from happening.
>
> Because this happens when you reach the end of your data buffer, I have to
> think it's a pointer arithmetic error of some sort.  If you can't figure
> out the problem from the core file, please create a "minimum working
> example" (well, in this case I guess a minimum non-working example),
> including a dummy packet generator script that creates suitable packets,
> and I'll see if I can recreate the problem.
>
> HTH,
> Dave
>
> On Nov 30, 2020, at 14:45, Mark Ruzindana  wrote:
>
> 'm currently using gdb to debug and it either tells me that I have a
> segmentation fault at the memcpy() in process_packet() or something very
> strange happens where the starting mcnt of a block greatly exceeds the mcnt
> corresponding to the packet being processed and there's no segmentation
> fault because the mcnt distance becomes negative so the memcpy() is
> skipped. Hopefully that wasn't too hard to track. Very strange problem that
> only occurs with gdb and not when I run hashpipe without it. Without gdb, I
> get the same segmentation fault at the end of the circular buffer as
> mentioned above.
>
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AC9534AD-390F-44D8-ABFE-8AE76F059957%40berkeley.edu
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpyphTtDGtJ%3DaremL1gB1atqGOPkDfKFJxR216TJZD5ivg%40mail.gmail.com.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-12-03 Thread David MacMahon
Hi, Mark,

Sorry to hear you're still getting a segfault.  It sounds like you made some 
progress with gdb, but the fact that you ended up with a different sort of 
error suggests that you were starting hashpipe in the debugger.  To debug your 
initial segfault problem, you can run hashpipe without the debugger, let it 
segfault and generate a core file, then use gdb and the core file (and 
hashpipe) to examine the state of the program when the segfault occurred.  The 
tricky part is getting the core file to be generated on a segfault.  You 
typically have to increase the core file size limit using "ulimit -c unlimited" 
and (because hashpipe is typically installed with the suid bit set) you have to 
let the kernel know it's OK to dump core files for suid programs using "sudo 
sysctl -w fs.suid_dumpable=1" (or maybe 2 if 1 doesn't quite do it).  You can 
read more about these steps with "help ulimit" (ulimit is a bash builtin) and 
"man 5 proc".

Once you have the core file (typically named "core" but it may have a numeric 
extension from the PID of the crashing process) you can debug things with "gbd 
/path/to/hashpipe /path/to/core/file".  Note that the core file may be created 
with permissions that only let root read it, so you might have to "sudo chown 
a+r core" or similar to get read access to it.  This starts the debugger in a a 
sort of forensic mode using the core file as a snapshot of the process and its 
memory space at the time of the segfault.  You can use "info threads" to see 
which threads existed, "thread N" to switch between threads (N is a thread 
number as shown by "info threads"), "bt" to see the function call backtrace fo 
the current thread, and "frame N" to switch to a specific frame in the function 
call backtrace.  Once you zero in on which part of your code was executing when 
the segfault occurred you can examine variables to see what exactly caused the 
segfault to occur.  You might find that the "interesting" or "relevant" 
variables have been optimized away, so you may want/need to recompile with a 
lower optimization level (e.g. -O1 or maybe even -O0?) to prevent that from 
happening.

Because this happens when you reach the end of your data buffer, I have to 
think it's a pointer arithmetic error of some sort.  If you can't figure out 
the problem from the core file, please create a "minimum working example" 
(well, in this case I guess a minimum non-working example), including a dummy 
packet generator script that creates suitable packets, and I'll see if I can 
recreate the problem.

HTH,
Dave

> On Nov 30, 2020, at 14:45, Mark Ruzindana  wrote:
> 
> 'm currently using gdb to debug and it either tells me that I have a 
> segmentation fault at the memcpy() in process_packet() or something very 
> strange happens where the starting mcnt of a block greatly exceeds the mcnt 
> corresponding to the packet being processed and there's no segmentation fault 
> because the mcnt distance becomes negative so the memcpy() is skipped. 
> Hopefully that wasn't too hard to track. Very strange problem that only 
> occurs with gdb and not when I run hashpipe without it. Without gdb, I get 
> the same segmentation fault at the end of the circular buffer as mentioned 
> above.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AC9534AD-390F-44D8-ABFE-8AE76F059957%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-12-02 Thread Mark Ruzindana
Thanks for the response John!

Yes, I have ensured that those numbers add up. I did it a little while ago,
but I just verified it to make sure. The PKT_UDP_SIZE macro includes both
the UDP header as well as our packet header (each 8 bytes) which we do not
want included when copying the payload hence the '-16'. And since
PKT_UDP_DATA excludes both the IP and UDP header (0x1c or 28 bytes, with
the IP header size of 20 bytes and UDP header size of 8 bytes) then when
copying the payload, we also need to exclude our header before copying the
data as follows:

uint8_t * dest_p = db->block[pkt_block_idx].data + flag_input_e_databuf_idx(
binfo->m, binfo->f, 0, 0, 0);

const uint8_t * payload_p = (uint8_t *)(PKT_UDP_DATA(p_frame)+8); // Ignore
header

// Copy data into buffer
memcpy(dest_p, payload_p, PKT_UDP_SIZE(p_frame)-16); // Ignore UDP header
(8 bytes) and packet header (8 bytes)

where 'p_frame' is an unsigned char *.

And I'm fairly confident that the offsets are correct, I've checked them
fairly thoroughly, but then again, there is still an issue so it's possible
that I missed something.

I've only included snippets of where the segmentation fault occurs for
simplicity since the code is quite dense, but I'm more than happy to share
more. I'm not entirely sure what I could share other than what I already
have to paint a better picture. Let me know what I could be missing if you
can.

Thanks again,

Mark Ruzindana

On Mon, Nov 30, 2020 at 4:54 PM John Ford  wrote:

> Hi Mark.  Spelunking through the hashpipe_pktsock.h  header file I see
> that
>
> #define PKT_UDP_DATA(p) (PKT_NET(p) + 0x1c)
>
> In your code you posted earlier, you have this:
>
> memcpy(dest_p, payload, PKT_UDP_SIZE(frame) - 16)  // Ignore both UDP (8
> bytes) and packet header (8 bytes)
>
> Have you verified that all these magic numbers add up, that is, the 16 and
> the 0x1c, and other constants such as these?  It seems clear from your
> description that you are trying to read from unallocated memory, but it's
> difficult to see where from the snippets of code we have.  Also, make sure
> that any pointer arithmetic uses the correct casts before adding the
> offsets.
>
>
>
> On Mon, Nov 30, 2020 at 3:45 PM Mark Ruzindana 
> wrote:
>
>> Hi David,
>>
>> Hope everything is fine. It's okay if you haven't seen it yet or forgot,
>> but I'm still struggling with this issue. Would you mind giving me some
>> thoughts on it if you have any? Here is the issue again, along with a
>> summary of what I did to catch you up, just in case you need it:
>>
>> I was able to install hashpipe with the suid bit set as you suggested
>> previously. So far, I have been able to capture data with the first round
>> of frames of the circular buffer i.e. if I have 160 frames, I am able to
>> capture packets of frames 0 to 159 at which point right at the memcpy()
>> in the process_packet() function of the net thread, I get a segmentation
>> fault.
>>
>> And the suggestions that you provided were very helpful with diagnosis,
>> but the problem hasn't been resolved yet.
>>
>> I'm currently using gdb to debug and it either tells me that I have a
>> segmentation fault at the memcpy() in process_packet() or something very
>> strange happens where the starting mcnt of a block greatly exceeds the mcnt
>> corresponding to the packet being processed and there's no segmentation
>> fault because the mcnt distance becomes negative so the memcpy() is
>> skipped. Hopefully that wasn't too hard to track. Very strange problem that
>> only occurs with gdb and not when I run hashpipe without it. Without gdb, I
>> get the same segmentation fault at the end of the circular buffer as
>> mentioned above.
>>
>> I also omitted the "+ input_databuf_idx(...)" to test for buffer
>> overflow, and the same result (segmentation fault).
>>
>> I checked to make sure that the blocks are large enough for the number of
>> frames. Right now, I have 480 total frames and 60 blocks so 8 frames per
>> block. And my frame size (8192) is a multiple of the kernel page size
>> (4096). I've also tried frame sizes 4096, and 16384 with the same results.
>>
>> I tried using 'hashpipe_dump_databuf -b "block number"' and I see binary
>> symbols in stdout regardless of what values I put in memset(). So that part
>> wasn't as helpful with diagnosis as I'd hoped.
>>
>> I should also mention that there is data being received on the same
>> interface from other ports, but the code ignores data from them as far as I
>> can tell, and only captures/processes data from the user suggested port.
>> But maybe somehow it's causing these issues and I'm not able to see how.
>>
>> As a test, I also tried removing the release_frame() function after
>> process_packet() is called and I got the same segmentation fault. So I
>> still think there's something about the implementation of the
>> release_frame() function that I'm not doing or it's not releasing the
>> frame. I'm not sure.
>>
>> I appreciate any feedback. I'll respond ASAP if you 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-11-30 Thread John Ford
Hi Mark.  Spelunking through the hashpipe_pktsock.h  header file I see that

#define PKT_UDP_DATA(p) (PKT_NET(p) + 0x1c)

In your code you posted earlier, you have this:

memcpy(dest_p, payload, PKT_UDP_SIZE(frame) - 16)  // Ignore both UDP (8
bytes) and packet header (8 bytes)

Have you verified that all these magic numbers add up, that is, the 16 and
the 0x1c, and other constants such as these?  It seems clear from your
description that you are trying to read from unallocated memory, but it's
difficult to see where from the snippets of code we have.  Also, make sure
that any pointer arithmetic uses the correct casts before adding the
offsets.



On Mon, Nov 30, 2020 at 3:45 PM Mark Ruzindana  wrote:

> Hi David,
>
> Hope everything is fine. It's okay if you haven't seen it yet or forgot,
> but I'm still struggling with this issue. Would you mind giving me some
> thoughts on it if you have any? Here is the issue again, along with a
> summary of what I did to catch you up, just in case you need it:
>
> I was able to install hashpipe with the suid bit set as you suggested
> previously. So far, I have been able to capture data with the first round
> of frames of the circular buffer i.e. if I have 160 frames, I am able to
> capture packets of frames 0 to 159 at which point right at the memcpy()
> in the process_packet() function of the net thread, I get a segmentation
> fault.
>
> And the suggestions that you provided were very helpful with diagnosis,
> but the problem hasn't been resolved yet.
>
> I'm currently using gdb to debug and it either tells me that I have a
> segmentation fault at the memcpy() in process_packet() or something very
> strange happens where the starting mcnt of a block greatly exceeds the mcnt
> corresponding to the packet being processed and there's no segmentation
> fault because the mcnt distance becomes negative so the memcpy() is
> skipped. Hopefully that wasn't too hard to track. Very strange problem that
> only occurs with gdb and not when I run hashpipe without it. Without gdb, I
> get the same segmentation fault at the end of the circular buffer as
> mentioned above.
>
> I also omitted the "+ input_databuf_idx(...)" to test for buffer overflow,
> and the same result (segmentation fault).
>
> I checked to make sure that the blocks are large enough for the number of
> frames. Right now, I have 480 total frames and 60 blocks so 8 frames per
> block. And my frame size (8192) is a multiple of the kernel page size
> (4096). I've also tried frame sizes 4096, and 16384 with the same results.
>
> I tried using 'hashpipe_dump_databuf -b "block number"' and I see binary
> symbols in stdout regardless of what values I put in memset(). So that part
> wasn't as helpful with diagnosis as I'd hoped.
>
> I should also mention that there is data being received on the same
> interface from other ports, but the code ignores data from them as far as I
> can tell, and only captures/processes data from the user suggested port.
> But maybe somehow it's causing these issues and I'm not able to see how.
>
> As a test, I also tried removing the release_frame() function after
> process_packet() is called and I got the same segmentation fault. So I
> still think there's something about the implementation of the
> release_frame() function that I'm not doing or it's not releasing the
> frame. I'm not sure.
>
> I appreciate any feedback. I'll respond ASAP if you have any questions.
>
> Thanks,
>
> Mark Ruzindana
>
> On Fri, Oct 2, 2020 at 12:23 AM Mark Ruzindana 
> wrote:
>
>> Hi David,
>>
>> Sorry it's been a while, I've been working on other tasks besides the
>> packet socket implementation and I've gotten the opportunity to come back
>> to it. I know you have access to the previous emails, but just to catch you
>> up with a summary of what the issue was in implementing packet sockets:
>>
>> I was able to install hashpipe with the suid bit set as you suggested
>> previously. So far, I have been able to capture data with the first round
>> of frames of the circular buffer i.e. if I have 160 frames, I am able to
>> capture packets of frames 0 to 159 at which point right at the memcpy() in
>> the process_packet() function of the net thread, I get a segmentation fault.
>>
>> And the suggestions that you provided were very helpful with diagnosis,
>> but the problem hasn't been resolved yet.
>>
>> I'm currently using gdb to debug and it either tells me that I have a
>> segmentation fault at the memcpy() in process_packet() or something very
>> strange happens where the starting mcnt of a block greatly exceeds the mcnt
>> corresponding to the packet being processed and there's no segmentation
>> fault because the mcnt distance becomes negative so the memcpy() is
>> skipped. Hopefully that wasn't too hard to track. Very strange problem that
>> only occurs with gdb and not when I run hashpipe without it. Without gdb, I
>> get the same segmentation fault at the end of the circular buffer as
>> mentioned above.
>>

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-11-30 Thread Mark Ruzindana
Hi David,

Hope everything is fine. It's okay if you haven't seen it yet or forgot,
but I'm still struggling with this issue. Would you mind giving me some
thoughts on it if you have any? Here is the issue again, along with a
summary of what I did to catch you up, just in case you need it:

I was able to install hashpipe with the suid bit set as you suggested
previously. So far, I have been able to capture data with the first round
of frames of the circular buffer i.e. if I have 160 frames, I am able to
capture packets of frames 0 to 159 at which point right at the memcpy() in
the process_packet() function of the net thread, I get a segmentation fault.

And the suggestions that you provided were very helpful with diagnosis, but
the problem hasn't been resolved yet.

I'm currently using gdb to debug and it either tells me that I have a
segmentation fault at the memcpy() in process_packet() or something very
strange happens where the starting mcnt of a block greatly exceeds the mcnt
corresponding to the packet being processed and there's no segmentation
fault because the mcnt distance becomes negative so the memcpy() is
skipped. Hopefully that wasn't too hard to track. Very strange problem that
only occurs with gdb and not when I run hashpipe without it. Without gdb, I
get the same segmentation fault at the end of the circular buffer as
mentioned above.

I also omitted the "+ input_databuf_idx(...)" to test for buffer overflow,
and the same result (segmentation fault).

I checked to make sure that the blocks are large enough for the number of
frames. Right now, I have 480 total frames and 60 blocks so 8 frames per
block. And my frame size (8192) is a multiple of the kernel page size
(4096). I've also tried frame sizes 4096, and 16384 with the same results.

I tried using 'hashpipe_dump_databuf -b "block number"' and I see binary
symbols in stdout regardless of what values I put in memset(). So that part
wasn't as helpful with diagnosis as I'd hoped.

I should also mention that there is data being received on the same
interface from other ports, but the code ignores data from them as far as I
can tell, and only captures/processes data from the user suggested port.
But maybe somehow it's causing these issues and I'm not able to see how.

As a test, I also tried removing the release_frame() function after
process_packet() is called and I got the same segmentation fault. So I
still think there's something about the implementation of the
release_frame() function that I'm not doing or it's not releasing the
frame. I'm not sure.

I appreciate any feedback. I'll respond ASAP if you have any questions.

Thanks,

Mark Ruzindana

On Fri, Oct 2, 2020 at 12:23 AM Mark Ruzindana  wrote:

> Hi David,
>
> Sorry it's been a while, I've been working on other tasks besides the
> packet socket implementation and I've gotten the opportunity to come back
> to it. I know you have access to the previous emails, but just to catch you
> up with a summary of what the issue was in implementing packet sockets:
>
> I was able to install hashpipe with the suid bit set as you suggested
> previously. So far, I have been able to capture data with the first round
> of frames of the circular buffer i.e. if I have 160 frames, I am able to
> capture packets of frames 0 to 159 at which point right at the memcpy() in
> the process_packet() function of the net thread, I get a segmentation fault.
>
> And the suggestions that you provided were very helpful with diagnosis,
> but the problem hasn't been resolved yet.
>
> I'm currently using gdb to debug and it either tells me that I have a
> segmentation fault at the memcpy() in process_packet() or something very
> strange happens where the starting mcnt of a block greatly exceeds the mcnt
> corresponding to the packet being processed and there's no segmentation
> fault because the mcnt distance becomes negative so the memcpy() is
> skipped. Hopefully that wasn't too hard to track. Very strange problem that
> only occurs with gdb and not when I run hashpipe without it. Without gdb, I
> get the same segmentation fault at the end of the circular buffer as
> mentioned above.
>
> I also omitted the "+ input_databuf_idx(...)" to test for buffer overflow,
> and the same result (segmentation fault).
>
> I checked to make sure that the blocks are large enough for the number of
> frames. Right now, I have 480 total frames and 60 blocks so 8 frames per
> block. And my frame size (8192) is a multiple of the kernel page size
> (4096). I've also tried frame sizes 4096, and 16384 with the same results.
>
> I tried using 'hashpipe_dump_databuf -b "block number"' and I see binary
> symbols in stdout regardless of what values I put in memset(). So that part
> wasn't as helpful with diagnosis as I'd hoped.
>
> I should also mention that there is data being received on the same
> interface from other ports, but the code ignores data from them as far as I
> can tell, and only captures/processes data from the user 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-10-02 Thread Mark Ruzindana
Hi David,

Sorry it's been a while, I've been working on other tasks besides the
packet socket implementation and I've gotten the opportunity to come back
to it. I know you have access to the previous emails, but just to catch you
up with a summary of what the issue was in implementing packet sockets:

I was able to install hashpipe with the suid bit set as you suggested
previously. So far, I have been able to capture data with the first round
of frames of the circular buffer i.e. if I have 160 frames, I am able to
capture packets of frames 0 to 159 at which point right at the memcpy() in
the process_packet() function of the net thread, I get a segmentation fault.

And the suggestions that you provided were very helpful with diagnosis, but
the problem hasn't been resolved yet.

I'm currently using gdb to debug and it either tells me that I have a
segmentation fault at the memcpy() in process_packet() or something very
strange happens where the starting mcnt of a block greatly exceeds the mcnt
corresponding to the packet being processed and there's no segmentation
fault because the mcnt distance becomes negative so the memcpy() is
skipped. Hopefully that wasn't too hard to track. Very strange problem that
only occurs with gdb and not when I run hashpipe without it. Without gdb, I
get the same segmentation fault at the end of the circular buffer as
mentioned above.

I also omitted the "+ input_databuf_idx(...)" to test for buffer overflow,
and the same result (segmentation fault).

I checked to make sure that the blocks are large enough for the number of
frames. Right now, I have 480 total frames and 60 blocks so 8 frames per
block. And my frame size (8192) is a multiple of the kernel page size
(4096). I've also tried frame sizes 4096, and 16384 with the same results.

I tried using 'hashpipe_dump_databuf -b "block number"' and I see binary
symbols in stdout regardless of what values I put in memset(). So that part
wasn't as helpful with diagnosis as I'd hoped.

I should also mention that there is data being received on the same
interface from other ports, but the code ignores data from them as far as I
can tell, and only captures/processes data from the user suggested port.
But maybe somehow it's causing these issues and I'm not able to see how.

As a test, I also tried removing the release_frame() function after
process_packet() is called and I got the same segmentation fault. So I
still think there's something about the implementation of the
release_frame() function that I'm not doing or it's not releasing the
frame. I'm not sure.

I appreciate any feedback. I'll respond ASAP if you have any questions.

Thanks,

Mark Ruzindana




On Mon, May 25, 2020 at 6:14 PM Mark Ruzindana  wrote:

> Thanks for the additional suggestions. I will try those and let you know
> what happens.
>
> Mark
>
> On Mon, May 25, 2020 at 6:07 PM David MacMahon 
> wrote:
>
>> A few more suggestions:
>>
>> 1) Enable core dumps.  Usually you have to run "ulimit -c unlimited" and
>> for suid executables there's an extra step related to
>> /proc/sys/fs/suid_dumpable.  See "man 5 core" and "man 5 proc" for
>> details.  Once you have a core file, you can use gdb to examine the state
>> of things when the segfault happened.  You might want to recompile your
>> plug-in with debugging enabled and fewer optimizations to get the most out
>> of this approach: "gdb /path/to/hashpipe /path/to/core".  (Gotta love how
>> it's still called "core"!).  gdb can be a bit cryptic, but it's also very
>> powerful.
>>
>> 2) Another idea, just for diagnostic purposes, is to omit the "+
>> input_databuf_idx(...)" part of the dest_p assignment.  That will write all
>> payloads to the first part of the data block, so not buffer overflow for
>> sure (assuming idx is in range :)).  It's just a way to eliminate a
>> variable.
>>
>> 3) Make sure the packet socket blocks are large enough for the packet
>> frames.  I agree it looks like you're not reading past the end of the
>> packet payload size, but maybe the payload itself goes beyond the end of
>> the packet socket blocks?  The kernel might silently truncate the packets
>> in that case.
>>
>> 4) If you're using tagged VLANs the PKT_UDP_xxx macros won't work right.
>> It sounds like that's not happening because you're seeing the expected
>> size, but it's worth mentioning for mail archive completeness.
>>
>> 5) You can use hashpipe_dump_databuf to examine the 159 payloads you were
>> able copy before the segfault to see whether every byte is properly
>> positioned and has believable values.  You could change memcpy(..) to
>> memset(p_dest, 'X', PKT_UDP_SIZE(frame)-16) so you'll know the exact value
>> that every byte should have. Instead of 'X' you could use pkt_num+1 (i.e. a
>> 1-based packet counter) so you'll know which bytes correspond to which
>> packets.  Using memset() would also eliminate reading from the packet
>> socket blocks (another variable gone).
>>
>> Happy hunting,
>> Dave
>>
>> On May 25, 2020, 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-05-25 Thread Mark Ruzindana
Thanks for the additional suggestions. I will try those and let you know
what happens.

Mark

On Mon, May 25, 2020 at 6:07 PM David MacMahon  wrote:

> A few more suggestions:
>
> 1) Enable core dumps.  Usually you have to run "ulimit -c unlimited" and
> for suid executables there's an extra step related to
> /proc/sys/fs/suid_dumpable.  See "man 5 core" and "man 5 proc" for
> details.  Once you have a core file, you can use gdb to examine the state
> of things when the segfault happened.  You might want to recompile your
> plug-in with debugging enabled and fewer optimizations to get the most out
> of this approach: "gdb /path/to/hashpipe /path/to/core".  (Gotta love how
> it's still called "core"!).  gdb can be a bit cryptic, but it's also very
> powerful.
>
> 2) Another idea, just for diagnostic purposes, is to omit the "+
> input_databuf_idx(...)" part of the dest_p assignment.  That will write all
> payloads to the first part of the data block, so not buffer overflow for
> sure (assuming idx is in range :)).  It's just a way to eliminate a
> variable.
>
> 3) Make sure the packet socket blocks are large enough for the packet
> frames.  I agree it looks like you're not reading past the end of the
> packet payload size, but maybe the payload itself goes beyond the end of
> the packet socket blocks?  The kernel might silently truncate the packets
> in that case.
>
> 4) If you're using tagged VLANs the PKT_UDP_xxx macros won't work right.
> It sounds like that's not happening because you're seeing the expected
> size, but it's worth mentioning for mail archive completeness.
>
> 5) You can use hashpipe_dump_databuf to examine the 159 payloads you were
> able copy before the segfault to see whether every byte is properly
> positioned and has believable values.  You could change memcpy(..) to
> memset(p_dest, 'X', PKT_UDP_SIZE(frame)-16) so you'll know the exact value
> that every byte should have. Instead of 'X' you could use pkt_num+1 (i.e. a
> 1-based packet counter) so you'll know which bytes correspond to which
> packets.  Using memset() would also eliminate reading from the packet
> socket blocks (another variable gone).
>
> Happy hunting,
> Dave
>
> On May 25, 2020, at 16:33, Mark Ruzindana  wrote:
>
> Thanks for the suggestions. I neglected to mention that I'm printing out
> the PKT_UDP_SIZE() and PKT_UDP_DST() right before the memcpy(), I take into
> account the 8 byte UDP header and the size and port are correct. When
> performing the memcpy(), I am taking into account that PKT_UDP_DATA()
> returns a pointer of the payload and excludes the UDP header. However, I
> also have an 8 byte packet header within that payload (this gives me the
> mcnt, f-engine, and x-engine indices) and I exclude it when performing the
> memcpy(). This is what it looks like:
>
> uint8_t * dest_p = db->block[idx].data + input_databuf_idx(m, f, 0,0,0);
> // This macro index shifts every mcnt and f-engine index
> const uint8_t * payload = (uint8_t *)(PKT_UDP_DATA(frame)+8); // Ignore
> packet header
>
> fprintf(...); // prints PKT_UDP_SIZE() and PKT_UDP_DST()
> memcpy(dest_p, payload, PKT_UDP_SIZE(frame) - 16)  // Ignore both UDP (8
> bytes) and packet header (8 bytes)
>
> I will look into the other possible issues that you suggested, but as far
> as I can tell, it doesn't seem like there should be a segfault given what
> I'm doing before that memcpy(). I will let you know what else I find.
>
> Thanks again, I really appreciate the help.
>
> Mark
>
> On Mon, May 25, 2020 at 4:30 PM David MacMahon 
> wrote:
>
>> Hi, Mark,
>>
>> Sounds like progress!
>>
>> On May 25, 2020, at 13:56, Mark Ruzindana  wrote:
>>
>> I have been able to capture data with the first round of frames of the
>> circular buffer i.e. if I have 160 frames, I am able to capture packets of
>> frames 0 to 159 at which point right at the memcpy() in the
>> process_packet() function of the net thread, I get a segmentation fault.
>>
>>
>> The fact that you get a the segfault right at the memcpy of the final
>> frame of the ring buffer suggests that there is problem with the parameters
>> passed to memcpy.  Most likely src+length-1 exceeds the end of the frame so
>> you get a segfault when memcpy tries to read from beyond the allocated
>> memory.  This would explain why it segfaults on the final frame and not the
>> previous frames because reading beyond a previous frame still reads from
>> "legal" (though incorrect) memory locations.  It's also possible that the
>> segfault happens due to a bad address on the destination side of the
>> memcpy(), but unless the destination buffer is also 160 frames in size that
>> seems less likely.
>>
>> The release_frame function is not likely to be a culprit here unless the
>> pointer you are passing it differs from the pointer that the pktsock_recv
>> function returned.
>>
>> For debugging, I suggest logging dst, src, len before calling memcpy.
>> Normally you wouldn't generate a log message for every packet because that
>> 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-05-25 Thread David MacMahon
A few more suggestions:

1) Enable core dumps.  Usually you have to run "ulimit -c unlimited" and for 
suid executables there's an extra step related to /proc/sys/fs/suid_dumpable.  
See "man 5 core" and "man 5 proc" for details.  Once you have a core file, you 
can use gdb to examine the state of things when the segfault happened.  You 
might want to recompile your plug-in with debugging enabled and fewer 
optimizations to get the most out of this approach: "gdb /path/to/hashpipe 
/path/to/core".  (Gotta love how it's still called "core"!).  gdb can be a bit 
cryptic, but it's also very powerful.

2) Another idea, just for diagnostic purposes, is to omit the "+ 
input_databuf_idx(...)" part of the dest_p assignment.  That will write all 
payloads to the first part of the data block, so not buffer overflow for sure 
(assuming idx is in range :)).  It's just a way to eliminate a variable.

3) Make sure the packet socket blocks are large enough for the packet frames.  
I agree it looks like you're not reading past the end of the packet payload 
size, but maybe the payload itself goes beyond the end of the packet socket 
blocks?  The kernel might silently truncate the packets in that case.

4) If you're using tagged VLANs the PKT_UDP_xxx macros won't work right.  It 
sounds like that's not happening because you're seeing the expected size, but 
it's worth mentioning for mail archive completeness.

5) You can use hashpipe_dump_databuf to examine the 159 payloads you were able 
copy before the segfault to see whether every byte is properly positioned and 
has believable values.  You could change memcpy(..) to memset(p_dest, 'X', 
PKT_UDP_SIZE(frame)-16) so you'll know the exact value that every byte should 
have. Instead of 'X' you could use pkt_num+1 (i.e. a 1-based packet counter) so 
you'll know which bytes correspond to which packets.  Using memset() would also 
eliminate reading from the packet socket blocks (another variable gone).

Happy hunting,
Dave

> On May 25, 2020, at 16:33, Mark Ruzindana  wrote:
> 
> Thanks for the suggestions. I neglected to mention that I'm printing out the 
> PKT_UDP_SIZE() and PKT_UDP_DST() right before the memcpy(), I take into 
> account the 8 byte UDP header and the size and port are correct. When 
> performing the memcpy(), I am taking into account that PKT_UDP_DATA() returns 
> a pointer of the payload and excludes the UDP header. However, I also have an 
> 8 byte packet header within that payload (this gives me the mcnt, f-engine, 
> and x-engine indices) and I exclude it when performing the memcpy(). This is 
> what it looks like:
> 
> uint8_t * dest_p = db->block[idx].data + input_databuf_idx(m, f, 0,0,0); // 
> This macro index shifts every mcnt and f-engine index
> const uint8_t * payload = (uint8_t *)(PKT_UDP_DATA(frame)+8); // Ignore 
> packet header
> 
> fprintf(...); // prints PKT_UDP_SIZE() and PKT_UDP_DST()
> memcpy(dest_p, payload, PKT_UDP_SIZE(frame) - 16)  // Ignore both UDP (8 
> bytes) and packet header (8 bytes)
> 
> I will look into the other possible issues that you suggested, but as far as 
> I can tell, it doesn't seem like there should be a segfault given what I'm 
> doing before that memcpy(). I will let you know what else I find.
> 
> Thanks again, I really appreciate the help.
> 
> Mark
> 
> On Mon, May 25, 2020 at 4:30 PM David MacMahon  > wrote:
> Hi, Mark,
> 
> Sounds like progress!
> 
>> On May 25, 2020, at 13:56, Mark Ruzindana > > wrote:
>> 
>> I have been able to capture data with the first round of frames of the 
>> circular buffer i.e. if I have 160 frames, I am able to capture packets of 
>> frames 0 to 159 at which point right at the memcpy() in the process_packet() 
>> function of the net thread, I get a segmentation fault.
> 
> The fact that you get a the segfault right at the memcpy of the final frame 
> of the ring buffer suggests that there is problem with the parameters passed 
> to memcpy.  Most likely src+length-1 exceeds the end of the frame so you get 
> a segfault when memcpy tries to read from beyond the allocated memory.  This 
> would explain why it segfaults on the final frame and not the previous frames 
> because reading beyond a previous frame still reads from "legal" (though 
> incorrect) memory locations.  It's also possible that the segfault happens 
> due to a bad address on the destination side of the memcpy(), but unless the 
> destination buffer is also 160 frames in size that seems less likely.
> 
> The release_frame function is not likely to be a culprit here unless the 
> pointer you are passing it differs from the pointer that the pktsock_recv 
> function returned.
> 
> For debugging, I suggest logging dst, src, len before calling memcpy.  
> Normally you wouldn't generate a log message for every packet because that 
> would ruin your throughput, but since you know it's going to crash after the 
> first 160 packets there's not much throughout to ruin. 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-05-25 Thread Mark Ruzindana
Thanks for the suggestions. I neglected to mention that I'm printing out
the PKT_UDP_SIZE() and PKT_UDP_DST() right before the memcpy(), I take into
account the 8 byte UDP header and the size and port are correct. When
performing the memcpy(), I am taking into account that PKT_UDP_DATA()
returns a pointer of the payload and excludes the UDP header. However, I
also have an 8 byte packet header within that payload (this gives me the
mcnt, f-engine, and x-engine indices) and I exclude it when performing the
memcpy(). This is what it looks like:

uint8_t * dest_p = db->block[idx].data + input_databuf_idx(m, f, 0,0,0); //
This macro index shifts every mcnt and f-engine index
const uint8_t * payload = (uint8_t *)(PKT_UDP_DATA(frame)+8); // Ignore
packet header

fprintf(...); // prints PKT_UDP_SIZE() and PKT_UDP_DST()
memcpy(dest_p, payload, PKT_UDP_SIZE(frame) - 16)  // Ignore both UDP (8
bytes) and packet header (8 bytes)

I will look into the other possible issues that you suggested, but as far
as I can tell, it doesn't seem like there should be a segfault given what
I'm doing before that memcpy(). I will let you know what else I find.

Thanks again, I really appreciate the help.

Mark

On Mon, May 25, 2020 at 4:30 PM David MacMahon  wrote:

> Hi, Mark,
>
> Sounds like progress!
>
> On May 25, 2020, at 13:56, Mark Ruzindana  wrote:
>
> I have been able to capture data with the first round of frames of the
> circular buffer i.e. if I have 160 frames, I am able to capture packets of
> frames 0 to 159 at which point right at the memcpy() in the
> process_packet() function of the net thread, I get a segmentation fault.
>
>
> The fact that you get a the segfault right at the memcpy of the final
> frame of the ring buffer suggests that there is problem with the parameters
> passed to memcpy.  Most likely src+length-1 exceeds the end of the frame so
> you get a segfault when memcpy tries to read from beyond the allocated
> memory.  This would explain why it segfaults on the final frame and not the
> previous frames because reading beyond a previous frame still reads from
> "legal" (though incorrect) memory locations.  It's also possible that the
> segfault happens due to a bad address on the destination side of the
> memcpy(), but unless the destination buffer is also 160 frames in size that
> seems less likely.
>
> The release_frame function is not likely to be a culprit here unless the
> pointer you are passing it differs from the pointer that the pktsock_recv
> function returned.
>
> For debugging, I suggest logging dst, src, len before calling memcpy.
> Normally you wouldn't generate a log message for every packet because that
> would ruin your throughput, but since you know it's going to crash after
> the first 160 packets there's not much throughout to ruin. :)
>
> One thing to remember is that PKT_UDP_DATA() evaluates to a pointer to the
> UDP payload of the packet, but PKT_UDP_SIZE() evaluates to the total UDP
> size (i.e. 8 bytes for the UDP header plus the length of the UDP payload).
> Passing PKT_UDP_SIZE() as "len" to memcpy without subtracting 8 for the
> header bytes is not correct and could potentially cause this problem.
>
> HTH,
> Dave
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/297C1709-AE9C-488D-9110-FD0832BF5951%40berkeley.edu
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxVHhDiD6RT6qK86ub3Tq3aQaTFxrGitKFMaNnRh3rKRw%40mail.gmail.com.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-05-25 Thread David MacMahon
Hi, Mark,

Sounds like progress!

> On May 25, 2020, at 13:56, Mark Ruzindana  wrote:
> 
> I have been able to capture data with the first round of frames of the 
> circular buffer i.e. if I have 160 frames, I am able to capture packets of 
> frames 0 to 159 at which point right at the memcpy() in the process_packet() 
> function of the net thread, I get a segmentation fault.

The fact that you get a the segfault right at the memcpy of the final frame of 
the ring buffer suggests that there is problem with the parameters passed to 
memcpy.  Most likely src+length-1 exceeds the end of the frame so you get a 
segfault when memcpy tries to read from beyond the allocated memory.  This 
would explain why it segfaults on the final frame and not the previous frames 
because reading beyond a previous frame still reads from "legal" (though 
incorrect) memory locations.  It's also possible that the segfault happens due 
to a bad address on the destination side of the memcpy(), but unless the 
destination buffer is also 160 frames in size that seems less likely.

The release_frame function is not likely to be a culprit here unless the 
pointer you are passing it differs from the pointer that the pktsock_recv 
function returned.

For debugging, I suggest logging dst, src, len before calling memcpy.  Normally 
you wouldn't generate a log message for every packet because that would ruin 
your throughput, but since you know it's going to crash after the first 160 
packets there's not much throughout to ruin. :)

One thing to remember is that PKT_UDP_DATA() evaluates to a pointer to the UDP 
payload of the packet, but PKT_UDP_SIZE() evaluates to the total UDP size (i.e. 
8 bytes for the UDP header plus the length of the UDP payload).  Passing 
PKT_UDP_SIZE() as "len" to memcpy without subtracting 8 for the header bytes is 
not correct and could potentially cause this problem.

HTH,
Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/297C1709-AE9C-488D-9110-FD0832BF5951%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-05-25 Thread Mark Ruzindana
Hi Dave,

Thanks for all of the help with the use of packet sockets. I will try to be
concise, but as detailed as I can. So if you need more details, I will
definitely provide them.

 I was able to install hashpipe with the suid bit set as you suggested
previously. So far, I have been able to capture data with the first round
of frames of the circular buffer i.e. if I have 160 frames, I am able to
capture packets of frames 0 to 159 at which point right at the memcpy() in
the process_packet() function of the net thread, I get a segmentation fault.

I believe this has to do with how I am implementing the release_frame()
function in the net thread. I call the release_frame function right after
process_packet() which makes sense to me because as I understand it, after
a packet is read/processed, the user must zero the status field so the
kernel can use that frame again as stated in packet_mmap.txt. So calling
the release_frame function after process_packet makes the most sense to me.

I am using the PKT_UDP_DATA(frame) macro to acquire the pointer to the
packet payload, and right at the memcpy() at the end of the first round of
the buffer, I get that segmentation fault. Given what I've done to debug
the code, as well as the information I have acquired about TP_STATUS, I
haven't yet seen how I'm accessing memory out of  allocated range. There is
probably something I'm missing or don't understand about the release_frame
function or otherwise. As of right now, it seems as though the
release_frame function is freeing the memory entirely or there are some
priviledge issues that I'm unable to see.

Hopefully, my explanation makes sense. Let me know whether you need
additional information and I can definitely provide it. Thanks again for
the help.

Mark Ruzindana

On Fri, Apr 17, 2020 at 9:00 PM David MacMahon  wrote:

> Hi, Mark,
>
> Yeah packet sockets do require extra privileges.  The solution/workaround
> that Hashpipe uses is to instal hashpipe with the suid bit set.  The init()
> functions of the threads will be called with the privileges of the suid
> user.  Then hashpipe will drop the suid privileges before invoking the
> run() functions of the threads.  If you setup the packet sockets in the
> init() function, you can then use them in the run() functions.  It's not an
> ideal solution and could be considered a security hole, but given the
> limited and generally tightly controlled environments in which Hashpipe is
> typically used this is a working compromise.  The other option, as you
> indicated is to do something with the NAP_NET_RAW privilege, but I've not
> explored how to utilize that.  My limited understanding of that is that it
> is for users rather than executables, but like I said I haven't explored
> that route so I'm not really sure what's possible there.  If you figure out
> something useful, please post here.
>
> Cheers,
> Dave
>
> On Apr 17, 2020, at 14:25, Mark Ruzindana  wrote:
>
> Hi all,
>
> Hope you're doing fine. I was able to add packet sockets and the functions
> provided by Hashpipe in hashpipe_pktsock.h, but I get permission issues
> when trying to capture packets as a non-root user.
>
> The method I am trying to use to overcome this is owning the
> plugins/executables as root and using the setuid flag to give root
> privileges to hashpipe. At this point, I still get an 'operation not
> permitted' when trying to open the socket. Then when trying to use the
> CAP_NET_RAW privilege (setcap cap_net_raw=pe 'program'), I'm told that the
> operation is not supported.
>
> Just to be clear, I don't have any of these issues when running the
> process as root, but I'd rather have non-root users running hashpipe. How
> were you able to overcome the permission issues when trying to capture raw
> packets with hashpipe as a non-root user? If you were running it as a
> non-root user.
>
> Let me know whether you need any more information or whether I'm not
> stating anything clearly.
>
> Thanks a lot for the help.
>
> Mark Ruzindana
>
> On Tue, Mar 31, 2020 at 5:08 PM Mark Ruzindana 
> wrote:
>
>> Thanks a lot for the quick responses John and David! I really appreciate
>> it.
>>
>> I will definitely be updating the version of Hashpipe that I currently
>> have on the server as well as ensure that the network tuning is good.
>>
>> I'm currently using the standard "socket()" function, and a switch to
>> packet sockets, with the description that you gave, seems like it will
>> definitely be beneficial.
>>
>> I also currently pin the threads to the desired cores with a "-c #" on
>> the command line, but thank you for mentioning it, I might have not been
>> doing so. The NUMA info is also very helpful. I'll make sure that the
>> architecture is as optimal as it should be.
>>
>> Thanks again! This was very helpful and I'll update you with the progress
>> that I make.
>>
>> Mark
>>
>>
>>
>>
>> On Tue, Mar 31, 2020 at 4:38 PM David MacMahon 
>> wrote:
>>
>>> Just to expand on John's excellent tips, Hashpipe 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-04-17 Thread David MacMahon
Hi, Mark,

Yeah packet sockets do require extra privileges.  The solution/workaround that 
Hashpipe uses is to instal hashpipe with the suid bit set.  The init() 
functions of the threads will be called with the privileges of the suid user.  
Then hashpipe will drop the suid privileges before invoking the run() functions 
of the threads.  If you setup the packet sockets in the init() function, you 
can then use them in the run() functions.  It's not an ideal solution and could 
be considered a security hole, but given the limited and generally tightly 
controlled environments in which Hashpipe is typically used this is a working 
compromise.  The other option, as you indicated is to do something with the 
NAP_NET_RAW privilege, but I've not explored how to utilize that.  My limited 
understanding of that is that it is for users rather than executables, but like 
I said I haven't explored that route so I'm not really sure what's possible 
there.  If you figure out something useful, please post here.

Cheers,
Dave

> On Apr 17, 2020, at 14:25, Mark Ruzindana  wrote:
> 
> Hi all,
> 
> Hope you're doing fine. I was able to add packet sockets and the functions 
> provided by Hashpipe in hashpipe_pktsock.h, but I get permission issues when 
> trying to capture packets as a non-root user. 
> 
> The method I am trying to use to overcome this is owning the 
> plugins/executables as root and using the setuid flag to give root privileges 
> to hashpipe. At this point, I still get an 'operation not permitted' when 
> trying to open the socket. Then when trying to use the CAP_NET_RAW privilege 
> (setcap cap_net_raw=pe 'program'), I'm told that the operation is not 
> supported.
> 
> Just to be clear, I don't have any of these issues when running the process 
> as root, but I'd rather have non-root users running hashpipe. How were you 
> able to overcome the permission issues when trying to capture raw packets 
> with hashpipe as a non-root user? If you were running it as a non-root user.
> 
> Let me know whether you need any more information or whether I'm not stating 
> anything clearly.
> 
> Thanks a lot for the help.
> 
> Mark Ruzindana
> 
> On Tue, Mar 31, 2020 at 5:08 PM Mark Ruzindana  > wrote:
> Thanks a lot for the quick responses John and David! I really appreciate it.
> 
> I will definitely be updating the version of Hashpipe that I currently have 
> on the server as well as ensure that the network tuning is good.
> 
> I'm currently using the standard "socket()" function, and a switch to packet 
> sockets, with the description that you gave, seems like it will definitely be 
> beneficial.
> 
> I also currently pin the threads to the desired cores with a "-c #" on the 
> command line, but thank you for mentioning it, I might have not been doing 
> so. The NUMA info is also very helpful. I'll make sure that the architecture 
> is as optimal as it should be.
> 
> Thanks again! This was very helpful and I'll update you with the progress 
> that I make.
> 
> Mark
> 
> 
> 
> 
> On Tue, Mar 31, 2020 at 4:38 PM David MacMahon  > wrote:
> Just to expand on John's excellent tips, Hashpipe does lock its shared memory 
> buffers with mlock.  These buffers will have the NUMA node affinity of the 
> thread that created them so be sure to pin the threads to the desired core or 
> cores by preceding the thread names on the command line with a -c # (set 
> thread affinity to a single core) or -m # (set thread affinity to multiple 
> cores) option.  Alternatively (or additional) you can run the entire hashpipe 
> process with numactl.  For example...
> 
> numactl --cpunodebind=1 --membind=1 hashpipe [...]
> 
> ...will restrict hashpipe and all its threads to run on NUMA node 1 and all 
> memory allocations will (to the extent possible) be made within memory that 
> is affiliated with NUMA node 1.  You can use various tools to find out which 
> hardware is associated with which NUMA node such as "numactl --hardware" or 
> "lstopo".  Hashpipe includes its own such utility: "hashpipe_topology.sh".
> 
> On NUMA (i.e. multi-socket) systems, each PCIe slot is associated with a 
> specific NUMA node.  It can be beneficial to have relevant peripherals (e.g. 
> NIC and GPU) be in PCIe slots that are on the same NUMA node.
> 
> Of course, if you have as single socket mainboard, then all this NUMA stuff 
> is irrelevant. :P
> 
> Cheers,
> Dave
> 
>> On Mar 31, 2020, at 15:04, John Ford > > wrote:
>> 
>> 
>> 
>> Hi Mark.  Since the newer version has a script called 
>> "hashpipe_irqaffinity.sh" I would think that the most expedient thing to do 
>> is to upgrade to the newer version.  It's likely to fix some or all of this.
>> 
>> That said, there are a lot of things that you can check, and not only the 
>> irq affinity, but also make sure that your network tuning is good, that your 
>> network card irqs are attached to processes where the memory is local 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-04-17 Thread Mark Ruzindana
Hi all,

Hope you're doing fine. I was able to add packet sockets and the functions
provided by Hashpipe in hashpipe_pktsock.h, but I get permission issues
when trying to capture packets as a non-root user.

The method I am trying to use to overcome this is owning the
plugins/executables as root and using the setuid flag to give root
privileges to hashpipe. At this point, I still get an 'operation not
permitted' when trying to open the socket. Then when trying to use the
CAP_NET_RAW privilege (setcap cap_net_raw=pe 'program'), I'm told that the
operation is not supported.

Just to be clear, I don't have any of these issues when running the process
as root, but I'd rather have non-root users running hashpipe. How were you
able to overcome the permission issues when trying to capture raw packets
with hashpipe as a non-root user? If you were running it as a non-root user.

Let me know whether you need any more information or whether I'm not
stating anything clearly.

Thanks a lot for the help.

Mark Ruzindana

On Tue, Mar 31, 2020 at 5:08 PM Mark Ruzindana  wrote:

> Thanks a lot for the quick responses John and David! I really appreciate
> it.
>
> I will definitely be updating the version of Hashpipe that I currently
> have on the server as well as ensure that the network tuning is good.
>
> I'm currently using the standard "socket()" function, and a switch to
> packet sockets, with the description that you gave, seems like it will
> definitely be beneficial.
>
> I also currently pin the threads to the desired cores with a "-c #" on the
> command line, but thank you for mentioning it, I might have not been doing
> so. The NUMA info is also very helpful. I'll make sure that the
> architecture is as optimal as it should be.
>
> Thanks again! This was very helpful and I'll update you with the progress
> that I make.
>
> Mark
>
>
>
>
> On Tue, Mar 31, 2020 at 4:38 PM David MacMahon 
> wrote:
>
>> Just to expand on John's excellent tips, Hashpipe does lock its shared
>> memory buffers with mlock.  These buffers will have the NUMA node affinity
>> of the thread that created them so be sure to pin the threads to the
>> desired core or cores by preceding the thread names on the command line
>> with a -c # (set thread affinity to a single core) or -m # (set thread
>> affinity to multiple cores) option.  Alternatively (or additional) you can
>> run the entire hashpipe process with numactl.  For example...
>>
>> numactl --cpunodebind=1 --membind=1 hashpipe [...]
>>
>> ...will restrict hashpipe and all its threads to run on NUMA node 1 and
>> all memory allocations will (to the extent possible) be made within memory
>> that is affiliated with NUMA node 1.  You can use various tools to find out
>> which hardware is associated with which NUMA node such as "numactl
>> --hardware" or "lstopo".  Hashpipe includes its own such utility:
>> "hashpipe_topology.sh".
>>
>> On NUMA (i.e. multi-socket) systems, each PCIe slot is associated with a
>> specific NUMA node.  It can be beneficial to have relevant peripherals
>> (e.g. NIC and GPU) be in PCIe slots that are on the same NUMA node.
>>
>> Of course, if you have as single socket mainboard, then all this NUMA
>> stuff is irrelevant. :P
>>
>> Cheers,
>> Dave
>>
>> On Mar 31, 2020, at 15:04, John Ford  wrote:
>>
>>
>>
>> Hi Mark.  Since the newer version has a script called
>> "hashpipe_irqaffinity.sh" I would think that the most expedient thing to do
>> is to upgrade to the newer version.  It's likely to fix some or all of this.
>>
>> That said, there are a lot of things that you can check, and not only the
>> irq affinity, but also make sure that your network tuning is good, that
>> your network card irqs are attached to processes where the memory is local
>> to that processor, and that the hashpipe threads are mapped to processor
>> cores that are also local to that memory.   Sometimes it's
>> counterproductive to map processes to processor cores by themselves if they
>> need data that is produced by a different core that's far away, NUMA-wise.
>> And lock all the memory in core with mlockall() or one of his friends.
>>
>> Good luck with it!
>>
>> John
>>
>>
>>
>>
>> On Tue, Mar 31, 2020 at 12:09 PM Mark Ruzindana 
>> wrote:
>>
>>> Hi all,
>>>
>>> I am fairly new to asking questions on a forum so if I need to provide
>>> more details, please let me know.
>>>
>>> Worth noting that just as I was about to send this out, I checked and I
>>> don't have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh
>>> among other additions and modifications. So this might fix my problem, but
>>> maybe not and someone else has more insight. I will update everyone if it
>>> does.
>>>
>>> I am trying to reduce the number of packets lost/dropped when running
>>> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and
>>> diagnostics to be confident that the problem is not any HASHPIPE thread
>>> running for too long. Also, the percentage of packets dropped on any given
>>> 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread Mark Ruzindana
Thanks a lot for the quick responses John and David! I really appreciate it.

I will definitely be updating the version of Hashpipe that I currently have
on the server as well as ensure that the network tuning is good.

I'm currently using the standard "socket()" function, and a switch to
packet sockets, with the description that you gave, seems like it will
definitely be beneficial.

I also currently pin the threads to the desired cores with a "-c #" on the
command line, but thank you for mentioning it, I might have not been doing
so. The NUMA info is also very helpful. I'll make sure that the
architecture is as optimal as it should be.

Thanks again! This was very helpful and I'll update you with the progress
that I make.

Mark




On Tue, Mar 31, 2020 at 4:38 PM David MacMahon  wrote:

> Just to expand on John's excellent tips, Hashpipe does lock its shared
> memory buffers with mlock.  These buffers will have the NUMA node affinity
> of the thread that created them so be sure to pin the threads to the
> desired core or cores by preceding the thread names on the command line
> with a -c # (set thread affinity to a single core) or -m # (set thread
> affinity to multiple cores) option.  Alternatively (or additional) you can
> run the entire hashpipe process with numactl.  For example...
>
> numactl --cpunodebind=1 --membind=1 hashpipe [...]
>
> ...will restrict hashpipe and all its threads to run on NUMA node 1 and
> all memory allocations will (to the extent possible) be made within memory
> that is affiliated with NUMA node 1.  You can use various tools to find out
> which hardware is associated with which NUMA node such as "numactl
> --hardware" or "lstopo".  Hashpipe includes its own such utility:
> "hashpipe_topology.sh".
>
> On NUMA (i.e. multi-socket) systems, each PCIe slot is associated with a
> specific NUMA node.  It can be beneficial to have relevant peripherals
> (e.g. NIC and GPU) be in PCIe slots that are on the same NUMA node.
>
> Of course, if you have as single socket mainboard, then all this NUMA
> stuff is irrelevant. :P
>
> Cheers,
> Dave
>
> On Mar 31, 2020, at 15:04, John Ford  wrote:
>
>
>
> Hi Mark.  Since the newer version has a script called
> "hashpipe_irqaffinity.sh" I would think that the most expedient thing to do
> is to upgrade to the newer version.  It's likely to fix some or all of this.
>
> That said, there are a lot of things that you can check, and not only the
> irq affinity, but also make sure that your network tuning is good, that
> your network card irqs are attached to processes where the memory is local
> to that processor, and that the hashpipe threads are mapped to processor
> cores that are also local to that memory.   Sometimes it's
> counterproductive to map processes to processor cores by themselves if they
> need data that is produced by a different core that's far away, NUMA-wise.
> And lock all the memory in core with mlockall() or one of his friends.
>
> Good luck with it!
>
> John
>
>
>
>
> On Tue, Mar 31, 2020 at 12:09 PM Mark Ruzindana 
> wrote:
>
>> Hi all,
>>
>> I am fairly new to asking questions on a forum so if I need to provide
>> more details, please let me know.
>>
>> Worth noting that just as I was about to send this out, I checked and I
>> don't have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh
>> among other additions and modifications. So this might fix my problem, but
>> maybe not and someone else has more insight. I will update everyone if it
>> does.
>>
>> I am trying to reduce the number of packets lost/dropped when running
>> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and
>> diagnostics to be confident that the problem is not any HASHPIPE thread
>> running for too long. Also, the percentage of packets dropped on any given
>> scan is between about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second
>> scan with a total of 1,650,000 packets. So while it's a small percentage,
>> the number of packets lost is still quite large. I have also done enough
>> tests with 'top', 'iostat' as well as timing HASHPIPE in between time
>> windows where there are no packets dropped to diagnose the issue further. I
>> (as well as my colleagues) have come to the conclusion that the kernel is
>> allowing processes to interrupt HASHPIPE as it is running.
>>
>> So I have researched and run tests involving 'niceness' and I am
>> currently trying to configure smp affinities and irq balancing, but the
>> changes that I make to the smp_affinity files aren't doing anything. My
>> plan was to have the interrupts run on the 20 cores that aren't being used
>> by HASHPIPE. Also, disabling 'irqbalance' didn't do anything either. I also
>> restarted the machine to see whether the changes made are permanent, but
>> the system reverts back to what it was.
>>
>> I might be missing something, or trying the wrong things. Has anyone
>> experienced this? And could you point me in the right direction if you have
>> any insight?
>>
>> If 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread David MacMahon
Just to expand on John's excellent tips, Hashpipe does lock its shared memory 
buffers with mlock.  These buffers will have the NUMA node affinity of the 
thread that created them so be sure to pin the threads to the desired core or 
cores by preceding the thread names on the command line with a -c # (set thread 
affinity to a single core) or -m # (set thread affinity to multiple cores) 
option.  Alternatively (or additional) you can run the entire hashpipe process 
with numactl.  For example...

numactl --cpunodebind=1 --membind=1 hashpipe [...]

...will restrict hashpipe and all its threads to run on NUMA node 1 and all 
memory allocations will (to the extent possible) be made within memory that is 
affiliated with NUMA node 1.  You can use various tools to find out which 
hardware is associated with which NUMA node such as "numactl --hardware" or 
"lstopo".  Hashpipe includes its own such utility: "hashpipe_topology.sh".

On NUMA (i.e. multi-socket) systems, each PCIe slot is associated with a 
specific NUMA node.  It can be beneficial to have relevant peripherals (e.g. 
NIC and GPU) be in PCIe slots that are on the same NUMA node.

Of course, if you have as single socket mainboard, then all this NUMA stuff is 
irrelevant. :P

Cheers,
Dave

> On Mar 31, 2020, at 15:04, John Ford  wrote:
> 
> 
> 
> Hi Mark.  Since the newer version has a script called 
> "hashpipe_irqaffinity.sh" I would think that the most expedient thing to do 
> is to upgrade to the newer version.  It's likely to fix some or all of this.
> 
> That said, there are a lot of things that you can check, and not only the irq 
> affinity, but also make sure that your network tuning is good, that your 
> network card irqs are attached to processes where the memory is local to that 
> processor, and that the hashpipe threads are mapped to processor cores that 
> are also local to that memory.   Sometimes it's counterproductive to map 
> processes to processor cores by themselves if they need data that is produced 
> by a different core that's far away, NUMA-wise.  And lock all the memory in 
> core with mlockall() or one of his friends.
> 
> Good luck with it!
> 
> John
> 
> 
> 
> 
> On Tue, Mar 31, 2020 at 12:09 PM Mark Ruzindana  > wrote:
> Hi all,
> 
> I am fairly new to asking questions on a forum so if I need to provide more 
> details, please let me know. 
> 
> Worth noting that just as I was about to send this out, I checked and I don't 
> have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh among 
> other additions and modifications. So this might fix my problem, but maybe 
> not and someone else has more insight. I will update everyone if it does.
> 
> I am trying to reduce the number of packets lost/dropped when running 
> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and diagnostics 
> to be confident that the problem is not any HASHPIPE thread running for too 
> long. Also, the percentage of packets dropped on any given scan is between 
> about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second scan with a total of 
> 1,650,000 packets. So while it's a small percentage, the number of packets 
> lost is still quite large. I have also done enough tests with 'top', 'iostat' 
> as well as timing HASHPIPE in between time windows where there are no packets 
> dropped to diagnose the issue further. I (as well as my colleagues) have come 
> to the conclusion that the kernel is allowing processes to interrupt HASHPIPE 
> as it is running. 
> 
> So I have researched and run tests involving 'niceness' and I am currently 
> trying to configure smp affinities and irq balancing, but the changes that I 
> make to the smp_affinity files aren't doing anything. My plan was to have the 
> interrupts run on the 20 cores that aren't being used by HASHPIPE. Also, 
> disabling 'irqbalance' didn't do anything either. I also restarted the 
> machine to see whether the changes made are permanent, but the system reverts 
> back to what it was.
> 
> I might be missing something, or trying the wrong things. Has anyone 
> experienced this? And could you point me in the right direction if you have 
> any insight?
> 
> If you need anymore details, please let me know. I didn't add as much as I 
> could because I wanted this to be a reasonably sized message.
> 
> Thanks,
> 
> Mark Ruzindana
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu " group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxcwSQT-EsjuyqXpGmmBzykDeLt6JbfUUg_ZYpkXyat2w%40mail.gmail.com
>  
> 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread David MacMahon
Hi, Mark,

That packet rate should be very manageable.  Are you using the standard 
"socket()" and "recv()" functions or are you using packet sockets?  Packet 
sockets are a more efficient way to get packets from the kernel that bypasses 
the kernel's IP stack.  It's not as efficient as IBVerbs or DPDK, but it is 
widely supported and should be more than adequate for the packet/data rates you 
are dealing with.  Hashpipe has functions that make it easy to work with packet 
sockets by providing a somewhat higher level interface to them.  If your 
version of Hashpipe doesn't have a "hashpipe_pktsock.h" then you should update 
for sure.

HTH,
Dave

> On Mar 31, 2020, at 12:09, Mark Ruzindana  wrote:
> 
> Hi all,
> 
> I am fairly new to asking questions on a forum so if I need to provide more 
> details, please let me know. 
> 
> Worth noting that just as I was about to send this out, I checked and I don't 
> have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh among 
> other additions and modifications. So this might fix my problem, but maybe 
> not and someone else has more insight. I will update everyone if it does.
> 
> I am trying to reduce the number of packets lost/dropped when running 
> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and diagnostics 
> to be confident that the problem is not any HASHPIPE thread running for too 
> long. Also, the percentage of packets dropped on any given scan is between 
> about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second scan with a total of 
> 1,650,000 packets. So while it's a small percentage, the number of packets 
> lost is still quite large. I have also done enough tests with 'top', 'iostat' 
> as well as timing HASHPIPE in between time windows where there are no packets 
> dropped to diagnose the issue further. I (as well as my colleagues) have come 
> to the conclusion that the kernel is allowing processes to interrupt HASHPIPE 
> as it is running. 
> 
> So I have researched and run tests involving 'niceness' and I am currently 
> trying to configure smp affinities and irq balancing, but the changes that I 
> make to the smp_affinity files aren't doing anything. My plan was to have the 
> interrupts run on the 20 cores that aren't being used by HASHPIPE. Also, 
> disabling 'irqbalance' didn't do anything either. I also restarted the 
> machine to see whether the changes made are permanent, but the system reverts 
> back to what it was.
> 
> I might be missing something, or trying the wrong things. Has anyone 
> experienced this? And could you point me in the right direction if you have 
> any insight?
> 
> If you need anymore details, please let me know. I didn't add as much as I 
> could because I wanted this to be a reasonably sized message.
> 
> Thanks,
> 
> Mark Ruzindana
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxcwSQT-EsjuyqXpGmmBzykDeLt6JbfUUg_ZYpkXyat2w%40mail.gmail.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/C3EFBF88-75DA-477C-A28A-D3235996E0FB%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread John Ford
Hi Mark.  Since the newer version has a script called
"hashpipe_irqaffinity.sh" I would think that the most expedient thing to do
is to upgrade to the newer version.  It's likely to fix some or all of this.

That said, there are a lot of things that you can check, and not only the
irq affinity, but also make sure that your network tuning is good, that
your network card irqs are attached to processes where the memory is local
to that processor, and that the hashpipe threads are mapped to processor
cores that are also local to that memory.   Sometimes it's
counterproductive to map processes to processor cores by themselves if they
need data that is produced by a different core that's far away, NUMA-wise.
And lock all the memory in core with mlockall() or one of his friends.

Good luck with it!

John




On Tue, Mar 31, 2020 at 12:09 PM Mark Ruzindana  wrote:

> Hi all,
>
> I am fairly new to asking questions on a forum so if I need to provide
> more details, please let me know.
>
> Worth noting that just as I was about to send this out, I checked and I
> don't have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh
> among other additions and modifications. So this might fix my problem, but
> maybe not and someone else has more insight. I will update everyone if it
> does.
>
> I am trying to reduce the number of packets lost/dropped when running
> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and
> diagnostics to be confident that the problem is not any HASHPIPE thread
> running for too long. Also, the percentage of packets dropped on any given
> scan is between about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second
> scan with a total of 1,650,000 packets. So while it's a small percentage,
> the number of packets lost is still quite large. I have also done enough
> tests with 'top', 'iostat' as well as timing HASHPIPE in between time
> windows where there are no packets dropped to diagnose the issue further. I
> (as well as my colleagues) have come to the conclusion that the kernel is
> allowing processes to interrupt HASHPIPE as it is running.
>
> So I have researched and run tests involving 'niceness' and I am currently
> trying to configure smp affinities and irq balancing, but the changes that
> I make to the smp_affinity files aren't doing anything. My plan was to have
> the interrupts run on the 20 cores that aren't being used by HASHPIPE.
> Also, disabling 'irqbalance' didn't do anything either. I also restarted
> the machine to see whether the changes made are permanent, but the system
> reverts back to what it was.
>
> I might be missing something, or trying the wrong things. Has anyone
> experienced this? And could you point me in the right direction if you have
> any insight?
>
> If you need anymore details, please let me know. I didn't add as much as I
> could because I wanted this to be a reasonably sized message.
>
> Thanks,
>
> Mark Ruzindana
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxcwSQT-EsjuyqXpGmmBzykDeLt6JbfUUg_ZYpkXyat2w%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CABmH8B_4MoNDsO4yZNYH608u6DVtbSPkKz0YBS8%2Bb%3DffqS%3DwaA%40mail.gmail.com.