Re: Pcap2Arrow - Packet capture and data conversion tool to Apache Arrow on the fly

2021-02-16 Thread Micah Kornfield
Nice work, glad Arrow proved useful.

On Mon, Feb 15, 2021 at 11:44 PM Kohei KaiGai  wrote:

> Hello,
>
> Let me share my recent works below:
> https://github.com/heterodb/pg-strom/wiki/804:-Pcap2Arrow
>
> This standalone command-line tool allows to capture network packets
> from network interface devices,
> and convert them into Apache Arrow data format according to the
> pre-defined data schema for each
> supported protocol (TCP, UDP, ICMP x IPv4, IPv6), then write out the
> destination files.
>
> It internally uses PF_RING [*1] to support fast network interface card
> (> 10Gb), and to minimize
> packet losses by utilization of multi-core CPUs.
> Even though I confirmed that Pcap2Arrow write out the captured network
> packets more than
> 50Gb/s ratio, my test cases are artificial and biased traffic patterns.
> If you can test the software on your environment, it makes sense to
> improve the software.
> [*1] https://www.ntop.org/products/packet-capture/pf_ring/
>
> As you may know, network traffic data tends to grow so large, thus, it
> is not easy to import
> them into database systems for analytics. Once we can convert them
> into Apache Arrow,
> we don't need to import the captured data again. Just map the files
> prior to analytics.
>
> Best regards,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei 
>


Pcap2Arrow - Packet capture and data conversion tool to Apache Arrow on the fly

2021-02-15 Thread Kohei KaiGai
Hello,

Let me share my recent works below:
https://github.com/heterodb/pg-strom/wiki/804:-Pcap2Arrow

This standalone command-line tool allows to capture network packets
from network interface devices,
and convert them into Apache Arrow data format according to the
pre-defined data schema for each
supported protocol (TCP, UDP, ICMP x IPv4, IPv6), then write out the
destination files.

It internally uses PF_RING [*1] to support fast network interface card
(> 10Gb), and to minimize
packet losses by utilization of multi-core CPUs.
Even though I confirmed that Pcap2Arrow write out the captured network
packets more than
50Gb/s ratio, my test cases are artificial and biased traffic patterns.
If you can test the software on your environment, it makes sense to
improve the software.
[*1] https://www.ntop.org/products/packet-capture/pf_ring/

As you may know, network traffic data tends to grow so large, thus, it
is not easy to import
them into database systems for analytics. Once we can convert them
into Apache Arrow,
we don't need to import the captured data again. Just map the files
prior to analytics.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei