Re: Pcap2Arrow - Packet capture and data conversion tool to Apache Arrow on the fly
Nice work, glad Arrow proved useful. On Mon, Feb 15, 2021 at 11:44 PM Kohei KaiGai wrote: > Hello, > > Let me share my recent works below: > https://github.com/heterodb/pg-strom/wiki/804:-Pcap2Arrow > > This standalone command-line tool allows to capture network packets > from network interface devices, > and convert them into Apache Arrow data format according to the > pre-defined data schema for each > supported protocol (TCP, UDP, ICMP x IPv4, IPv6), then write out the > destination files. > > It internally uses PF_RING [*1] to support fast network interface card > (> 10Gb), and to minimize > packet losses by utilization of multi-core CPUs. > Even though I confirmed that Pcap2Arrow write out the captured network > packets more than > 50Gb/s ratio, my test cases are artificial and biased traffic patterns. > If you can test the software on your environment, it makes sense to > improve the software. > [*1] https://www.ntop.org/products/packet-capture/pf_ring/ > > As you may know, network traffic data tends to grow so large, thus, it > is not easy to import > them into database systems for analytics. Once we can convert them > into Apache Arrow, > we don't need to import the captured data again. Just map the files > prior to analytics. > > Best regards, > -- > HeteroDB, Inc / The PG-Strom Project > KaiGai Kohei >
Pcap2Arrow - Packet capture and data conversion tool to Apache Arrow on the fly
Hello, Let me share my recent works below: https://github.com/heterodb/pg-strom/wiki/804:-Pcap2Arrow This standalone command-line tool allows to capture network packets from network interface devices, and convert them into Apache Arrow data format according to the pre-defined data schema for each supported protocol (TCP, UDP, ICMP x IPv4, IPv6), then write out the destination files. It internally uses PF_RING [*1] to support fast network interface card (> 10Gb), and to minimize packet losses by utilization of multi-core CPUs. Even though I confirmed that Pcap2Arrow write out the captured network packets more than 50Gb/s ratio, my test cases are artificial and biased traffic patterns. If you can test the software on your environment, it makes sense to improve the software. [*1] https://www.ntop.org/products/packet-capture/pf_ring/ As you may know, network traffic data tends to grow so large, thus, it is not easy to import them into database systems for analytics. Once we can convert them into Apache Arrow, we don't need to import the captured data again. Just map the files prior to analytics. Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei