[R] Working with necessary columns in R (CSV)

2010-11-17 Thread arturs . onzuls
Hi all. It will be great if some one will help me to solve my home task. So,
the deal : i have .pcap file, i convert it to csv using tcpdump (tcpdump -tt
-n -r x.pcap  x.csv)

CSV file looks like that :

12890084,761659 IP 10.10.20.20.47808  10.10.20.255.47808: UDP, length 12
12890084,761659 IP 10.10.20.20.47808  10.10.20.255.47808: TCP, length 12
12890084,761659 IP 10.10.20.20.47808  10.10.20.255.47808: HTML, length 12
...

10 rows.

Now, i need to open csv in R, and solve 5 problems, but i need to work only
with UDP packets (not TCP,HTMP...). For example i need to count how many
UDP packets are there, max and min time in UDP and so on. I see only two
answers.. i need to scan (but how?) for UDP or i need to separate this
csv, cut only needed rows, and work with them. Please help.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Working with necessary columns in R (CSV)

2010-11-17 Thread Petr Savicky
On Wed, Nov 17, 2010 at 08:19:53PM +0200, arturs.onz...@gmail.com wrote:
 Hi all. It will be great if some one will help me to solve my home task. So,
 the deal : i have .pcap file, i convert it to csv using tcpdump (tcpdump -tt
 -n -r x.pcap  x.csv)
 
 CSV file looks like that :
 
 12890084,761659 IP 10.10.20.20.47808  10.10.20.255.47808: UDP, length 12
 12890084,761659 IP 10.10.20.20.47808  10.10.20.255.47808: TCP, length 12
 12890084,761659 IP 10.10.20.20.47808  10.10.20.255.47808: HTML, length 12
 ...
 
 10 rows.
 
 Now, i need to open csv in R, and solve 5 problems, but i need to work only
 with UDP packets (not TCP,HTMP...). For example i need to count how many
 UDP packets are there, max and min time in UDP and so on. I see only two
 answers.. i need to scan (but how?) for UDP or i need to separate this
 csv, cut only needed rows, and work with them. Please help.

You can read the file into R and extract only UDP rows for example

  all - read.csv(x.csv, stringsAsFactors=FALSE, header=FALSE) # assuming 
there is no header
  udp - all[grep( UDP$, all[, 2]), ]

Using concatenation of three copies of your 3 rows, we get

  all
  V1 V2 V3
  1 12890084  761659 IP 10.10.20.20.47808  10.10.20.255.47808: UDP  length 12
  2 12890084  761659 IP 10.10.20.20.47808  10.10.20.255.47808: TCP  length 12
  3 12890084 761659 IP 10.10.20.20.47808  10.10.20.255.47808: HTML  length 12
  4 12890084  761659 IP 10.10.20.20.47808  10.10.20.255.47808: UDP  length 12
  5 12890084  761659 IP 10.10.20.20.47808  10.10.20.255.47808: TCP  length 12
  6 12890084 761659 IP 10.10.20.20.47808  10.10.20.255.47808: HTML  length 12
  7 12890084  761659 IP 10.10.20.20.47808  10.10.20.255.47808: UDP  length 12
  8 12890084  761659 IP 10.10.20.20.47808  10.10.20.255.47808: TCP  length 12
  9 12890084 761659 IP 10.10.20.20.47808  10.10.20.255.47808: HTML  length 12

  udp
  V1V2 V3
  1 12890084 761659 IP 10.10.20.20.47808  10.10.20.255.47808: UDP  length 12
  4 12890084 761659 IP 10.10.20.20.47808  10.10.20.255.47808: UDP  length 12
  7 12890084 761659 IP 10.10.20.20.47808  10.10.20.255.47808: UDP  length 12

Note that there are three columns only, since your input had only three fields
per line. If you change the export to .csv so that, for example, column 2 
contains
only the protocol name, you could use

  table(all[, 2])

to get the number of occurrences of each protocol or

  sum(all[, 2] == UDP)

to get the number of UDP rows or

  udp - all[all[, 2] == UDP, ]

to extract only UDP rows.

If you cannot change the export to .csv, you can use the function strsplit().

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.