2 ¢:

 

I'd try a hash table, created from the 9k DNS entries.

 

Then spin the 260k file checking its IP addresses against the hash table
keys.

 

 

 

From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com]
On Behalf Of Michael B. Smith
Sent: Tuesday, May 24, 2016 4:03 pm
To: powershell@lists.myitforum.com
Subject: [powershell] RE: CSV Comparisons:

 

Add an additional field in both files that is the uint32 version of the IP.
Sort both files ascending by the new field.

 

Since the files are already sorted, compare-object will be very fast. If you
want it even faster, you can create a couple of while() loops yourself to do
the comparisons.

 

There are a variety of mechanisms for converting the text-ip to uint32-ip.
The easiest is not the fastest. :) Depends on how often you will need to do
this as to whether that’s relevant.

 

Probably the easiest:

 

               ( ‘192.168.231.10’ –as [IpAddress] ).Address

 

From: listsad...@lists.myitforum.com <mailto:listsad...@lists.myitforum.com>
[mailto:listsad...@lists.myitforum.com] On Behalf Of Orlebeck, Geoffrey
Sent: Tuesday, May 24, 2016 6:03 PM
To: 'powershell@lists.myitforum.com'
Subject: [powershell] CSV Comparisons:

 

I have two CSV files. One is a list of IP addresses and DNS hostnames. The
other has IP, DNS name (if it resolves) as well as “firstseen” and
“lastseen” on the network. I am attempting to compare the list of IP
addresses to the second list that contains additional fields.

 

The dataset has the following format:



 

The goal is to loop through the list of IP addresses and check for matches
in the CSV file. If found, I want to note the entire row with all fields
(IP,Name,FirstSeen,LastSeen). I’m not sure of the most efficient way to
process this. The smaller list is ~9k entries. The larger list has over 260k
entries. On a small scale I am able to see this logic will at least give me
a list of IP addresses found in both files:

 

$List = Import-csv C:\IP_Addresses.csv

$CSV = Import-Csv C:\DB_Export.csv 

$Matches = $List | % {$CSV.ip -like $_.ip} 

 

The efficiency of this method is where I’m concerned. I’m currently working
with a limited subset of the data (about 500 entries each), and it takes a
few minutes to parse. However, with the full list being 9,000 IP addresses
against 260,000+ rows in $CSV, I have a feeling it will take quite some time
to process.

 

I’m wondering if anyone has any examples of such data comparisons. My end
goal is to parse the 9k IP addresses, compare them against our 260k entries,
and eventually filter based on the ‘LastSeen’ field. My gut says using
Compare-Object or the above example is a bad process, I’m just not sure how
else to go about processing such large files. Basically I read about the
differences between Get-Content versus Streamreader here
<https://foxdeploy.com/2016/03/23/coding-for-speed/>  and thought there is
probably a better way than the built-in cmdlets I am attempting to use.

 

Thank you.

 

-Geoff

Confidentiality Notice: This is a transmission from Community Hospital of
the Monterey Peninsula. This message and any attached documents may be
confidential and contain information protected by state and federal medical
privacy statutes. They are intended only for the use of the addressee. If
you are not the intended recipient, any disclosure, copying, or distribution
of this information is strictly prohibited. If you received this
transmission in error, please accept our apologies and notify the sender.
Thank you. 

 

 




Reply via email to