On 02/07/2014 02:40 PM, Christopher Wood wrote:
On Fri, Feb 07, 2014 at 10:49:29AM -0700, Brent Bice wrote:
(SNIP)
I've got a few OpenLDAP instances that I use for writing log data
to, so write performance is critical, but since I'm building it from
log data, absitively, posolutely, guaranteed perfect DB consistency
isn't. I can always replay log data to rebuild the DB if, say, I had
a power outage, the UPS failed, the RAID write-cache failed, the
planets aligned, and I lost data. :-)
Out of interest, what are you using this log data for, and have you tested
how many reads you are getting?
In the recent past, I've setup a java script to log
postfix/sendmail/cuda logs to OpenLDAP and some simple php scripts to
query it. 'Makes it easier for junior admins and managerial types to be
able to track how an email got from point A to point B. Say, an Exchange
user sent an email to an internal list server - so it went from exchange
to a postfix relay to the list server, then back to the postfix relays
then to some recips on Exchange, some recips on other lists, some on
departmental mail servers using sendmail, etc. I can search by to/from
and/or date/time, find the email, then click on the message-ID to search
by that and show the email every hop along the way as well as all the
recipients who got it. Makes it faster to sort out those "I sent an
email to list ABC and user XYZ didn't get it! Why not!" problems. The
answer usually is "user XYZ did get it and here's the log showing it". :-)
I also recently started logging DHCP client hostnames, IPs, MAC
Addresses, and (if the dhcp request came from our VPN hardware)
username. That way when I'm sifting through snort/FireEye/PaloAlto logs
and I see some IP with a dhcp hostname of "MyPC" I can quickly tell
which user's home machine is infected with malware-du-jour. I can see
who was on which IPs when.
Yeah, I coulda used MySQL or Postgres or something else. The first
one (the relay logs) started off as a weekend project to edjimicate
myself on the LDAP API in Java (or one of 'em). It proved useful enough
we just kept it. And since I had that in place, adding on the vpn/dhcp
stuff later was easy. I use the dds overlay to automagically throw away
records older than X days.
For both of those, the number of writes per second we do is low -
around 4 or 5 per second last I checked.
However, we have a lot of DNS servers in a lot of different
geographies and I've thought about trying to centralize their logs. But
the query logs can be substantial - a terabyte per region per day - more
than I really want to shove over the WAN to a central spot. So it
occurred to me one morning that I could leave the log data distributed,
but centralize how I query it. I could have one LDAP server that had
referrals to other LDAP servers, one per region, and have all the DNS
servers in a given region log their queries to their local LDAP server.
Then a simple php script can do one query against the root server and
find any query handled by any DNS server in any region. (useful when
handling an intrusion event, for instance, and you want to know every
DNS query made by some system between certain dates/times).
But SGI sells HPC equipment (big storage too, btw - grin). So it's
not unheard of for someone to spin up a big cluster in one location and
generate thousands of DNS queries per second. So any sort of logging I
do has to scale well or it just won't work. There's likely a better
way, but this gave me a good excuse to try out OpenLDAP + mdb on xfs and
to see if PHP's LDAP API would chase referrals. :-) I'll probably wind
up using some tool to index the textual query logs and some way to
search all the indexes on all the regional log servers with regex
patterns instead or somethin'...
(scrolls back) Oh yeah... Reads... I haven't been paying close
attention to the number of reads per second I've been getting as writes
and deletes were the bottleneck I was curious about. But the last time
I checked, I was getting something like 30k+ queries per second with 8
threads on one client. But this is with zero tuning of the filesystem
options and with a really simple-minded bit of java - this shouldn't be
taken as any sort of serious benchmark. I've learned that proper
benchmarking is HARD and I only use the java tool for rough guesstimates
(and comparing how different config options may improve performance - or
not - in a relative sort of way).
Brent