Re: [users@httpd] LogFormat Combined - many logfile lines with no Referer or User-agent

2011-07-28 Thread Marcin 'Rambo' Roguski
   There doesn't seem to be any pattern to client IP address, browser, etc.

I know for a fact, that certain browsers (most versions of IE for example), 
don't send
referer when request is induced via JavaScript. Several firewalls strip these 
by default, too.

-
The official User-To-User support forum of the Apache HTTP Server Project.
See URL:http://httpd.apache.org/userslist.html for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
  from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] LogFormat Combined - many logfile lines with no Referer or User-agent

2011-07-28 Thread Rich Bowen

On Jul 28, 2011, at 3:20 AM, Terry Kennedy wrote:

  I'm using the default LogFormat combined directive in my httpd.conf
 file. That should generate logfile lines using this pattern:
 
 %h %l %u %t \%r\ %s %b \%{Referer}i\ \%{User-Agent}i\
 
  There have always been occasional entries which don't contain the last
 2 fields, for some reason.

These are optional fields which *may* be passed by a user agent. When they are 
passed, they are not reliable - that is, they may be spoofed, trivially.

 
  However, I have observed a HUGE increase in the number of logfile lines
 missing these two fields, starting early in June, 2011.

It would be interesting to see what version of what browser released in the 
last 30 days.


  Even if the Referer and User-agent data was missing for some reason, I
 would have expected httpd to log lines ending in   or possibly - -,
 since there are escaped literal quotes on either side of both of those 
 fields in the LogFormat config line. This makes me think that it is some-
 thing going on inside Apache (perhaps triggered by some external change).

Oh. Hmm. That's interesting. What I would look for, in that case, is more than 
one LogFormat directive logging to the same location.

--
Rich Bowen
rbo...@rcbowen.com
rbo...@apache.org







-
The official User-To-User support forum of the Apache HTTP Server Project.
See URL:http://httpd.apache.org/userslist.html for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
  from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] LogFormat Combined - many logfile lines with no Referer or User-agent

2011-07-28 Thread Stormy

At 09:40 AM 7/28/2011 -0400, Rich Bowen wrote:
[snip]

  However, I have observed a HUGE increase in the number of logfile lines
 missing these two fields, starting early in June, 2011.

It would be interesting to see what version of what browser released in 
the last 30 days.


FireFox 5 ... ???


Paul
Tired old sys-admin 



-
The official User-To-User support forum of the Apache HTTP Server Project.
See URL:http://httpd.apache.org/userslist.html for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
 from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] LogFormat Combined - many logfile lines with no Referer or User-agent

2011-07-28 Thread Terry Kennedy
Rich Bowen wrote:

 These are optional fields which *may* be passed by a user agent. When they
 are passed, they are not reliable - that is, they may be spoofed, trivially.

  Understood. I'm not depending on them for any decision-making.

  The issue is that Analog discards those lines, so (for example) requests
logged for a particular file (which are missing those two fields) are dis-
carded and not counted for purpose of things like top 25 requested files.

  Also, they're completely absent, despite the escaped s in the LogFormat
directive which should generate either   or - - when the fields are
missing.

 It would be interesting to see what version of what browser released in the 
 last 30 days.

  Most of the clients accessing the site in question are using ancient
browsers - in one case where I investigated fully, the client PC is running
Windows 2000 and IE 6. Some of its accesses had the Referer and User-Agent
logged, while others had them missing.

  One system where I have logs going back 2+ years shows a number of entries
with missing fields at a reasonably constant rate (200 to 5000 per month),
with no big jump. Oddly, that's the system where I'd expect new client ver-
sions (like Firefox 5) to show up, yet the number of logged lines where the
fields are missing remains relatively constant.

  It seems that either both fields are properly present, or both are missing.
I was unable to locate any log lines which had either a Referer or - but
which were missing the User-Agent field.

 Oh. Hmm. That's interesting. What I would look for, in that case, is more
 than one LogFormat directive logging to the same location.

  I thought of that and checked it previously. However, I just checked it
again (Apache 2.0.63 system):

(0:12) www:/usr/local/etc/apache2# grep CustomLog *
httpd.conf:# a CustomLog directive (see below).
httpd.conf:#CustomLog /var/log/httpd-access.log common
httpd.conf:#CustomLog /var/log/httpd-referer.log referer
httpd.conf:#CustomLog /var/log/httpd-agent.log agent
httpd.conf:CustomLog /var/log/httpd-access.log combined
httpd.conf:#CustomLog /var/log/dummy-host.example.com-access_log common
httpd.conf:CustomLog /var/log/httpd-deflate.log deflate
ssl.conf:CustomLog /var/log/httpd-ssl_request.log \
ssl.conf_orig:CustomLog /var/log/httpd-ssl_request.log \

  I only see 3 uncommented CustomLog directives, one for a combined log,
a separate one that logs deflate info, and a third one for SSL requests.

  There also isn't any discernable pattern to the entries with the missing
fields - some CGI requests are logged with them, some without. Same for PHP.
Some are for 404's, some are for successful file access.

  I'm baffled. I wonder if anyone else is having the same issue, but didn't
notice it. For example, Analog will only complain about Large number of 
corrupt lines in logfile if they exceed a certain percentage threshold of
the total number of lines in the log file.

  The following (disgusting, I really should use awk) command string should
report the total number of lines missing the Referer and User-Agent fields
in a combined-format logfile, at least if the default timestamp format is
used:

cut -d \] -f 2-99 /var/log/httpd-access.log | cut -d \ -f 3-99 | cut -d   -f 
4-99 | grep ^$ | wc -l

  Anybody want to try it? (Of course, satisfy yourself that it can't do
anything evil first).

  On two of my production systems running 2.0.63:

(0:23) www:/tmp# cut -d \] -f 2-99 /var/log/httpd-access.log | cut -d \ -f 
3-99 | cut -d   -f 4-99 | grep ^$ | wc -l
  743308
(0:24) www:/tmp# wc -l /var/log/httpd-access.log
 4802394 /var/log/httpd-access.log

(0:175) gate:/tmp# cut -d \] -f 2-99 /var/log/httpd-access.log | cut -d \ -f 
3-99 | cut -d   -f 4-99 | grep ^$ | wc -l
   99583
(0:176) gate:/tmp# wc -l /var/log/httpd-access.log
 3658733 /var/log/httpd-access.log

  On a 2.2.19 test system I just brought up:

(0:36) test:/tmp# cut -d \] -f 2-99 /var/log/httpd-access.log | cut -d \ -f 
3-99 | cut -d   -f 4-99 | grep ^$ | wc -l
 433
(0:37) test:/tmp# wc -l /var/log/httpd-access.log 
1321 /var/log/httpd-access.log

  The test system is particularly interesting as I did NOT copy the Apache
configuration files from a production system - I configured it by editing
the default config files. So this shouldn't be a cut-and-paste error.

Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA

-
The official User-To-User support forum of the Apache HTTP Server Project.
See URL:http://httpd.apache.org/userslist.html for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
  from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org