I have attempted to write myself a parser for the Apache Log formats, and 
starting off, I seem to have a parser that works for the two main parts I am 
looking at now: ip and date. The problem is the transform.

My transform looks like


class WebLogTransform < Parslet::Transform
 rule(:wordmonth => simple(:month)) {
   ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 
'Nov', 'Dec'].index(month) + 1
 }
 rule(:rawdate => simple(:date)) { 
   DateTime.new(Integer(Date.year), Integer(Date.month), Integer(Date.day))
 }
end

(* note I just added the conversion for wordmonth, and rawdate depends on it, 
not sure best way to handle that).

This thing doesn't affect anything.  I would like to think that I have gotten a 
handle on everything else.

Any suggestions?

The output is like this:

=> {:IP=>"137.207.74.55"@0, :rawdate=>{:day=>"08"@19, 
:month=>{:month=>"Feb"@22}, :year=>"2013"@26, :hour=>"19"@31, :minute=>"28"@34, 
:second=>"10"@37, :timezone=>{:tzpm=>"-"@40, :tz=>"0500"@41}}}

The full parser file is below. I am amazed at how clean this is, much easier to 
read than Boost::Spirit.

With thanks,

Jeffrey Drake.



#!/usr/bin/env ruby

require 'parslet' 
require 'date'

class WebLog < Parslet::Parser
 rule(:integer)    { match('[0-9]').repeat(1) }
 rule(:space)      { match('\s').repeat(1) }
 rule(:space?)     { space.maybe }
 rule(:dot)        { match('.') }

 rule(:month)      { (str('Jan') | str('Feb') |
                      str('Mar') | str('Apr') |
                      str('May') | str('Jun') | 
                      str('Jul') | str('Aug') |
                      str('Sep') | str('Oct') | 
                      str('Nov') | str('Dec')).as(:wordmonth) >> space? 
                    }

 rule(:timezone)   { match('[+-]').as(:tzpm) >> integer.as(:tz) >> space? }

 rule(:date)       { str('[') >> integer.as(:day) >> 
                     str('/') >> month.as(:month) >>
                     str('/') >> integer.as(:year) >>
                     str(':') >> integer.as(:hour) >>
                     str(':') >> integer.as(:minute) >>
                     str(':') >> integer.as(:second) >>
                     space? >> timezone.as(:timezone) >>
                     str(']') 
                   }


 rule(:ipaddr)     { integer >> dot >> 
                     integer >> dot >> 
                     integer >> dot >> 
                     integer }

 rule(:weblog)     { ipaddr.as(:IP) >> space? >> 
                     str('-') >> space? >> str('-') >> space? >> 
date.as(:rawdate) 
                   }
 root :weblog
end



class WebLogTransform < Parslet::Transform
 rule(:wordmonth => simple(:month)) {
   ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 
'Nov', 'Dec'].index(month) + 1
 }
 rule(:rawdate => simple(:date)) { 
   DateTime.new(Integer(Date.year), Integer(Date.month), Integer(Date.day))
 }
end



def parse(str)
 log = WebLog.new
 trans = WebLogTransform.new

 puts trans.apply(log.parse(str))
rescue Parslet::ParseFailed => failure
 puts failure.cause.ascii_tree
end

parse "137.207.74.55 - - [08/Feb/2013:19:28:10 -0500]"

Reply via email to