I am trying to parse a build log for errors. I figure I can do this one of three ways: - find the absolute platonic form of an error and search for that item - create definitions of what patterns describe errors for each tool which is used (ant, MSDEV, etc). - rework the build such that all the return codes for all 3rd party are captured and logged using my own error description
The return code method has a high level of effort attached and spreads the responsibility for the task quite widely unless I can write a little command pattern like wrapper script (which is a possibility). Still it would mean all of the 100s of calls to tools would have to wrapped and if any place a developer changed this, the build would leak errors. The relativism/definition based approach means that I must be sure to capture all error cases which may prove difficult and false negatives are a really dangerous problem. In the Platonic/absolutist camp, I could define an error as an instance of a word or phrase from the "bad list" that is not in a filename or path. Bad List: [error, fatal, killed, not found]. Were I to go this way, I'd be faced with a major problem: in a world where symbols and whitespace can be included in a path how can I extract a path from a line of text? Ugly Valid Paths: C:\Program Files\A File Named Error .txt /usr/#a file named error #.txt This means that determining the boundaries of a path is non trivial. FileNames: Many build tools list filenames without their full path. All of my product's files are <text>.<ext>, so that is a pattern that I might be able to locate .+\..+ perhaps Paths: on windows all of my paths will start with [A-Z]:\ or \\ on unix the will tend to start with ./ or /. Finding the starting point is not too difficult, but its that ending that's hard I could generate a substring for each of the starting types and then look at what came before. for sep in [letterStart, uncStart, unixrootedStart, unixpwdStart]: # create sub string prePath = line.split(elem)[0] checkForBadWords(prePath) I could then split the postPath segment on the os.sep and the check for unlikely cases in the list elements - double spaces within a path element - symbol characters within an element (although this is a little dicey) Since I am parsing the log on the system on which it was generated, for each path I could do an os.path.exists on the potential path. If someone happens to know of a good method of extracting weird paths out of logs, I'd be interested in hearing about it. -- http://mail.python.org/mailman/listinfo/python-list