On Nov 12, 2007 4:49 AM, danil osipchuk <[EMAIL PROTECTED]> wrote: > XML has a tree-like structure and I would say . Passing all necessary > structures between the invocations of the tacit verb creates a lot of > progamming/runtime overhead.
Sure, so J programmers tend to reach for verbs which process as much data in one bite as they can find. > > How about > > get=: ". bind ] > > set=: 4 :'(x)=:y' > > Perfect, but is not it slow? Am I right that each invocation of the explicit > verb requires parsing? It takes some time, but that's slow only if you are "taking tiny bites, and not chewing your food". > I have huge files with log events (one event per line) that I would like to > analyze in the following way: > Get timestamp for each line. Determine the type of the message by looking > at the message text. Group events by this type. Then comes the analysys part > at which J excels (and I don't have questions in this area yet). Ok. (And I like how you have described this -- you did not say "break the log into lines" as your first step. That gets into micromanaging the process which can really slow things down.) I would build a verb which extracts time stamps for each line from the raw text. I would build another verb which extracts message type for each line from the raw text. Grouping probably becomes \. on line offset,length pairs, with message text as a global variable. > Problem: > I can not predict beforehand what will be the message string that identifies > each message type and what is its place in the log string. Therefore I would > like to build a dictionary that would held already known types (from the > beginning of the parsing). Each next string is compaired against the > dictionary. If there is a string containing the same substring as current - > they are of the same class and this substring identifies the type. Length of > substring is predefined. So my idea was to go line-by-line and maintain the > corresponding structure. Ok, you don't know what you are doing when you start log processing, and plan to figure it out as you go along. This plays away from J's strengths (which tend to leverage the programmer's knowledge of the problem domain). But you have to have some idea of how you would figure out what it is that you want to do, and you have not described that. That said, I would not hesitate to do multiple quick passes over the raw text. In other words, whatever your rules are for finding message types, I would simply apply them to the whole log, and then proceed from there. [And, by "simply apply" I do mean that I would focus on the simple aspects of these rules.] -- Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
