Definitely rely on dictionaries and sets more than lists. Finding in a
list is O(n) while dicts and sets (finding membership & etc) is
O(log(n)), which will speed up your code for very little effort.

--Rory

On Fri, Jan 9, 2009 at 3:24 PM, Daniel Kersten <[email protected]> wrote:
>
> Hi Rory,
>
> The main slowdown is that messages depend on previous messages, so I
> need to match them together. This becomes slow if I have to search
> through too many messages.
>
> It is somewhat inefficient and I am working on optimising the
> algorithm as we speak, especially in these areas:
>  1) replacing lists with dictionaries, since lists get horribly slow
> as they grow.
>  2) either processing messages concurrently or doing multiple checks
> on a single message concurrently.
>  3) Diarmuid suggested perhaps use a separate process to prefetch from
> the log files, this may work since I don't waste time waiting on disk
> IO.
>
> There is one more area which will improve the speed drastically - I
> was lucky in that it has been decided to change the tests slightly -
> as a side effect, I can now make assumptions I couldn't before which
> will (I think) remove one of the lists which I wasn't able to
> dictionary-ify.
>
> Here is what currently happens, in pseudocode:
>
> for entry in log_file:
>    current_message = parse_using_regex(entry)
>    expected_messages_for_type =
> list_of_expected_messages[current_message.message_type]
>    expected_messages =
> expected_messages_for_type[current_message.id_dependant_on_message_type]
>    for message in expected_messages:
>        if message == current_message:
>           remove_from_expected_messages(message)
>           new_messages = get_next_expected_messages(current_message)
>           if new_messages is not None:
>               add_to_expected_messages(new_messages)
>               message_from_middle_operation()
>           else:
>               operation_completed()
>
> One thing to note is that the last line may add more than one message
> (and the remove should remove them all) because some messages can
> expect one of a selection of alternatives - this can now be removed, I
> believe.
>
> Theres more going on than that.. In fact, theres quite a lot of
> complexity added because messages from different "operations" are all
> almost the same, so to determine the next expected message(s) requires
> some convoluted logic. Gonna try figure out a better way, especially
> as I don't think I'll need to deal with alternative messages in one
> operation anymore.
>
> Hope that explained it somewhat. If you have any obvious suggestions
> on speeding up the code further - please share! Unfortunately, I can't
> share the code.. The code itself is sharable really, but the data
> operated on (messages and operations) is not and can be inferred from
> the code.
>
>
>
> Still, the point of my email was to show that even a naive use of
> cython (compiling Python code without modification) can give quite a
> good speed increase! (I would have simply wrote "hey i used cython and
> my code is faster" but I figured some background would be nice)
>
> 2009/1/9 Rory Geoghegan <[email protected]>:
>>
>> Not to pick at you, but an hour and a half sounds a bit long for 171k
>> entries. If you did not write the code for corporate interests, would
>> you mind sharing it, on something like github or bitbicket?
>>
>> I mean, I know nothing of a) the problem you are trying to solve b)
>> the constraints you are facing in programming a solution so I will
>> obstinately hold down my views that there must be a problem with
>> either the algorithm you employ or the way you've coded it, even in
>> the face of precise and clear evidence.
>>
>> --Rory
>>
>> On Fri, Jan 9, 2009 at 2:01 PM, Daniel Kersten <[email protected]> wrote:
>>>
>>> Hi all,
>>>
>>> I wrote a Python program to generate a report from log files. The log
>>> files are generated by a test-suite used to test a java program. The
>>> report gives me details about how many messages (its a message based
>>> program being tested) were processed, how long each operation (which
>>> may consist of processing one or more message) took, counts of which
>>> operations passed or didnt pass, messages processed per second etc.
>>> The idea is that the main program can be run over a weekend or week or
>>> whatever and the log files from the test suite are checked by my
>>> Python program.
>>>
>>> The log files can be huge.
>>>
>>> Yesterday, I ran my program on a log file with 171K entries - it took
>>> an hour and a half! (This is why I'm interested in the sppedup patch)
>>> There are some algorithmic changes which would be beneficial, but that
>>> would require significant code restructuring which, right now, I dont
>>> have time for. So I'm looking for simpler ways.
>>>
>>> I decided to give Cython (cython.org) a shot, since it compiles Python
>>> code to C. IT supports almost all of Pythons constructs, the only
>>> major limitation (IMHO - that is, the only feature I really use which
>>> Cython does not support) being nested functions and lambdas. Removing
>>> them from my code slowed it down a small bit, due to one of my
>>> functions accessing a variable from the outer scope, so I couldn't
>>> simply move it into the global scope - and I couldn't pass it as an
>>> argument because I was storing the function as a callback.
>>> Besides that, I made NO other changes to my Python code.
>>>
>>> The code that took 1 hour and 32 minutes to execute with the pure
>>> python version completed in 48 minutes!!
>>>
>>> This can be improved more still, by strategically declaring functions
>>> and variables as C types.
>>>
>>> Just thought I'd share, in case someone else needs more performance
>>> out of their Python and doesn't know where to turn.
>>>
>>> --
>>> Daniel Kersten.
>>> Leveraging dynamic paradigms since the synergies of 1985.
>>>
>>> >
>>>
>>
>> >
>>
>
>
>
> --
> Daniel Kersten.
> Leveraging dynamic paradigms since the synergies of 1985.
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Python Ireland" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.ie/group/pythonireland?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to