I don't think we should consider open/close events when calculating these relations. That way it wont work for contacts and other non-file- like items.
The initial step of the algorithm: "Fetch the last 7 events for this subject uri" seems good. The next step where you create a time range neighbourhood around each of these events, is a bit unclear to me... You create the neighbourhood as (event.timestamp, <next_event_timestamp>). This seems odd at a glance. Why not (event.timestamp - delta, event.timestamp + delta) ? Next thing is that I think you can do the two last steps of the algorithm in one SQL query. Ie. the parts where you create the k_tuples and the part where you calculate the support of the k_tuples. Possibly: SELECT subj_uri, count(subject_uri) FROM event_view WHERE (timestamp > ? AND timestamp < ?) OR (timestamp > ? timestamp < ?) OR (...) ... GROUP BY subj_uri ORDER BY timestamp ASC LIMIT 5 I am sure Siegfried can do this even better though :-D -- "apriori": get most used (websites/notes/documents/etc...) https://bugs.launchpad.net/bugs/494288 You received this bug notification because you are a member of Zeitgeist Framework, which is the registrant for Zeitgeist Framework. Status in Zeitgeist Framework: New Bug description: We have a branch with the 1-step apriori algorithm built. Right now it throws out the most used items with another item We should make it configurable to be able to ask for most used interpretations of items with other items This way we can for example ask for most used "websites" with document X etc.... what do u think? _______________________________________________ Mailing list: https://launchpad.net/~zeitgeist Post to : email@example.com Unsubscribe : https://launchpad.net/~zeitgeist More help : https://help.launchpad.net/ListHelp