[Zeitgeist] [Bug 778140] Re: Time warp problem in MOVE_EVENT handling

2012-03-18 Thread Siegfried Gevatter
** Description changed:

  MOVE EVENTS
  
  
  PRESENTATION
  
  By definition, Zeitgeist's events are immutable, and the subject meta-data
  they contain is a snapshot of how a given resource was back when the event
  happened.
  
  To be useful, some way of linking event subjects to their physical
  representation is needed. The primary identifier for doing this is the
  subject's URI.
  
  However, URIs, especially local ones, are transient and may change. To solve
  this problem, a new field was added to subjects, and it is special in that
  it isn't considered to be immutable. This is the `current_uri' field.
  
  INITIAL IDEA
  
  When a subject is inserted, its `current_uri' field is initially set to the
  same value as its `uri' field. When Zeitgeist receives a MOVE_EVENT for that
  file (with a coherent timestamp), the value of `current_uri' is updated to
  its new file name.
  
  The idea here is that this is done in a way that, if we deleted the
  `current_uri' of all subjects and restored them looking at all MOVE_EVENTs
  in the database, the result would be the same as before.
  
  CURRENT IMPLEMENTATION
  
  As of now, `current_uri' is initially set to the same value as `current_uri'.
  Once a MOVE_EVENT is inserted, all events with a timestamp before that of the
  move are updated.
  
  However, after the point the MOVE_EVENT has been inserted, it is never
  considered again. This is so for performance reasons, since the initial plan
  would require pretty much rebuilding the database.
  
  PROBLEMS
  
  There are numerous problems with this implementation, at least in theoretical
  situations.
  
  One problem is that of events coming in after the MOVE_EVENT (maybe because
  the application is batching them). In this case they won't be updated.
  
  We also have the opposite problem, a MOVE_EVENT coming in late after another
  conflicting MOVE_EVENT happened. For instance, we have the following events:
-   T5 a.txt, T10 a.txt, T15 a.txt
+   T5 a.txt, T10 a.txt, T15 a.txt
  We receive a first MOVE_EVENT from a.txt to b.txt with timestamp T7. Now we
  have (time / current_uri):
-   T5 a.txt, T10 b.txt, T15 b.txt
+   T5 a.txt, T10 b.txt, T15 b.txt
  Finally, we receive a further MOVE_EVENT from a.txt to c.txt with timestamp 
T0.
  The result is:
-   T5 c.txt, T10 b.txt, T15 b.txt
+   T5 c.txt, T10 b.txt, T15 b.txt
  This is totally inconsistent; the correct result would have been:
-   T5 c.txt, T10 c.txt, T15 b.txt
+   T5 c.txt, T10 c.txt, T15 b.txt
  
  Further, even if implemented as described in the initial idea section, the
  concept is flawed in that it may happen that events are inserted
  retrospectively using already their updated URI. This could give rise to
  further inconsistencies.
  
  PROPOSAL
  
  No clear way to avoid this problem is evident. Maybe the best idea is to
  formalize the current behavior by documenting it and requesting that MOVE
  and DELETE events be inserted near real time (for local files).
  
- ADDITIONAL PROPOSAL
+ OUTSTANDING ISSUES
  
- So far we haven't taken resource deletions into account at all. However,
- those also affect the URI of a resource, in that it ceases to exist (and
- may be subsequently reused for an unrelated resource).
+ a) Deletion of MOVE_EVENT
+ What happens upon deletion of a MOVE_EVENT? Should the current_uri changes be 
reverted?
  
- For this reason, I propose that DELETE_EVENTs also update `current_uri'. In
- particular, they should change said URI to  (empty).
+ b) Insertion of other events
+ When inserting an event, should Zeitgeist check whether a MOVE_EVENT happened 
for that URI after the event's timestamp, and update it accordingly?
+ 
+ c) Directory
+ Should the insertion of a MOVE_EVENT with the renaming from 
file:///home/user/dir1 to file:///home/user/dir2 also update all events 
with uri file:///home/user/dir1/* to file:///home/user/dir2/*? I think so.
+ 
+ SEE ALSO
+ 
+ Related to this, please also check my proposal for improved DELETE_EVENT
+ handling in bug #954206.

** Description changed:

  MOVE EVENTS
  
  
  PRESENTATION
  
  By definition, Zeitgeist's events are immutable, and the subject meta-data
  they contain is a snapshot of how a given resource was back when the event
  happened.
  
  To be useful, some way of linking event subjects to their physical
  representation is needed. The primary identifier for doing this is the
  subject's URI.
  
  However, URIs, especially local ones, are transient and may change. To solve
  this problem, a new field was added to subjects, and it is special in that
  it isn't considered to be immutable. This is the `current_uri' field.
  
  INITIAL IDEA
  
  When a subject is inserted, its `current_uri' field is initially set to the
  same value as its `uri' field. When Zeitgeist receives a MOVE_EVENT for that
  file (with a coherent timestamp), the value of `current_uri' is updated to
  

[Zeitgeist] [Bug 778140] Re: Time warp problem in MOVE_EVENT handling

2012-02-18 Thread Siegfried Gevatter
** Description changed:

+ MOVE EVENTS
+ 
  
- RainCT seiflotfy: the query updating current_uri on MOVE_EVENT should only 
change stuff with timestampmove_event_timestamp
- seiflotfy RainCT, true
- seiflotfy good catch
- RainCT seiflotfy: there's also another ugly case
- seiflotfy RainCT, do tell
- RainCT seiflotfy: Imagine you insert event: 0. A, 1. A, 2. A, 3. B, 4. C, 
5. A. Then with timestamp between events 3 and 4 you get A-T, so now you have 
T, T, T, B, C, A
- [...]
- seiflotfy i see the problem
- seiflotfy the last A should be a T
- RainCT no, the last A is fine
- RainCT because it is a new file with the same name
- [...]
- seiflotfy yeah ok
- RainCT seiflotfy: now you get A-L with timestamp between 1 and 2, so it 
should have L, L, T, B, C, A, but since the current_uri of the first is 
already L you won't see it. for this it'd need to check the original URI 
instead of the current_uri
- RainCT are you with me so far?
- seiflotfy trying to
- seiflotfy RainCT, ok i cont get your last point
- seiflotfy A-L wont change the T
- seiflotfy because A has been move to T
- seiflotfy you can not move A again
- seiflotfy you need to move T
- seiflotfy thus its does not work
- RainCT yeah, but it should, because you're being told that it was moved 
before that
- seiflotfy unless you really want to you will have to look for all 
MOVE_EVENTS with A and figure out what A is now
- seiflotfy its doable
- RainCT so the move that happened later in time didn't affect those events, 
only the later ones
- seiflotfy RainCT, true
- RainCT the easy way to solve this is checking subj_id instead of 
subj_id_current
- seiflotfy RainCT, i think we should raise an exception
- RainCT but now when it gets really messed up is if there was even another 
move event before that
- RainCT which was already logged
- seiflotfy You tried to move and event after it was used in a new location
- seiflotfy RainCT, actually we also have the MOVE_EVENT logged
- seiflotfy you can then try to figrue out the patch of A
- RainCT yes, that's the solution
- seiflotfy the path
- seiflotfy RainCT, but i highly discourage that
- RainCT you can find the previos move event and set 
timestampprevious_move_event.timestamp
- seiflotfy RainCT, exactly
- seiflotfy i am +- 0 on that tbh
- seiflotfy not sure
- RainCT ok, I don't dislike finding the previous timestamp
- RainCT i'll open a bug
+ PRESENTATION
+ 
+ By definition, Zeitgeist's events are immutable, and the subject meta-data
+ they contain is a snapshot of how a given resource was back when the event
+ happened.
+ 
+ To be useful, some way of linking event subjects to their physical
+ representation is needed. The primary identifier for doing this is the
+ subject's URI.
+ 
+ However, URIs, especially local ones, are transient and may change. To solve
+ this problem, a new field was added to subjects, and it is special in that
+ it isn't considered to be immutable. This is the `current_uri' field.
+ 
+ INITIAL IDEA
+ 
+ When a subject is inserted, its `current_uri' field is initially set to the
+ same value as its `uri' field. When Zeitgeist receives a MOVE_EVENT for that
+ file (with a coherent timestamp), the value of `current_uri' is updated to
+ its new file name.
+ 
+ The idea here is that this is done in a way that, if we deleted the
+ `current_uri' of all subjects and restored them looking at all MOVE_EVENTs
+ in the database, the result would be the same as before.
+ 
+ CURRENT IMPLEMENTATION
+ 
+ As of now, `current_uri' is initially set to the same value as `current_uri'.
+ Once a MOVE_EVENT is inserted, all events with a timestamp before that of the
+ move are updated.
+ 
+ However, after the point the MOVE_EVENT has been inserted, it is never
+ considered again. This is so for performance reasons, since the initial plan
+ would require pretty much rebuilding the database.
+ 
+ PROBLEMS
+ 
+ There are numerous problems with this implementation, at least in theoretical
+ situations.
+ 
+ One problem is that of events coming in after the MOVE_EVENT (maybe because
+ the application is batching them). In this case they won't be updated.
+ 
+ We also have the opposite problem, a MOVE_EVENT coming in late after another
+ conflicting MOVE_EVENT happened. For instance, we have the following events:
+   T5 a.txt, T10 a.txt, T15 a.txt
+ We receive a first MOVE_EVENT from a.txt to b.txt with timestamp T7. Now we
+ have (time / current_uri):
+   T5 a.txt, T10 b.txt, T15 b.txt
+ Finally, we receive a further MOVE_EVENT from a.txt to c.txt with timestamp 
T0.
+ The result is:
+   T5 c.txt, T10 b.txt, T15 b.txt
+ This is totally inconsistent; the correct result would have been:
+   T5 c.txt, T10 c.txt, T15 b.txt
+ 
+ Further, even if implemented as described in the initial idea section, the
+ concept is flawed in that it may happen that events are inserted
+ retrospectively using already their updated URI. This could give rise to
+ further inconsistencies.
+ 
+ 

[Zeitgeist] [Bug 778140] Re: Time warp problem in MOVE_EVENT handling

2011-05-15 Thread Seif Lotfy
I prefer the first solution :)

-- 
You received this bug notification because you are a member of Zeitgeist
Framework Team, which is subscribed to Zeitgeist Framework.
https://bugs.launchpad.net/bugs/778140

Title:
  Time warp problem in MOVE_EVENT handling

Status in Zeitgeist Framework:
  Triaged

Bug description:
  
  RainCT seiflotfy: the query updating current_uri on MOVE_EVENT should only 
change stuff with timestampmove_event_timestamp
  seiflotfy RainCT, true
  seiflotfy good catch
  RainCT seiflotfy: there's also another ugly case
  seiflotfy RainCT, do tell
  RainCT seiflotfy: Imagine you insert event: 0. A, 1. A, 2. A, 3. B, 4. C, 
5. A. Then with timestamp between events 3 and 4 you get A-T, so now you have 
T, T, T, B, C, A
  [...]
  seiflotfy i see the problem
  seiflotfy the last A should be a T
  RainCT no, the last A is fine
  RainCT because it is a new file with the same name
  [...]
  seiflotfy yeah ok
  RainCT seiflotfy: now you get A-L with timestamp between 1 and 2, so it 
should have L, L, T, B, C, A, but since the current_uri of the first is 
already L you won't see it. for this it'd need to check the original URI 
instead of the current_uri
  RainCT are you with me so far?
  seiflotfy trying to
  seiflotfy RainCT, ok i cont get your last point
  seiflotfy A-L wont change the T
  seiflotfy because A has been move to T
  seiflotfy you can not move A again
  seiflotfy you need to move T
  seiflotfy thus its does not work
  RainCT yeah, but it should, because you're being told that it was moved 
before that
  seiflotfy unless you really want to you will have to look for all 
MOVE_EVENTS with A and figure out what A is now
  seiflotfy its doable
  RainCT so the move that happened later in time didn't affect those events, 
only the later ones
  seiflotfy RainCT, true
  RainCT the easy way to solve this is checking subj_id instead of 
subj_id_current
  seiflotfy RainCT, i think we should raise an exception
  RainCT but now when it gets really messed up is if there was even another 
move event before that
  RainCT which was already logged
  seiflotfy You tried to move and event after it was used in a new location
  seiflotfy RainCT, actually we also have the MOVE_EVENT logged
  seiflotfy you can then try to figrue out the patch of A
  RainCT yes, that's the solution
  seiflotfy the path
  seiflotfy RainCT, but i highly discourage that
  RainCT you can find the previos move event and set 
timestampprevious_move_event.timestamp
  seiflotfy RainCT, exactly
  seiflotfy i am +- 0 on that tbh
  seiflotfy not sure
  RainCT ok, I don't dislike finding the previous timestamp
  RainCT i'll open a bug

___
Mailing list: https://launchpad.net/~zeitgeist
Post to : zeitgeist@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zeitgeist
More help   : https://help.launchpad.net/ListHelp


[Zeitgeist] [Bug 778140] Re: Time warp problem in MOVE_EVENT handling

2011-05-15 Thread Seif Lotfy
We have 2 options here:
1) not inserting a move event before another move event in time
2) Extract the history of the subject and rebuild and change the DB according 
to the subject_current_uri

-- 
You received this bug notification because you are a member of Zeitgeist
Framework Team, which is subscribed to Zeitgeist Framework.
https://bugs.launchpad.net/bugs/778140

Title:
  Time warp problem in MOVE_EVENT handling

Status in Zeitgeist Framework:
  Triaged

Bug description:
  
  RainCT seiflotfy: the query updating current_uri on MOVE_EVENT should only 
change stuff with timestampmove_event_timestamp
  seiflotfy RainCT, true
  seiflotfy good catch
  RainCT seiflotfy: there's also another ugly case
  seiflotfy RainCT, do tell
  RainCT seiflotfy: Imagine you insert event: 0. A, 1. A, 2. A, 3. B, 4. C, 
5. A. Then with timestamp between events 3 and 4 you get A-T, so now you have 
T, T, T, B, C, A
  [...]
  seiflotfy i see the problem
  seiflotfy the last A should be a T
  RainCT no, the last A is fine
  RainCT because it is a new file with the same name
  [...]
  seiflotfy yeah ok
  RainCT seiflotfy: now you get A-L with timestamp between 1 and 2, so it 
should have L, L, T, B, C, A, but since the current_uri of the first is 
already L you won't see it. for this it'd need to check the original URI 
instead of the current_uri
  RainCT are you with me so far?
  seiflotfy trying to
  seiflotfy RainCT, ok i cont get your last point
  seiflotfy A-L wont change the T
  seiflotfy because A has been move to T
  seiflotfy you can not move A again
  seiflotfy you need to move T
  seiflotfy thus its does not work
  RainCT yeah, but it should, because you're being told that it was moved 
before that
  seiflotfy unless you really want to you will have to look for all 
MOVE_EVENTS with A and figure out what A is now
  seiflotfy its doable
  RainCT so the move that happened later in time didn't affect those events, 
only the later ones
  seiflotfy RainCT, true
  RainCT the easy way to solve this is checking subj_id instead of 
subj_id_current
  seiflotfy RainCT, i think we should raise an exception
  RainCT but now when it gets really messed up is if there was even another 
move event before that
  RainCT which was already logged
  seiflotfy You tried to move and event after it was used in a new location
  seiflotfy RainCT, actually we also have the MOVE_EVENT logged
  seiflotfy you can then try to figrue out the patch of A
  RainCT yes, that's the solution
  seiflotfy the path
  seiflotfy RainCT, but i highly discourage that
  RainCT you can find the previos move event and set 
timestampprevious_move_event.timestamp
  seiflotfy RainCT, exactly
  seiflotfy i am +- 0 on that tbh
  seiflotfy not sure
  RainCT ok, I don't dislike finding the previous timestamp
  RainCT i'll open a bug

___
Mailing list: https://launchpad.net/~zeitgeist
Post to : zeitgeist@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zeitgeist
More help   : https://help.launchpad.net/ListHelp


[Zeitgeist] [Bug 778140] Re: Time warp problem in MOVE_EVENT handling

2011-05-07 Thread Launchpad Bug Tracker
** Branch linked: lp:zeitgeist

-- 
You received this bug notification because you are a member of Zeitgeist
Framework Team, which is subscribed to Zeitgeist Framework.
https://bugs.launchpad.net/bugs/778140

Title:
  Time warp problem in MOVE_EVENT handling

Status in Zeitgeist Framework:
  Triaged

Bug description:
  
  RainCT seiflotfy: the query updating current_uri on MOVE_EVENT should only 
change stuff with timestampmove_event_timestamp
  seiflotfy RainCT, true
  seiflotfy good catch
  RainCT seiflotfy: there's also another ugly case
  seiflotfy RainCT, do tell
  RainCT seiflotfy: Imagine you insert event: 0. A, 1. A, 2. A, 3. B, 4. C, 
5. A. Then with timestamp between events 3 and 4 you get A-T, so now you have 
T, T, T, B, C, A
  [...]
  seiflotfy i see the problem
  seiflotfy the last A should be a T
  RainCT no, the last A is fine
  RainCT because it is a new file with the same name
  [...]
  seiflotfy yeah ok
  RainCT seiflotfy: now you get A-L with timestamp between 1 and 2, so it 
should have L, L, T, B, C, A, but since the current_uri of the first is 
already L you won't see it. for this it'd need to check the original URI 
instead of the current_uri
  RainCT are you with me so far?
  seiflotfy trying to
  seiflotfy RainCT, ok i cont get your last point
  seiflotfy A-L wont change the T
  seiflotfy because A has been move to T
  seiflotfy you can not move A again
  seiflotfy you need to move T
  seiflotfy thus its does not work
  RainCT yeah, but it should, because you're being told that it was moved 
before that
  seiflotfy unless you really want to you will have to look for all 
MOVE_EVENTS with A and figure out what A is now
  seiflotfy its doable
  RainCT so the move that happened later in time didn't affect those events, 
only the later ones
  seiflotfy RainCT, true
  RainCT the easy way to solve this is checking subj_id instead of 
subj_id_current
  seiflotfy RainCT, i think we should raise an exception
  RainCT but now when it gets really messed up is if there was even another 
move event before that
  RainCT which was already logged
  seiflotfy You tried to move and event after it was used in a new location
  seiflotfy RainCT, actually we also have the MOVE_EVENT logged
  seiflotfy you can then try to figrue out the patch of A
  RainCT yes, that's the solution
  seiflotfy the path
  seiflotfy RainCT, but i highly discourage that
  RainCT you can find the previos move event and set 
timestampprevious_move_event.timestamp
  seiflotfy RainCT, exactly
  seiflotfy i am +- 0 on that tbh
  seiflotfy not sure
  RainCT ok, I don't dislike finding the previous timestamp
  RainCT i'll open a bug

___
Mailing list: https://launchpad.net/~zeitgeist
Post to : zeitgeist@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zeitgeist
More help   : https://help.launchpad.net/ListHelp