Re: [Zim-wiki] Time Stamped Text (TST) plugin

2013-06-10 Thread Jaap Karssenberg
On Mon, Jun 10, 2013 at 12:38 AM, NorfCran norfc...@gmail.com wrote:

 Jaap,
 I am trying to figure out, what is the best storage solution for small
 changes, so far there are two suggestions:

1. single file with separated patches and integrated timestamps (in
progress)
2. zip archive, which contains patches / files (name of the file
represents timestamps)

 solution nr. 1:

- index is not possible in this solution, only a chain of patches
- special characters and following timestamps are used to separate the
patches
- it is easier to append new patches to end of the file
- the file could be tracked by the main VCS (since it is text)

 solution nr. 2:

- possibility to keep index file, which increases lookup of patches
- timestamps are stored in single files in the zip file
- there is involved compression of the timeline
- it is easier to handle list all patches even in file browser

 Probably the best approach is the first version.


Probably the most intensive calculation is to track the history of each
piece of the current version. If needed we can cache that info in a
seperate file.

Let's go with solution 1. for now and make sure the code is flexible enough
to change the storage format later if need be.


 Additionally I do have a question, whether it is fine to use other code,
 which is not licenced under GPL. Particularly it concerns suitable python
 module licensed under Apache License, Version 2.0:
 http://code.google.com/p/google-diff-match-patch/


No hard objection to use code with this license, although we should take
care to keep licenses per module clear and not mix source files with
different licenses.

However might be easier to stick to the standard library diff module and
not add additional dependencies if it can be avoided. Unless of course this
module has some complex logic that is beyond the standard library?


 The attachments contain:

1. diff_match_patch.py (python code, which generates patches and
reconstructs text)
2. page.timeline (proposed storage of changes with timestamps)
3. testing of concept.py (code, which utilizes diff_match_patch.py and
serves as a prove of concept)

 JK


Regards,

Jaap
___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp


Re: [Zim-wiki] Time Stamped Text (TST) plugin

2013-06-06 Thread Jaap Karssenberg
On Tue, Jun 4, 2013 at 11:41 PM, NorfCran norfc...@gmail.com wrote:

 Yes, I do not want to replace the text files with XML (the concept based
 on TXT files is the most flexible in my opinion). You are right, attaching
 additional meta information in a shadow file is the only way.

 Since you have asked for use cases, I am going to model some of them
 separately for a notebook and for a single page, in order to illustrate
 utilization of Time Stamped Text (TST):

 *notebook*

1. search the most recently modified pages by a time range
   - hierarchical structures like wiki are dynamic and often changes
   happen on many pages, so why not to preserve this flow determined by 
 time
   in its natural form?
   - ordinary search with many matching pages may be filtered out by a
   time range, which is almost possible only very generally based on ctime 
 and
   mtime in the zim's database

  *page (main utilization of TST)*

1. highlighting up to date changes by smooth versions stored in TST
data structure
   - most up to date changes are highlighted for instance by a red
   color and it fades to black (so it is easier to see, in case of
   modifications and revisions)
2. provides possibility to revert changes by performing undo and redo
any time even though the text buffer is no longer available
3. time is a natural binder for any other activity performed in
parallel to the note taking process → for instance searching information on
the web (traceable from browser's history)

  One of my projects attached to the email researches a graph
 data-structure among other solutions capable of storing timestamps per word
 chunks (as a lowest granularity) separated by spaces. The data-structure is
 based on graphs and it has been implemented, but it is still not robust
 enough for all cases. Maybe it can bring some additional understanding and
 further direction of our discussion.


OK, I would plan something like that as follow:

1/ Come up with a compact patch / diff like format that we can use to store
small changes in a journal file next to the source file)
2/ Hook up a plugin to write such a patch file and update on each auto-save
action
3/ Add an API to the plugin such that we can
   a/ query timestamp for a specific piece of text in a specific file
   b/ can request timestamps for each part of the current version of the
file
   c/ request previous / next change for a given file
4/ Extend the search function to use API part a and add a column to the
search dialog
-- fulfills first use case

5/ Extend the page view to use API part b to highlight recent changes
6/ Hook up the undo/redo-manager
   a/ to use API part c to extend undo in the past
   b/ send data to the plugin per change as they happen, instead of waiting
for auto-save
-- fulfills 2nd use case

Probably the quickest result for a first result would be if you look into
step 1-3a and I hack in step 4. If that is working I'm willing to help with
steps 5  6 to integrate into the editor widget.

Regards,

Jaap
___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp


Re: [Zim-wiki] Time Stamped Text (TST) plugin

2013-06-03 Thread Jaap Karssenberg
On Sat, Jun 1, 2013 at 6:47 PM, NorfCran norfc...@gmail.com wrote:

 Dear Japp,
 thank you for your suggestion, I already started experimenting with
 difflib library, which is capable of generating deltas.

 Concerning the data structure with timestamps, it would be possibly worth
 to consider the following protocol called Gobby, which does provide a way
 how to collaborate over network on a one to many text files. On top of this
 feature it also defines data structure, which may accommodate timestamps.
 Do you think that the protocol cold be integrated into ZIM, since it uses
 GTK? Possibly it may elevate ZIM like a personal wiki to real-time
 collaborative writing? The 
 APIhttp://gobby.0x539.de/trac/wiki/APIReferenceoffers libinftextgtk, but I 
 am not certain about complexity resulting from
 intended integration.
 Even though the libinftextgtk is implement in C it seems to be possible to
 wrap the C implementation and use it in Python code according to the
 following link:
 http://stackoverflow.com/questions/1942298/wrapping-a-c-library-in-python-c-cython-or-ctypes

 The infinote protocol uses storage in the following form:

 ?xml version=1.0?
 inf-text-session
   user id=1 name=norfcran_apple hue=0.203069/
   user id=2 name=norfcran hue=0.628897996/
   buffer
 segment author=1asdfasdfasd
 fa
 sdf
 as
 tell
 df
 as
 df
 as
 df
 /segment
 segment author=2asdfa
 sdf
 as/segment
 segment author=1 this may be wrong/segment
 segment author=2
 df
 as
 d
 f/segment
   /buffer
 /inf-text-session


 The segment may be extended by timestamps. So it results in timestamped
 text, which does not preserve history of changes, but on the other hand it
 brings a real-time collaboration on a single file. Additionally the
 timestamps could be utilized for tracking changes over many pages, since
 time is a natural binder of flow, when there are more than one page edited
 simultaneously. Hope that these suggestions do not turn it into something
 impossible, so far at least I can see potentially a feasible shortcut to
 bring another organizational tool in form of timestamps.

 Thank you in advance for your opinion, best regards, JK



Assuming you do not propose to replace the wiki text files with xml files,
only way using this I see is as a shadow file that sits next to the
actual source file. But whether or not that is useful and desirable depends
highly on what you want to do with the data. Depending on the use case, a
different representation may be more efficient.

The thing to realize is that the API you refer has it's own document
management, so it may not be compatible with how zim stores documents in
the notebook. Would need to dive in much deeper to understand the technical
implications.

So what is the use case / user functionality that you want to build ?  Is
it about synchronisation, about timestamping each and every change to the
sources, both ? What use interface would you want to support? Then we can
answer what technology is needed to support that kind of feature.

Regards,

Jaap
___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp


Re: [Zim-wiki] Time Stamped Text (TST) plugin

2013-06-01 Thread NorfCran
Dear Japp,
thank you for your suggestion, I already started experimenting with difflib
library, which is capable of generating deltas.

Concerning the data structure with timestamps, it would be possibly worth
to consider the following protocol called Gobby, which does provide a way
how to collaborate over network on a one to many text files. On top of this
feature it also defines data structure, which may accommodate timestamps.
Do you think that the protocol cold be integrated into ZIM, since it uses
GTK? Possibly it may elevate ZIM like a personal wiki to real-time
collaborative writing? The
APIhttp://gobby.0x539.de/trac/wiki/APIReferenceoffers libinftextgtk,
but I am not certain about complexity resulting from
intended integration.
Even though the libinftextgtk is implement in C it seems to be possible to
wrap the C implementation and use it in Python code according to the
following link:
http://stackoverflow.com/questions/1942298/wrapping-a-c-library-in-python-c-cython-or-ctypes

The infinote protocol uses storage in the following form:

 ?xml version=1.0?
 inf-text-session
   user id=1 name=norfcran_apple hue=0.203069/
   user id=2 name=norfcran hue=0.628897996/
   buffer
 segment author=1asdfasdfasd
 fa
 sdf
 as
 tell
 df
 as
 df
 as
 df
 /segment
 segment author=2asdfa
 sdf
 as/segment
 segment author=1 this may be wrong/segment
 segment author=2
 df
 as
 d
 f/segment
   /buffer
 /inf-text-session


The segment may be extended by timestamps. So it results in timestamped
text, which does not preserve history of changes, but on the other hand it
brings a real-time collaboration on a single file. Additionally the
timestamps could be utilized for tracking changes over many pages, since
time is a natural binder of flow, when there are more than one page edited
simultaneously. Hope that these suggestions do not turn it into something
impossible, so far at least I can see potentially a feasible shortcut to
bring another organizational tool in form of timestamps.

Thank you in advance for your opinion, best regards, JK


On 31 May 2013 13:04, Jaap Karssenberg jaap.karssenb...@gmail.com wrote:

 JK,

 Main problem I see is how you going to store all that meta-data in a wiki
 format. If you really want to timestamp a change of e.g. 2 words half way a
 paragraph you end up with timestamps every other word in your source text.

 So you would have to keep a file next to the actual source to track
 changes as they happen. Kind of keeping a permanent record of the undo
 stack. Not too hard to hack together if you trigger it to update on each
 auto-save. Bonus is that you would also get permanent undo. Only technical
 tid-bit is that our real undo-stack is in terms of positions in the text
 buffer, which does not match positions in the source text, so some glue is
 needed there.

 Alternative would be to store patches and figure out history from that.
 Most version control system have an annotated mode to show history of
 text, but those are usually per line, not per word. You could figure out
 though history per word from the version history. You would have to commit
 for every other change though, so probably not for your purpose.

 So in conclusion:
 1/ Write a plugin that takes a diff of the text in the source file on each
 auto-save and stores the deltas timestamped in a record next to the actual
 source file.
 2/ Connect it to the undo stack, so even after closing a page, you can
 still undo/redo each delta
 3/ Figure out what representation of this data you would want in the user
 interface - e.g. text annotation, change log, ...

 Regards,

 Jaap



 On Fri, May 31, 2013 at 10:53 AM, NorfCran norfc...@gmail.com wrote:

  Hi Jaap and other contributors,
 it has been some time, since I worked on a project, which researched
 capabilities of synchronizing text with time (to extent of timestamped
 words). Actually I wanted to bring this feature to ZIM, but could not get
 there (the project in my case tries to use tree data structure algorithm to
 solve this issue with respect to timestamps). Anyhow, recently I came up to
 the following application, which inspired me to write this email:
 https://itunes.apple.com/de/app/armadillo-audio-notes/id532223938?mt=12
 It basically provides, what I would personally love to see in ZIM as well
 (apart of the audio, that is another level). Can you see any possibility to
 target this feature? I am personally very much into idea of time based text
 and possibly other users may start to see advantages of it (so there would
 be chance to track changes based on time through hierarchy of pages,
 especially when some words in long paragraphs change, it is difficult to
 use VCS). This is actually the main inspiration of this feature:
 http://etherpad.org/
 I would be interested in your opinions whether you see some possibility
 in implementing time based text plugin (basically directions of further
 focus)?
 Thank you for your feedback, best regards, JK