Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-06-10 Thread rghetta

Joel Crisp wrote:

 I'm not against programming, just against making everyone do it. If 
you can provide a framework which allows a registry
 of common file types against the way of handling them and a library 
of shipped code fragments which can be incorporated without the end user 
having to do any coding, then that would be fine.

 Maybe something like:
 monotone types filetype --match=\*.xml --type=text/xml--- Setup 
initial default mappings
I don't like much having a specialized monotone command just for that. 
Besides, this way you can't control the order of matching.

imho, a configuration file is a better solution.

 or
 monotone types filetype --file=foo.xml --type=x-rational-xmi  ---
 Change the type of the file

As I see it, there are three distinct issues to handle:
- mapping file extensions (and/or content) to mime-types
- mapping mime-types to merge/diff tools
- assigning mime-types to files, and handling them in monotone.

The first two tasks can be accomplished by using a bit of lua glue to 
read mappings from configuration files.
These files could be in pure tabular form or, better, use the syntax 
proposed by graydon, i.e. something like

file_mapping(.xml, text/xml)
and/or
content_mapping(offset, bytestring , mime-type)
The same goes for merging:
merge_tool(mime-type, difftool, mergetool, automerge_allowed)

While the user sees only a collection of mapping directives, these lines 
effectively translate to lua functions calls, making customization both 
powerful and easy.


Storing mime-types in monotone should be done with file attributes, but 
currently this is a bit tricky, because you need a way to resolve 
conflicting mime-types *before* merging.
This could be accomplished by merging .mt-attrs before other files, but 
introducing ordering into merges could be dangerous.

Anyway, there are ongoing developments that should make these things easier.

In the meantime, we could _partially_ resolve the issue by using only 
the mapping tables at merge/diff time, without explicitly assigning a 
mime-type to files.
Per-file mapping is still possible by using the full filename instead of 
the extension

file_mapping(model.xml, application/xmi)

Cheers,
Riccardo





___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-06-10 Thread rghetta

Hi Riccardo

This sounds much better.

The criteria which I'm concerned about are:

1) ease of use - end users should not have to (knowingly) use LUA to 
configure 'pre-defined' file types
2) flexibility - the type of each file should be able to be set 
independently and new file types defined (may use LUA for this)

3) power - all file operations should be customizable
4) reliability - it should work reliably and consistently

Your proposal sounds like it would address all of these.

BTW, you didn't copy monotone-devel - feel free to forward this mail if 
that is what you intended


Joel




___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-06-01 Thread rghetta
Glen Ditchfield wrote:
  Why can't there be one function that examines the files and decides to run 
 the 
  internal merge algorithm on some kinds of files, and to exec external tools 
  on other kinds of files?
Sorry if I'm stating the obvious, but perhaps not everyone is aware that
monotone embeds a complete lua interpreter, and you aren't limited to
just reimplement the predefined hooks in your monotonerc files.
You can also add other functions, tables, etc.
For example, you could create a single function to categorize your
files, and use it both on the add-time and merge hooks.
Something like that:

function choose_merge(filename)
   filedata=read_contents_of_file(name)
   if filedata ~= nil then 
  if  is_word(filedata)
  return msword
  else 
 * other categorizing code *
  end
   end
   return nil -- filetype unknown
end

attr_init_functions[manual_merge] =
   function(filename)
  if choose_merge(filename) ~= nil then
return true -- files with associate tool merge manually
  else
return nil
  end
   end


function merge3(anc_path, left_path, right_path, merged_path, ancestor,
left, right)
   * common code to setup files (see std_hooks.lua) *

   ftype = choose_merge(filename)
   if ftype==msword then
  * call word *
   else 
  * other tools *
   end

   ... 

end






___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-06-01 Thread Joel Crisp

I'm not against programming, just against making everyone do it. If you can 
provide a framework which allows a registry
of common file types against the way of handling them and a library of shipped code fragments which can be incorporated without the 
end user having to do any coding, then that would be fine.


Maybe something like:

monotone types filetype --match=\*.xml --type=text/xml--- Setup initial 
default mappings

or

monotone types filetype --file=foo.xml --type=x-rational-xmi --- 
Change the type of the file

Then have an object interface like (pseudocode!):

type_handlers {
   string[] getSupportedTypePatterns()
   void merge3(...args...)
   boolean isBinary()
   void copyIn(...stream..., ...database..., ..other args)
   void copyOut(..destination.., ..database..., ..other args)
   etc.
}

A library of type handlers which implemented this type of interface could then be selected at run-time simply by looking up the file 
type associated with file, then looking up the handler for that type. Note that there is no reason why these should not be lua if 
they are shipped as a standard library.


Whilst I take you point about user preference in merge tools, for many of the 'exotic' types there will be a much more limited set 
of merge tools and suppling type variants which are specific to each tool should be feasible.


Thoughts?

Joel



rghetta wrote:

On Wed, 2005-06-01 at 20:07 +0100, Joel Crisp wrote:


I just don't think that it is fair to expect everyone to program what should be 
standard functionality in hooks.

Hooks should be there for functionality which is non-standard, e.g. integration with my software process rather than 
yours...mailing when checkins are done, or enforcing lifecycle constraints.


Choosing how to handle common file types hardly fits into that category, and I think the average user would prefer that to be 
supported via a less obscure mechanism.


To give you some comparason: in a recent government job I worked in we weren't allowed to use triggers _at all_ (in clearcase, which 
uses perl) on the grounds that no-one else would be able to maintain themlet alone in a language with the limited uptake of lua 
(note, I personally think it is ok as a language, but the perception in the industry as a whole is that it is a game programmers 
language not a 'commercial' one)



Could you provide some example of an acceptable syntax ? 
How you like to specify merging behaviour, how to identify a filetype,

etc.
I really don't see how implement what you want without resorting to some
hook programming, unless we add a built-in filetype identifier. 
And even with something like that you need to handle the uncommon

filetype ...

Riccardo








___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel




___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-30 Thread rghetta
On Sun, 2005-05-29 at 12:20 +0100, Joel Crisp wrote:
 Hi
 
 My concern about this approach is that if you have lots of different types of 
 files to handle, XML, Word, Rational XMI (which is XML 
 but has a specific merge tool), etc then you would end up having to do lots 
 of jiggering in the merge hooks. Also, the order in 
 which you tried to identify the files at the point of merge would become 
 significant, for example the case of the XMI file actually 
 being an XML file means that you would have to ensure that you checked for 
 XMI before XMI. This could get very complex.
But you still need to *automatically* categorize files in first place,
and to do that you still need to look for XMI before XML (but if your
XMI files have consistently a .xmi extension, looking at that is order
independent ;-)  ).

Riccardo



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-29 Thread rghetta
On Fri, 2005-05-27 at 20:13 -0700, Nathaniel Smith wrote:
 On Fri, May 27, 2005 at 09:44:23PM +0200, rghetta wrote:
  Ok, I'll try to summarize the requests (and possible answers) so far:
  
  Both Nathaniel Smith and Emile Snyder advocated the use of .mt-attrs,
  perhaps coupled with the attr_init hook to automagically mark the files
  at add time.
  Howewer, the attr_init hooks receive only the filename, while the hook
  needs also the file content to guess the file type if the name doesn't
  matches.
 
 But, the file is sitting right there on the filesystem, and the hook
 can run arbitrary code.  For instance, it could peek at the file to
 see whether it looks like it's binary.
I'm a bit worried about efficency, here. Add already reads the file ? If
yes, then monotone will read the file twice, and this could have a
noticeable impact on add performance.

  Attributes seems also just not available at merge time.
  Both of these issues need to be resolved before using attributes to
  decide on merging.  Is a rewrite of the attribute system needed ?
 
 What would this rewrite do?  (It's entirely possible we do need one,
 the .mt-attr concept doesn't seem fully developed yet to me, but I
 don't see what you're getting at here.)
 
 At merge time we know the file names, and we know what revision they
 come from; in principle there is no reason we can't grab the .mt-attrs
 file from that revision and see what it says.
I was wrong. Attributes *are* available at merge time. Looking better at
the merging code, I found we already do that to get the file encoding.
It's parsing the attribute file(s) everytime, and this could have an
impact on merging performance for large trees, but handling the binary
attribute is trivial.

Riccardo



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-29 Thread rghetta
From the feedback to this patch, it appears that in naming the hook
binary_file() I made big a mistake.
Since the hook only effect is to disable the internal merging algorithm
of monotone, perhaps a better name would be manual_merge, and that
could also be used for the .mt-attr property.

Riccardo



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-29 Thread rghetta
On Sat, 2005-05-28 at 09:44 -0500, Glen Ditchfield wrote:
 I worry that, when monotone checks for control characters, it is not always 
 good enough, and too late for a hook to fix things.  I would like to have a 
 hook that sees that the first six bytes of the file are 
 \320\317\021\340\241\261 and concludes this is an MS Word file, instead of 
 a hook that checks the file suffix for eight different case-sensitive 
 variations of .doc and still guesses wrong sometimes.
 
 This is related to Joel Crisp's point in an earlier posting.  What is the 
 root 
 problem?  Does monotone just have to spot the binary files, or does it have 
 to get a more exact idea of each file's type so that it can invoke a 
 type-specific merge function?  (This is an MS Word file, so merge the 
 revisions with Word.)
The binary file flag just disables monotone internal merging, thus
invoking everytime the lua hooks merge2()/merge3().  Like the
binary_file() hook, you can override them in a monotonerc file.
These hooks get both name and full file content of all to-be-merged
files. If you want to choose the merge program based on file content,
you do it there.
In short, the step to use MS Word to handle .doc files are:

1. redefine the binary_file() hook to mark .doc files as binary (btw,
current hook comparisons *aren't* case sensitive. The hook takes the
filename, converts it to lowercase, and matches on the converted name).
If you're worried that still can miss some ill-named word files, make
binary the default and match only on known text files.

2. redefine the merge2()/merge3() to invoke word when the first bytes of
content match.

Note: if we implement the add-time hook, you will have also access to
file content at step 1.

Riccardo




___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-29 Thread Nathaniel Smith
On Sun, May 29, 2005 at 09:29:45AM +0200, rghetta wrote:
 On Fri, 2005-05-27 at 20:13 -0700, Nathaniel Smith wrote:
  But, the file is sitting right there on the filesystem, and the hook
  can run arbitrary code.  For instance, it could peek at the file to
  see whether it looks like it's binary.
 I'm a bit worried about efficency, here. Add already reads the file ? If
 yes, then monotone will read the file twice, and this could have a
 noticeable impact on add performance.

No, add doesn't read the file, so there's no duplicated work.

-- Nathaniel

-- 
...these, like all words, have single, decontextualized meanings: everyone
knows what each of these words means, everyone knows what constitutes an
instance of each of their referents.  Language is fixed.  Meaning is
certain.  Santa Claus comes down the chimney at midnight on December 24.
  -- The Language War, Robin Lakoff


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-29 Thread Joel Crisp

Hi

My concern about this approach is that if you have lots of different types of files to handle, XML, Word, Rational XMI (which is XML 
but has a specific merge tool), etc then you would end up having to do lots of jiggering in the merge hooks. Also, the order in 
which you tried to identify the files at the point of merge would become significant, for example the case of the XMI file actually 
being an XML file means that you would have to ensure that you checked for XMI before XMI. This could get very complex.


I don't see that as particularly clean for a wide uptake system. Would it be possible to provide a default lua hook for merge which 
used a lookup table to map the file type to the correct merge facility and an easy way of setting up that merge? Or replace the lua 
hook with one which takes an object implementing merge2,merge3 and any other relevent functions for the particular file type?


Joel

rghetta wrote:

On Sat, 2005-05-28 at 09:44 -0500, Glen Ditchfield wrote:

I worry that, when monotone checks for control characters, it is not always 
good enough, and too late for a hook to fix things.  I would like to have a 
hook that sees that the first six bytes of the file are 
\320\317\021\340\241\261 and concludes this is an MS Word file, instead of 
a hook that checks the file suffix for eight different case-sensitive 
variations of .doc and still guesses wrong sometimes.


This is related to Joel Crisp's point in an earlier posting.  What is the root 
problem?  Does monotone just have to spot the binary files, or does it have 
to get a more exact idea of each file's type so that it can invoke a 
type-specific merge function?  (This is an MS Word file, so merge the 
revisions with Word.)


The binary file flag just disables monotone internal merging, thus
invoking everytime the lua hooks merge2()/merge3().  Like the
binary_file() hook, you can override them in a monotonerc file.
These hooks get both name and full file content of all to-be-merged
files. If you want to choose the merge program based on file content,
you do it there.
In short, the step to use MS Word to handle .doc files are:

1. redefine the binary_file() hook to mark .doc files as binary (btw,
current hook comparisons *aren't* case sensitive. The hook takes the
filename, converts it to lowercase, and matches on the converted name).
If you're worried that still can miss some ill-named word files, make
binary the default and match only on known text files.

2. redefine the merge2()/merge3() to invoke word when the first bytes of
content match.

Note: if we implement the add-time hook, you will have also access to
file content at step 1.

Riccardo




___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel




___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-28 Thread Glen Ditchfield
Glen Ditchfield wrote:
 You base the text/binary decision on the name of the file. How hard
 would it be to base it on the contents of the file instead, the way the
 Unix 'file' command does?

On Friday 27 May 2005 14:44, rghetta replied:
 The hook uses only the filespec, true, but if it returns nil, monotone will
 check the file content for ASCII NULs and some other control chars.
I worry that, when monotone checks for control characters, it is not always 
good enough, and too late for a hook to fix things.  I would like to have a 
hook that sees that the first six bytes of the file are 
\320\317\021\340\241\261 and concludes this is an MS Word file, instead of 
a hook that checks the file suffix for eight different case-sensitive 
variations of .doc and still guesses wrong sometimes.

This is related to Joel Crisp's point in an earlier posting.  What is the root 
problem?  Does monotone just have to spot the binary files, or does it have 
to get a more exact idea of each file's type so that it can invoke a 
type-specific merge function?  (This is an MS Word file, so merge the 
revisions with Word.)

On Friday 27 May 2005 14:44, rghetta wrote:
 Unless adding .mt-attrs support is more or less trivial, my proposal is
 to merge the current patch to resolve the merging bug.
This may be one of those good enough solutions -- the kind where nobody ever  
gets around to coding up the right thing (in a way that would be 
backwards-compatible with the established good enough thing), and future 
generations just live with a small, nagging annoyance ...


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-27 Thread rghetta
Ok, I'll try to summarize the requests (and possible answers) so far:

Both Nathaniel Smith and Emile Snyder advocated the use of .mt-attrs,
perhaps coupled with the attr_init hook to automagically mark the files
at add time.
Howewer, the attr_init hooks receive only the filename, while the hook
needs also the file content to guess the file type if the name doesn't
matches.
IMHO, the guessing part is a necessity, requiring the user to manually
specify every binary file seem too harsh to me, especially because every
project has its share of non mergeable files, even monotone ;-)
The hook is here just to handle the corner cases (like file specific
merging tools), but I think monotone should make the right choice
automatically.
Attributes seems also just not available at merge time.
Both of these issues need to be resolved before using attributes to
decide on merging.  Is a rewrite of the attribute system needed ?

Joel Crisp wrote: 
 I'd prefer a 'file type' attribute rather than a binary file attribute - 
 there are many types of files which may require specialist 
 merging (e.g XML) 
 or storage (e.g. super big video files which are stored external to the scm. 
The binary_file hook is used only to inhibit algorithmic merging
(perhaps a better name for the hook would be disable_auto_merge).
Essentially, a binary file is treated as a text file with a conflict,
i.e. monotone will invoke the merge2 or merge3() lua hooks. 
The merge2/merge3 hooks receive both filenames and full file content,
thus by redefining them you can use choose a specialized merge tool
based on the file type (I made the example of gimp for merging images).
AFAIK, monotone doesn't directly support storing files externally, but
you can simulate it by storing only a pointer file and redefining the
mentioned hooks.

Glen Ditchfield wrote: 
 You base the text/binary decision on the name of the file.  How hard would it 
 be to base it on the contents of the file instead, the way the Unix 'file' 
 command does?  
The hook uses only the filespec, true, but if it returns nil, monotone will
check the file content for ASCII NULs and some other control chars.


Unless adding .mt-attrs support is more or less trivial, my proposal is
to merge the current patch to resolve the merging bug. Then perhaps we
could rethink a bit the attributes to have them available everywhere,
not just when dealing with the working copy or the manifest. 
After that, adding a binary or disable_auto_merge attribute should 
be easy.
What do you (collectively :-) think ?  

Riccardo


 





___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-26 Thread Emile Snyder
I like the idea of an .mt-attrs approach because the binary'ness of a
file is a property of the file, not something that different people
should have different ideas about (a'la hooks).

I don't have particularly strong feelings about the right way to help
monotone automatically figure it out for you, but I do feel strongly
that there should be some way to explicitly tell monotone to treat a
file as binary and have it do the right thing from then on.

thanks,
-emile

On Wed, 2005-05-25 at 21:09 -0700, Nathaniel Smith wrote:
 On Wed, May 25, 2005 at 12:33:04AM +0200, rghetta wrote:
  If the hook returns nil, the file will be treated as binary if the 
  monotone function guess_binary() returns true, i.e. if the files 
  contains NUL bytes or a selection of other ASCII control chars (for 
  example, STX and ETX).
 
 Another possible way to do binary support, for discussion:
   -- have the merger peek at .mt-attrs, and if a binary attribute is
  set on a file, consider it binary.  (Currently nothing in .mt-attrs
  has hard-coded behavior, so this would be a change.)
   -- use the cool new attr_init hooks to automatically guess at add
  time whether each file is binary.
   -- never again automatically touch this attribute; let people set it
  to what they want, if they want
 
 Another possible way to do binary support, for discussion:
   -- just use guess_binary() on the data at merge time
 
 I don't tend to store binary files under VCS, so I don't have as much
 of an intuition about what the nicest way to do so would be; it'd be
 good to hear opinions from those actually affected by this :-)
 
 -- Nathaniel
 


signature.asc
Description: This is a digitally signed message part
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-25 Thread Glen Ditchfield
On Tuesday 24 May 2005 17:33, rghetta wrote:
 function binary_file(name)
   lowname=string.lower(name)
   -- some known binaries, return true
   if (string.find(lowname, %.gif$)) then return true end
 

You base the text/binary decision on the name of the file.  How hard would it 
be to base it on the contents of the file instead, the way the Unix 'file' 
command does?  


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook

2005-05-25 Thread Nathaniel Smith
On Wed, May 25, 2005 at 12:33:04AM +0200, rghetta wrote:
 If the hook returns nil, the file will be treated as binary if the 
 monotone function guess_binary() returns true, i.e. if the files 
 contains NUL bytes or a selection of other ASCII control chars (for 
 example, STX and ETX).

Another possible way to do binary support, for discussion:
  -- have the merger peek at .mt-attrs, and if a binary attribute is
 set on a file, consider it binary.  (Currently nothing in .mt-attrs
 has hard-coded behavior, so this would be a change.)
  -- use the cool new attr_init hooks to automatically guess at add
 time whether each file is binary.
  -- never again automatically touch this attribute; let people set it
 to what they want, if they want

Another possible way to do binary support, for discussion:
  -- just use guess_binary() on the data at merge time

I don't tend to store binary files under VCS, so I don't have as much
of an intuition about what the nicest way to do so would be; it'd be
good to hear opinions from those actually affected by this :-)

-- Nathaniel

-- 
  /* Tell the world that we're going to be the grim
   * reaper of innocent orphaned children.
   */
-- Linux kernel 2.4.5, main.c

This email may be read aloud.


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel