Re: [git-users] Use git on microsoft words documents

2015-05-16 Thread Magnus Therning
On 15 May 2015 1:48 pm, John McKown john.archie.mck...@gmail.com wrote:

 On Thu, May 14, 2015 at 10:15 PM, Dale R. Worley wor...@alum.mit.edu
wrote:

 From the original poster's point of view:  Yes, you can use Git to store
 various versions of MS Word documents, but you probably don't get much
 benefit from doing so, since Git can't see into the different versions
 of documents to see how they differ; to Git they're just blobs.  OTOH,
 it may be that collections of blobs is all that you need the storage
 system to provide.


 ​Personally, this is (yet another) reason to not use a word processor
such as MS Word, or Libre/Open Office​, and use a structured document
processor such as LyX. I am not an expert in using it. It is not a WYSIWYG
as the normal word processors. But it is somewhat. Of course, IMO using
WYSIWYG is for people who like the sizzle more than the steak. I.e. the
the content doesn't matter as long at is looks impressive!. Anyway, off
that soap box, the contents of the file maintained by LyX is textual mark
up (more or less) and so it is more amenable for tracking changes using
git.

Hear, hear! It is also more future proof.

/M

-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Use git on microsoft words documents

2015-05-15 Thread Philip Oakley

From: Dale R. Worley wor...@alum.mit.edu
From the original poster's point of view:  Yes, you can use Git to 
store

various versions of MS Word documents, but you probably don't get much
benefit from doing so, since Git can't see into the different versions
of documents to see how they differ; to Git they're just blobs.  OTOH,
it may be that collections of blobs is all that you need the storage
system to provide.

Konstantin Khomoutov flatw...@users.sourceforge.net writes:

Steve (Gadget) Barnes gadgetst...@hotmail.com wrote:

At the risk of getting flamed for mentioning a differnt dVCS, the
Mercurial, (hg), project has a very sneaky extension called zipdoc
that stores the content of the zip files, (docx are actually zips
containing XML), and the fact that they belong in a specific .docx,
(or whatever), file.  On committing such a file it is actually
unzipped and the constituents either stored, or for an update, 
diffed

and then on a pull they are pulled as constituent parts and then
zipped to reconstitute the original file.

You could either consider using Mercurial or trying to find or
develop a similar extension.


I wonder what this actually buys: you'll end up with a bunch of XML
files (and picture files, if any, and the Manifest file, and so on),
and the problem is that that XML file representing the content is 
as

readable as the original .docx.  As they say, XML combines the
efficiency of text files with the readability of binary files [1].
I mean, diffing a machine-produced XML files, where a tiny
logical change in a document could result in hefty parts of that XML
swath rewritten is just marginally better than the original problem.


The question is this:  If you make a small change to the document (as 
a

human sees it), does this cause a small change to the XML files within
the Zip?  If the answer is Yes, then many revisions of a document can 
be
stored densely in a repository.  And it might be possible to merge 
small

differences in documents using a standard merging approach.

But the only way to know would be to talk to someone who has
considerable experience with this.

While not having personal experience, I've seen a number of reports that 
the 'expanded XML' approach to docx style documents (including 
LibreOffice I understand), which are zips of XMLs, often fails because 
the main package presumes that the internal XML files are in a 
particular order. Once the zip has been expanded, that order of file 
components is lost, so when the VCS repackages the zip, the components 
are not in the right order, and the main program can't read it properly.


The key to all this (doing version differencing) is to locate a method 
[program] which can be fed the old and new versions, and have the diff 
presented to you in a meaninful fashion. Often 'Word' style documents 
don't have a good way that is both meaningful and compact at the same 
time. (a human factors problem, not a coding problem ;-) !


If the OP's originating program has a 'compare documents' mode then a 
small bit of coding should allow Git to feed the old version and new 
version to it, as long as it has an external API (rather than it all 
being via Gui/menu selection).


--

Philip 


--
You received this message because you are subscribed to the Google Groups Git for 
human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Use git on microsoft words documents

2015-05-15 Thread Rainer M Krug
Philip Oakley philipoak...@iee.org writes:

 From: Dale R. Worley wor...@alum.mit.edu
 From the original poster's point of view:  Yes, you can use Git to
 store
 various versions of MS Word documents, but you probably don't get much
 benefit from doing so, since Git can't see into the different versions
 of documents to see how they differ; to Git they're just blobs.  OTOH,
 it may be that collections of blobs is all that you need the storage
 system to provide.

 Konstantin Khomoutov flatw...@users.sourceforge.net writes:
 Steve (Gadget) Barnes gadgetst...@hotmail.com wrote:
 At the risk of getting flamed for mentioning a differnt dVCS, the
 Mercurial, (hg), project has a very sneaky extension called zipdoc
 that stores the content of the zip files, (docx are actually zips
 containing XML), and the fact that they belong in a specific .docx,
 (or whatever), file.  On committing such a file it is actually
 unzipped and the constituents either stored, or for an update,
 diffed
 and then on a pull they are pulled as constituent parts and then
 zipped to reconstitute the original file.

 You could either consider using Mercurial or trying to find or
 develop a similar extension.

 I wonder what this actually buys: you'll end up with a bunch of XML
 files (and picture files, if any, and the Manifest file, and so on),
 and the problem is that that XML file representing the content is
 as
 readable as the original .docx.  As they say, XML combines the
 efficiency of text files with the readability of binary files [1].
 I mean, diffing a machine-produced XML files, where a tiny
 logical change in a document could result in hefty parts of that XML
 swath rewritten is just marginally better than the original problem.

 The question is this:  If you make a small change to the document
 (as a
 human sees it), does this cause a small change to the XML files within
 the Zip?  If the answer is Yes, then many revisions of a document
 can be
 stored densely in a repository.  And it might be possible to merge
 small
 differences in documents using a standard merging approach.

 But the only way to know would be to talk to someone who has
 considerable experience with this.

 While not having personal experience, I've seen a number of reports
 that the 'expanded XML' approach to docx style documents (including
 LibreOffice I understand), which are zips of XMLs, often fails because
 the main package presumes that the internal XML files are in a
 particular order. Once the zip has been expanded, that order of file
 components is lost, so when the VCS repackages the zip, the components
 are not in the right order, and the main program can't read it
 properly.

 The key to all this (doing version differencing) is to locate a method
 [program] which can be fed the old and new versions, and have the diff
 presented to you in a meaninful fashion. Often 'Word' style documents
 don't have a good way that is both meaningful and compact at the same
 time. (a human factors problem, not a coding problem ;-) !

 If the OP's originating program has a 'compare documents' mode then a
 small bit of coding should allow Git to feed the old version and new
 version to it, as long as it has an external API (rather than it all
 being via Gui/menu selection).

I might be completely off track here ( and probably dreaming), but
can;'t you define diff tools depending on file type? You could then use
MS Word and =compare the old and the new version? I think to remember
that I set it up some years ago?

Rainer



 --

 Philip 

-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, 
UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug

PGP: 0x0F52F982

-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


signature.asc
Description: PGP signature


Re: [git-users] Use git on microsoft words documents

2015-05-15 Thread John McKown
On Thu, May 14, 2015 at 10:15 PM, Dale R. Worley wor...@alum.mit.edu
wrote:

 From the original poster's point of view:  Yes, you can use Git to store
 various versions of MS Word documents, but you probably don't get much
 benefit from doing so, since Git can't see into the different versions
 of documents to see how they differ; to Git they're just blobs.  OTOH,
 it may be that collections of blobs is all that you need the storage
 system to provide.


​Personally, this is (yet another) reason to not use a word processor
such as MS Word, or Libre/Open Office​, and use a structured document
processor such as LyX. I am not an expert in using it. It is not a WYSIWYG
as the normal word processors. But it is somewhat. Of course, IMO using
WYSIWYG is for people who like the sizzle more than the steak. I.e. the
the content doesn't matter as long at is looks impressive!. Anyway, off
that soap box, the contents of the file maintained by LyX is textual mark
up (more or less) and so it is more amenable for tracking changes using
git.


-- 
If someone tell you that nothing is impossible:
Ask him to dribble a football.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown

-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Use git on microsoft words documents

2015-05-14 Thread Massoud Yeganeh
Original document will be reviewed and edited by a few people.
Then later it will branched to different variations.
Also, root document will be translated by different people to their own 
languages.
The root document, these translated documents and branched document will be 
updated (root changes or better translations).

How to manage this? 

On Thursday, May 14, 2015 at 6:12:32 PM UTC+8, Konstantin Khomoutov wrote:

 On Wed, 13 May 2015 21:29:18 -0700 (PDT) 
 Massoud Yeganeh massoud...@gmail.com javascript: wrote: 

  Is it possible to use git to manage microsoft documents? 
  We have so many files and we need to manage the version, change, 
  languages, etc management. 

 What do you mean by managing languages? 

  Can we use git? 

 You can, but note that MSO documents are essentially binary, and it's 
 impossible to diff them using only built-in Git facilities. 

 You might not need diffing at all (say, you're fine with just recording 
 versions and put some information about them into commit messages, and 
 are not interested in physical changes done to documents between 
 revisions), and then you should be fine using Git as is. 

 If you still need diffing, I think you might be better off with 
 Subversion as there are tools available around it to help dealing with 
 MSO-produced documents: 
 * The diff viewer program shipped with TortoiseSVN -- the goto solution 
   for working with Subversion on Windows -- has limited support for 
   diffing MSO documents. 
 * There is [1] which integrates support for Subversion right info 
   MSO editors and claims to support diffing as well. 

 I'd note that both products seem to rely on COM components made 
 available by an installed MSO suite, so you'll have it installed on 
 machines which would need that diffing functionality. 

 Otherwise Git (and any other VCS system) will just be used as a tool 
 to keep manage opaque changes to opaque blobs -- may be just what you 
 need but supposedly not. 

 1. https://code.google.com/p/msofficesvn/ 


-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Use git on microsoft words documents

2015-05-14 Thread Konstantin Khomoutov
On Thu, 14 May 2015 05:44:05 -0700 (PDT)
Massoud Yeganeh massoud.yega...@gmail.com wrote:

 Original document will be reviewed and edited by a few people.
 Then later it will branched to different variations.
 Also, root document will be translated by different people to their
 own languages.
 The root document, these translated documents and branched document
 will be updated (root changes or better translations).
 
 How to manage this? 

Mark, Massoud, I started to question whether you're actually on the
right track to find the solution to your problem.  In my eyes, the
problem with your approach is that you might not need a VCS in the
first place or at least not *that* sharp tool Git is.  Please don't be
too driven away by the fact Git is currently on the hype and is the
de-facto VCS most new software projects pick (to the point that some
people asking for VCS-related help on non-VCS support forums do not
mention what VCS they are talking about as they imply Git).  Git is
wonderful, but it's tailored to a specific task: managing source code
of a software project by a person with advanced skill set and
consequently matching demands to their tools.  To me, it seems that
your use case doesn't fall into this categorization (yes, I know that
lots of inexperienced folks use Git but the question is should they use
it in the first place).

So, I'd like to ask you both: did you try to explore if one of the
so-called document management systems (DMS) is actually the suitable
fit for your use case?  For instance, the Alfresco project is a mature
and free DMS.  A DMS allows you to inject documents, set up their
workflow (approval, submission to other persons etc), manage their
versions, receive notifications about edits etc.  And all this using
a simple (typically web-based) interface.

Honestly, after reading your questions, I fancy how someone in your
enterprise pulls from a shared Git repository, gets a merge conflict
and... I'm just not sure that will play well, especially given the blob
nature of those MSO documents (IOW, they are unmergeable in a normal
sense).  Do you really want to learn about remote vs local branches in
Git?  Suitable merge strategies to deal with blobs?  I'm not so sure.

Hence I'd suggest to first look at a DMS system and if else fails look
at a centralized VCS (Subversion is a typical goto solution) or at least
a VCS system which mimics a centralized workflow as much as possible --
with Fossil being a good fit.

-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Use git on microsoft words documents

2015-05-14 Thread Konstantin Khomoutov
On Thu, 14 May 2015 18:39:41 +0100
Steve (Gadget) Barnes gadgetst...@hotmail.com wrote:

  Is it possible to use git to manage microsoft documents?
  We have so many files and we need to manage the version, change, 
  languages, etc management.
 
  Can we use git?
[...]
 At the risk of getting flamed for mentioning a differnt dVCS, the 
 Mercurial, (hg), project has a very sneaky extension called zipdoc
 that stores the content of the zip files, (docx are actually zips
 containing XML), and the fact that they belong in a specific .docx,
 (or whatever), file.  On committing such a file it is actually
 unzipped and the constituents either stored, or for an update, diffed
 and then on a pull they are pulled as constituent parts and then
 zipped to reconstitute the original file.
 
 You could either consider using Mercurial or trying to find or
 develop a similar extension.

I wonder what this actually buys: you'll end up with a bunch of XML
files (and picture files, if any, and the Manifest file, and so on),
and the problem is that that XML file representing the content is as
readable as the original .docx.  As they say, “XML combines the
efficiency of text files with the readability of binary files” [1].
I mean, diffing a machine-produced XML files, where a tiny
logical change in a document could result in hefty parts of that XML
swath rewritten is just marginally better than the original problem.

To put it differently, IMO the only way to properly diff MSO documents
is to use tools deriving on MSO libs to actually extract content
sensible to humans from these containers, and somehow use it for
diffing.  I don't know how TortoiseSVN et al manage to use MSO-shipped
COM objects to carry out this task, but they do.

On the other hand, good tools for diffing XML *should* exist given the
ubiquity of this crap in the enterprise sector.  I don't know of any,
but it worth googling or someone might chime in with a solution. ;-)

In either case, I'm afraid both people who asked questions in this
thread are looking for a document management system, not a VCS.
And I'm afraid, setting up diff tools in Git wouldn't be an easily
solvable task for them (please take no offence, guys!).

1. http://harmful.cat-v.org/software/xml/

-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Use git on microsoft words documents

2015-05-14 Thread Steve (Gadget) Barnes



On 14/05/2015 05:29, Massoud Yeganeh wrote:

Hi,

Is it possible to use git to manage microsoft documents?
We have so many files and we need to manage the version, change, 
languages, etc management.


Can we use git?

Thanks.
--
You received this message because you are subscribed to the Google 
Groups Git for human beings group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to git-users+unsubscr...@googlegroups.com 
mailto:git-users+unsubscr...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


At the risk of getting flamed for mentioning a differnt dVCS, the 
Mercurial, (hg), project has a very sneaky extension called zipdoc that 
stores the content of the zip files, (docx are actually zips containing 
XML), and the fact that they belong in a specific .docx, (or whatever), 
file.  On committing such a file it is actually unzipped and the 
constituents either stored, or for an update, diffed and then on a pull 
they are pulled as constituent parts and then zipped to reconstitute the 
original file.


You could either consider using Mercurial or trying to find or develop a 
similar extension.


Steve (Gadget) Barnes

--
You received this message because you are subscribed to the Google Groups Git for 
human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Use git on microsoft words documents

2015-05-14 Thread Dale R. Worley
From the original poster's point of view:  Yes, you can use Git to store
various versions of MS Word documents, but you probably don't get much
benefit from doing so, since Git can't see into the different versions
of documents to see how they differ; to Git they're just blobs.  OTOH,
it may be that collections of blobs is all that you need the storage
system to provide.

Konstantin Khomoutov flatw...@users.sourceforge.net writes:
 Steve (Gadget) Barnes gadgetst...@hotmail.com wrote:
 At the risk of getting flamed for mentioning a differnt dVCS, the 
 Mercurial, (hg), project has a very sneaky extension called zipdoc
 that stores the content of the zip files, (docx are actually zips
 containing XML), and the fact that they belong in a specific .docx,
 (or whatever), file.  On committing such a file it is actually
 unzipped and the constituents either stored, or for an update, diffed
 and then on a pull they are pulled as constituent parts and then
 zipped to reconstitute the original file.
 
 You could either consider using Mercurial or trying to find or
 develop a similar extension.

 I wonder what this actually buys: you'll end up with a bunch of XML
 files (and picture files, if any, and the Manifest file, and so on),
 and the problem is that that XML file representing the content is as
 readable as the original .docx.  As they say, XML combines the
 efficiency of text files with the readability of binary files [1].
 I mean, diffing a machine-produced XML files, where a tiny
 logical change in a document could result in hefty parts of that XML
 swath rewritten is just marginally better than the original problem.

The question is this:  If you make a small change to the document (as a
human sees it), does this cause a small change to the XML files within
the Zip?  If the answer is Yes, then many revisions of a document can be
stored densely in a repository.  And it might be possible to merge small
differences in documents using a standard merging approach.

But the only way to know would be to talk to someone who has
considerable experience with this.

Dale

-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Use git on microsoft words documents

2015-05-14 Thread Konstantin Khomoutov
On Wed, 13 May 2015 21:29:18 -0700 (PDT)
Massoud Yeganeh massoud.yega...@gmail.com wrote:

 Is it possible to use git to manage microsoft documents?
 We have so many files and we need to manage the version, change,
 languages, etc management.

What do you mean by managing languages?

 Can we use git?

You can, but note that MSO documents are essentially binary, and it's
impossible to diff them using only built-in Git facilities.

You might not need diffing at all (say, you're fine with just recording
versions and put some information about them into commit messages, and
are not interested in physical changes done to documents between
revisions), and then you should be fine using Git as is.

If you still need diffing, I think you might be better off with
Subversion as there are tools available around it to help dealing with
MSO-produced documents:
* The diff viewer program shipped with TortoiseSVN -- the goto solution
  for working with Subversion on Windows -- has limited support for
  diffing MSO documents.
* There is [1] which integrates support for Subversion right info
  MSO editors and claims to support diffing as well.

I'd note that both products seem to rely on COM components made
available by an installed MSO suite, so you'll have it installed on
machines which would need that diffing functionality.

Otherwise Git (and any other VCS system) will just be used as a tool
to keep manage opaque changes to opaque blobs -- may be just what you
need but supposedly not.

1. https://code.google.com/p/msofficesvn/

-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.