In Node, I would probably use... cheerio to do quick one-off structural
adjustments, and maybe a sax parser like sax-js to do streaming
adjustments of tag names and stuff, then pipe it all through a
(hopefully fixed) formatting thing like the wrap-xml script to do the
line-wrapping and pretty printing. When we're ready for that, I hope I
can help out! It will be difficult until all the files are in one place
to do it in one pass, though.

Kris

On Tue, Feb 7, 2017, at 10:58, [email protected] wrote:
> Thanks Kris for the update and pointers to the code.
> 
> I'll give it a try - but it looks like there is a good amount of detail
> with access to the source, I don't expect any problems.  I'm also warming
> up to Node as a decent infrastructure for the tooling.
> 
> Do you have any tools or ideas for transforming the current XML
> attributes and property names to the new names? If not, I can write
> something to do the conversion (although it will be in Java ;)
> 
> Gary
> 
> 
> > -----Original Message-----
> > From: [email protected] [mailto:spdx-tech-
> > [email protected]] On Behalf Of J Lovejoy
> > Sent: Monday, February 6, 2017 11:09 PM
> > To: Kris Reeves
> > Cc: [email protected]; SPDX-legal
> > Subject: Re: Update
> > 
> > Thanks Kris!
> > 
> > The legal team will get cracking on the exceptions and new licenses - good 
> > to
> > know everything left to review is all in there.
> > 
> > I’ve copied the tech team, as I’m hoping someone more code-savvy than I can
> > parse the instructions on the tool below and then see how we can use that
> > going forward.
> > 
> > We’ll miss you in Tahoe this year, in any case!
> > 
> > Jilayne
> > 
> > SPDX Legal Team co-lead
> > [email protected]
> > 
> > 
> > > On Feb 5, 2017, at 3:13 PM, Kris Reeves <[email protected]> wrote:
> > >
> > > Hi, folks!
> > >
> > > I've got some updates for you, though I imagine those of you
> > > subscribed to notifications on the spdx/license-list-XML repo have
> > > probably got a bunch of notifications, for which I apologize!
> > >
> > > First up, I've sent PRs for all the exceptions and the new licenses.
> > > Some of these may still have the kinds of problems we had before, but
> > > I hope not too many. Perfectionism has been getting in the way of me
> > > getting things done, so I figure something is better than nothing here.
> > >
> > > Next, the conversion tool I've been using, which has been updated to
> > > deal with exceptions from the XLS:
> > > https://github.com/myndzi/license-tool
> > >
> > > I'm sure if I did the wrong thing license wise with that repo, someone
> > > will tell me ;)
> > >
> > > A number of notes are required for explaining how to use this, which
> > > I'll enumerate here:
> > >
> > > Installing:
> > > - You need a recent-ish version of Node (at least one that supports
> > > arrow functions), which I believe is >=4. Various package managers
> > > include Node, but it's generally considered best by the Node community
> > > to install the latest package from the website here:
> > > https://nodejs.org/en/download/current/ (I typically build from source).
> > > For convenience, you might take advantage of `n`:
> > > https://github.com/mklement0/n-install (I recommend auditing any shell
> > > scripts rather than just blindly run them!) -- `n` can be installed
> > > this way without having Node, and then you can simply execute `n
> > > latest` to get the latest build.
> > > - Clone the repository
> > > - From inside the cloned folder, `npm install`
> > >
> > > Using:
> > > `node convert` or `node convert exceptions` in the project directory
> > >
> > > Since this tool was written to batch-process a bunch of files, I never
> > > really gave it a one-off mode. It looks for an SPDX spreadsheet in
> > > ./license-list and attempts to run the process for every license (or
> > > exception) it finds that *does not exist* in ./src/licenses or
> > > ./src/exceptions
> > >
> > > There is a branch (`git checkout current`) on the license-tool
> > > repository that has all the XML files I have previously converted
> > > checked in, so for future batches one should be able to update the
> > > license-list subrepo and pull the new files, then run the batch
> > > converter (`node convert`)
> > >
> > > (For this latest batch, I copied the XML files from my previous work
> > > into ./src/licenses and ran the script; then, I checked out master,
> > > which left the un-added files dangling, copied them to my
> > > license-list-XML fork, and ran a little bash script to check each one
> > > into its own branch individually and push it up to github. I created
> > > the PRs manually this time.)
> > >
> > > The "user interface":
> > > The conversion tool presents you with a UI for each file. You are able
> > > to mark sections of the text in one of four modes, and optionally
> > > toggle the "review" flag.
> > >
> > > Keys:
> > > 1 - title mode
> > > 2 - copyright mode
> > > 3 - license mode
> > > 4 - optional mode
> > > "`", "~" - toggle 'review'
> > > esc, q - abort/quit
> > > enter, tab - write file, proceed to next up, down - extend/reduce
> > > current block by one chunk page up, page down - extend/reduce current
> > > block by one page
> > >
> > > You *must* have marked *all* the license body before continuing,
> > > otherwise the program will just crash when you hit enter (low priority
> > > bug for me). I usually hit pgdn a few times at the end to make sure of
> > > this and gobble up any blank trailing lines.
> > >
> > > If it crashes, don't worry -- it'll pick up where it left off when you
> > > run it again.
> > >
> > > You'll notice that SPDX markup and bullet points are highlighted in
> > > the license body when using the conversion tool; you can't change
> > > this, it's only there to display to you what it has identified and
> > > will perform special actions on.
> > >
> > > There is one other utility included in here, `wrap-xml`; this can be
> > > used to reformat an XML file by wrapping it to a given width.
> > > Recommended for heavily-edited XML files to keep them nice. It will
> > > rewrite the indentation and so on. It is, I think, the culprit of the
> > > over-escaped problem in some of the existing licenses (all those
> > > unnecessary &quot; entities and stuff). I'll reserve fixing this for
> > > future work (or anyone who wants to send a PR!). With some changes, it
> > > should be usable to fix all those instances in batch. This script in
> > > particular operates on stdin and stdout, so to reformat an xml file
> > > you would do something like `cat file.xml | node wrap-xml >
> > > file-new.xml`
> > >
> > > One last caveat: the list-detection (and part of the reason why it's
> > > broken) is based on
> > > the assertion that the input text files have been formatted in a very
> > > specific way (it counts spaces). This will probably need to be
> > > adjusted before it's suitable for use on arbitrary input text files
> > > (new license
> > > candidates)
> > >
> > > Sorry for dragging my feet for so long, and I hope this gets us caught
> > > up!
> > >
> > > Kris
> > > _______________________________________________
> > > Spdx-legal mailing list
> > > [email protected]
> > > https://lists.spdx.org/mailman/listinfo/spdx-legal
> > 
> > _______________________________________________
> > Spdx-tech mailing list
> > [email protected]
> > https://lists.spdx.org/mailman/listinfo/spdx-tech
> 
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to