(Also, I'm sorry for the mess that is this code... it started off as a one-off hack of its own, so I didn't do much to make it maintainable or clean)
On Tue, Feb 7, 2017, at 18:34, Kris Reeves wrote: > In Node, I would probably use... cheerio to do quick one-off structural > adjustments, and maybe a sax parser like sax-js to do streaming > adjustments of tag names and stuff, then pipe it all through a > (hopefully fixed) formatting thing like the wrap-xml script to do the > line-wrapping and pretty printing. When we're ready for that, I hope I > can help out! It will be difficult until all the files are in one place > to do it in one pass, though. > > Kris > > On Tue, Feb 7, 2017, at 10:58, [email protected] wrote: > > Thanks Kris for the update and pointers to the code. > > > > I'll give it a try - but it looks like there is a good amount of detail > > with access to the source, I don't expect any problems. I'm also warming > > up to Node as a decent infrastructure for the tooling. > > > > Do you have any tools or ideas for transforming the current XML > > attributes and property names to the new names? If not, I can write > > something to do the conversion (although it will be in Java ;) > > > > Gary > > > > > > > -----Original Message----- > > > From: [email protected] [mailto:spdx-tech- > > > [email protected]] On Behalf Of J Lovejoy > > > Sent: Monday, February 6, 2017 11:09 PM > > > To: Kris Reeves > > > Cc: [email protected]; SPDX-legal > > > Subject: Re: Update > > > > > > Thanks Kris! > > > > > > The legal team will get cracking on the exceptions and new licenses - > > > good to > > > know everything left to review is all in there. > > > > > > I’ve copied the tech team, as I’m hoping someone more code-savvy than I > > > can > > > parse the instructions on the tool below and then see how we can use that > > > going forward. > > > > > > We’ll miss you in Tahoe this year, in any case! > > > > > > Jilayne > > > > > > SPDX Legal Team co-lead > > > [email protected] > > > > > > > > > > On Feb 5, 2017, at 3:13 PM, Kris Reeves <[email protected]> wrote: > > > > > > > > Hi, folks! > > > > > > > > I've got some updates for you, though I imagine those of you > > > > subscribed to notifications on the spdx/license-list-XML repo have > > > > probably got a bunch of notifications, for which I apologize! > > > > > > > > First up, I've sent PRs for all the exceptions and the new licenses. > > > > Some of these may still have the kinds of problems we had before, but > > > > I hope not too many. Perfectionism has been getting in the way of me > > > > getting things done, so I figure something is better than nothing here. > > > > > > > > Next, the conversion tool I've been using, which has been updated to > > > > deal with exceptions from the XLS: > > > > https://github.com/myndzi/license-tool > > > > > > > > I'm sure if I did the wrong thing license wise with that repo, someone > > > > will tell me ;) > > > > > > > > A number of notes are required for explaining how to use this, which > > > > I'll enumerate here: > > > > > > > > Installing: > > > > - You need a recent-ish version of Node (at least one that supports > > > > arrow functions), which I believe is >=4. Various package managers > > > > include Node, but it's generally considered best by the Node community > > > > to install the latest package from the website here: > > > > https://nodejs.org/en/download/current/ (I typically build from source). > > > > For convenience, you might take advantage of `n`: > > > > https://github.com/mklement0/n-install (I recommend auditing any shell > > > > scripts rather than just blindly run them!) -- `n` can be installed > > > > this way without having Node, and then you can simply execute `n > > > > latest` to get the latest build. > > > > - Clone the repository > > > > - From inside the cloned folder, `npm install` > > > > > > > > Using: > > > > `node convert` or `node convert exceptions` in the project directory > > > > > > > > Since this tool was written to batch-process a bunch of files, I never > > > > really gave it a one-off mode. It looks for an SPDX spreadsheet in > > > > ./license-list and attempts to run the process for every license (or > > > > exception) it finds that *does not exist* in ./src/licenses or > > > > ./src/exceptions > > > > > > > > There is a branch (`git checkout current`) on the license-tool > > > > repository that has all the XML files I have previously converted > > > > checked in, so for future batches one should be able to update the > > > > license-list subrepo and pull the new files, then run the batch > > > > converter (`node convert`) > > > > > > > > (For this latest batch, I copied the XML files from my previous work > > > > into ./src/licenses and ran the script; then, I checked out master, > > > > which left the un-added files dangling, copied them to my > > > > license-list-XML fork, and ran a little bash script to check each one > > > > into its own branch individually and push it up to github. I created > > > > the PRs manually this time.) > > > > > > > > The "user interface": > > > > The conversion tool presents you with a UI for each file. You are able > > > > to mark sections of the text in one of four modes, and optionally > > > > toggle the "review" flag. > > > > > > > > Keys: > > > > 1 - title mode > > > > 2 - copyright mode > > > > 3 - license mode > > > > 4 - optional mode > > > > "`", "~" - toggle 'review' > > > > esc, q - abort/quit > > > > enter, tab - write file, proceed to next up, down - extend/reduce > > > > current block by one chunk page up, page down - extend/reduce current > > > > block by one page > > > > > > > > You *must* have marked *all* the license body before continuing, > > > > otherwise the program will just crash when you hit enter (low priority > > > > bug for me). I usually hit pgdn a few times at the end to make sure of > > > > this and gobble up any blank trailing lines. > > > > > > > > If it crashes, don't worry -- it'll pick up where it left off when you > > > > run it again. > > > > > > > > You'll notice that SPDX markup and bullet points are highlighted in > > > > the license body when using the conversion tool; you can't change > > > > this, it's only there to display to you what it has identified and > > > > will perform special actions on. > > > > > > > > There is one other utility included in here, `wrap-xml`; this can be > > > > used to reformat an XML file by wrapping it to a given width. > > > > Recommended for heavily-edited XML files to keep them nice. It will > > > > rewrite the indentation and so on. It is, I think, the culprit of the > > > > over-escaped problem in some of the existing licenses (all those > > > > unnecessary " entities and stuff). I'll reserve fixing this for > > > > future work (or anyone who wants to send a PR!). With some changes, it > > > > should be usable to fix all those instances in batch. This script in > > > > particular operates on stdin and stdout, so to reformat an xml file > > > > you would do something like `cat file.xml | node wrap-xml > > > > > file-new.xml` > > > > > > > > One last caveat: the list-detection (and part of the reason why it's > > > > broken) is based on > > > > the assertion that the input text files have been formatted in a very > > > > specific way (it counts spaces). This will probably need to be > > > > adjusted before it's suitable for use on arbitrary input text files > > > > (new license > > > > candidates) > > > > > > > > Sorry for dragging my feet for so long, and I hope this gets us caught > > > > up! > > > > > > > > Kris > > > > _______________________________________________ > > > > Spdx-legal mailing list > > > > [email protected] > > > > https://lists.spdx.org/mailman/listinfo/spdx-legal > > > > > > _______________________________________________ > > > Spdx-tech mailing list > > > [email protected] > > > https://lists.spdx.org/mailman/listinfo/spdx-tech > > _______________________________________________ Spdx-tech mailing list [email protected] https://lists.spdx.org/mailman/listinfo/spdx-tech
