Re: Help Wrapping HTMLTidy in LCB
On Mon, Dec 9, 2019 at 12:28 PM Trevor DeVore wrote: > UPDATE: > > I made some progress on the HTMLTidy project and this morning Mark > Waddingham and Brian Milby helped me over the last hurdle. The code base > now has a tidyHTMLToXHTML() function which works on macOS. You can try it > out using the test stack included in the repo. The code may also be of > interest to those trying to wrap other libraries. > > https://github.com/trevordevore/lc-htmltidy > > I will be adding the Windows DLL so that the extension works on Windows > and then trying to create a sensible API around HTMLTidy for my current > needs. I don't plan on making it feature complete at the moment as I just > need to for my own work. If someone else wanted to take that up they are > welcome to. > I've added Windows support and the ability to pass boolean options so that you can control the behavior of htmltidy when it cleanses the HTML input. I've added some sample settings to the test stack for testing. If anybody would like to add Linux support it should just be a matter of following the build instructions provided by htmltidy and added the resulting library to an `x86-linux` folder in the code folder: https://github.com/trevordevore/lc-htmltidy/tree/master/code You can find a link to the htmltidy build instructions in the product README: https://github.com/trevordevore/lc-htmltidy -- Trevor DeVore ScreenSteps www.screensteps.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Help Wrapping HTMLTidy in LCB
On Fri, Nov 22, 2019 at 10:30 AM Trevor DeVore wrote: > While looking at solutions for converting HTML into XHTML that can be > parsed by revXML I decided to test HTMLTidy which has an option to output > the input as XHTML. While I could bundle up the tidy command line tool and > include it with my app, I prefer to wrap things up in LCB if possible. > > Unfortunately I haven't gotten very far with HTMLTidy and I'm > hoping someone else might be able to figure out what I'm doing wrong. If > you are up for loading up an LCB project in LC 9 on macOS and looking at > some C files then please read on: > UPDATE: I made some progress on the HTMLTidy project and this morning Mark Waddingham and Brian Milby helped me over the last hurdle. The code base now has a tidyHTMLToXHTML() function which works on macOS. You can try it out using the test stack included in the repo. The code may also be of interest to those trying to wrap other libraries. https://github.com/trevordevore/lc-htmltidy I will be adding the Windows DLL so that the extension works on Windows and then trying to create a sensible API around HTMLTidy for my current needs. I don't plan on making it feature complete at the moment as I just need to for my own work. If someone else wanted to take that up they are welcome to. -- Trevor DeVore ScreenSteps www.screensteps.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Help Wrapping HTMLTidy in LCB
On Sat, Nov 23, 2019 at 10:52 AM hh via use-livecode < use-livecode@lists.runrev.com> wrote: > Is it really worth the work to do that from LCB? In my opinion, yes. If for no other reason then that with each library that is wrapped in LCB I learn what the limitations are in LCB or I learn how to do something that I didn’t know how to do before. There is a lot of code out in the world that we could benefit from in LiveCode. Not all have as nice a command line tool as HTMLTidy. Some don’t have a command line tool at all. Having lots and lots of example of wrapping C, Objective-C, etc. will help more people wrap libraries and contribute to the community in the future. - - Trevor DeVore ScreenSteps > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Help Wrapping HTMLTidy in LCB
Is it really worth the work to do that from LCB? A while ago I installed HTML tidy 5.6.0 from here http://binaries.html-tidy.org (the Mac .dmg) Then I copied the binary "tidy" from /usr/local/bin compressed to my stack (=231 KByte). Now I use it from there, running it in the temporary folder via shell(). Works fine (on any 64bit Mac). ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Help Wrapping HTMLTidy in LCB
On Fri, Nov 22, 2019 at 5:31 PM Richard Gaskin via use-livecode < use-livecode@lists.runrev.com> wrote: > Trevor DeVore wrote: > > > HTML may be placed on the clipboard when copying text and images > > from web browsers or by our good friend Microsoft Word. Microsoft > > Word places some very "interesting" HTML on the clipboard that > > needs to be massaged quite a bit before running it through revXML. > > Are you suggesting Microsoft has trouble reading open and > well-documented standards? Why, I never! ;) It’s not them, it’s me. Clearly I’m expecting too much. - - Trevor DeVore ScreenSteps > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Help Wrapping HTMLTidy in LCB
Trevor DeVore wrote: > HTML may be placed on the clipboard when copying text and images > from web browsers or by our good friend Microsoft Word. Microsoft > Word places some very "interesting" HTML on the clipboard that > needs to be massaged quite a bit before running it through revXML. Are you suggesting Microsoft has trouble reading open and well-documented standards? Why, I never! ;) -- Richard Gaskin Fourth World Systems Software Design and Development for the Desktop, Mobile, and the Web ambassa...@fourthworld.comhttp://www.FourthWorld.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Help Wrapping HTMLTidy in LCB
On Fri, Nov 22, 2019 at 2:25 PM Richard Gaskin via use-livecode < use-livecode@lists.runrev.com> wrote: > Trevor DeVore wrote: > > > While looking at solutions for converting HTML into XHTML that can be > > parsed by revXML I decided to test HTMLTidy which has an option to > > output the input as XHTML. While I could bundle up the tidy command > > line tool and include it with my app, I prefer to wrap things up in > > LCB if possible. > > Is conversion to XHTML the way to go? > > I've tried using the XML external to parse even RSS files -- ostensibly > pure XML -- only to find it choke on some of them. I've gone back to > hand-crafted pull-parsers. > There are definitely other ways to approach the problem I'm trying to solve. In fact, in other areas of my app I will extract parts of HTML by without relying on revXML. In this particular case I already have some LC code that parses HTML placed on the clipboard and converts it into data structure used by the application. This was originally implemented using the revXML callback feature (no tree is created in memory) and that API has worked well for the conversions I need to make. HTML may be placed on the clipboard when copying text and images from web browsers or by our good friend Microsoft Word. Microsoft Word places some very "interesting" HTML on the clipboard that needs to be massaged quite a bit before running it through revXML. There is a speed hit that occurs when running some of the regex patterns on the Word HTML that are used to strip out some markup and do things such as add quotes around attributes. Given the code that I have in place already, I would prefer to leverage HTMLTidy rather than fix every potential "gotcha" or spend time trying to optimize the code. I'm betting that HTMLTidy can do it better and faster given how mature it is. -- Trevor DeVore ScreenSteps www.screensteps.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Help Wrapping HTMLTidy in LCB
Trevor DeVore wrote: > While looking at solutions for converting HTML into XHTML that can be > parsed by revXML I decided to test HTMLTidy which has an option to > output the input as XHTML. While I could bundle up the tidy command > line tool and include it with my app, I prefer to wrap things up in > LCB if possible. Is conversion to XHTML the way to go? I've tried using the XML external to parse even RSS files -- ostensibly pure XML -- only to find it choke on some of them. I've gone back to hand-crafted pull-parsers. -- Richard Gaskin Fourth World Systems Software Design and Development for the Desktop, Mobile, and the Web ambassa...@fourthworld.comhttp://www.FourthWorld.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Help Wrapping HTMLTidy in LCB
Hello, While looking at solutions for converting HTML into XHTML that can be parsed by revXML I decided to test HTMLTidy which has an option to output the input as XHTML. While I could bundle up the tidy command line tool and include it with my app, I prefer to wrap things up in LCB if possible. Unfortunately I haven't gotten very far with HTMLTidy and I'm hoping someone else might be able to figure out what I'm doing wrong. If you are up for loading up an LCB project in LC 9 on macOS and looking at some C files then please read on: RESOURCES - Github repo with LCB file, a test stack, and compiled HTMLTidy dylib for testing on macOS: https://github.com/trevordevore/lc-htmltidy - HTMLTidy github repo where source files are located: https://github.com/htacg/tidy-html5 WHAT WORKS In the htmltidy.lcb file I've wrapped some of the simple APIs that return strings: tidyReleaseDate(), tidyLibraryVersion(), and tidyPlatform(). Those all work. WHAT DOESN'T WORK? tidyHTMLToXHTML() in the htmltidy.lcb file has some test code in it that isn't working. As a test I want to call `tidyOptGetIdForName()` from the htmltidy C library and get a valid value returned. I expect the following code to log `0` but it is logging `104`. I don't think I am creating the Ctmbstr pointer properly but I don't really know. Here is code from the htmltidy.lcb file along with links to the ctmbstr definition in the HTMLTidy source code: ``` variable tCStr as Pointer -- Attempting to create a Ctmbstr from a LiveCode string -- ctmbstr: https://github.com/htacg/tidy-html5/blob/next/include/tidyplatform.h#L607 MCStringConvertToCString("TidyUnknownOption", tCStr) -- The next handler is logging `104` which is N_TIDY_OPTIONS (error) -- Appears that tCStr is not the right format. log c_tidyOptGetIdForName(tCStr) ``` MCStringConvertToCString is defined as follows in the htmltidy.lcb file: ``` foreign handler MCStringConvertToCString(in pString as String, out rCString as Pointer) returns CBool binds to "" ``` If anyone can provide some pointers or a PR I would really appreciate it. -- Trevor DeVore ScreenSteps www.screensteps.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode