Excuse me for being pure Java and not knowing Sword C++ at all but can I add (perhaps obviously) that an XSLT framework will perform noticeably slower than a SAX-like framework.
Here <http://java.sun.com/developer/technicalArticles/xml/JavaTechandXML_part2/>are some performance comparisons. They are old and Java-centric and so XSLT performance may have improved but these tests show that in the worst case XSLT was 3 times slower than SAX and a good SAX processor was twice as fast as a good XSLT processor. If pages are parsed at the chapter level then users may notice a delay turning pages on smaller machines like mobile phones. Martin On 1 December 2010 12:20, Troy A. Griffitts <[email protected]> wrote: > The logic to get from any Publisher Source Document to rendered HTML is > a very complex task to solve. > > We conceptually create Plato's Form of, say, a Bible, and try to fit > imperfect Publisher markup into this concept. A Bible has verses, > headings between verses, chapter intros, footnotes, crossrefs, lemma > information, etc. > > If we do not do this, then we become a PDF reader-- there are already > PDF readers and we lose the ability to do Bible specific things with our > software. For example, if we didn't normalize the concept of crossref > across all Books, then we couldn't turn them on and off; we couldn't > provide a crossref panel in the reader which fills according to which > crossref is hovered over, etc. Same with notes, strongs, headings, etc. > > This causes us to impose our Form onto a publisher's text. I understand > why some people may not like this, but it is very much to our end users' > benefit that we do this. Without this, we become a web-browser or a PDF > reader. Which are fine for their purpose, but we intend to provide > common, familiar, and sometimes novel Bible study aides to our reader. > > The current processing model is dark magic and I apologize for this. It > should be well documented and easy to modify. I will attempt to improve > the dissemination of knowledge of exactly WHAT our Forms are, how we > impose those Forms on publishers' texts and improve the documentation > and code to help others understand and have the ability to improve the > code. > > I'll attempt to post a few easy to swallow SWORD 101 classes in email, > which will help us gather our thoughts and documents on how all this works. > > > Troy > > > > On 12/01/2010 12:09 AM, Greg Hellings wrote: > > On Tue, Nov 30, 2010 at 1:08 PM, Troy A. Griffitts <[email protected]> > wrote: > >> Having finally returned from a hectic 2 weeks of conferences, and lots > >> to do before leaving for Christmas, I'm not sure I'm up for a heated, > >> passionate debate about technologies right now, but by all means, please > >> commence the public discussion. > >> > >> Let me start by saying that everyone (I believe) agrees that we would > >> like to have an HTML output from the engine which is more generic and > >> would allow CSS to be applied if a frontend would like to do this. > >> Currently HTMLHREF output from the engine is used by the widest number > >> of frontends (to my knowledge) and would benefit everyone involved by > >> becoming much more generic. e.g., > >> > >> <title> -> <h1> > >> rather than > >> <title> -> <b><br /> > >> > >> <transChange type="added"> -> <span class="tcAdded"> > >> rather than > >> <transChange type="added"> -> <i> > >> > >> etc. > >> > >> I believe this will solve a number of issues and possibly get the BT and > >> MacSword teams onboard to using the same HTML output filters as the > >> other projects involve (or at least subclassing them and using the > >> majority of their functionality). > > > > I think this is our pretty well accepted premise. The current filters > > stink to various degrees and currently no one is willing to step up > > and tackle them. > > > >> > >> > >> Now, as to the other issue of using XSLT internally in the engine to > >> process OSIS -> HTML > >> > >> I will throw a few melons into the air for target practice, and let the > >> shooting commence. > >> > >> _____________________________ > >> *Multiple Language* > >> > >> XSLT is a programming language in the same sense that C++ is a > >> programming language. > >> > >> The SWORD Project C++ engine is written in C++. It is not a Python > >> engine; it is not a Perl engine; it is not a Java engine; it is C++. > >> > >> One might say, "Well, you can use XSLT from C++. Doesn't JSword do this > >> from Java?" Well, yes, of course you can, and DM can comment, if he > >> feels the desire to recommend his decision to encorporate an XSLT engine > >> into the JSword logic flow. But simply because one CAN doesn't mean one > >> SHOULD. We COULD encorporate a Perl text processing engine in our C++ > >> code, or an Awk processing engine... that doesn't mean we SHOULD. I'm > >> sure some would say we SHOULD. And obviously DM has thought he SHOULD > >> encorporate XSLT processing for JSword, so I'm not intending to say it > >> is a BAD decision, just that it is not a decision I would make; in the > >> same way as our projects each chose C++ vs. Java to implement our > objective. > > > > If a developer is going to develop OSIS -> HTML filters, for instance, > > we are already assuming they know OSIS and HTML. OSIS is XML and HTML > > is SGML (though most of our work is probably targetting a more > > XML-dialect of HTML). XSLT is also XML. Formally, it is not even a > > programming language, but just a set of formatting/processing > > instructions in XML. > > > > Any developer using XML who is worth their salt should at least be > > familiar with the basics of XSL - they may not be a guru of XPath > > expressions or have every attribute of XSL memorized - and would > > probably expect a library which handles XML as its preferred input > > method to utilize one of the standard XML processing methods. I know > > I'm not the only person who was surprised to look in the library > > filters and see neither DOM, SAX nor XSLT technologies in use. That > > was when I first ran and hid. > > > > Of course, this portion of the discussion is only relevant for the > > from-OSIS filters. > > > >> > >> _______________________ > >> *XSLT better than C++* > >> > >> One might say, "well, XSLT is better suited to process XML than C++." > >> That's a loaded and unquantified statement. > >> > >> Certainly the C++ language specification doesn't include facilities to > >> easily process XML, but that doesn't mean a plethora of C++ libraries > >> don't exists for assisting in this task. > >> > >> The SWORD engine includes classes like XMLTag and SWBasicFilter which > >> implement a SAX processing model. > >> > >> The current filters do not all use SWBasicFilter, nor XMLTag. They've > >> been written over 15 years and many before these classes existed. Some > >> are ugly and need to be rewritten for readability, certainly. But not > >> necessarily in a different programming language. > > > > XSLT being "better" is, yes, a matter of complete subjectivity. And, > > as I mentioned above, is only useful when our source is XML to begin > > with. For GBF or Plaintext sources, XSLT is clearly not even > > applicable. > > > > But the current C++ is so good that you seem the only person willing > > to touch it. Peter just mentioned he tried once and couldn't get it. > > I have gone into the filters before with a singular goal in mind and > > was able to produce my desired changes, but it was long, drawn-out and > > painful. Doing the same tasks in XSL would have taken me mere > > seconds. I know a few other people, at least, have said they would > > know how to do a task if XSLT was used instead of C++. Of course, > > that is a hypothetical - I can't know that they would have done so, > > but that was their claim at the time. > > > > Our recent discussion about the use of the "n" attribute for footnotes > > in ThML is a perfect example. Maintaining the attribute in XSL would > > have been a trivial task I could have handled in seconds. Instead, it > > required you, myself and Karl and took about 10 days to get fixed. > > You had to alert Karl and me to presence of the attributes, I provided > > him a preliminary patch to incorporate the values, then he had to > > heavily modify the patch to operate correctly in non-ThML source and a > > few other corner cases. And, in the end, the fix is only in Xiphos' > > code base - I would have to go through 2 of those three steps again in > > Bibletime, BPBible, MacSword and any other applications I wanted to > > see proper behavior in. Alternatively I could tackle the filters - > > but I'm not really inclined to do so. > > > > Is XSLT "better"? For me, it would be better because I could more > > easily modify its behavior based on the fact that I know XML and could > > easily locate the necessary processing directive. For you, maybe not. > > Are there things you simply cannot do in XSL that C++ can? Yes. IMO > > the benefits of XSL outweigh the benefits of C++ for this task, but > > you clearly disagree. :) I would also say that DOM or SAX processing > > would be better for all the same reasons - it shields the user from > > having to see the XML parsing and handle inconsistencies in > > whitespace, validation, etc and is still a decently well-known > > technology among XML users (even if it's slightly less well-known than > > XSL). And with a DOM or SAX parser, you could still happily employ > > the full power of C++. > > > >> > >> ________________________ > >> *COMPLEXITY* > >> > >> The task of enumerating all types of OSIS <title> tags, and deciding > >> what to do with each, and how to classify all <title> tags from all > >> possible OSIS documents into our enumeration is still going to be a > >> complex task using XSLT. <title> is a complex example, but certainly > >> not the most complex. > >> > >> It is a tall task to generalize all elements of all documents from all > >> publishers into one conceptual model with one chosen output for a > >> frontend-- whether that be for an audience on the Desktop, web-based, or > >> a handheld. > >> > >> The complex processing required by the engine will require long, complex > >> XSLT-- which likely will encorporate callbacks to C++. It will not be > >> more simple-- only mixed language. > > > > I could also argue that the XSL would not require a developer to > > mentally filter out the code that just identifies and locates XML > > elements and attributes and parses them from the code that transforms > > them and generates the output. Thus yes, it might include some > > extension functions into C++ but it would be simpler. And it would > > also be more expressive. > > > > The enumeration of every OSIS <title> tag is a moot point for the > > decision. You need to enumerate them all in C++ as well and decide > > what to do with them. That doesn't change in the XSL - just the > > method used. An XSL match along the lines of <xsl:template > > match="tit...@type=psalm]"> still has to be done in C++ with some sort > > of if(tag.name() == "title && tag.attr("type") == "psalm") or whatever > > the syntax is. And that is assuming the current filter is using > > XMLTag and isn't comparing character strings directly. > > > >> _______________________ > >> *Semantic vs. Display* > >> > >> Some will say (and have), "well, let everything be display oriented and > >> let the publisher decide". Fine, then you lose 2 things: the ability to > >> display differently per user preference, per display device; and you > >> also give up the promise to actually do any interesting research on the > >> text. When you lose semantic markup, then you lose all interesting > >> information about WHAT is being marked up. > > > > I just want to be clear that I'm not advocating the use of display > > over semantics as a general choice. My statements are strictly based > > around my specific task and the fact that OSIS support in SWORD and > > the front ends is not as good as the support of ThML. Largely this is > > because most applications display in HTML and my required task is > > framed entirely in terms of the presentation and display - not the > > semantics. I would love and prefer to use OSIS for this task, but I > > simply cannot accomplish it with the state of SWORD at this time. > > > >> > >> _______________________ > >> *More than a Rending Engine* > >> > >> The SWORD C++ Engine is more than simply a text rendering engine-- it is > >> a Biblical text research engine. > >> > >> If I'd like to know the morphology of word 3 in 2Thes 2.13 of the WHNU > >> Greek text, the entire program to do such is: > >> > >> SWMgr library; > >> SWModule *whnu = library.getModule("WHNU"); > >> whnu->setKey("2th.2.13"); > >> whnu->RenderText(); > >> > >> cout << "The morphology of word three is: " << > >> whnu->getEntryAttributes()["Word"]["003"]["Morph"] << endl; > >> > >> > >> That reads nice (at least in my opinion). I don't need to know about > >> XML, XSLT, care what markup the WHNU module uses, I don't even have to > >> know how to make a SWORD filter. The current filters do all the work of > >> breaking out these attributes and making them available in a nice and > >> interesting map. > > > > I'd like to be clear again, that XSL would only be useful for material > > already in OSIS formats (or in valid ThML - I think TEI is also an XML > > format?). I doubt many modules in ThML are strictly valid at their > > import times, so XSL wouldn't be very useful, and GBF is a monster > > unto itself. Doing the above in XSL from an OSIS source would not be > > much different in complexity than what you have listed there. > > > > <xsl:template match="ver...@osisid='2thes.2.13']/w...@n=3]"> > > The morphology of word three is: <xsl:value-of select="@morph" /> > > </xsl:template> > > > > Or something similar (my knowledge of exact OSIS attribute names and > > values wanes and it's been two or three weeks since I wrote an XPath > > expression). > > > > Of course, the string processing portion of SWORD would continue to be > > of great importance for any modules in GBF format or similar to bring > > them into a useful form. In that way, SWORD would continue to be more > > than just a text rendering engine. It would continue to offer all of > > its features, its buffering from the system and from the format, its > > indexing, its module fetching and storing, etc. > > > >> ______________________ > >> > >> > >> And finally, if bullets aren't flying already, I'll stir the heat up > with... > >> > >> XSLT sucks. A good C++ programmer can do anything in C++ better than > >> any XSLT programmer. > >> > >> > >> :) > > > > A C++ programmer can definitely do more, since C++ is actually a > > programming language and XSLT is a set of processing instructions. > > Better? That depends on what the criteria is. For me, in my current > > role as a module creator, the use of C++ is not currently better > > because it is less flexible and extensible. For you, as the library > > maintainer, perhaps C++ is better because it's what you are already > > comfortable with and because it has largely been your hand in the > > filters. > > > >> > >> *duck* > >> Have fun. > >> > >> Troy > >> > >> PS. In summary, I understand the current filters are sometimes overly > >> complex and need cleanup, standardization, etc. It comes down to the > >> fact that they mostly work, and other things which don't get priority, > >> so they don't get much attention. But honestly, I think one might be > >> oversimplifying the problem at hand without realizing it, if one simply > >> thinks switching to XSLT will make things easier. > > > > I think one is also oversimplifying the options. My dreamlist is that > > SWORD produce a well-formed, valid, complete OSIS document for an > > arbitrary KeyList that I pass it with FMT_OSIS set. That basically > > boils down to getting the *OSIS filters up to snuff and standardized. > > The second item on the list is a readily extensible mechanism for > > SWORD outputting HTML from that OSIS. If that choice is providing an > > XSL stylesheet with the library, a C++ SAX processor that a front-end > > can readily extend, a DOM interface that can be easily customized is > > immaterial to me. I like all three of those, and can easily > > understand and extend all of them. > > > > I think any of those technologies would be an improvement over all > > in-house C++ for the second half of any such processing. If we are > > using XML in Open Source Software, let's leverage the work of others > > who have happily given us permission to use their libraries! > > > > --Greg > > > > _______________________________________________ > > sword-devel mailing list: [email protected] > > http://www.crosswire.org/mailman/listinfo/sword-devel > > Instructions to unsubscribe/change your settings at above page > > > _______________________________________________ > sword-devel mailing list: [email protected] > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page >
_______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
