Hi Lou, Yes, it does suggest for each paragraph you'd be tokenising (or grouping) into sentences, which might have a slight efficiency hit (but I doubt that much), but would make the choosing the number of sentences to be under $maxWords easier. I was assuming that you wanted the output to have the sentences marked as <s>, my mistake.
-James On Mon, 5 Nov 2018 at 12:31, Lou Burnard <[email protected]> wrote: > Thanks for v the quick reply james but doesnt your approach imply that the > tokenisation into sentences has already been done? Im trying t o avoid a > two pass solution as I expect to be doing this hundreds of times > > reluctantly using Outlook for Android <https://aka.ms/ghei36> > > ------------------------------ > *From:* James Cummings <[email protected]> > *Sent:* Monday, November 5, 2018 1:10:02 PM > *To:* Lou Burnard > *Cc:* [email protected] > *Subject:* Re: [oXygen-user] an xslt challenge > > Hi Lou, > > Would it make sense to use xsl:for-each-group to group the sentences into > <s> units to make this easier? Then I'd probably recursively call a > template or function passing the current collection of <s> units as a > variable item* value, testing if its tokenised number is above or below > $maxWords. > > Not got time to write that out as a solution atm, and I'm sure it can be > done without the recursivity as well, but that is the approach that would > have occurred to me at least. > > -James > > > On Mon, 5 Nov 2018 at 12:03, Lou Burnard <[email protected]> > wrote: > >> I hope I am not abusing this list in asking occasionally for advice on >> the best way to hack something in xslt. >> >> Today's problem is to output only the first x sentences (string >> terminated by a full stop) of a paragraph such that the total number of >> words (space delimited strings) is less than some limit (call it >> $maxWords) Since the sentences are of variable length, obviously I don't >> know what x is. >> >> Here's where I got to so far: >> >> <xsl:template match="t:p"> >> <xsl:variable name="pString"> >> <xsl:value-of select="."/> >> </xsl:variable> >> <xsl:for-each select="tokenize($pString, '\.\s')"> >> <xsl:variable name="seq"> >> <xsl:value-of select="string(position())"/> >> </xsl:variable> >> <xsl:variable name="wordsSoFar"> >> <xsl:value-of >> select="string-length(translate(normalize-space >> (preceding-sibling::text()), ' ', '')) + 1"/> >> </xsl:variable> >> <xsl:if test="$wordsSoFar < $maxWords"> >> >> <s n="{$seq}"> >> <xsl:value-of select="."/> >> </s> >> >> <xsl:if> >> >> </xsl:for-each> >> </xsl:template> >> >> But this is not valid because preceding-sibling:: wants a node() not a >> string (even though "text()" *is* a node imho). >> >> Am I going about this entirely the wrong way? >> >> >> >> >> _______________________________________________ >> oXygen-user mailing list >> [email protected] >> https://www.oxygenxml.com/mailman/listinfo/oxygen-user >> >
_______________________________________________ oXygen-user mailing list [email protected] https://www.oxygenxml.com/mailman/listinfo/oxygen-user
