I should have explained that the "Action" is "Send non-matching text to
sub-filter".
Best regards,
David
Sent with [ProtonMail](https://protonmail.com) Secure Email.
> -------- Original Message --------
> Subject: Re: [sword-devel] Module .conf files, Unicode Normalization
> Local Time: 6 January 2018 7:41 PM
> UTC Time: 6 January 2018 19:41
> From: dfh...@protonmail.com
> To: sword-devel mailing list <sword-devel@crosswire.org>
>
> "Here's one I made earlier."
>
> Comment...
> | Normalize to NFC excluding any Hebrew text
> |
> | NB. Does not expect any alphabetical presentation forms!
> |
> +--Perl pattern [[\x{0590}-\x{05FF}]+] with []
> | [X] Match case
> | [ ] Whole words only
> | [ ] Case sensitive replace
> | [ ] Prompt on replace
> | [ ] Skip prompt if identical
> | [ ] First only
> | [ ] Extract matches
> | Maximum text buffer size 4096
> | [X] Maximum match (greedy)
> | [ ] Allow comments
> | [ ] '.' matches newline
> | [X] UTF-8 Support
> |
> +--NFC - Canonical Decomposition, followed by Canonical Composition
>
> NB. That's merely the clipboard copy of the filter for illustration purposes.
>
> Best regards,
>
> David
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>
>> -------- Original Message --------
>> Subject: Re: [sword-devel] Module .conf files, Unicode Normalization
>> Local Time: 6 January 2018 7:26 PM
>> UTC Time: 6 January 2018 19:26
>> From: dfh...@protonmail.com
>> To: SWORD Developers' Collaboration Forum <sword-devel@crosswire.org>
>>
>> Good question, Tom.
>>
>> Assuming that the Latin script part of the source text actually required
>> normalization to NFC,
>> and that at least some of the Biblical Hebrew should not be converted to NFC,
>> you'd build the module using the -N switch of osis2mod, after first applying
>> a script
>> to the source text to ensure that both the requirements were implemented.
>>
>> It would be a very simple task for a bespoke TextPipe filter with a restrict
>> filter
>> designed to limit the Convert to NFC subfilter to the text that was not
>> Hebrew.
>>
>> Ignoring alphabetical presentation forms, all the Hebrew characters are in
>> one Unicode block.
>> A PCRE to exclude the Hebrew would be very simple.
>> I could almost do it in my sleep after 17 years using TextPipe.
>> No doubt other programmers could do likewise with Perl or Python, etc.
>>
>> Best regards,
>>
>> David
>>
>> Sent from ProtonMail Mobile
>>
>> On Sat, Jan 6, 2018 at 19:14, Tom Sullivan <i...@beforgiven.info> wrote:
>>
>>> Y'all: For text, such as in a commentary, which includes both Hebrew and
>>> English (or another modern Latin script using language), what do you put
>>> for the normalization? Tom Tom Sullivan i...@beforgiven.info FAX:
>>> 815-301-2835 --------------------- Great News! God created you, owns you
>>> and gave you commands to obey. You have disobeyed God - as your conscience
>>> very well attests to you. God's holiness and justice compel Him to punish
>>> you in Hell. Jesus Christ became Man, was crucified, buried and rose from
>>> the dead as a substitute for all who trust in Him, redeeming them from
>>> Hell. If you repent (turn from your sin) and believe (trust) in Jesus
>>> Christ, you will go to Heaven. Otherwise you will go to Hell. Warning! Good
>>> works are a result, not cause, of saving trust. More info is at
>>> www.esig.beforgiven.info Do you believe this? Copy this signature into your
>>> email program and use the Internet to spread the Great News every time you
>>> email. On 01/06/2018 12:32 PM, David Haslam wrote: > Hi Greg, > > One area
>>> where it might turn out to be useful is for the search features > of
>>> front-end apps. > It could be important to know that the underlying module
>>> text is _not_ > *NFC*. > > That's not to lay down a requirement as to how
>>> search features should be > designed, > but at least to provide the
>>> information in case it does matter for some > types of search option. > >
>>> Like other things in .conf files, a key can also be _educational_. > It may
>>> prompt developers and users to ask, /*Why did they do this?*/ > > cf. It
>>> was _almost by accident_ that in 2014, I first came across this > aspect of
>>> using Unicode for Biblical Hebrew. > /It applies only to texts with _both_
>>> vowel accents and cantillation./ > > Even though it's mentioned in our
>>> developers' wiki, it's all too easily > missed by other CrossWire
>>> volunteers. > > Best regards, > > David > > Sent with ProtonMail Secure
>>> Email. > >> -------- Original Message -------- >> Subject: Re:
>>> [sword-devel] Module .conf files, Unicode Normalization >> Local Time: 6
>>> January 2018 5:19 PM >> UTC Time: 6 January 2018 17:19 >> From:
>>> greg.helli...@gmail.com >> To: David Haslam , SWORD Developers' >>
>>> Collaboration Forum >> >> Why would the front end or engine need to know
>>> this information? Would >> it help the front end developers or users to
>>> know it? What do we gain >> by adding this? (I'm not implying it wouldn't
>>> be beneficial. But the >> only thing I know about Unicode is how the
>>> different UTF encodings >> work, so I have no idea what use this
>>> information could be. I also >> think changes to formats and information
>>> standards should be >> conservative instead of liberal) >> >> --Greg >> >>
>>> On Jan 6, 2018 11:01, "David Haslam" > > wrote: >> >> Dear all, >> >> We've
>>> known for quite a few years that there are aspects of >> *Biblical Hebrew*
>>> that mean we should _avoid_ converting the >> Unicode source text to *NFC*
>>> when we build a module. >> >> This prompts me to suggest that we ought to
>>> define a new *key* for >> .conf files. >> >> *Normalization=NFC* (this
>>> would be the default, and may be >> _omitted_ for the vast majority of
>>> modules) >> *Normalization=Custom* (we should include this in certain
>>> Biblical >> Hebrew modules) >> >> This would make it clear to front-end
>>> developers and users alike >> that the source text was _not_ converted to
>>> NFC during module build. >> i.e. *osis2mod* was used intentionally with the
>>> *-N* switch, in >> _accordance with the requirements of the source text
>>> provider_. >> >> The Unicode source text may already be encoded in *UTF-8*
>>> ; this >> memo is /only /about normalization. >> >> In the rare eventuality
>>> that there could arise a requrement for >> any of the other three
>>> normalization forms (*NFD*, *NFKC*, *NFKD*) >> defined by the Unicode
>>> Consortium, >> these would also be permitted values for the conf file key.
>>> >> >> A further benefit arises when a module needs to be updated. >> If the
>>> modules team sees that the .conf file includes the line >>
>>> *Normalization=Custom* >> they would be forewarned against converting to
>>> NFC through >> /inadvertently/ omitting the *-N* switch during module
>>> build. >> >> _Aside_: Another language with a need for non-standard >>
>>> normalization is *Tibetan*. We don't yet have a module in that script. >>
>>> >> Best regards, >> >> David >> >> Sent with ProtonMail Secure Email. >> >>
>>> >> _______________________________________________ >> sword-devel mailing
>>> list: sword-devel@crosswire.org >> >>
>>> http://www.crosswire.org/mailman/listinfo/sword-devel >> >> Instructions to
>>> unsubscribe/change your settings at above page > > >
>>> ______________________________________________________________________ >
>>> This email has been scanned by the Symantec Email Security.cloud service. >
>>> For more information please visit http://www.symanteccloud.com >
>>> ______________________________________________________________________ > >
>>> > _______________________________________________ > sword-devel mailing
>>> list: sword-devel@crosswire.org >
>>> http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to
>>> unsubscribe/change your settings at above page >
>>> _______________________________________________ sword-devel mailing list:
>>> sword-devel@crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to
>>> unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page