I should have explained that the "Action" is "Send non-matching text to 
sub-filter".

Best regards,

David

Sent with [ProtonMail](https://protonmail.com) Secure Email.

> -------- Original Message --------
> Subject: Re: [sword-devel] Module .conf files, Unicode Normalization
> Local Time: 6 January 2018 7:41 PM
> UTC Time: 6 January 2018 19:41
> From: dfh...@protonmail.com
> To: sword-devel mailing list <sword-devel@crosswire.org>
>
> "Here's one I made earlier."
>
> Comment...
> |  Normalize to NFC excluding any Hebrew text
> |
> |  NB. Does not expect any alphabetical presentation forms!
> |
> +--Perl pattern [[\x{0590}-\x{05FF}]+] with []
>    |  [X] Match case
>    |  [ ] Whole words only
>    |  [ ] Case sensitive replace
>    |  [ ] Prompt on replace
>    |  [ ] Skip prompt if identical
>    |  [ ] First only
>    |  [ ] Extract matches
>    |      Maximum text buffer size 4096
>    |  [X] Maximum match (greedy)
>    |  [ ] Allow comments
>    |  [ ] '.' matches newline
>    |  [X] UTF-8 Support
>    |
>    +--NFC - Canonical Decomposition, followed by Canonical Composition
>
> NB. That's merely the clipboard copy of the filter for illustration purposes.
>
> Best regards,
>
> David
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>
>> -------- Original Message --------
>> Subject: Re: [sword-devel] Module .conf files, Unicode Normalization
>> Local Time: 6 January 2018 7:26 PM
>> UTC Time: 6 January 2018 19:26
>> From: dfh...@protonmail.com
>> To: SWORD Developers' Collaboration Forum <sword-devel@crosswire.org>
>>
>> Good question, Tom.
>>
>> Assuming that the Latin script part of the source text actually required 
>> normalization to NFC,
>> and that at least some of the Biblical Hebrew should not be converted to NFC,
>> you'd build the module using the -N switch of osis2mod, after first applying 
>> a script
>> to the source text to ensure that both the requirements were implemented.
>>
>> It would be a very simple task for a bespoke TextPipe filter with a restrict 
>> filter
>> designed to limit the Convert to NFC subfilter to the text that was not 
>> Hebrew.
>>
>> Ignoring alphabetical presentation forms, all the Hebrew characters are in 
>> one Unicode block.
>> A PCRE to exclude the Hebrew would be very simple.
>> I could almost do it in my sleep after 17 years using TextPipe.
>> No doubt other programmers could do likewise with Perl or Python, etc.
>>
>> Best regards,
>>
>> David
>>
>> Sent from ProtonMail Mobile
>>
>> On Sat, Jan 6, 2018 at 19:14, Tom Sullivan <i...@beforgiven.info> wrote:
>>
>>> Y'all: For text, such as in a commentary, which includes both Hebrew and 
>>> English (or another modern Latin script using language), what do you put 
>>> for the normalization? Tom Tom Sullivan i...@beforgiven.info FAX: 
>>> 815-301-2835 --------------------- Great News! God created you, owns you 
>>> and gave you commands to obey. You have disobeyed God - as your conscience 
>>> very well attests to you. God's holiness and justice compel Him to punish 
>>> you in Hell. Jesus Christ became Man, was crucified, buried and rose from 
>>> the dead as a substitute for all who trust in Him, redeeming them from 
>>> Hell. If you repent (turn from your sin) and believe (trust) in Jesus 
>>> Christ, you will go to Heaven. Otherwise you will go to Hell. Warning! Good 
>>> works are a result, not cause, of saving trust. More info is at 
>>> www.esig.beforgiven.info Do you believe this? Copy this signature into your 
>>> email program and use the Internet to spread the Great News every time you 
>>> email. On 01/06/2018 12:32 PM, David Haslam wrote: > Hi Greg, > > One area 
>>> where it might turn out to be useful is for the search features > of 
>>> front-end apps. > It could be important to know that the underlying module 
>>> text is _not_ > *NFC*. > > That's not to lay down a requirement as to how 
>>> search features should be > designed, > but at least to provide the 
>>> information in case it does matter for some > types of search option. > > 
>>> Like other things in .conf files, a key can also be _educational_. > It may 
>>> prompt developers and users to ask, /*Why did they do this?*/ > > cf. It 
>>> was _almost by accident_ that in 2014, I first came across this > aspect of 
>>> using Unicode for Biblical Hebrew. > /It applies only to texts with _both_ 
>>> vowel accents and cantillation./ > > Even though it's mentioned in our 
>>> developers' wiki, it's all too easily > missed by other CrossWire 
>>> volunteers. > > Best regards, > > David > > Sent with ProtonMail Secure 
>>> Email. > >> -------- Original Message -------- >> Subject: Re: 
>>> [sword-devel] Module .conf files, Unicode Normalization >> Local Time: 6 
>>> January 2018 5:19 PM >> UTC Time: 6 January 2018 17:19 >> From: 
>>> greg.helli...@gmail.com >> To: David Haslam , SWORD Developers' >> 
>>> Collaboration Forum >> >> Why would the front end or engine need to know 
>>> this information? Would >> it help the front end developers or users to 
>>> know it? What do we gain >> by adding this? (I'm not implying it wouldn't 
>>> be beneficial. But the >> only thing I know about Unicode is how the 
>>> different UTF encodings >> work, so I have no idea what use this 
>>> information could be. I also >> think changes to formats and information 
>>> standards should be >> conservative instead of liberal) >> >> --Greg >> >> 
>>> On Jan 6, 2018 11:01, "David Haslam" > > wrote: >> >> Dear all, >> >> We've 
>>> known for quite a few years that there are aspects of >> *Biblical Hebrew* 
>>> that mean we should _avoid_ converting the >> Unicode source text to *NFC* 
>>> when we build a module. >> >> This prompts me to suggest that we ought to 
>>> define a new *key* for >> .conf files. >> >> *Normalization=NFC* (this 
>>> would be the default, and may be >> _omitted_ for the vast majority of 
>>> modules) >> *Normalization=Custom* (we should include this in certain 
>>> Biblical >> Hebrew modules) >> >> This would make it clear to front-end 
>>> developers and users alike >> that the source text was _not_ converted to 
>>> NFC during module build. >> i.e. *osis2mod* was used intentionally with the 
>>> *-N* switch, in >> _accordance with the requirements of the source text 
>>> provider_. >> >> The Unicode source text may already be encoded in *UTF-8* 
>>> ; this >> memo is /only /about normalization. >> >> In the rare eventuality 
>>> that there could arise a requrement for >> any of the other three 
>>> normalization forms (*NFD*, *NFKC*, *NFKD*) >> defined by the Unicode 
>>> Consortium, >> these would also be permitted values for the conf file key. 
>>> >> >> A further benefit arises when a module needs to be updated. >> If the 
>>> modules team sees that the .conf file includes the line >> 
>>> *Normalization=Custom* >> they would be forewarned against converting to 
>>> NFC through >> /inadvertently/ omitting the *-N* switch during module 
>>> build. >> >> _Aside_: Another language with a need for non-standard >> 
>>> normalization is *Tibetan*. We don't yet have a module in that script. >> 
>>> >> Best regards, >> >> David >> >> Sent with ProtonMail Secure Email. >> >> 
>>> >> _______________________________________________ >> sword-devel mailing 
>>> list: sword-devel@crosswire.org >> >> 
>>> http://www.crosswire.org/mailman/listinfo/sword-devel >> >> Instructions to 
>>> unsubscribe/change your settings at above page > > > 
>>> ______________________________________________________________________ > 
>>> This email has been scanned by the Symantec Email Security.cloud service. > 
>>> For more information please visit http://www.symanteccloud.com > 
>>> ______________________________________________________________________ > > 
>>> > _______________________________________________ > sword-devel mailing 
>>> list: sword-devel@crosswire.org > 
>>> http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to 
>>> unsubscribe/change your settings at above page > 
>>> _______________________________________________ sword-devel mailing list: 
>>> sword-devel@crosswire.org 
>>> http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to 
>>> unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to