Re: Parsing (was: New way to do UCB lookups)
In a6b9336cdb62bb46b9f8708e686a7ea0116565f...@nrhmms8p02.uicnrh.dom, on 11/19/2012 at 04:06 PM, McKown, John john.mck...@healthmarkets.com said: I was told that many old languages had no interword spacing mainly because that wasted precious writing material. I was also told that old Hebrew omitted the vowels to save space, No; there were no vowel markings to omit; those were invented later, as were the Cantillation marks[2] (trop). AFAIK the interword spacing came before the vowel marks. which is why some words in the Torah as uncertain as to which word was meant. There are also issues with consonants. [1] Wikipedia http://en.wikipedia.org/wiki/Niqqud claims Early Middle Ages. [2] http://en.wikipedia.org/wiki/Cantillation -- Shmuel (Seymour J.) Metz, SysProg and JOAT Atid/2http://patriot.net/~shmuel We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Parsing (was: New way to do UCB lookups)
Gee, that sort of thing is why I was scared off of taking German in college. Unfortunately, in my ignorance, I took Russian instead. Long words with really funny looking characters. grin/ -- John McKown Systems Engineer IV IT Administrative Services Group HealthMarkets® 9151 Boulevard 26 . N. Richland Hills . TX 76010 (817) 255-3225 phone . john.mck...@healthmarkets.com . www.HealthMarkets.com Confidentiality Notice: This e-mail message may contain confidential or proprietary information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. HealthMarkets® is the brand name for products underwritten and issued by the insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance Company®, Mid-West National Life Insurance Company of TennesseeSM and The MEGA Life and Health Insurance Company.SM -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Lindy Mayfield Sent: Monday, November 19, 2012 4:53 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Parsing (was: New way to do UCB lookups) this isn't a complete illustrative example of what you refer to, but even still in some languages this is still today a certain extent true. some finnish words have all sorts of grammar built into them, yet are still considered one word: ikä = age ikävä = miss (you), too bad ikävystyä = to miss someone, be bored ikävystyneisyys = boredom ikävystyneisyydessä = in boredom ikävystynesyydessäänkään = not even in his boredeom ... that is for me a funny example, but not at all extreme. German has a lot of compound words that have no spaces. Finnish, too. My example was a single word but I could have made it longer by compounding it. Lindy -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Bill Fairchild Sent: Tuesday, November 20, 2012 12:44 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Parsing (was: New way to do UCB lookups) Typically in modern languages the vowel points, diacritic markings, syllabic stress markers, etc., are only used in printed works that are used by beginning learners of those languages. Being a beginning learner in Greek once again (and this time no drop-out), I have happily discovered that modern Greek texts atypically have syllabic stress markers in each word. My Latin teacher told me the same thing 50+ years ago - that punctuation, inter-word spacing, capitalization, etc., were never necessary until people stopped thinking. Delving into other languages is a good way to expand one's horizons and diminish one's provinciality. Like anything else we learn to do, I would wager that reading and writing in any language without punctuation, capitalization, and spacing would get much easier after the first few thousand hours of practice. :-) Bill Fairchild Programmer Rocket Software 408 Chamberlain Park Lane * Franklin, TN 37069-2526 * USA t: +1.617.614.4503 * e: bfairch...@rocketsoftware.com * w: www.rocketsoftware.com -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Parsing (was: New way to do UCB lookups)
Except that Cyrillic is an easy alphabet to learn - at least upper case is. :-) And it's particularly sensible with its single characters for dzh, ch, sh, ts, ya etc. And German is sensible too, apart from the boot verb., :-) Cheers, Martin Martin Packer, zChampion, Principal Systems Investigator, Worldwide Banking Center of Excellence, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker From: McKown, John john.mck...@healthmarkets.com To: IBM-MAIN@listserv.ua.edu, Date: 11/20/2012 12:53 PM Subject:Re: Parsing (was: New way to do UCB lookups) Sent by:IBM Mainframe Discussion List IBM-MAIN@listserv.ua.edu Gee, that sort of thing is why I was scared off of taking German in college. Unfortunately, in my ignorance, I took Russian instead. Long words with really funny looking characters. grin/ -- John McKown Systems Engineer IV IT Administrative Services Group HealthMarkets® 9151 Boulevard 26 . N. Richland Hills . TX 76010 (817) 255-3225 phone . john.mck...@healthmarkets.com . www.HealthMarkets.com Confidentiality Notice: This e-mail message may contain confidential or proprietary information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. HealthMarkets® is the brand name for products underwritten and issued by the insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance Company®, Mid-West National Life Insurance Company of TennesseeSM and The MEGA Life and Health Insurance Company.SM -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Lindy Mayfield Sent: Monday, November 19, 2012 4:53 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Parsing (was: New way to do UCB lookups) this isn't a complete illustrative example of what you refer to, but even still in some languages this is still today a certain extent true. some finnish words have all sorts of grammar built into them, yet are still considered one word: ikä = age ikävä = miss (you), too bad ikävystyä = to miss someone, be bored ikävystyneisyys = boredom ikävystyneisyydessä = in boredom ikävystynesyydessäänkään = not even in his boredeom ... that is for me a funny example, but not at all extreme. German has a lot of compound words that have no spaces. Finnish, too. My example was a single word but I could have made it longer by compounding it. Lindy -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Bill Fairchild Sent: Tuesday, November 20, 2012 12:44 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Parsing (was: New way to do UCB lookups) Typically in modern languages the vowel points, diacritic markings, syllabic stress markers, etc., are only used in printed works that are used by beginning learners of those languages. Being a beginning learner in Greek once again (and this time no drop-out), I have happily discovered that modern Greek texts atypically have syllabic stress markers in each word. My Latin teacher told me the same thing 50+ years ago - that punctuation, inter-word spacing, capitalization, etc., were never necessary until people stopped thinking. Delving into other languages is a good way to expand one's horizons and diminish one's provinciality. Like anything else we learn to do, I would wager that reading and writing in any language without punctuation, capitalization, and spacing would get much easier after the first few thousand hours of practice. :-) Bill Fairchild Programmer Rocket Software 408 Chamberlain Park Lane * Franklin, TN 37069-2526 * USA t: +1.617.614.4503 * e: bfairch...@rocketsoftware.com * w: www.rocketsoftware.com -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Parsing (was: New way to do UCB lookups)
On Mon, 19 Nov 2012 21:39:57 +, Lindy Mayfield wrote: It gets me all Lewis Carroll just thinking about it. I cannot even imagine how to create something like that SQL in Finnish. Something so simple as that, I cannot even think how a computer could parse it written in an agglutinative language. Though I am a bear of very little brain, so I'm sure it could be done. :-) Wouldn't this be somewhat like FORTRAN, where the lexical analyzer first removes _all_[1] blanks, rendering the source code maximally agglutinative, then attempts to parse the mess so created? [1] Well, except in quoted or counted text strings. So to bring it a bit back on to topic, English can be weird, but sometimes quite useful in its own way. Classic Latin was written with no interword separators. -- gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Parsing (was: New way to do UCB lookups)
On 11/19/2012 2:56 PM, Paul Gilmartin wrote: On Mon, 19 Nov 2012 21:39:57 +, Lindy Mayfield wrote: It gets me all Lewis Carroll just thinking about it. I cannot even imagine how to create something like that SQL in Finnish. Something so simple as that, I cannot even think how a computer could parse it written in an agglutinative language. Though I am a bear of very little brain, so I'm sure it could be done. :-) Wouldn't this be somewhat like FORTRAN, where the lexical analyzer first removes _all_[1] blanks, rendering the source code maximally agglutinative, then attempts to parse the mess so created? [1] Well, except in quoted or counted text strings. So to bring it a bit back on to topic, English can be weird, but sometimes quite useful in its own way. Classic Latin was written with no interword separators. Interesting. I didn't know that. Japanese is written with no interword separators also. -- Kind regards, -Steve Comstock The Trainer's Friend, Inc. 303-355-2752 http://www.trainersfriend.com * To get a good Return on your Investment, first make an investment! + Training your people is an excellent investment * Try our tool for calculating your Return On Investment for training dollars at http://www.trainersfriend.com/ROI/roi.html -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Parsing (was: New way to do UCB lookups)
I was told that many old languages had no interword spacing mainly because that wasted precious writing material. I was also told that old Hebrew omitted the vowels to save space, which is why some words in the Torah as uncertain as to which word was meant. mgnsntncwthnvwlsndnspcs (Imagine a sentence with no vowels and no spaces). -- John McKown Systems Engineer IV IT Administrative Services Group HealthMarkets® 9151 Boulevard 26 • N. Richland Hills • TX 76010 (817) 255-3225 phone • john.mck...@healthmarkets.com • www.HealthMarkets.com Confidentiality Notice: This e-mail message may contain confidential or proprietary information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. HealthMarkets® is the brand name for products underwritten and issued by the insurance subsidiaries of HealthMarkets, Inc. –The Chesapeake Life Insurance Company®, Mid-West National Life Insurance Company of TennesseeSM and The MEGA Life and Health Insurance Company.SM -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Steve Comstock Sent: Monday, November 19, 2012 4:00 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Parsing (was: New way to do UCB lookups) On 11/19/2012 2:56 PM, Paul Gilmartin wrote: On Mon, 19 Nov 2012 21:39:57 +, Lindy Mayfield wrote: It gets me all Lewis Carroll just thinking about it. I cannot even imagine how to create something like that SQL in Finnish. Something so simple as that, I cannot even think how a computer could parse it written in an agglutinative language. Though I am a bear of very little brain, so I'm sure it could be done. :-) Wouldn't this be somewhat like FORTRAN, where the lexical analyzer first removes _all_[1] blanks, rendering the source code maximally agglutinative, then attempts to parse the mess so created? [1] Well, except in quoted or counted text strings. So to bring it a bit back on to topic, English can be weird, but sometimes quite useful in its own way. Classic Latin was written with no interword separators. Interesting. I didn't know that. Japanese is written with no interword separators also. -- Kind regards, -Steve Comstock The Trainer's Friend, Inc. 303-355-2752 http://www.trainersfriend.com * To get a good Return on your Investment, first make an investment! + Training your people is an excellent investment * Try our tool for calculating your Return On Investment for training dollars at http://www.trainersfriend.com/ROI/roi.html -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Parsing (was: New way to do UCB lookups)
On Nov 19, 2012, at 3:56 PM, Paul Gilmartin paulgboul...@aim.com wrote: Classic Latin was written with no interword separators. My Greek professor once told us all punctuation (and spaces between words are punctuation) is a crutch for poor readers. I'll keep my crutch, thank you very much. -- Curtis Pew (c@its.utexas.edu) ITS Systems Core The University of Texas at Austin -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Parsing (was: New way to do UCB lookups)
Typically in modern languages the vowel points, diacritic markings, syllabic stress markers, etc., are only used in printed works that are used by beginning learners of those languages. Being a beginning learner in Greek once again (and this time no drop-out), I have happily discovered that modern Greek texts atypically have syllabic stress markers in each word. My Latin teacher told me the same thing 50+ years ago - that punctuation, inter-word spacing, capitalization, etc., were never necessary until people stopped thinking. Delving into other languages is a good way to expand one's horizons and diminish one's provinciality. Like anything else we learn to do, I would wager that reading and writing in any language without punctuation, capitalization, and spacing would get much easier after the first few thousand hours of practice. :-) Bill Fairchild Programmer Rocket Software 408 Chamberlain Park Lane * Franklin, TN 37069-2526 * USA t: +1.617.614.4503 * e: bfairch...@rocketsoftware.com * w: www.rocketsoftware.com -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Pew, Curtis G Sent: Monday, November 19, 2012 4:25 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Parsing (was: New way to do UCB lookups) On Nov 19, 2012, at 3:56 PM, Paul Gilmartin paulgboul...@aim.com wrote: Classic Latin was written with no interword separators. My Greek professor once told us all punctuation (and spaces between words are punctuation) is a crutch for poor readers. I'll keep my crutch, thank you very much. -- Curtis Pew (c@its.utexas.edu) ITS Systems Core The University of Texas at Austin -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Parsing (was: New way to do UCB lookups)
this isn't a complete illustrative example of what you refer to, but even still in some languages this is still today a certain extent true. some finnish words have all sorts of grammar built into them, yet are still considered one word: ikä = age ikävä = miss (you), too bad ikävystyä = to miss someone, be bored ikävystyneisyys = boredom ikävystyneisyydessä = in boredom ikävystynesyydessäänkään = not even in his boredeom ... that is for me a funny example, but not at all extreme. German has a lot of compound words that have no spaces. Finnish, too. My example was a single word but I could have made it longer by compounding it. Lindy -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Bill Fairchild Sent: Tuesday, November 20, 2012 12:44 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Parsing (was: New way to do UCB lookups) Typically in modern languages the vowel points, diacritic markings, syllabic stress markers, etc., are only used in printed works that are used by beginning learners of those languages. Being a beginning learner in Greek once again (and this time no drop-out), I have happily discovered that modern Greek texts atypically have syllabic stress markers in each word. My Latin teacher told me the same thing 50+ years ago - that punctuation, inter-word spacing, capitalization, etc., were never necessary until people stopped thinking. Delving into other languages is a good way to expand one's horizons and diminish one's provinciality. Like anything else we learn to do, I would wager that reading and writing in any language without punctuation, capitalization, and spacing would get much easier after the first few thousand hours of practice. :-) Bill Fairchild Programmer Rocket Software 408 Chamberlain Park Lane * Franklin, TN 37069-2526 * USA t: +1.617.614.4503 * e: bfairch...@rocketsoftware.com * w: www.rocketsoftware.com -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Parsing (was: New way to do UCB lookups)
:: -Original Message- :: From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On :: Behalf Of Steve Comstock :: Sent: Monday, November 19, 2012 2:00 PM :: To: IBM-MAIN@LISTSERV.UA.EDU :: Subject: Re: Parsing (was: New way to do UCB lookups) :: :: On 11/19/2012 2:56 PM, Paul Gilmartin wrote: :: On Mon, 19 Nov 2012 21:39:57 +, Lindy Mayfield wrote: :: :: It gets me all Lewis Carroll just thinking about it. I cannot even :: imagine how to create something like that SQL in Finnish. Something so :: simple as that, I cannot even think how a computer could parse it :: written in an agglutinative language. Though I am a bear of very little :: brain, so I'm sure it could be done. :-) :: :: Wouldn't this be somewhat like FORTRAN, where the lexical analyzer :: first removes :: _all_[1] blanks, rendering the source code maximally agglutinative, :: then attempts :: to parse the mess so created? :: :: [1] Well, except in quoted or counted text strings. :: :: So to bring it a bit back on to topic, English can be weird, but :: sometimes quite useful in its own way. :: :: Classic Latin was written with no interword separators. :: :: Interesting. I didn't know that. Japanese is written with no :: interword separators also. According to one of the folks I worked with over there, on the rare occasion when the character sequence is not sufficient to determine where the word break is, they will use a dot (think period or decimal point) to separate the characters. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Parsing (was: New way to do UCB lookups)
:: -Original Message- :: From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On :: Behalf Of McKown, John :: Sent: Monday, November 19, 2012 2:07 PM :: To: IBM-MAIN@LISTSERV.UA.EDU :: Subject: Re: Parsing (was: New way to do UCB lookups) :: :: I was told that many old languages had no interword spacing mainly :: because that wasted precious writing material. I was also told that old :: Hebrew omitted the vowels to save space, which is why some words in :: the Torah as uncertain as to which word was meant. :: mgnsntncwthnvwlsndnspcs (Imagine a sentence with no vowels and no :: spaces). It not just old Hebrew but many letter combinations have an implied vowel which can make the explicit vowel superfluous for the fluent. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN