Re: Parsing (was: New way to do UCB lookups)

2012-11-20 Thread Shmuel Metz (Seymour J.)
In a6b9336cdb62bb46b9f8708e686a7ea0116565f...@nrhmms8p02.uicnrh.dom,
on 11/19/2012
   at 04:06 PM, McKown, John john.mck...@healthmarkets.com said:

I was told that many old languages had no interword spacing 
mainly because that wasted precious writing material. I was also 
told that old Hebrew omitted the vowels to save space,

No; there were no vowel markings to omit; those were invented later,
as were the Cantillation marks[2] (trop). AFAIK the interword spacing
came before the vowel marks.

which is why some words in the Torah as uncertain as to which word
was meant.

There are also issues with consonants.

[1] Wikipedia http://en.wikipedia.org/wiki/Niqqud claims
Early Middle Ages.

[2] http://en.wikipedia.org/wiki/Cantillation

-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 Atid/2http://patriot.net/~shmuel
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Parsing (was: New way to do UCB lookups)

2012-11-20 Thread McKown, John
Gee, that sort of thing is why I was scared off of taking German in college. 
Unfortunately, in my ignorance, I took Russian instead. Long words with really 
funny looking characters. grin/

-- 
John McKown
Systems Engineer IV
IT

Administrative Services Group

HealthMarkets®

9151 Boulevard 26 . N. Richland Hills . TX 76010
(817) 255-3225 phone .
john.mck...@healthmarkets.com . www.HealthMarkets.com

Confidentiality Notice: This e-mail message may contain confidential or 
proprietary information. If you are not the intended recipient, please contact 
the sender by reply e-mail and destroy all copies of the original message. 
HealthMarkets® is the brand name for products underwritten and issued by the 
insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance 
Company®, Mid-West National Life Insurance Company of TennesseeSM and The MEGA 
Life and Health Insurance Company.SM


 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU]
 On Behalf Of Lindy Mayfield
 Sent: Monday, November 19, 2012 4:53 PM
 To: IBM-MAIN@LISTSERV.UA.EDU
 Subject: Re: Parsing (was: New way to do UCB lookups)
 
 this isn't a complete illustrative example of what you refer to, but
 even still in some languages this is still today a certain extent true.
 some finnish words have all sorts of grammar built into them, yet are
 still considered one word:
 
 ikä = age
 ikävä = miss (you), too bad
 ikävystyä = to miss someone, be bored
 ikävystyneisyys = boredom
 ikävystyneisyydessä = in boredom
 ikävystynesyydessäänkään = not even in his boredeom ...
 
 that is for me a funny example, but not at all extreme.  German has a
 lot of compound words that have no spaces.  Finnish, too.  My example
 was a single word but I could have made it longer by compounding it.
 
 Lindy
 
 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU]
 On Behalf Of Bill Fairchild
 Sent: Tuesday, November 20, 2012 12:44 AM
 To: IBM-MAIN@LISTSERV.UA.EDU
 Subject: Re: Parsing (was: New way to do UCB lookups)
 
 Typically in modern languages the vowel points, diacritic markings,
 syllabic stress markers, etc., are only used in printed works that are
 used by beginning learners of those languages.  Being a beginning
 learner in Greek once again (and this time no drop-out), I have happily
 discovered that modern Greek texts atypically have syllabic stress
 markers in each word.
 
 My Latin teacher told me the same thing 50+ years ago - that
 punctuation, inter-word spacing, capitalization, etc., were never
 necessary until people stopped thinking.  Delving into other languages
 is a good way to expand one's horizons and diminish one's
 provinciality.  Like anything else we learn to do, I would wager that
 reading and writing in any language without punctuation,
 capitalization, and spacing would get much easier after the first few
 thousand hours of practice.  :-)
 
 Bill Fairchild
 Programmer
 Rocket Software
 408 Chamberlain Park Lane * Franklin, TN 37069-2526 * USA
 t: +1.617.614.4503 *  e: bfairch...@rocketsoftware.com * w:
 www.rocketsoftware.com
 
 --
 For IBM-MAIN subscribe / signoff / archive access instructions, send
 email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Parsing (was: New way to do UCB lookups)

2012-11-20 Thread Martin Packer
Except that Cyrillic is an easy alphabet to learn - at least upper case 
is. :-) And it's particularly sensible with its single characters for dzh, 
ch, sh, ts, ya etc.

And German is sensible too, apart from the boot verb., :-)

Cheers, Martin

Martin Packer,
zChampion, Principal Systems Investigator,
Worldwide Banking Center of Excellence, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker
Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker



From:   McKown, John john.mck...@healthmarkets.com
To: IBM-MAIN@listserv.ua.edu, 
Date:   11/20/2012 12:53 PM
Subject:Re: Parsing (was: New way to do UCB lookups)
Sent by:IBM Mainframe Discussion List IBM-MAIN@listserv.ua.edu



Gee, that sort of thing is why I was scared off of taking German in 
college. Unfortunately, in my ignorance, I took Russian instead. Long 
words with really funny looking characters. grin/

-- 
John McKown
Systems Engineer IV
IT

Administrative Services Group

HealthMarkets®

9151 Boulevard 26 . N. Richland Hills . TX 76010
(817) 255-3225 phone .
john.mck...@healthmarkets.com . www.HealthMarkets.com

Confidentiality Notice: This e-mail message may contain confidential or 
proprietary information. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message. HealthMarkets® is the brand name for products underwritten and 
issued by the insurance subsidiaries of HealthMarkets, Inc. -The 
Chesapeake Life Insurance Company®, Mid-West National Life Insurance 
Company of TennesseeSM and The MEGA Life and Health Insurance Company.SM


 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU]
 On Behalf Of Lindy Mayfield
 Sent: Monday, November 19, 2012 4:53 PM
 To: IBM-MAIN@LISTSERV.UA.EDU
 Subject: Re: Parsing (was: New way to do UCB lookups)
 
 this isn't a complete illustrative example of what you refer to, but
 even still in some languages this is still today a certain extent true.
 some finnish words have all sorts of grammar built into them, yet are
 still considered one word:
 
 ikä = age
 ikävä = miss (you), too bad
 ikävystyä = to miss someone, be bored
 ikävystyneisyys = boredom
 ikävystyneisyydessä = in boredom
 ikävystynesyydessäänkään = not even in his boredeom ...
 
 that is for me a funny example, but not at all extreme.  German has a
 lot of compound words that have no spaces.  Finnish, too.  My example
 was a single word but I could have made it longer by compounding it.
 
 Lindy
 
 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU]
 On Behalf Of Bill Fairchild
 Sent: Tuesday, November 20, 2012 12:44 AM
 To: IBM-MAIN@LISTSERV.UA.EDU
 Subject: Re: Parsing (was: New way to do UCB lookups)
 
 Typically in modern languages the vowel points, diacritic markings,
 syllabic stress markers, etc., are only used in printed works that are
 used by beginning learners of those languages.  Being a beginning
 learner in Greek once again (and this time no drop-out), I have happily
 discovered that modern Greek texts atypically have syllabic stress
 markers in each word.
 
 My Latin teacher told me the same thing 50+ years ago - that
 punctuation, inter-word spacing, capitalization, etc., were never
 necessary until people stopped thinking.  Delving into other languages
 is a good way to expand one's horizons and diminish one's
 provinciality.  Like anything else we learn to do, I would wager that
 reading and writing in any language without punctuation,
 capitalization, and spacing would get much easier after the first few
 thousand hours of practice.  :-)
 
 Bill Fairchild
 Programmer
 Rocket Software
 408 Chamberlain Park Lane * Franklin, TN 37069-2526 * USA
 t: +1.617.614.4503 *  e: bfairch...@rocketsoftware.com * w:
 www.rocketsoftware.com
 
 --
 For IBM-MAIN subscribe / signoff / archive access instructions, send
 email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN








Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU






--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Parsing (was: New way to do UCB lookups)

2012-11-19 Thread Paul Gilmartin
On Mon, 19 Nov 2012 21:39:57 +, Lindy Mayfield wrote:

It gets me all Lewis Carroll just thinking about it.  I cannot even imagine 
how to create something like that SQL in Finnish.  Something so simple as 
that, I cannot even think how a computer could parse it written in an 
agglutinative language.  Though I am a bear of very little brain, so I'm sure 
it could be done.  :-)
 
Wouldn't this be somewhat like FORTRAN, where the lexical analyzer first removes
_all_[1] blanks, rendering the source code maximally agglutinative, then 
attempts
to parse the mess so created?

[1] Well, except in quoted or counted text strings.

So to bring it a bit back on to topic, English can be weird, but sometimes 
quite useful in its own way.

Classic Latin was written with no interword separators.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Parsing (was: New way to do UCB lookups)

2012-11-19 Thread Steve Comstock

On 11/19/2012 2:56 PM, Paul Gilmartin wrote:

On Mon, 19 Nov 2012 21:39:57 +, Lindy Mayfield wrote:


It gets me all Lewis Carroll just thinking about it.  I cannot even imagine how 
to create something like that SQL in Finnish.  Something so simple as that, I 
cannot even think how a computer could parse it written in an agglutinative 
language.  Though I am a bear of very little brain, so I'm sure it could be 
done.  :-)


Wouldn't this be somewhat like FORTRAN, where the lexical analyzer first removes
_all_[1] blanks, rendering the source code maximally agglutinative, then 
attempts
to parse the mess so created?

[1] Well, except in quoted or counted text strings.


So to bring it a bit back on to topic, English can be weird, but sometimes 
quite useful in its own way.


Classic Latin was written with no interword separators.


Interesting. I didn't know that. Japanese is written with no
interword separators also.



--

Kind regards,

-Steve Comstock
The Trainer's Friend, Inc.

303-355-2752
http://www.trainersfriend.com

* To get a good Return on your Investment, first make an investment!
  + Training your people is an excellent investment

* Try our tool for calculating your Return On Investment
for training dollars at
  http://www.trainersfriend.com/ROI/roi.html

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Parsing (was: New way to do UCB lookups)

2012-11-19 Thread McKown, John
I was told that many old languages had no interword spacing mainly because that 
wasted precious writing material. I was also told that old Hebrew omitted the 
vowels to save space, which is why some words in the Torah as uncertain as to 
which word was meant. mgnsntncwthnvwlsndnspcs (Imagine a sentence with no 
vowels and no spaces).

-- 
John McKown
Systems Engineer IV
IT

Administrative Services Group

HealthMarkets®

9151 Boulevard 26 • N. Richland Hills • TX 76010
(817) 255-3225 phone •
john.mck...@healthmarkets.com • www.HealthMarkets.com

Confidentiality Notice: This e-mail message may contain confidential or 
proprietary information. If you are not the intended recipient, please contact 
the sender by reply e-mail and destroy all copies of the original message. 
HealthMarkets® is the brand name for products underwritten and issued by the 
insurance subsidiaries of HealthMarkets, Inc. –The Chesapeake Life Insurance 
Company®, Mid-West National Life Insurance Company of TennesseeSM and The MEGA 
Life and Health Insurance Company.SM


 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU]
 On Behalf Of Steve Comstock
 Sent: Monday, November 19, 2012 4:00 PM
 To: IBM-MAIN@LISTSERV.UA.EDU
 Subject: Re: Parsing (was: New way to do UCB lookups)
 
 On 11/19/2012 2:56 PM, Paul Gilmartin wrote:
  On Mon, 19 Nov 2012 21:39:57 +, Lindy Mayfield wrote:
 
  It gets me all Lewis Carroll just thinking about it.  I cannot even
  imagine how to create something like that SQL in Finnish.  Something
  so simple as that, I cannot even think how a computer could parse it
  written in an agglutinative language.  Though I am a bear of very
  little brain, so I'm sure it could be done.  :-)
 
  Wouldn't this be somewhat like FORTRAN, where the lexical analyzer
  first removes _all_[1] blanks, rendering the source code maximally
  agglutinative, then attempts to parse the mess so created?
 
  [1] Well, except in quoted or counted text strings.
 
  So to bring it a bit back on to topic, English can be weird, but
 sometimes quite useful in its own way.
 
  Classic Latin was written with no interword separators.
 
 Interesting. I didn't know that. Japanese is written with no interword
 separators also.
 
 
 
 --
 
 Kind regards,
 
 -Steve Comstock
 The Trainer's Friend, Inc.
 
 303-355-2752
 http://www.trainersfriend.com
 
 * To get a good Return on your Investment, first make an investment!
+ Training your people is an excellent investment
 
 * Try our tool for calculating your Return On Investment
  for training dollars at
http://www.trainersfriend.com/ROI/roi.html
 
 --
 For IBM-MAIN subscribe / signoff / archive access instructions, send
 email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Parsing (was: New way to do UCB lookups)

2012-11-19 Thread Pew, Curtis G
On Nov 19, 2012, at 3:56 PM, Paul Gilmartin paulgboul...@aim.com wrote:

 Classic Latin was written with no interword separators.

My Greek professor once told us all punctuation (and spaces between words are 
punctuation) is a crutch for poor readers. I'll keep my crutch, thank you 
very much.
 
-- 
Curtis Pew (c@its.utexas.edu)
ITS Systems Core
The University of Texas at Austin

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Parsing (was: New way to do UCB lookups)

2012-11-19 Thread Bill Fairchild
Typically in modern languages the vowel points, diacritic markings, syllabic 
stress markers, etc., are only used in printed works that are used by beginning 
learners of those languages.  Being a beginning learner in Greek once again 
(and this time no drop-out), I have happily discovered that modern Greek texts 
atypically have syllabic stress markers in each word.

My Latin teacher told me the same thing 50+ years ago - that punctuation, 
inter-word spacing, capitalization, etc., were never necessary until people 
stopped thinking.  Delving into other languages is a good way to expand one's 
horizons and diminish one's provinciality.  Like anything else we learn to do, 
I would wager that reading and writing in any language without punctuation, 
capitalization, and spacing would get much easier after the first few thousand 
hours of practice.  :-)

Bill Fairchild
Programmer
Rocket Software
408 Chamberlain Park Lane * Franklin, TN 37069-2526 * USA
t: +1.617.614.4503 *  e: bfairch...@rocketsoftware.com * w: 
www.rocketsoftware.com


-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Pew, Curtis G
Sent: Monday, November 19, 2012 4:25 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Parsing (was: New way to do UCB lookups)

On Nov 19, 2012, at 3:56 PM, Paul Gilmartin paulgboul...@aim.com wrote:

 Classic Latin was written with no interword separators.

My Greek professor once told us all punctuation (and spaces between words are 
punctuation) is a crutch for poor readers. I'll keep my crutch, thank you 
very much.
 
--
Curtis Pew (c@its.utexas.edu)
ITS Systems Core
The University of Texas at Austin

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Parsing (was: New way to do UCB lookups)

2012-11-19 Thread Lindy Mayfield
this isn't a complete illustrative example of what you refer to, but even still 
in some languages this is still today a certain extent true.  some finnish 
words have all sorts of grammar built into them, yet are still considered one 
word:
 
ikä = age
ikävä = miss (you), too bad
ikävystyä = to miss someone, be bored
ikävystyneisyys = boredom
ikävystyneisyydessä = in boredom
ikävystynesyydessäänkään = not even in his boredeom ...

that is for me a funny example, but not at all extreme.  German has a lot of 
compound words that have no spaces.  Finnish, too.  My example was a single 
word but I could have made it longer by compounding it.  

Lindy

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Bill Fairchild
Sent: Tuesday, November 20, 2012 12:44 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Parsing (was: New way to do UCB lookups)

Typically in modern languages the vowel points, diacritic markings, syllabic 
stress markers, etc., are only used in printed works that are used by beginning 
learners of those languages.  Being a beginning learner in Greek once again 
(and this time no drop-out), I have happily discovered that modern Greek texts 
atypically have syllabic stress markers in each word.

My Latin teacher told me the same thing 50+ years ago - that punctuation, 
inter-word spacing, capitalization, etc., were never necessary until people 
stopped thinking.  Delving into other languages is a good way to expand one's 
horizons and diminish one's provinciality.  Like anything else we learn to do, 
I would wager that reading and writing in any language without punctuation, 
capitalization, and spacing would get much easier after the first few thousand 
hours of practice.  :-)

Bill Fairchild
Programmer
Rocket Software
408 Chamberlain Park Lane * Franklin, TN 37069-2526 * USA
t: +1.617.614.4503 *  e: bfairch...@rocketsoftware.com * w: 
www.rocketsoftware.com

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Parsing (was: New way to do UCB lookups)

2012-11-19 Thread retired mainframer
:: -Original Message-
:: From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
:: Behalf Of Steve Comstock
:: Sent: Monday, November 19, 2012 2:00 PM
:: To: IBM-MAIN@LISTSERV.UA.EDU
:: Subject: Re: Parsing (was: New way to do UCB lookups)
::
:: On 11/19/2012 2:56 PM, Paul Gilmartin wrote:
::  On Mon, 19 Nov 2012 21:39:57 +, Lindy Mayfield wrote:
:: 
::  It gets me all Lewis Carroll just thinking about it.  I cannot even
:: imagine how to create something like that SQL in Finnish.  Something so
:: simple as that, I cannot even think how a computer could parse it
:: written in an agglutinative language.  Though I am a bear of very little
:: brain, so I'm sure it could be done.  :-)
:: 
::  Wouldn't this be somewhat like FORTRAN, where the lexical analyzer
:: first removes
::  _all_[1] blanks, rendering the source code maximally agglutinative,
:: then attempts
::  to parse the mess so created?
:: 
::  [1] Well, except in quoted or counted text strings.
:: 
::  So to bring it a bit back on to topic, English can be weird, but
:: sometimes quite useful in its own way.
:: 
::  Classic Latin was written with no interword separators.
::
:: Interesting. I didn't know that. Japanese is written with no
:: interword separators also.

According to one of the folks I worked with over there, on the rare occasion
when the character sequence is not sufficient to determine where the word
break is, they will use a dot (think period or decimal point) to separate
the characters.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Parsing (was: New way to do UCB lookups)

2012-11-19 Thread retired mainframer
:: -Original Message-
:: From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
:: Behalf Of McKown, John
:: Sent: Monday, November 19, 2012 2:07 PM
:: To: IBM-MAIN@LISTSERV.UA.EDU
:: Subject: Re: Parsing (was: New way to do UCB lookups)
::
:: I was told that many old languages had no interword spacing mainly
:: because that wasted precious writing material. I was also told that old
:: Hebrew omitted the vowels to save space, which is why some words in
:: the Torah as uncertain as to which word was meant.
:: mgnsntncwthnvwlsndnspcs (Imagine a sentence with no vowels and no
:: spaces).

It not just old Hebrew but many letter combinations have an implied vowel
which can make the explicit vowel superfluous for the fluent.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN