Re: [silk] The weirdest languages
Black Swans [1] like the Mule [2]. Curse you, Udhay, for that image of the Mule in a tutu en pointe.
Re: [silk] The weirdest languages
On Sat, Jul 6, 2013 at 10:52 AM, Srini RamaKrishnan che...@gmail.com wrote: It won't be the most technically efficient language, but the one geopolitics elects as the winner. English and Mandarin are the only two real contestants in this world view, and their present hegemony is thanks mainly to a violent imperial past, and has nothing to do with technical brilliance. Yup. Similarly, I like to tell startups that a technology edge is not a sustainable business advantage, since technology can be acquired (bought/copied/whatever) rather more easily than other kinds of entry barriers. This is not a message that a technologist likes to hear, though. :) Udhay -- ((Udhay Shankar N)) ((udhay @ pobox.com)) ((www.digeratus.com))
Re: [silk] The weirdest languages
Of course you are right. That is precisely why we speak Latin and Greek to this day. Indrajit Gupta On Jul 6, 2013, at 10:52 AM, Srini RamaKrishnan che...@gmail.com wrote: On Fri, Jul 5, 2013 at 5:49 PM, Udhay Shankar N ud...@pobox.com wrote: Much more, including the full spreadsheet with all 21 'weirdness features' for all the languages, at the URL below. Also, it amuses me that this list says the most 'normal' language is Hindi. :-) It depresses me a little to say this, but market share matters more than features in the end. The way we are headed in a hundred years or less we will all speak the same language out of practicality for the most part. It won't be the most technically efficient language, but the one geopolitics elects as the winner. English and Mandarin are the only two real contestants in this world view, and their present hegemony is thanks mainly to a violent imperial past, and has nothing to do with technical brilliance.
Re: [silk] The weirdest languages
On Sat, Jul 6, 2013 at 11:55 AM, Bonobashi bonoba...@yahoo.co.in wrote: Of course you are right. That is precisely why we speak Latin and Greek to this day. [I claim that] Cheeni is right, but he oversimplified to exclude Black Swans [1] like the Mule [2]. Udhay, maniacally mixing and mangling metaphors [1] https://en.wikipedia.org/wiki/Black_swan_theory [2] https://en.wikipedia.org/wiki/Mule_%28Foundation%29 -- ((Udhay Shankar N)) ((udhay @ pobox.com)) ((www.digeratus.com))
Re: [silk] The weirdest languages
I have a headache. I think I shall retire from the fray, preferably to a Sri Lankan beach free of nerds who know too much. Indrajit Gupta On Jul 6, 2013, at 12:09 PM, Udhay Shankar N ud...@pobox.com wrote: On Sat, Jul 6, 2013 at 11:55 AM, Bonobashi bonoba...@yahoo.co.in wrote: Of course you are right. That is precisely why we speak Latin and Greek to this day. [I claim that] Cheeni is right, but he oversimplified to exclude Black Swans [1] like the Mule [2]. Udhay, maniacally mixing and mangling metaphors [1] https://en.wikipedia.org/wiki/Black_swan_theory [2] https://en.wikipedia.org/wiki/Mule_%28Foundation%29 -- ((Udhay Shankar N)) ((udhay @ pobox.com)) ((www.digeratus.com))
Re: [silk] The weirdest languages
On Fri, 2013-07-05 at 17:49 +0530, Udhay Shankar N wrote: Thoughts? I would say watch out! Don't take this stuff too seriously. Linguists have done a lot of bullshitting in the past and will continue to do so for the foreseeable future. University language departments don't get funding easily and they are quite capable of coming up with theories that attract the attention of some sucker who will fund them to come up with more crap. shiv
Re: [silk] The weirdest languages
On Fri, 2013-07-05 at 17:49 +0530, Udhay Shankar N wrote: http://idibon.com/the-weirdest-languages/ We’re in the business of We? In business We includes these people: Robert Munro Chief Executive Officer Rob drives the company’s vision, decisions, and overall strategic direction. He is a world leader in applying big data analytics to human communications, having worked in many diverse environments, from Sierra Leone, Haiti and the Amazon to London, Sydney and San Francisco. He completed a PhD in Computational Linguistics Tyler Schnoebelen Senior Data Scientist Tyler finds the patterns in data that make it meaningful. He has ten years of experience in UX design/research in Silicon Valley and a PhD from Stanford. His work there included experimental psycholinguistics, fieldwork on endangered languages, and a dissertation on emotion (he got his BA at Yale studying playwriting and poetry) Believe at your own risk.
Re: [silk] The weirdest languages
Hmmm. Would out of India and Aryan invasion theory have anything to do with this attack of jaundice, Shiv? Anything at all? No, nothing? Oh, all right, then. Just asked. Indrajit Gupta On Jul 6, 2013, at 2:41 PM, SS cybers...@gmail.com wrote: On Fri, 2013-07-05 at 17:49 +0530, Udhay Shankar N wrote: Thoughts? I would say watch out! Don't take this stuff too seriously. Linguists have done a lot of bullshitting in the past and will continue to do so for the foreseeable future. University language departments don't get funding easily and they are quite capable of coming up with theories that attract the attention of some sucker who will fund them to come up with more crap. shiv
Re: [silk] The weirdest languages
On Sat, 2013-07-06 at 15:23 +0530, Bonobashi wrote: Would out of India and Aryan invasion theory have anything to do with this attack of jaundice, Shiv? Anything at all? No, nothing? Oh, all right, then. Just asked. My late father, after acquiring the degrees BS, MS and PhD used to say that they stand for Bullshit, More of Same and Piled higher and deeper. I thought it was a joke until I started digging into the work of linguistics departments. The bullshitting started decades, if not over a century ago. What we see today is stuff that is built upon the original stuff - piled higher and deeper. The big take away lesson that I got from that is when it comes to new language theories 1. Be rigid 2. Be aggressive 3. Misquote, misinterpret and mislead to your heart's content because no one else will understand it and linguists are all doing the same thing anyway. 4. Accuse others of bigotry and less than honourable motives. Incidentally if you understood what was in that blog please post a summary in five words or less. shiv
Re: [silk] The weirdest languages
Summary in five words: Linguistics earns less than Medicine. On Jul 6, 2013, at 7:08 PM, SS cybers...@gmail.com wrote: On Sat, 2013-07-06 at 15:23 +0530, Bonobashi wrote: Would out of India and Aryan invasion theory have anything to do with this attack of jaundice, Shiv? Anything at all? No, nothing? Oh, all right, then. Just asked. My late father, after acquiring the degrees BS, MS and PhD used to say that they stand for Bullshit, More of Same and Piled higher and deeper. I thought it was a joke until I started digging into the work of linguistics departments. The bullshitting started decades, if not over a century ago. What we see today is stuff that is built upon the original stuff - piled higher and deeper. The big take away lesson that I got from that is when it comes to new language theories 1. Be rigid 2. Be aggressive 3. Misquote, misinterpret and mislead to your heart's content because no one else will understand it and linguists are all doing the same thing anyway. 4. Accuse others of bigotry and less than honourable motives. Incidentally if you understood what was in that blog please post a summary in five words or less. shiv
Re: [silk] The weirdest languages
On Sat, Jul 06, 2013 at 07:54:53PM +0530, Bonobashi wrote: Summary in five words: Linguistics earns less than Medicine. Not so sure about that anymore.
Re: [silk] The weirdest languages
That was a five word summary of an existing text. I have been informed that a useful adage for the circumstances is GIGO. Indrajit Gupta On Jul 6, 2013, at 10:11 PM, Eugen Leitl eu...@leitl.org wrote: On Sat, Jul 06, 2013 at 07:54:53PM +0530, Bonobashi wrote: Summary in five words: Linguistics earns less than Medicine. Not so sure about that anymore.
Re: [silk] The weirdest languages
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 07/06/2013 02:39 AM, Udhay Shankar N wrote: On Sat, Jul 6, 2013 at 11:55 AM, Bonobashi bonoba...@yahoo.co.in wrote: Of course you are right. That is precisely why we speak Latin and Greek to this day. [I claim that] Cheeni is right, but he oversimplified to exclude Black Swans [1] like the Mule [2]. Udhay, maniacally mixing and mangling metaphors [1] https://en.wikipedia.org/wiki/Black_swan_theory [2] https://en.wikipedia.org/wiki/Mule_%28Foundation%29 As an avowed Foundation fan, I approve of this message. (hopefully, I've been pegged by now for my preference.) And my apologies to Deepak for not responding to his excellent link to youtube. I've been travelling, and without my normal internet access. - -- Violence is the last refuge of the incompetent. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBCgAGBQJR2E01AAoJEDeph/0fVJWsEZgP/1/JiS95JldivTmKTr7sWplB mcgZ+KTfIl/Ejnn7B7T99i7N8mU+AUZTNhsRIYPGk7JukM1B3GHyZWo0CsKd76/H UE/mBTE8Bq/mAmE7a+ioJWb9UW2XVZby2c1n2mWxqshaY1G02qGdvE/Y8FjXUGoV +RuE+sJcyzy1tS5RhelKgu98hm3QyO7J8wwcz89LKEFy6H7zDlrNgrQnmee5c10Y 8tTTzqssCdZjaXXpl7apu1hPOUH29rh0N8bnwFCK1wtuJzzMhXkpF2byvX+48WpI nfy9Qdcm9k36hKduDO2w3jtjujrkC/yqH8hW+3tIRWsbvKdgWwTAOhcEMu0lHHLv uCe4Z1F0lUCbwratkJxcXq5iEh0+AhmvZXeJGQj6pfLGIynRaBNP5EU5N5sWzK7o mcs5OHE95agLHmMhNKNGG3UZ9SE374puIdx8bWcndMg2MYLIol9A8sbeFUDWRk1n CXjbMsu4ZQvVUUF5u+nv7wiNmQmcr2RMxXrrMxCjD7J4yJBjSiysfp0Di+J1Xp6p bH9QCPd+xt3UHuVHTTAmFz5rh894lBb2iKj1rdqZkZUCeQsiHlpOM50nKWfCrfRk 0pjEGxSGzp+iymNxiTN8vlGCtfK155sieJSAUXLTJ2S/CoK9NJRAHXwfQjSnlqDh Qx7/hiIm61sVQUobriaD =2unG -END PGP SIGNATURE-
[silk] The weirdest languages
Much more, including the full spreadsheet with all 21 'weirdness features' for all the languages, at the URL below. Also, it amuses me that this list says the most 'normal' language is Hindi. :-) Thoughts? Udhay http://idibon.com/the-weirdest-languages/ We’re in the business of natural language processing with lots of different languages. In the last six months, we’ve worked on (big breath): English, Portuguese (Brazilian and from Portugal), Spanish, Italian, French, Russian, German, Turkish, Arabic, Japanese, Greek, Mandarin Chinese, Persian, Polish, Dutch, Swedish, Serbian, Romanian, Korean, Hungarian, Bulgarian, Hindi, Croatian, Czech, Ukrainian, Finnish, Hebrew, Urdu, Catalan, Slovak, Indonesian, Malay, Vietnamese, Bengali, Thai, and a bit on Latvian, Estonian, Lithuanian, Kurdish, Yoruba, Amharic, Zulu, Hausa, Kazakh, Sindhi, Punjabi, Tagalog, Cebuano, Danish, and Navajo. Natural language processing (NLP) is about finding patterns in language—for example, taking heaps of unstructured text and automatically pulling out its structure. The open secret about NLP is that it’s very English-centric. English is far and away the language that linguists have worked on the most and it’s also the language that has the most available resources for computer science projects (and more data is almost always better in computer science). So one of the best ways to test an NLP system is to try languages other than English. The better that a system can deal with diverse data, the more confident that you can be in its ability to handle unseen data. To this end, we might choose to define “weirdness” in terms of English. But that’s a pretty irritating definition. Let’s try to do something different. A global method for linguistic outliers The World Atlas of Language Structures evaluates 2,676 different languages in terms of a bunch of different language features. These features include word order, types of sounds, ways of doing negation, and a lot of other things—192 different language features in total. So rather than take an English-centric view of the world, WALS allows us take a worldwide view. That is, we evaluate each language in terms of how unusual it is for each feature. For example, English word order is subject-verb-object—there are 1,377 languages that are coded for word order in WALS and 35.5% of them have SVO word order. Meanwhile only 8.7% of languages start with a verb—like Welsh, Hawaiian and Majang—so cross-linguistically, starting with a verb is unusual. For what it’s worth, 41.0% of the world’s languages are actually SOV order. (Aside: I’ve done some work with Hawaiian and Majang and that’s how I learned that verbs are a big commitment for me. I’m just not ready for verbs when I open my mouth.) The data in WALS is fairly sparse, so we restrict ourselves to the 165 features that have at least 100 languages in them (at this stage we also knock out languages that have fewer than 10 of these—dropping us down to 1,693 languages). Now, one problem is that if you just stop there you have a huge amount of collinearity. Part of this is just the nature of the features listed in WALS—there’s one for overall subject/object/verb order and then separate ones for object/verb and subject/verb. Ideally, we’d like to judge weirdness based on unrelated features. We can focus in on features that aren’t strongly correlated with each other (between two correlated features, we pick the one that has more languages coded for it). We end up with 21 features in total. For each value that a language has, we calculate the relative frequency of that value for all the other languages that are coded for it. So if we had included subject-object-verb order then English would’ve gotten a value of 0.355 (we actually normalized these values according to the overal entropy for each feature, so it wasn’t exactly 0.355, but you get the idea). The Weirdness Index is then an average across the 21 unique structural features. But because different features have different numbers of values and we want to reduce skewing, we actually take the harmonic mean (and because we want bigger numbers = more weird, we actually subtract the mean from one). In this blog post, I’ll only report languages that have a value filled in for at least two-thirds of features (239 languages). The outlier (weirdest) languages The language that is most different from the majority of all other languages in the world is a verb-initial tonal languages spoken by 6,000 people in Oaxaca, Mexico, known as Chalcatongo Mixtec (aka San Miguel el Grande Mixtec). Number two is spoken in Siberia by 22,000 people: Nenets (that’s where we get the word parka from). Number three is Choctaw, spoken by about 10,000 people, mostly in Oklahoma. But here’s the rub—some of the weirdest languages in the world are ones you’ve heard of: German, Dutch, Norwegian, Czech, Spanish, and Mandarin. And actually English is #33 in the Language Weirdness Index. The weirdest languages in the
Re: [silk] The weirdest languages
On 5 July 2013 17:49, Udhay Shankar N ud...@pobox.com wrote: Also, it amuses me that this list says the most 'normal' language is Hindi Abey yaar, isn't that what people north of the Hebbal Flyover have been trying to say all this time. -gabin -- They pay me to think... As long as I keep my mouth shut.
Re: [silk] The weirdest languages
On Fri, Jul 5, 2013 at 5:49 PM, Udhay Shankar N ud...@pobox.com wrote: Much more, including the full spreadsheet with all 21 'weirdness features' for all the languages, at the URL below. Also, it amuses me that this list says the most 'normal' language is Hindi. :-) It depresses me a little to say this, but market share matters more than features in the end. The way we are headed in a hundred years or less we will all speak the same language out of practicality for the most part. It won't be the most technically efficient language, but the one geopolitics elects as the winner. English and Mandarin are the only two real contestants in this world view, and their present hegemony is thanks mainly to a violent imperial past, and has nothing to do with technical brilliance.