Re: [silk] The weirdest languages

2013-07-08 Thread Ramakrishnan Sundaram
 Black
 Swans [1] like the Mule [2].

Curse you, Udhay, for that image of the Mule in a tutu en pointe.


Re: [silk] The weirdest languages

2013-07-06 Thread Udhay Shankar N
On Sat, Jul 6, 2013 at 10:52 AM, Srini RamaKrishnan che...@gmail.com wrote:

 It won't be the most technically efficient language, but the one
 geopolitics elects as the winner. English and Mandarin are the only
 two real contestants in this world view, and their present hegemony is
 thanks mainly to a violent imperial past, and has nothing to do with
 technical brilliance.

Yup. Similarly, I like to tell startups that a technology edge is not
a sustainable business advantage, since technology can be acquired
(bought/copied/whatever) rather more easily than other kinds of entry
barriers. This is not a message that a technologist likes to hear,
though. :)

Udhay
-- 
((Udhay Shankar N)) ((udhay @ pobox.com)) ((www.digeratus.com))



Re: [silk] The weirdest languages

2013-07-06 Thread Bonobashi
Of course you are right. That is precisely why we speak Latin and Greek to this 
day.

Indrajit Gupta

On Jul 6, 2013, at 10:52 AM, Srini RamaKrishnan che...@gmail.com wrote:

 On Fri, Jul 5, 2013 at 5:49 PM, Udhay Shankar N ud...@pobox.com wrote:
 Much more, including the full spreadsheet with all 21 'weirdness
 features' for all the languages, at the URL below.
 
 Also, it amuses me that this list says the most 'normal' language is
 Hindi. :-)
 
 It depresses me a little to say this, but market share matters more
 than features in the end. The way we are headed in a hundred years or
 less we will all speak the same language out of practicality for the
 most part.
 
 It won't be the most technically efficient language, but the one
 geopolitics elects as the winner. English and Mandarin are the only
 two real contestants in this world view, and their present hegemony is
 thanks mainly to a violent imperial past, and has nothing to do with
 technical brilliance.
 



Re: [silk] The weirdest languages

2013-07-06 Thread Udhay Shankar N
On Sat, Jul 6, 2013 at 11:55 AM, Bonobashi bonoba...@yahoo.co.in wrote:

 Of course you are right. That is precisely why we speak Latin and Greek to 
 this day.

[I claim that] Cheeni is right, but he oversimplified to exclude Black
Swans [1] like the Mule [2].

Udhay, maniacally mixing and mangling metaphors

[1] https://en.wikipedia.org/wiki/Black_swan_theory
[2] https://en.wikipedia.org/wiki/Mule_%28Foundation%29
-- 
((Udhay Shankar N)) ((udhay @ pobox.com)) ((www.digeratus.com))



Re: [silk] The weirdest languages

2013-07-06 Thread Bonobashi
I have a headache. I think I shall retire from the fray, preferably to a Sri 
Lankan beach free of nerds who know too much.

Indrajit Gupta

On Jul 6, 2013, at 12:09 PM, Udhay Shankar N ud...@pobox.com wrote:

 On Sat, Jul 6, 2013 at 11:55 AM, Bonobashi bonoba...@yahoo.co.in wrote:
 
 Of course you are right. That is precisely why we speak Latin and Greek to 
 this day.
 
 [I claim that] Cheeni is right, but he oversimplified to exclude Black
 Swans [1] like the Mule [2].
 
 Udhay, maniacally mixing and mangling metaphors
 
 [1] https://en.wikipedia.org/wiki/Black_swan_theory
 [2] https://en.wikipedia.org/wiki/Mule_%28Foundation%29
 -- 
 ((Udhay Shankar N)) ((udhay @ pobox.com)) ((www.digeratus.com))
 



Re: [silk] The weirdest languages

2013-07-06 Thread SS
On Fri, 2013-07-05 at 17:49 +0530, Udhay Shankar N wrote:
 Thoughts?

I would say watch out! Don't take this stuff too seriously. Linguists
have done a lot of bullshitting in the past and will continue to do so
for the foreseeable future. University language departments don't get
funding easily and they are quite capable of coming up with theories
that attract the attention of some sucker who will fund them to come up
with more crap.

shiv




Re: [silk] The weirdest languages

2013-07-06 Thread SS
On Fri, 2013-07-05 at 17:49 +0530, Udhay Shankar N wrote:
 
 
 http://idibon.com/the-weirdest-languages/
 
 We’re in the business of 

We? In business

We includes these people:

Robert Munro
Chief Executive Officer
Rob drives the company’s vision, decisions, and overall strategic
direction. He is a world leader in applying big data analytics to human
communications, having worked in many diverse environments, from Sierra
Leone, Haiti and the Amazon to London, Sydney and San Francisco. He
completed a PhD in Computational Linguistics


Tyler Schnoebelen
Senior Data Scientist
Tyler finds the patterns in data that make it meaningful. He has ten
years of experience in UX design/research in Silicon Valley and a PhD
from Stanford. His work there included experimental psycholinguistics,
fieldwork on endangered languages, and a dissertation on emotion (he got
his BA at Yale studying playwriting and poetry)

Believe at your own risk.








Re: [silk] The weirdest languages

2013-07-06 Thread Bonobashi
Hmmm.

Would out of India and Aryan invasion theory have anything to do with this 
attack of jaundice, Shiv? Anything at all? No, nothing? Oh, all right, then. 
Just asked.

Indrajit Gupta

On Jul 6, 2013, at 2:41 PM, SS cybers...@gmail.com wrote:

 On Fri, 2013-07-05 at 17:49 +0530, Udhay Shankar N wrote:
 Thoughts?
 
 I would say watch out! Don't take this stuff too seriously. Linguists
 have done a lot of bullshitting in the past and will continue to do so
 for the foreseeable future. University language departments don't get
 funding easily and they are quite capable of coming up with theories
 that attract the attention of some sucker who will fund them to come up
 with more crap.
 
 shiv
 
 



Re: [silk] The weirdest languages

2013-07-06 Thread SS
On Sat, 2013-07-06 at 15:23 +0530, Bonobashi wrote:
 Would out of India and Aryan invasion theory have anything to do with
 this attack of jaundice, Shiv? Anything at all? No, nothing? Oh, all
 right, then. Just asked.

My late father, after acquiring the degrees BS, MS and PhD used to say
that they stand for Bullshit, More of Same and Piled higher and deeper. 

I thought it was a joke until I started digging into the work of
linguistics departments. The bullshitting started decades, if not over a
century ago. What we see today is stuff that is built upon the original
stuff - piled higher and deeper.

The big take away lesson that I got from that is when it comes to new
language theories
1. Be rigid
2. Be aggressive 
3. Misquote, misinterpret and mislead to your heart's content because no
one else will understand it and linguists are all doing the same thing
anyway.
4. Accuse others of bigotry and less than honourable motives.

Incidentally if you understood what was in that blog please post a
summary in five words or less. 

shiv







Re: [silk] The weirdest languages

2013-07-06 Thread Bonobashi
Summary in five words: 

Linguistics earns less than Medicine.

On Jul 6, 2013, at 7:08 PM, SS cybers...@gmail.com wrote:

 On Sat, 2013-07-06 at 15:23 +0530, Bonobashi wrote:
 Would out of India and Aryan invasion theory have anything to do with
 this attack of jaundice, Shiv? Anything at all? No, nothing? Oh, all
 right, then. Just asked.
 
 My late father, after acquiring the degrees BS, MS and PhD used to say
 that they stand for Bullshit, More of Same and Piled higher and deeper. 
 
 I thought it was a joke until I started digging into the work of
 linguistics departments. The bullshitting started decades, if not over a
 century ago. What we see today is stuff that is built upon the original
 stuff - piled higher and deeper.
 
 The big take away lesson that I got from that is when it comes to new
 language theories
 1. Be rigid
 2. Be aggressive 
 3. Misquote, misinterpret and mislead to your heart's content because no
 one else will understand it and linguists are all doing the same thing
 anyway.
 4. Accuse others of bigotry and less than honourable motives.
 
 Incidentally if you understood what was in that blog please post a
 summary in five words or less. 
 
 shiv
 
 
 
 
 



Re: [silk] The weirdest languages

2013-07-06 Thread Eugen Leitl
On Sat, Jul 06, 2013 at 07:54:53PM +0530, Bonobashi wrote:
 Summary in five words: 
 
 Linguistics earns less than Medicine.

Not so sure about that anymore.



Re: [silk] The weirdest languages

2013-07-06 Thread Bonobashi
That was a five word summary of an existing text. I have been informed that a 
useful adage for the circumstances is GIGO.

Indrajit Gupta

On Jul 6, 2013, at 10:11 PM, Eugen Leitl eu...@leitl.org wrote:

 On Sat, Jul 06, 2013 at 07:54:53PM +0530, Bonobashi wrote:
 Summary in five words: 
 
 Linguistics earns less than Medicine.
 
 Not so sure about that anymore.
 



Re: [silk] The weirdest languages

2013-07-06 Thread Landon Hurley
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 07/06/2013 02:39 AM, Udhay Shankar N wrote:
 On Sat, Jul 6, 2013 at 11:55 AM, Bonobashi bonoba...@yahoo.co.in wrote:
 
 Of course you are right. That is precisely why we speak Latin and Greek to 
 this day.
 
 [I claim that] Cheeni is right, but he oversimplified to exclude Black
 Swans [1] like the Mule [2].
 
 Udhay, maniacally mixing and mangling metaphors
 
 [1] https://en.wikipedia.org/wiki/Black_swan_theory
 [2] https://en.wikipedia.org/wiki/Mule_%28Foundation%29
 

As an avowed Foundation fan, I approve of this message. (hopefully, I've
been pegged by now for my preference.) And my apologies to Deepak for
not responding to his excellent link to youtube. I've been travelling,
and without my normal internet access.

- -- 
Violence is the last refuge of the incompetent.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iQIcBAEBCgAGBQJR2E01AAoJEDeph/0fVJWsEZgP/1/JiS95JldivTmKTr7sWplB
mcgZ+KTfIl/Ejnn7B7T99i7N8mU+AUZTNhsRIYPGk7JukM1B3GHyZWo0CsKd76/H
UE/mBTE8Bq/mAmE7a+ioJWb9UW2XVZby2c1n2mWxqshaY1G02qGdvE/Y8FjXUGoV
+RuE+sJcyzy1tS5RhelKgu98hm3QyO7J8wwcz89LKEFy6H7zDlrNgrQnmee5c10Y
8tTTzqssCdZjaXXpl7apu1hPOUH29rh0N8bnwFCK1wtuJzzMhXkpF2byvX+48WpI
nfy9Qdcm9k36hKduDO2w3jtjujrkC/yqH8hW+3tIRWsbvKdgWwTAOhcEMu0lHHLv
uCe4Z1F0lUCbwratkJxcXq5iEh0+AhmvZXeJGQj6pfLGIynRaBNP5EU5N5sWzK7o
mcs5OHE95agLHmMhNKNGG3UZ9SE374puIdx8bWcndMg2MYLIol9A8sbeFUDWRk1n
CXjbMsu4ZQvVUUF5u+nv7wiNmQmcr2RMxXrrMxCjD7J4yJBjSiysfp0Di+J1Xp6p
bH9QCPd+xt3UHuVHTTAmFz5rh894lBb2iKj1rdqZkZUCeQsiHlpOM50nKWfCrfRk
0pjEGxSGzp+iymNxiTN8vlGCtfK155sieJSAUXLTJ2S/CoK9NJRAHXwfQjSnlqDh
Qx7/hiIm61sVQUobriaD
=2unG
-END PGP SIGNATURE-



[silk] The weirdest languages

2013-07-05 Thread Udhay Shankar N
Much more, including the full spreadsheet with all 21 'weirdness
features' for all the languages, at the URL below.

Also, it amuses me that this list says the most 'normal' language is
Hindi. :-)

Thoughts?

Udhay

http://idibon.com/the-weirdest-languages/

We’re in the business of natural language processing with lots of
different languages. In the last six months, we’ve worked on (big
breath): English, Portuguese (Brazilian and from Portugal), Spanish,
Italian, French, Russian, German, Turkish, Arabic, Japanese, Greek,
Mandarin Chinese, Persian, Polish, Dutch, Swedish, Serbian, Romanian,
Korean, Hungarian, Bulgarian, Hindi, Croatian, Czech, Ukrainian,
Finnish, Hebrew, Urdu, Catalan, Slovak, Indonesian, Malay, Vietnamese,
Bengali, Thai, and a bit on Latvian, Estonian, Lithuanian, Kurdish,
Yoruba, Amharic, Zulu, Hausa, Kazakh, Sindhi, Punjabi, Tagalog, Cebuano,
Danish, and Navajo.

Natural language processing (NLP) is about finding patterns in
language—for example, taking heaps of unstructured text and
automatically pulling out its structure. The open secret about NLP is
that it’s very English-centric. English is far and away the language
that linguists have worked on the most and it’s also the language that
has the most available resources for computer science projects (and more
data is almost always better in computer science). So one of the best
ways to test an NLP system is to try languages other than English. The
better that a system can deal with diverse  data, the more confident
that you can be in its ability to handle unseen data.

To this end, we might choose to define “weirdness” in terms of English.
But that’s a pretty irritating definition. Let’s try to do something
different.
A global method for linguistic outliers

The World Atlas of Language Structures evaluates 2,676 different
languages in terms of a bunch of different language features. These
features include word order, types of sounds, ways of doing negation,
and a lot of other things—192 different language features in total.

So rather than take an English-centric view of the world, WALS allows us
take a worldwide view. That is, we evaluate each language in terms of
how unusual it is for each feature. For example, English word order is
subject-verb-object—there are 1,377 languages that are coded for word
order in WALS and 35.5% of them have SVO word order. Meanwhile only 8.7%
of languages start with a verb—like Welsh, Hawaiian and Majang—so
cross-linguistically, starting with a verb is unusual. For what it’s
worth, 41.0% of the world’s languages are actually SOV order. (Aside:
I’ve done some work with Hawaiian and Majang and that’s how I learned
that verbs are a big commitment for me. I’m just not ready for verbs
when I open my mouth.)

The data in WALS is fairly sparse, so we restrict ourselves to the 165
features that have at least 100 languages in them (at this stage we also
knock out languages that have fewer than 10 of these—dropping us down to
1,693 languages).

Now, one problem is that if you just stop there you have a huge amount
of collinearity. Part of this is just the nature of the features listed
in WALS—there’s one for overall subject/object/verb order and then
separate ones for object/verb and subject/verb. Ideally, we’d like to
judge weirdness based on unrelated features. We can focus in on features
that aren’t strongly correlated with each other (between two correlated
features, we pick the one that has more languages coded for it). We end
up with 21 features in total.

For each value that a language has, we calculate the relative frequency
of that value for all the other languages that are coded for it. So if
we had included subject-object-verb order then English would’ve gotten a
value of 0.355 (we actually normalized these values according to the
overal entropy for each feature, so it wasn’t exactly 0.355, but you get
the idea). The Weirdness Index is then an average across the 21 unique
structural features. But because different features have different
numbers of values and we want to reduce skewing, we actually take the
harmonic mean (and because we want bigger numbers = more weird, we
actually subtract the mean from one). In this blog post, I’ll only
report languages that have a value filled in for at least two-thirds of
features (239 languages).
The outlier (weirdest) languages

The language that is most different from the majority of all other
languages in the world is a verb-initial tonal languages spoken by 6,000
people in Oaxaca, Mexico, known as Chalcatongo Mixtec (aka San Miguel el
Grande Mixtec). Number two is spoken in Siberia by 22,000 people: Nenets
(that’s where we get the word parka from). Number three is Choctaw,
spoken by about 10,000 people, mostly in Oklahoma.

But here’s the rub—some of the weirdest languages in the world are ones
you’ve heard of: German, Dutch, Norwegian, Czech, Spanish, and Mandarin.
 And actually English is #33 in the Language Weirdness Index.

The weirdest languages in the 

Re: [silk] The weirdest languages

2013-07-05 Thread gabin kattukaran
On 5 July 2013 17:49, Udhay Shankar N ud...@pobox.com wrote:
 Also, it amuses me that this list says the most 'normal' language is
 Hindi

Abey yaar, isn't that what people north of the Hebbal Flyover have
been trying to say all this time.

-gabin


--

They pay me to think... As long as I keep my mouth shut.



Re: [silk] The weirdest languages

2013-07-05 Thread Srini RamaKrishnan
On Fri, Jul 5, 2013 at 5:49 PM, Udhay Shankar N ud...@pobox.com wrote:
 Much more, including the full spreadsheet with all 21 'weirdness
 features' for all the languages, at the URL below.

 Also, it amuses me that this list says the most 'normal' language is
 Hindi. :-)

It depresses me a little to say this, but market share matters more
than features in the end. The way we are headed in a hundred years or
less we will all speak the same language out of practicality for the
most part.

It won't be the most technically efficient language, but the one
geopolitics elects as the winner. English and Mandarin are the only
two real contestants in this world view, and their present hegemony is
thanks mainly to a violent imperial past, and has nothing to do with
technical brilliance.