There are some good ideas here.

I guess that Nemo figured this out already, but the displayed languages are
based on languages spoken naturally in a country according to CLDR data. CLDR,
in turn, bases its data on information received from UN and possibly national
statistics agencies. It definitely has mistakes - for example Israel doesn't
include Yiddish, and last time I checked, USA didn't have any variant of
Chinese (I might be wrong about this last point).

And yes, it gets data for the whole country and not for a region. Neapolitan
would, of course, be more relevant in Southern Italy, and Catalan would be more
relevant in Sardinia. Similarly, it would be much smarter to show Tatar,
Adyghe, Sakha, etc. only in the relevant regions in Russia. In a country as
massively multilingual as India this problem is even more acute. Unfortunately,
the geo info we have is only per country. I'd love to have it more

Slovenian and Croatian are not completely out of place, because there are
people who speak these languages in Italy.

Adding languages that are widely studied in a country is a valid use case.
Because the number of languages is limited to 16, it may sometimes be
preferable to show a widely studied language than a local language spoken by a
few people (as tragic as it is). We just need to figure out how to collect and
store this data. Maybe getting relevant feedback from the communities is
acceptable as long as we don't have a comprehensive data source like CLDR for

