On 2020-02-23 16:32, Guido van Rossum wrote:
> Assuming that the reader is familiar with the example `Lottery ~
> Literacy + Wealth + Region` is *not* going to work. I have literally no
> idea from what field that is taken or what the purpose of the example
> is. Please don't expect that I can just Google it: I did, found
> https://www.statsmodels.org/stable/example_formulas.html, and I still
> have no idea what it's about.
Sorry, perhaps I should have given a bit more explanation. As I said,
"~" means "depends on". So in R, you do something like:
model = some_statistical_model_function(Lottery ~ Literacy + Wealth +
Region, some_data_table)
This means "make a model that predicts the value of Lottery based on
the values of Literacy, Wealth and Region", where the names Lottery,
Literacy etc. refer to columns in some_data_table, which is a tabular
data structure akin to a pandas DataFrame. So, again, `Lottery ~
Literacy + Wealth + Region` means "Lottery depends on Literacy, Wealth,
and Region". It doesn't really matter what names we use, we can use "A
~ B + C" just as well; the point is it is defining a relationship
between variables whose measurements we have as columns in a tabular
structure, and it means that we want a model where the variables on the
right of the tilde are the independent variables and the one on the left
is the dependent variable. "Y ~ X" means "predict Y using X".
As you mentioned (in a part of your response I snipped) the precedence
of the operator is important. In this case we would want the operator
to have very low precedence, because we want it to mean `Lottery ~
(Literacy + Wealth + Region)` --- that is, that the independent variable
may depend on some complicated expression involving combinations of the
dependent variables.
It's also worth noting that the tilde here isn't notation for any of
the work that the statistical model does. It's just a way of writing a
"formula" that relates the independent and dependent variables, but you
still have to pass that formula to some function that actually runs the
model.
All that said, given that we can already achieve the desired precedence
with parentheses, I'll reiterate that I don't think the tilde is a real
blocker to doing this kind of model specification with Python
expressions, so I don't think I'm in favor of this proposal as it is.
--
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is no
path, and leave a trail."
--author unknown
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/J7FACUWFC6YZQSTWR2USUJBENXD5VQWM/
Code of Conduct: http://python.org/psf/codeofconduct/