Hi
I'm a computational biologist in Montreal, and I've been working on
genomics and metaoblomics at McGill University.
I've been working on Cobrapy, which uses AST to parse and deal with Gene
Rules (GPR). This email is going to be a bit long to explain context
Each reaction (chemical transformation) has GPR that uses boolean AND/OR to
describe which genes are necessary for it to happen. The idea is that genes
where either is necessary have an OR relationship, and genes where you need
multiple (like a protein complex) have AND. In files, these relationships
are organized as a string.
I've been shifting the code from AST to sympy, so we can do more logical
comparisons and extend the capability.
It seems that there are two options:
1) Going from string directly to sympy using sympy_parse
This works generally well, but there are some GPRs that are too long, since
the convention is to store the GPRs in files in maximal canonical form.
It means if you have a complex that needs three units, and each unit has
two potential genes, the GPR can be logically
(A1 OR A2) AND (B1 OR B2) AND (C1 OR C2)
But in the files it will be as
(A1 AND B1 AND C1) OR (A2 AND B1 AND C1) OR (A1 AND B2 AND C1), etc.
When you have 40 proteins, each having two genes or more, you get to
ridiculous numbers (11164) that crash python.
2) String ---> AST --> sympy
AST can deal with these very very large GPRs, so one way is to parse a GPR
extension and then run this function
def ast2sympy(expr, level=0, names=None):
"""convert compiled ast to gene_reaction_rule sympy expression
Parameters
----------
expr : str
string for a gene reaction rule, e.g "a and b"
level : int
internal use only
names : dict
Dict where each element id a gene identifier and the value is the
gene name. Use this to get a rule str which uses names instead. This
should be done for display purposes only. All gene_reaction_rule
strings which are computed with should use the id.
Returns
------
string
The gene reaction rule
"""
if isinstance(expr, Expression):
return ast2sympy(expr.body, 0, names) if hasattr(expr, "body") else
""
elif isinstance(expr, Name):
return sympy.symbols(names.get(expr.id, expr.id)) if names else
sympy.symbols(expr.id)
elif isinstance(expr, BoolOp):
op = expr.op
if isinstance(op, Or):
sympy_exp = sp_boolalg.Or(*[ast2sympy(i, level + 1, names) for
i in expr.values])
elif isinstance(op, And):
sympy_exp = sp_boolalg.And(*[ast2sympy(i, level + 1, names) for
i in expr.values])
else:
raise TypeError("unsupported operation " + op.__class__.__name)
return sympy_exp
elif expr is None:
return ""
The rest of my code can be seen
in https://github.com/akaviaLab/cobrapy/blob/gpr-sympy/src/cobra/core/gene.py
(and has sympy in the function name)
My questions
1) Is there another (better) way to do this? Is there some AST --> sympy
function directly?
2) Which way do you think is better?
I'm leaning towards parse_sympy(), but telling the cobrapy functions to go
string ---> AST --> sympy if the string seems to have a large number of
entities.
Thank you,
Uri David
--
You received this message because you are subscribed to the Google Groups
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/sympy/2dfb954b-7370-4fc5-86cb-7599a4b6a3f2n%40googlegroups.com.