Hi Aaron,
Sorry, I am using sympy.parsing.sympy_parser.parse_expr() after replacing &
and | and doing a few more string modifications. That's what I described as
(1). That generally works, but crashes on very very large expressions.
I can attach the string that crashes it if you're interested.
The function that parses using parse_expr() is
def parse_gpr_sympy(str_expr):
"""parse gpr into SYMPY
Parameters
----------
str_expr : string
string with the gene reaction rule to parse
Returns
-------
tuple
elements SYMPY expression and gene_ids as a set
"""
str_expr = str_expr.strip()
if len(str_expr) == 0:
return None, set()
for char, escaped in replacements:
if char in str_expr:
str_expr = str_expr.replace(char, escaped)
escaped_str = keyword_re.sub("__cobra_escape__", str_expr)
escaped_str = number_start_re.sub("__cobra_escape__", escaped_str)
try:
escaped_str = re.sub(r"\b(or)", " |", escaped_str)
escaped_str = re.sub(r"\b(and)", " &", escaped_str)
sympy_exp = parse_expr(escaped_str,
transformations=standard_transformations,
evaluate=False)
except SympifyError as exc:
raise ValueError("unsupported operation " + repr(escaped_str), exc)
gene_set = set()
for node in list(sympy_exp.atoms()):
if node.name.startswith("__cobra_escape__"):
node.name = node.name[16:]
for char, escaped in replacements:
if escaped in node.name:
node.name = node.name.replace(escaped, char)
gene_set.add(node.name)
return sympy_exp, gene_set
(2) is using AST, and I'll see about converting it to a visitor.
Lark and Antir seem overkill, so I'll see about converting to a visitor
first.
Uri David
On Wednesday, October 28, 2020 at 6:00:17 PM UTC-4 [email protected] wrote:
> Hi Uri David.
>
> What do you mean by sympy_parse? Do you mean the functions in
> sympy.parsing, or something else? I think it should be possible to
> parse strings like that directly into SymPy expressions, though I'm
> not sure if anything built-in to SymPy can do it. Whatever you are
> using that is crashing on large inputs is probably implemented in an
> inefficient way, and can be improved. The types of strings you are
> dealing with seem relatively simple, but even so, you may benefit from
> using a real parser with a grammar file using something like Lark or
> Antlr.
>
> I don't think we have a direct ast to sympy converter. There is some
> ast related code in sympy.parsing, but it deals with evaluate=False,
> so it isn't quite what you are looking for. The one you have looks
> fine, although it may be cleaner to implement it using the ast visitor
> pattern (see https://docs.python.org/3/library/ast.html#ast.NodeVisitor).
>
> Another idea: if A1, B1, and so on are what you want to use as your
> symbol names, you can replace "AND" with "&" and "OR" with "|" and the
> string will parse directly as a SymPy expression (using parse_expr()
> or sympify()).
>
> Aaron Meurer
>
> On Wed, Oct 28, 2020 at 3:49 PM Uri David Akavia
> <[email protected]> wrote:
> >
> > Hi
> > I'm a computational biologist in Montreal, and I've been working on
> genomics and metaoblomics at McGill University.
> > I've been working on Cobrapy, which uses AST to parse and deal with Gene
> Rules (GPR). This email is going to be a bit long to explain context
> >
> > Each reaction (chemical transformation) has GPR that uses boolean AND/OR
> to describe which genes are necessary for it to happen. The idea is that
> genes where either is necessary have an OR relationship, and genes where
> you need multiple (like a protein complex) have AND. In files, these
> relationships are organized as a string.
> >
> > I've been shifting the code from AST to sympy, so we can do more logical
> comparisons and extend the capability.
> >
> > It seems that there are two options:
> > 1) Going from string directly to sympy using sympy_parse
> > This works generally well, but there are some GPRs that are too long,
> since the convention is to store the GPRs in files in maximal canonical
> form.
> > It means if you have a complex that needs three units, and each unit has
> two potential genes, the GPR can be logically
> > (A1 OR A2) AND (B1 OR B2) AND (C1 OR C2)
> > But in the files it will be as
> > (A1 AND B1 AND C1) OR (A2 AND B1 AND C1) OR (A1 AND B2 AND C1), etc.
> >
> > When you have 40 proteins, each having two genes or more, you get to
> ridiculous numbers (11164) that crash python.
> >
> > 2) String ---> AST --> sympy
> > AST can deal with these very very large GPRs, so one way is to parse a
> GPR extension and then run this function
> >
> > def ast2sympy(expr, level=0, names=None):
> > """convert compiled ast to gene_reaction_rule sympy expression
> >
> > Parameters
> > ----------
> > expr : str
> > string for a gene reaction rule, e.g "a and b"
> > level : int
> > internal use only
> > names : dict
> > Dict where each element id a gene identifier and the value is the
> > gene name. Use this to get a rule str which uses names instead. This
> > should be done for display purposes only. All gene_reaction_rule
> > strings which are computed with should use the id.
> >
> > Returns
> > ------
> > string
> > The gene reaction rule
> > """
> > if isinstance(expr, Expression):
> > return ast2sympy(expr.body, 0, names) if hasattr(expr, "body") else ""
> > elif isinstance(expr, Name):
> > return sympy.symbols(names.get(expr.id, expr.id)) if names else
> sympy.symbols(expr.id)
> > elif isinstance(expr, BoolOp):
> > op = expr.op
> > if isinstance(op, Or):
> > sympy_exp = sp_boolalg.Or(*[ast2sympy(i, level + 1, names) for i in
> expr.values])
> > elif isinstance(op, And):
> > sympy_exp = sp_boolalg.And(*[ast2sympy(i, level + 1, names) for i in
> expr.values])
> > else:
> > raise TypeError("unsupported operation " + op.__class__.__name)
> > return sympy_exp
> > elif expr is None:
> > return ""
> >
> > The rest of my code can be seen in
> https://github.com/akaviaLab/cobrapy/blob/gpr-sympy/src/cobra/core/gene.py
> (and has sympy in the function name)
> >
> > My questions
> > 1) Is there another (better) way to do this? Is there some AST --> sympy
> function directly?
> > 2) Which way do you think is better?
> > I'm leaning towards parse_sympy(), but telling the cobrapy functions to
> go string ---> AST --> sympy if the string seems to have a large number of
> entities.
> >
> > Thank you,
> >
> > Uri David
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "sympy" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected].
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/2dfb954b-7370-4fc5-86cb-7599a4b6a3f2n%40googlegroups.com
> .
>
--
You received this message because you are subscribed to the Google Groups
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/sympy/18abee55-5c8b-4adc-8387-d4959813ff5bn%40googlegroups.com.