Hi Aaron,

Sorry, I am using sympy.parsing.sympy_parser.parse_expr() after replacing & 
and | and doing a few more string modifications. That's what I described as 
(1). That generally works, but crashes on very very large expressions.
I can attach the string that crashes it if you're interested.
The function that parses using parse_expr() is
def parse_gpr_sympy(str_expr):
    """parse gpr into SYMPY

    Parameters
    ----------
    str_expr : string
        string with the gene reaction rule to parse

    Returns
    -------
    tuple
        elements SYMPY expression and gene_ids as a set
    """
    str_expr = str_expr.strip()
    if len(str_expr) == 0:
        return None, set()
    for char, escaped in replacements:
        if char in str_expr:
            str_expr = str_expr.replace(char, escaped)
    escaped_str = keyword_re.sub("__cobra_escape__", str_expr)
    escaped_str = number_start_re.sub("__cobra_escape__", escaped_str)
    try:
        escaped_str = re.sub(r"\b(or)", " |", escaped_str)
        escaped_str = re.sub(r"\b(and)", " &", escaped_str)
        sympy_exp = parse_expr(escaped_str, 
transformations=standard_transformations,
                                evaluate=False)
    except SympifyError as exc:
        raise ValueError("unsupported operation  " + repr(escaped_str), exc)
    gene_set = set()
    for node in list(sympy_exp.atoms()):
        if node.name.startswith("__cobra_escape__"):
            node.name = node.name[16:]
        for char, escaped in replacements:
            if escaped in node.name:
                node.name = node.name.replace(escaped, char)
        gene_set.add(node.name)

    return sympy_exp, gene_set

(2) is using AST, and I'll see about converting it to a visitor.

Lark and Antir seem overkill, so I'll see about converting to a visitor 
first.

Uri David

On Wednesday, October 28, 2020 at 6:00:17 PM UTC-4 [email protected] wrote:

> Hi Uri David.
>
> What do you mean by sympy_parse? Do you mean the functions in
> sympy.parsing, or something else? I think it should be possible to
> parse strings like that directly into SymPy expressions, though I'm
> not sure if anything built-in to SymPy can do it. Whatever you are
> using that is crashing on large inputs is probably implemented in an
> inefficient way, and can be improved. The types of strings you are
> dealing with seem relatively simple, but even so, you may benefit from
> using a real parser with a grammar file using something like Lark or
> Antlr.
>
> I don't think we have a direct ast to sympy converter. There is some
> ast related code in sympy.parsing, but it deals with evaluate=False,
> so it isn't quite what you are looking for. The one you have looks
> fine, although it may be cleaner to implement it using the ast visitor
> pattern (see https://docs.python.org/3/library/ast.html#ast.NodeVisitor).
>
> Another idea: if A1, B1, and so on are what you want to use as your
> symbol names, you can replace "AND" with "&" and "OR" with "|" and the
> string will parse directly as a SymPy expression (using parse_expr()
> or sympify()).
>
> Aaron Meurer
>
> On Wed, Oct 28, 2020 at 3:49 PM Uri David Akavia
> <[email protected]> wrote:
> >
> > Hi
> > I'm a computational biologist in Montreal, and I've been working on 
> genomics and metaoblomics at McGill University.
> > I've been working on Cobrapy, which uses AST to parse and deal with Gene 
> Rules (GPR). This email is going to be a bit long to explain context
> >
> > Each reaction (chemical transformation) has GPR that uses boolean AND/OR 
> to describe which genes are necessary for it to happen. The idea is that 
> genes where either is necessary have an OR relationship, and genes where 
> you need multiple (like a protein complex) have AND. In files, these 
> relationships are organized as a string.
> >
> > I've been shifting the code from AST to sympy, so we can do more logical 
> comparisons and extend the capability.
> >
> > It seems that there are two options:
> > 1) Going from string directly to sympy using sympy_parse
> > This works generally well, but there are some GPRs that are too long, 
> since the convention is to store the GPRs in files in maximal canonical 
> form.
> > It means if you have a complex that needs three units, and each unit has 
> two potential genes, the GPR can be logically
> > (A1 OR A2) AND (B1 OR B2) AND (C1 OR C2)
> > But in the files it will be as
> > (A1 AND B1 AND C1) OR (A2 AND B1 AND C1) OR (A1 AND B2 AND C1), etc.
> >
> > When you have 40 proteins, each having two genes or more, you get to 
> ridiculous numbers (11164) that crash python.
> >
> > 2) String ---> AST --> sympy
> > AST can deal with these very very large GPRs, so one way is to parse a 
> GPR extension and then run this function
> >
> > def ast2sympy(expr, level=0, names=None):
> > """convert compiled ast to gene_reaction_rule sympy expression
> >
> > Parameters
> > ----------
> > expr : str
> > string for a gene reaction rule, e.g "a and b"
> > level : int
> > internal use only
> > names : dict
> > Dict where each element id a gene identifier and the value is the
> > gene name. Use this to get a rule str which uses names instead. This
> > should be done for display purposes only. All gene_reaction_rule
> > strings which are computed with should use the id.
> >
> > Returns
> > ------
> > string
> > The gene reaction rule
> > """
> > if isinstance(expr, Expression):
> > return ast2sympy(expr.body, 0, names) if hasattr(expr, "body") else ""
> > elif isinstance(expr, Name):
> > return sympy.symbols(names.get(expr.id, expr.id)) if names else 
> sympy.symbols(expr.id)
> > elif isinstance(expr, BoolOp):
> > op = expr.op
> > if isinstance(op, Or):
> > sympy_exp = sp_boolalg.Or(*[ast2sympy(i, level + 1, names) for i in 
> expr.values])
> > elif isinstance(op, And):
> > sympy_exp = sp_boolalg.And(*[ast2sympy(i, level + 1, names) for i in 
> expr.values])
> > else:
> > raise TypeError("unsupported operation " + op.__class__.__name)
> > return sympy_exp
> > elif expr is None:
> > return ""
> >
> > The rest of my code can be seen in 
> https://github.com/akaviaLab/cobrapy/blob/gpr-sympy/src/cobra/core/gene.py 
> (and has sympy in the function name)
> >
> > My questions
> > 1) Is there another (better) way to do this? Is there some AST --> sympy 
> function directly?
> > 2) Which way do you think is better?
> > I'm leaning towards parse_sympy(), but telling the cobrapy functions to 
> go string ---> AST --> sympy if the string seems to have a large number of 
> entities.
> >
> > Thank you,
> >
> > Uri David
> >
> > --
> > You received this message because you are subscribed to the Google 
> Groups "sympy" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected].
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sympy/2dfb954b-7370-4fc5-86cb-7599a4b6a3f2n%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/18abee55-5c8b-4adc-8387-d4959813ff5bn%40googlegroups.com.

Reply via email to