Hi Uri David. What do you mean by sympy_parse? Do you mean the functions in sympy.parsing, or something else? I think it should be possible to parse strings like that directly into SymPy expressions, though I'm not sure if anything built-in to SymPy can do it. Whatever you are using that is crashing on large inputs is probably implemented in an inefficient way, and can be improved. The types of strings you are dealing with seem relatively simple, but even so, you may benefit from using a real parser with a grammar file using something like Lark or Antlr.
I don't think we have a direct ast to sympy converter. There is some ast related code in sympy.parsing, but it deals with evaluate=False, so it isn't quite what you are looking for. The one you have looks fine, although it may be cleaner to implement it using the ast visitor pattern (see https://docs.python.org/3/library/ast.html#ast.NodeVisitor). Another idea: if A1, B1, and so on are what you want to use as your symbol names, you can replace "AND" with "&" and "OR" with "|" and the string will parse directly as a SymPy expression (using parse_expr() or sympify()). Aaron Meurer On Wed, Oct 28, 2020 at 3:49 PM Uri David Akavia <[email protected]> wrote: > > Hi > I'm a computational biologist in Montreal, and I've been working on genomics > and metaoblomics at McGill University. > I've been working on Cobrapy, which uses AST to parse and deal with Gene > Rules (GPR). This email is going to be a bit long to explain context > > Each reaction (chemical transformation) has GPR that uses boolean AND/OR to > describe which genes are necessary for it to happen. The idea is that genes > where either is necessary have an OR relationship, and genes where you need > multiple (like a protein complex) have AND. In files, these relationships are > organized as a string. > > I've been shifting the code from AST to sympy, so we can do more logical > comparisons and extend the capability. > > It seems that there are two options: > 1) Going from string directly to sympy using sympy_parse > This works generally well, but there are some GPRs that are too long, since > the convention is to store the GPRs in files in maximal canonical form. > It means if you have a complex that needs three units, and each unit has two > potential genes, the GPR can be logically > (A1 OR A2) AND (B1 OR B2) AND (C1 OR C2) > But in the files it will be as > (A1 AND B1 AND C1) OR (A2 AND B1 AND C1) OR (A1 AND B2 AND C1), etc. > > When you have 40 proteins, each having two genes or more, you get to > ridiculous numbers (11164) that crash python. > > 2) String ---> AST --> sympy > AST can deal with these very very large GPRs, so one way is to parse a GPR > extension and then run this function > > def ast2sympy(expr, level=0, names=None): > """convert compiled ast to gene_reaction_rule sympy expression > > Parameters > ---------- > expr : str > string for a gene reaction rule, e.g "a and b" > level : int > internal use only > names : dict > Dict where each element id a gene identifier and the value is the > gene name. Use this to get a rule str which uses names instead. This > should be done for display purposes only. All gene_reaction_rule > strings which are computed with should use the id. > > Returns > ------ > string > The gene reaction rule > """ > if isinstance(expr, Expression): > return ast2sympy(expr.body, 0, names) if hasattr(expr, "body") else "" > elif isinstance(expr, Name): > return sympy.symbols(names.get(expr.id, expr.id)) if names else > sympy.symbols(expr.id) > elif isinstance(expr, BoolOp): > op = expr.op > if isinstance(op, Or): > sympy_exp = sp_boolalg.Or(*[ast2sympy(i, level + 1, names) for i > in expr.values]) > elif isinstance(op, And): > sympy_exp = sp_boolalg.And(*[ast2sympy(i, level + 1, names) for i > in expr.values]) > else: > raise TypeError("unsupported operation " + op.__class__.__name) > return sympy_exp > elif expr is None: > return "" > > The rest of my code can be seen in > https://github.com/akaviaLab/cobrapy/blob/gpr-sympy/src/cobra/core/gene.py > (and has sympy in the function name) > > My questions > 1) Is there another (better) way to do this? Is there some AST --> sympy > function directly? > 2) Which way do you think is better? > I'm leaning towards parse_sympy(), but telling the cobrapy functions to go > string ---> AST --> sympy if the string seems to have a large number of > entities. > > Thank you, > > Uri David > > -- > You received this message because you are subscribed to the Google Groups > "sympy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sympy/2dfb954b-7370-4fc5-86cb-7599a4b6a3f2n%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6%2BSp96o0A--Ep_mpYkzqsj1%3DS6-z9ud_O7LtaUhFC-Zaw%40mail.gmail.com.
