I see. How exactly does it crash? Do you get a traceback? Using & and | would not be efficient, because it has to create each subexpression pairwise, but it should still be able to work without crashing, so this may be a bug in SymPy. It's also possible that the crash is something you would see regardless of how you create the expression, so it's worth looking into.
Aaron Meurer On Thu, Oct 29, 2020 at 4:51 AM Uri David Akavia <[email protected]> wrote: > > Hi Aaron, > > Sorry, I am using sympy.parsing.sympy_parser.parse_expr() after replacing & > and | and doing a few more string modifications. That's what I described as > (1). That generally works, but crashes on very very large expressions. > I can attach the string that crashes it if you're interested. > The function that parses using parse_expr() is > def parse_gpr_sympy(str_expr): > """parse gpr into SYMPY > > Parameters > ---------- > str_expr : string > string with the gene reaction rule to parse > > Returns > ------- > tuple > elements SYMPY expression and gene_ids as a set > """ > str_expr = str_expr.strip() > if len(str_expr) == 0: > return None, set() > for char, escaped in replacements: > if char in str_expr: > str_expr = str_expr.replace(char, escaped) > escaped_str = keyword_re.sub("__cobra_escape__", str_expr) > escaped_str = number_start_re.sub("__cobra_escape__", escaped_str) > try: > escaped_str = re.sub(r"\b(or)", " |", escaped_str) > escaped_str = re.sub(r"\b(and)", " &", escaped_str) > sympy_exp = parse_expr(escaped_str, > transformations=standard_transformations, > evaluate=False) > except SympifyError as exc: > raise ValueError("unsupported operation " + repr(escaped_str), exc) > gene_set = set() > for node in list(sympy_exp.atoms()): > if node.name.startswith("__cobra_escape__"): > node.name = node.name[16:] > for char, escaped in replacements: > if escaped in node.name: > node.name = node.name.replace(escaped, char) > gene_set.add(node.name) > > return sympy_exp, gene_set > > (2) is using AST, and I'll see about converting it to a visitor. > > Lark and Antir seem overkill, so I'll see about converting to a visitor first. > > Uri David > > On Wednesday, October 28, 2020 at 6:00:17 PM UTC-4 [email protected] wrote: >> >> Hi Uri David. >> >> What do you mean by sympy_parse? Do you mean the functions in >> sympy.parsing, or something else? I think it should be possible to >> parse strings like that directly into SymPy expressions, though I'm >> not sure if anything built-in to SymPy can do it. Whatever you are >> using that is crashing on large inputs is probably implemented in an >> inefficient way, and can be improved. The types of strings you are >> dealing with seem relatively simple, but even so, you may benefit from >> using a real parser with a grammar file using something like Lark or >> Antlr. >> >> I don't think we have a direct ast to sympy converter. There is some >> ast related code in sympy.parsing, but it deals with evaluate=False, >> so it isn't quite what you are looking for. The one you have looks >> fine, although it may be cleaner to implement it using the ast visitor >> pattern (see https://docs.python.org/3/library/ast.html#ast.NodeVisitor). >> >> Another idea: if A1, B1, and so on are what you want to use as your >> symbol names, you can replace "AND" with "&" and "OR" with "|" and the >> string will parse directly as a SymPy expression (using parse_expr() >> or sympify()). >> >> Aaron Meurer >> >> On Wed, Oct 28, 2020 at 3:49 PM Uri David Akavia >> <[email protected]> wrote: >> > >> > Hi >> > I'm a computational biologist in Montreal, and I've been working on >> > genomics and metaoblomics at McGill University. >> > I've been working on Cobrapy, which uses AST to parse and deal with Gene >> > Rules (GPR). This email is going to be a bit long to explain context >> > >> > Each reaction (chemical transformation) has GPR that uses boolean AND/OR >> > to describe which genes are necessary for it to happen. The idea is that >> > genes where either is necessary have an OR relationship, and genes where >> > you need multiple (like a protein complex) have AND. In files, these >> > relationships are organized as a string. >> > >> > I've been shifting the code from AST to sympy, so we can do more logical >> > comparisons and extend the capability. >> > >> > It seems that there are two options: >> > 1) Going from string directly to sympy using sympy_parse >> > This works generally well, but there are some GPRs that are too long, >> > since the convention is to store the GPRs in files in maximal canonical >> > form. >> > It means if you have a complex that needs three units, and each unit has >> > two potential genes, the GPR can be logically >> > (A1 OR A2) AND (B1 OR B2) AND (C1 OR C2) >> > But in the files it will be as >> > (A1 AND B1 AND C1) OR (A2 AND B1 AND C1) OR (A1 AND B2 AND C1), etc. >> > >> > When you have 40 proteins, each having two genes or more, you get to >> > ridiculous numbers (11164) that crash python. >> > >> > 2) String ---> AST --> sympy >> > AST can deal with these very very large GPRs, so one way is to parse a GPR >> > extension and then run this function >> > >> > def ast2sympy(expr, level=0, names=None): >> > """convert compiled ast to gene_reaction_rule sympy expression >> > >> > Parameters >> > ---------- >> > expr : str >> > string for a gene reaction rule, e.g "a and b" >> > level : int >> > internal use only >> > names : dict >> > Dict where each element id a gene identifier and the value is the >> > gene name. Use this to get a rule str which uses names instead. This >> > should be done for display purposes only. All gene_reaction_rule >> > strings which are computed with should use the id. >> > >> > Returns >> > ------ >> > string >> > The gene reaction rule >> > """ >> > if isinstance(expr, Expression): >> > return ast2sympy(expr.body, 0, names) if hasattr(expr, "body") else "" >> > elif isinstance(expr, Name): >> > return sympy.symbols(names.get(expr.id, expr.id)) if names else >> > sympy.symbols(expr.id) >> > elif isinstance(expr, BoolOp): >> > op = expr.op >> > if isinstance(op, Or): >> > sympy_exp = sp_boolalg.Or(*[ast2sympy(i, level + 1, names) for i in >> > expr.values]) >> > elif isinstance(op, And): >> > sympy_exp = sp_boolalg.And(*[ast2sympy(i, level + 1, names) for i in >> > expr.values]) >> > else: >> > raise TypeError("unsupported operation " + op.__class__.__name) >> > return sympy_exp >> > elif expr is None: >> > return "" >> > >> > The rest of my code can be seen in >> > https://github.com/akaviaLab/cobrapy/blob/gpr-sympy/src/cobra/core/gene.py >> > (and has sympy in the function name) >> > >> > My questions >> > 1) Is there another (better) way to do this? Is there some AST --> sympy >> > function directly? >> > 2) Which way do you think is better? >> > I'm leaning towards parse_sympy(), but telling the cobrapy functions to go >> > string ---> AST --> sympy if the string seems to have a large number of >> > entities. >> > >> > Thank you, >> > >> > Uri David >> > >> > -- >> > You received this message because you are subscribed to the Google Groups >> > "sympy" group. >> > To unsubscribe from this group and stop receiving emails from it, send an >> > email to [email protected]. >> > To view this discussion on the web visit >> > https://groups.google.com/d/msgid/sympy/2dfb954b-7370-4fc5-86cb-7599a4b6a3f2n%40googlegroups.com. > > -- > You received this message because you are subscribed to the Google Groups > "sympy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sympy/18abee55-5c8b-4adc-8387-d4959813ff5bn%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6LxP8miSZAJR7QmbvHzo6RBXvNVk0WRuut620ManaLGqA%40mail.gmail.com.
