Hi
I'm a computational biologist in Montreal, and I've been working on 
genomics and metaoblomics at McGill University.
I've been working on Cobrapy, which uses AST to parse and deal with Gene 
Rules (GPR). This email is going to be a bit long to explain context

Each reaction (chemical transformation) has GPR that uses boolean AND/OR to 
describe which genes are necessary for it to happen. The idea is that genes 
where either is necessary have an OR relationship, and genes where you need 
multiple (like a protein complex) have AND. In files, these relationships 
are organized as a string.

I've been shifting the code from AST to sympy, so we can do more logical 
comparisons and extend the capability.

It seems that there are two options:
1) Going  from string directly to sympy using sympy_parse
This works generally well, but there are some GPRs that are too long, since 
the convention is to store the GPRs in files in maximal canonical form.
It means if you have a complex that needs three units, and each unit has 
two potential genes, the GPR can be logically
(A1 OR A2) AND (B1 OR B2) AND (C1 OR C2)
But in the files it will be as
(A1 AND B1 AND C1) OR (A2 AND B1 AND C1) OR (A1 AND B2 AND C1), etc.

When you have 40 proteins, each having two genes or more, you get to 
ridiculous numbers (11164) that crash python.

2) String ---> AST --> sympy
AST can deal with these very very large GPRs, so one way is to parse a GPR 
extension and then run this function

def ast2sympy(expr, level=0, names=None):
    """convert compiled ast to gene_reaction_rule sympy expression

    Parameters
    ----------
    expr : str
        string for a gene reaction rule, e.g "a and b"
    level : int
        internal use only
    names : dict
        Dict where each element id a gene identifier and the value is the
        gene name. Use this to get a rule str which uses names instead. This
        should be done for display purposes only. All gene_reaction_rule
        strings which are computed with should use the id.

    Returns
    ------
    string
        The gene reaction rule
    """
    if isinstance(expr, Expression):
        return ast2sympy(expr.body, 0, names) if hasattr(expr, "body") else 
""
    elif isinstance(expr, Name):
        return sympy.symbols(names.get(expr.id, expr.id)) if names else 
sympy.symbols(expr.id)
    elif isinstance(expr, BoolOp):
        op = expr.op
        if isinstance(op, Or):
            sympy_exp = sp_boolalg.Or(*[ast2sympy(i, level + 1, names) for 
i in expr.values])
        elif isinstance(op, And):
            sympy_exp = sp_boolalg.And(*[ast2sympy(i, level + 1, names) for 
i in expr.values])
        else:
            raise TypeError("unsupported operation " + op.__class__.__name)
        return sympy_exp
    elif expr is None:
        return ""

The rest of my code can be seen 
in https://github.com/akaviaLab/cobrapy/blob/gpr-sympy/src/cobra/core/gene.py 
(and has sympy in the function name)

My questions
1) Is there another (better) way to do this? Is there some AST --> sympy 
function directly?
2) Which way do you think is better?
I'm leaning towards parse_sympy(), but telling the cobrapy functions to go 
string ---> AST --> sympy if the string seems to have a large number of 
entities.

Thank you,

Uri David

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/2dfb954b-7370-4fc5-86cb-7599a4b6a3f2n%40googlegroups.com.

Reply via email to