On Tuesday, May 26, 2020 at 7:23:42 AM UTC-4, David Bailey wrote: > > On 25/05/2020 23:42, Ben wrote: > > You're totally correct -- Latex is ambiguous. I don't find your >> observation discouraging since it is perfectly reasonable. >> > > The issue I'm interested in tackling is the conversion of math presented > in Physics papers (e.g., .tex files on arxiv.org) to a semantically > meaningful and unambiguous representation (e.g., Sympy). > > This issue would be moot if Physics papers were written in Sympy. I don't > have insight on how to construct incentives that would lead to use of Sympy > in Physics papers, so I'm working on the Latex-to-Sympy approach. > > Right - well in that case, maybe a system of hints that the user could add > to your parser, would be really useful. For example if a user could tell > your parser that superscripts were usually tensor subscripts rather than > exponents (or alternatively that certain symbols used as superscripts would > never mean exponents) you could come out with a better translation. Another > useful hint, might be a list of the multi-letter symbols in use - sin, cos, > exp, ln etc. so that you could resolve your ambiguity of what ab means - I > mean sometimes sin(x) might mean s*i*n(x) and that could be handled by user > specifying that only certain multi-letter symbols were in use. > > David > > > Yeah, in talking this over with a collaborator about this, we think there are various sources to help with parsing.
- within the math latex string to parse, what can be deduced about the expected context? - given other math expressions in the same paper, what would be consistent? - given the text in a paper surrounding the math expressions, what would be expected based on keywords? - given other papers in the same domain or based on citations, what would be likely? - what is statistically likely give the corpus of all articles? This is, in some sense, the same process a human goes through to decode the intended meaning of any given math expression in a scientific paper. We are looking to encode that process as a Python program. (That's beyond the scope of Sympy but is context for the issue.) -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/c66e9f08-34ca-42f9-89c5-0ae5492c0686%40googlegroups.com.
