Re: [sympy] Re: tests for parsing Latex input to Sympy

Francesco Bonazzi Wed, 27 May 2020 01:47:10 -0700

Parsing a LaTeX expression should ideally return candidate SymPy 
expressions with a matching probability. In case of unambiguous matching, 
only one expression should have a high matching probability. In case of 
ambiguous matching, two or more SymPy expressions should have high 
probability.


Topic also matters. If you have a physics paper, you'd probably want it to 
match some particular kind of expression subsets.

On Tuesday, 26 May 2020 13:33:14 UTC+2, Ben wrote:
>
>
>
> On Tuesday, May 26, 2020 at 7:23:42 AM UTC-4, David Bailey wrote:
>>
>> On 25/05/2020 23:42, Ben wrote:
>>
>>  You're totally correct -- Latex is ambiguous. I don't find your 
>>> observation discouraging since it is perfectly reasonable. 
>>>
>>
>> The issue I'm interested in tackling is the conversion of math presented 
>> in Physics papers (e.g., .tex files on arxiv.org) to a semantically 
>> meaningful and unambiguous representation (e.g., Sympy). 
>>
>> This issue would be moot if Physics papers were written in Sympy.  I 
>> don't have insight on how to construct incentives that would lead to use of 
>> Sympy in Physics papers, so I'm working on the Latex-to-Sympy approach. 
>>
>> Right - well in that case, maybe a system of hints that the user could 
>> add to your parser, would be really useful. For example if a user could 
>> tell your parser that superscripts were usually tensor subscripts rather 
>> than exponents (or alternatively that certain symbols used as superscripts 
>> would never mean exponents) you could come out with a better translation. 
>> Another useful hint, might be a list of the multi-letter symbols in use - 
>> sin, cos, exp, ln etc. so that you could resolve your ambiguity of what ab 
>> means - I mean sometimes sin(x) might mean s*i*n(x) and that could be 
>> handled by user specifying that only certain  multi-letter symbols were in 
>> use.
>>
>> David
>>
>>
>>
> Yeah, in talking this over with a collaborator about this, we think there 
> are various sources to help with parsing. 
>
>    - within the math latex string to parse, what can be deduced about the 
>    expected context?
>    - given other math expressions in the same paper, what would be 
>    consistent?
>    - given the text in a paper surrounding the math expressions, what 
>    would be expected based on keywords?
>    - given other papers in the same domain or based on citations, what 
>    would be likely?
>    - what is statistically likely give the corpus of all articles?
>
> This is, in some sense, the same process a human goes through to decode 
> the intended meaning of any given math expression in a scientific paper. We 
> are looking to encode that process as a Python program. (That's beyond the 
> scope of Sympy but is context for the issue.)
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/aa518861-6ca7-4edb-be2e-e05c4f1fdf7d%40googlegroups.com.

Re: [sympy] Re: tests for parsing Latex input to Sympy

Reply via email to