#10637: Implement sage -sws2rst
------------------------------------------------------------+---------------
       Reporter:  nthiery                                   |         Owner:  
jason, mpatel, was                                             
           Type:  enhancement                               |        Status:  
needs_review                                                   
       Priority:  major                                     |     Milestone:  
sage-5.2                                                       
      Component:  notebook                                  |    Resolution:    
                                                             
       Keywords:  ReST, worksheet                           |   Work issues:    
                                                             
Report Upstream:  Workaround found; Bug reported upstream.  |     Reviewers:  
Nicolas ThiƩry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow
        Authors:  Pablo Angulo, Karl-Dieter Crisman         |     Merged in:    
                                                             
   Dependencies:  #11080, #11459                            |      Stopgaps:    
                                                             
------------------------------------------------------------+---------------

Comment (by kcrisman):

 Okay, I've read through the code, and here is everything I have to say
 without actually trying it out on a variety of worksheets.  Anyone have
 any comments on my patch to the actual script?

 ----

 To me, the main issue is that the code needs to be fairly well-formed.  Is
 worksheet.html really always that well-formed of HTML?  I just don't know.

 The reasons this doesn't concern me too much are
  1. The worst that could happen is that the rst doesn't look good, but
 nothing gets destroyed
  2. Presumably someone wouldn't bother to use this functionality in the
 first place without checking that the worksheet at least looked nice
  3. Presumably !BeautifulSoup makes the html more well-formed

 Places this could go wrong, maybe, with weird input, below.  Keep in mind
 I'm not at all a regex wizard, so that could be part of my questions.  I'd
 appreciate any responses to whether these could be problems; although most
 of them wouldn't be a big issue, I still feel like especially the first
 several really require (for me) explanation.
 * $$ that are intended to be empty LaTeX for later filling in - definitely
 counts on $$ always existing in pairs
 * I'm not sure about the replace_courier thing.  Are you saying that you,
 personally, use Courier font to indicate code in TinyMCE, and this is how
 you replace it with <code></code> tags?  It would be nice for this to be
 customizable; otherwise, what happens to the poor sap who happens to like
 Courier and then finds all their text replaced by code?
 * It looks like in visiting ordered lists, that all sublists will
 automatically becomes numbered.  But couldn't one have an unordered list
 inside an ordered list?
 * in the replace_latex thing, is it conceivable that the re's would match
 something by mistake?  It looks like e.g. latex_beginning is matching
 anything that starts and ends with a dollar sign, as long as it really
 does end with a dollar sign and not \$, where that would include any
 character (possibly zero) before that.  Again, in a well-formed worksheet
 that wouldn't be a problem, but maybe sometimes people would do things
 like "$\$$".
 * Or what if the TinyMCE was (completely) "h$x+y=z$", then wouldn't this
 get replaced by ":math:`x+y=z`" with the h gone?  It seems like you're
 assuming \\1 is a whitespace character?
 * How could the branch of visit_strong with _inside_code_tag be reached?
 The only place this flag seems to be True is in visit_code, but in that
 case one just gets plain text and and wouldn't visit_strong, it seems to
 me
 * What if there was an anchor tag that was NOT an "href"?  There are other
 uses for anchors.
 * There is potential for malformed {{{///}}} or }}} without the others or
 cell id's missing and | missing to cause trouble
 * <p> could be appended to no purpose; I guess this counts heavily on all
 tags being properly paired
 * In results sections, is all html being removed?  Is there a reason for
 this?  I assume that this is to take care of a specific type of result
 that can occur that I can't think of right now.  After all, a Sage cell is
 a Python cell, and html certainly could be legitimate output.
 * Is it conceivable that the rest will do something wrong with the
 escape_chars + and friends when it's not replaced? I assume not since
 they're turned into :math: mode
 * Am I correct in saying that once inside a code tag, all other tags are
 ignored?  That appears to be what is going on here.
 * In the table, I suppose it's at least possible there could be a tfoot
 element.  Does this not come out of TinyMCE?
 * What other preformatted text <pre> do we expect?  Should they all be
 ::'ed?
 * I know this might not happen a lot, but in theory a cell could do sage:
 sage: sage: and it should all be removed... not that this is a very big
 deal.
 * I do like that most of the traceback is removed - am I right in that?
 Why is there not a continue traceback thing?

 Well, that's a lot of dumb little questions.  On the ''very'' positive
 plus side, it looks like this is at least as good, if not much better,
 than other html2rst things on the web - one even advertises "no support
 for tables, don't even try".  I like the infrastructure for extending this
 should there be call for more to go, and the use of ints but via the
 names. Great work!

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/10637#comment:59>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to