Re: [NTG-context] Ruby 1.9.1 and non-ascii char parsing in .tui file
Hi all, I think I solved the problem. At least for my actual errors... I read the following net article about string coding in ruby 1.9 and up: http://blog.grayproductions.net/articles/ruby_19s_string With that info at hand, I made two brute-force trial patches (read the above article to see why I call them brute force :-) ) in two of the ruby context files where problems were arising (original line numbers are shown): ### .../scripts/context/ruby/base/tex.rb ## (946) case str.chomp === str = str.force_encoding(ISO-8859-1) if RUBY_VERSION = 1.9 case str.chomp ### .../scripts/context/ruby/base/texutil.rb ### (1033) case line.chomp === line = line.force_encoding(ISO-8859-1) if RUBY_VERSION = 1.9 case line.chomp The error is due to the fact that, by default, ruby 1.9 considers strings as US-ASCII and complains when finding chars not in that encoding. I don't know how to solve the problem for people writing in other encoding which is not ISO-8859-1. I tried the above with UTF-8 instead of ISO-8859-1 and it didn't work. Finally I don't know if there are any other places (at least case expressions) in the context ruby scripts where the problem might also show up. Kind regards, J. Augusto On Mon, Aug 10, 2009 at 4:15 AM, Jose Augusto jasaugu...@gmail.com wrote: Hi all, Ok, here it goes. Atached are the files used in the test. The problem as reported in the previous email used the file with the offending chars wrapped in a main file, which was just: \starttext \input zzz.tex \stoptext That is, the offending chars were in zzz.tex. In that example I noticed the error because the cross-refs in the equation numbering were not working. The parsing of the .tui file by ruby 1.9.1 failed. Then I saw the errors. But then I made a single context file , which goes attached with the correct results (tui, tuo, pdf), obtained with ruby 1.8.7. Howver, when i run ruby 1.9.1 with patch 129 (the last one), in this single tex file (attached) now the first pass don't work! Here is the result (in windows xp), which proves ruby 1.9.1 doesn't like the non US-ASCII chars :-) F:\ANOS\ano09-10-pen\NotasProcSinaltexexec test1.tex TeXExec | processing document 'test1.tex' TeXExec | no ctx file found D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:946:in `===': invalid byte sequence in US-ASCII (ArgumentError) from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:946: in `scantexcontent' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1907: in `processfile' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1143: in `block (2 levels) in processtex' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1133: in `timedrun' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1142: in `block in processtex' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1139: in `each' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1139: in `processtex' from D:/Context/tex/texmf-context/scripts/context/ruby/texexec.rb:63:in`process' from D:/Context/tex/texmf-context/scripts/context/ruby/texexec.rb:53:in `main' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/switch.rb:133:in `execute' from D:/Context/tex/texmf-context/scripts/context/ruby/texexec.rb:787:in `main' Thanks all for your interest, Kind regards J. Augusto On Sun, Aug 9, 2009 at 8:57 PM, Hans Hagen pra...@wxs.nl wrote: Jose Augusto wrote: when /^c (.*)$/o then @plugins.reader('MyCommands', [$1]) what if you remove the o (/o) can you find out what changed between 1.8 and 1.9 ... actually 1.9 is the stepping stone to 2.0 and 2 versions can be incompatible to 1 versions also, can you make a test file so that we can see if there's a platform dependency? - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ ___ If your
Re: [NTG-context] Ruby 1.9.1 and non-ascii char parsing in .tui file
I Hans, I just sent a mail with a possible patch, before I read this answer from you :-) As I say there, the patches work (at least for me) and I had updated context mkii a few hours ago, so I don't know if the betas you mentioned have already been installed... Hope the proposed patches be helpful... Thx very much for your answer. J. Augusto On Mon, Aug 10, 2009 at 2:10 PM, Hans Hagen pra...@wxs.nl wrote: Jose Augusto wrote: Hi all, Ok, here it goes. Atached are the files used in the test. The problem as reported in the previous email used the file with the offending chars wrapped in a main file, which was just: \starttext \input zzz.tex \stoptext That is, the offending chars were in zzz.tex. In that example I noticed the error because the cross-refs in the equation numbering were not working. The parsing of the .tui file by ruby 1.9.1 failed. Then I saw the errors. ruby 1.9 internally is no longer 8 bit clean i.e. there is always an encoding (file as well as internal); there is no way to enforce this (there is the -E option but that is useless for 1.8) i now open some files explicitly in binary mode; maybe that helps; i have no clue what happens with string manipulations later on i always liked ruby but such fundamental changes (encoding, dropping functions etc) without renaming the program are a showstopper for me as one cannot predict what will be on the user's system it looks like i have to convert the texutil part to lua (takes a few days and since i mostly use luatex it has a low priority) i uploaded a beta for testing Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Ruby 1.9.1 and non-ascii char parsing in .tui file
Hi Hans, The patch I proposed works also with ruby less than 1.9 (e.g. ruby 1.8.7)! The force_encoding() method is used only if RUBY_VERSION = 1.9. If the scripts are executed by ruby 1.8 or lesser version, there's no change done to the current line of code (e.g. 'case line.chomp' ). Also, I verified the patch with ruby 1.8.7 and with 1.9.1, and it worked in both cases. The patch has however the problem of slowing processing (the if is executed when parsing each line of the files, and probably this issue could be optimized...) Meanwhile I don't think that the magic string # encoding: ASCII-8BIT solves the problem. This string indicates that the script is written in ASCII-8BIT, but when is reading the strings from the .tex or .tui files ruby 1.9.1 considers them as US-ASCII regardless of the encoding declared in # encoding: ... I introduced # encoding: ASCII-8BIT in texmfstart.rb, tex.rb and texutil.rb and the problem didn't disapeer :-( Of course I may be wrong. But the experiments I did make me think this way. Also, I don't have Linux at my disposal (I mean, with context installed) and there the behavior perhaps is different... Kind regards and thank you very much. J. Augusto On Mon, Aug 10, 2009 at 5:27 PM, Hans Hagen pra...@wxs.nl wrote: Jose Augusto wrote: I Hans, I just sent a mail with a possible patch, before I read this answer from you :-) As I say there, the patches work (at least for me) and I had updated context mkii a few hours ago, so I don't know if the betas you mentioned have already been installed... Hope the proposed patches be helpful... your patch will not work with ruby 1.9 so if my patch (opening files in rb mode) works ok that's more robust; another option is to patch texmfstart.rb #!/usr/bin/env ruby #encoding: ASCII-8BIT - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Ruby 1.9.1 and non-ascii char parsing in .tui file
Hi Hans, I ran just now ruby 1.8.6 and the force_encoding() patch worked well. Just now I upgrade --context=current. The banner in the texexec.rb is banner = ['TeXExec', 'version 6.2.1', '1997-2009', 'PRAGMA ADE/POD'] and the date of this script (after updating) is 10-04-2009 (its April..) I'm running mkii. How do I get mkii beta scripts, as texexec.rb you mention? All my rubys are compiled from the box with mingw in windows (2000 or XP, in 3 different machines). Of course the encoding thing is different in Linux, Windows (and DOS prompts, for the matter), so there is probably different behavior in ruby/context/tex interaction with chars in Linux and Windows boxes... Thx Jose On Mon, Aug 10, 2009 at 6:39 PM, Hans Hagen pra...@wxs.nl wrote: Jose Augusto wrote: Meanwhile I don't think that the magic string # encoding: ASCII-8BIT solves the problem. This string indicates that the script is written in ASCII-8BIT, but when is reading the strings from the .tex or .tui files ruby 1.9.1 considers them as US-ASCII regardless of the encoding declared in # encoding: ... not when opened as 'rb' (which i do in the latest texexec.rb) so i wonder why that does not work at your place (http://blog.nuclearsquid.com/writings/ruby-1-9-encodings) i run ruby 1.8.6 (and on a couple of servers even older versions and i'm not going to touch ruby on these machines (i don't want to patch scripts that are supposed to run another 5-10 years) but i might update context and texexec) I introduced # encoding: ASCII-8BIT in texmfstart.rb, tex.rb and texutil.rb and the problem didn't disapeer :-( hm, it worked here Of course I may be wrong. But the experiments I did make me think this way. Also, I don't have Linux at my disposal (I mean, with context installed) and there the behavior perhaps is different... that's my biggest fear ... introducing more problems Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
[NTG-context] Ruby 1.9.1 and non-ascii char parsing in .tui file
Hi all, A few weeks ago I reported a problem with ruby 1.9.1, which was solved by removing the offending .tui line (Mojca and Hans AFAIR). The problem was related with the existence of non-ascii chars in the .tui file. Sadly it strikes again, now when chars with accents appear in titles (sections, subsections, etc...). The parsing of the line signaled below in the end of this message, from a .tui file, fails in ruby 1.9.1, but not in ruby 1.8.7. The error which is returned is also shown. If I remove the chars with accents from the section title all goes well. I'm using Mkii (--context=current). One of the advantages of ruby 1.8 over 1.8 is tht it is 3 times faster... However, ruby made lots of changes in string manipulation and storing when moving from 1.8 to 1.9, and that must be the source of the problem. I tracked the error to texutil.rb, line 1035: when /^c (.*)$/o then @plugins.reader('MyCommands', [$1]) but then i got lost in the Classes/Modules jungle :-) in that script. Perhaps it is this this procedure, in line 403 of texutil.rb, which triggers the error? def MyCommands::reader(logger,data) @@commands.push(data.shift+data.collect do |d| \{#{d}\} end.join) end Thx for your support in advance. If I can help in the solution of the problem please direct me in the task. I have some experience with ruby (I started using it when the 1st pickaxe book edition was published, around 2001) and with perl. But not with Lua :-)... Although I read alraedy quite a lot of Roberto's Lua book, I didn't started coding in Lua yet :-) Kind Regards J. Augusto ##TUI file and trigered error ### ## .tui snippet c \mainreference{}{a}{2--0-1-1-0-0-0-0--1}{1}{1.1} c \listentry{subsection}{3}{1.1.1}{Title with accents: Ãçê}{2--0-1-1-1-0-0-0--1}{1} c \mainreference{}{b}{2--0-1-1-1-0-0-0--1}{1}{1.2} ### error pdfTeX warning: pdftex.exe: no GlyphToUnicode entry has been inserted yet! Output written on test1.pdf (1 page, 72793 bytes). Transcript written on test1.log. TeXUtil | parsing file test1.tui TeXUtil | fatal error in parsing test1.tui TeXUtil | check loading of file 'test1', begin/end problem TeXUtil | shortcuts : 169 TeXUtil | expansions: 308 # ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] ConTeXt Minimals parsing of .tui broken with ruby 1.9.1 in Windows
Hi all, Thanks for the patch. I just updated ConTeXt Minimals and re-tried. Here is the GOOD result, now its working: - F:\ANOS\TeXesruby -v ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-mingw32] F:\ANOS\TeXestexexec con-hello1.tex TeXExec | processing document 'con-hello1.tex' Output written on con-hello1.pdf (1 page, 21759 bytes). Transcript written on con-hello1.log. TeXUtil | parsing file con-hello1.tui TeXUtil | shortcuts : 169 TeXUtil | expansions: 308 TeXUtil | reductions: 0 TeXUtil | divisions : 0 TeXUtil | loaded files: 1 TeXUtil | temporary files: 0 TeXUtil | commands: 20 TeXUtil | programs: 0 TeXUtil | tuo file saved TeXExec | runtime: 4.578125 - Meanwhile, the evil line with ctrl chars is not anymore in the .tui file. I want to thank Hans and Mojca for the patching and the kindness. Jose. On Tue, Jul 14, 2009 at 1:56 PM, Mojca Miklavec mojca.miklavec.li...@gmail.com wrote: After installing ConTeXt Minimals (the devel version) yesterday, I ran the above example with ruby 1.9.1-p129 in Windows (both Win 2000 and XP show the problem). (maybe mojca can patch this in core-uti.mkii: ): % \appendtoks % \immediatewriteutilitycommand{\thisisbytesequence{\testbytesequence}}% % \to \everyopenutilities \let\testbytesequence \empty % keep this \let\thisisbytesequence\gobbleoneargument % keep this Done, but untested. Mojca ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
[NTG-context] ConTeXt Minimals parsing of .tui broken with ruby 1.9.1 in Windows
Hello all, I want to report a problem that is either in ConTeXt, or in ruby 1.9.1 (last version of ruby). More probably, the problem has to do with ruby handling non-ASCII characters. I have no means of trying Linux, Solaris, etc... Anyone using ConTeXt with ruby 1.9.1 will face it probably (at least in Windows :-) The problem happens with all files, even with the simple Hello: \starttext Hello World \stoptext After installing ConTeXt Minimals (the devel version) yesterday, I ran the above example with ruby 1.9.1-p129 in Windows (both Win 2000 and XP show the problem). Meanwhile I compiled and tried several versions of Ruby, and found the following pattern of problems: ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-mingw32] PROBLEM ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mingw32]PROBLEM ruby 1.9.0 (2008-10-04 revision 19669) [i386-mingw32] No problem ruby 1.8.7 (2009-06-12 patchlevel 174) [i386-mingw32] No problem ruby 1.8.6 (2009-06-08 patchlevel 369) [i386-mingw32] No problem So, whatever it is, it is broken with ruby 1.9.1. All the versions of ruby were compiled in Windows using the mingw toolchain, with GCC 3.4.5. Here is the description of what happens When texutil parses the .tui file, I get the following (see comments after this text output): - ... Output written on con-hello1.pdf (1 page, 21759 bytes). Transcript written on con-hello1.log. TeXUtil | parsing file con-hello1.tui TeXUtil | debug 1 jasa #File:0x1271d18 xxx c \thisissectionseparator{-} xxx c \thisisutilityversion{2008.10.14} xxx c \thisisbytesequence{?+Ç} TeXUtil | fatal error in parsing con-hello1.tui TeXUtil | shortcuts : 0 TeXUtil | expansions: 0 TeXUtil | reductions: 0 TeXUtil | divisions : 0 TeXUtil | loaded files: 0 TeXUtil | temporary files: 0 TeXUtil | commands: 2 TeXUtil | programs: 0 TeXUtil | tuo file saved TeXExec | runtime: 2.703125 The lines with debug 1 jasa and starting with xxx result from the simple debug code I inserted in the file texutil.rb to find the problematic line. The error happens when the following ruby code is executed: (the extra debug lines have a mark # jasa ) -- texutil.rb (snippet, around line 1025) -- def loaded(filename) begin tuifile = File.suffixed(filename,'tui') if FileTest.file?(tuifile) then report(parsing file #{tuifile}) if f = open(tuifile) then report(debug 1 jasa #{f}) # jasa f.each do |line| print xxx #{line} # jasa case line.chomp when /^f (.*)$/o then @plugins.reader('MyFiles',$1.splitdata) when /^c (.*)$/o then @plugins.reader('MyCommands', [$1]) when /^e (.*)$/o then @plugins.reader('MyExtras', $1.splitdata) when /^s (.*)$/o then @plugins.reader('MySynonyms', $1.splitdata) when /^r (.*)$/o then @plugins.reader('MyRegisters',$1.splitdata) when /^p (.*)$/o then @plugins.reader('MyPlugins', $1.splitdata) when /^x (.*)$/o then @plugins.reader('MyKeys', $1.splitdata) when /^r (.*)$/o then # nothing, not handled here else # report(unknown entry #{line[0,1]} in line #{line.chomp}) end end f.close end else report(unable to locate #{tuifile}) end rescue report(fatal error in parsing #{tuifile}) @filename = 'texutil' else @filename = filename end end --- From the debugging lines that are expelled, it is clear that the line in the .tui file that triggers the problem is: c \thisisbytesequence{ ...non-ASCII codes... } and, precisely, it s the second line of the 'case': when /^c (.*)$/o then @plugins.reader('MyCommands', [$1]) which processes the .tui line and triggers the 'rescue' clause. So I think the problem lies in the digestion of non-ASCII characters by the last version of Ruby. I don't know what is the meaning of the \thisisbytesequence line in ConTeXt and the maening of those non-ASCII chars. I followed the @plugins.reader('MyCommands', [$1]) and figured out that what raises the exception happens before the @plugins.reader method, since it is never reached when the \thisisbytesequence line is