pdf itself is a plain text format so that you don't need another special tool to write or read it. The plot or publish addon can provide examples of how to write a pdf file in pure J.
If you open pdf with a text editor and there are some binary data other than picture, then those are usually stream data compressed using zlib. The compressed and uncompressed length of each stream data are given before each stream started. You can try decode one of them to see what is the real content that give the visual display of a formula. But I am skeptical if it is useful for machine understanding of formula. Also, there is an utility `pdttotext' to convert pdf to text. On Sat, 20 Jun 2009, [email protected] wrote: > I use an linux box (ubuntu) and I downloaded an application > by the name of: > > PDF Editor which goes by the name of PDFedit. > > I ran the program against the same portion of the .pdf file > and extracted the text for the formula that was so mangled. > > I got: > m > ( > ρ = unit : kg / m 3 > V > ) > > still kind of mangled, but it is better. > > In the original formula, the had a "m%V" term that was a > horizontal and slanted divide sign. > > I got the ρ - rho this time, which was an improvement. > > I think a lot strangeness will go away when I get the > correct font support. Maybe I should > install the APL font sets and some of the Mathematica font > sets. > > ----- Original Message Follows ----- > From: [email protected] > To: Programming forum <[email protected]> > Subject: Re: [Jprogramming] math.pdf -> J Server -> math.ijs > file > Date: Sat, 20 Jun 2009 11:40:31 -0800 > > > > >----- Original Message Follows ----- > >From: Devon McCormick <[email protected]> > >To: Programming forum <[email protected]> > >Subject: Re: [Jprogramming] math.pdf -> J Server -> > >math.ijs file > >Date: Fri, 19 Jun 2009 21:06:48 -0400 > > > >> I searched for "physics formula" in Google, > >>grabbed the first PDF Ifound > >>(http://faculty.trinityvalleyschool.org/hoseltom/handouts/ > >>F ormula%20Sheet-2003-05-07-8pg.pdf),opened it in Acrobat, > >>highlighted a formula for "Density = mass/volume"which > >was > >>written as something like: {rho} = m _ > >>(unit:kg/m{exponent} 3) vand grabbed this > >>formula using the Acrobat selection tool. In a textwindow > >>, this pasted as(unit : kg /m3)Vñ = mSo that's one > >problem. > > > >I downloaded the same .pdf file and ran a > >free-pay-if-you-like-it > >program that extracts text from a .pdf file. > > > >My results for the same density function: > > > >#4 Weight = m<8f>g > > g = 9.81m/sec² near the surface of the Earth > >= 9.795 m/sec ² in Fort Worth, TX > > Density = mass / volume > > > >If I take this apart, #4 Weitht = m<8f>g is not > >correct because I don't have > >a definition for the round dot for multiplication > > > >the next line is correct > >the next line is correct > > > >The next line looks like a big mess. > > > >() 3 /: mkgunitV > >m= ? > > > >as you say, it should be: > > > >rho = m/V(unit:kg/m^3) > > > >But looking at it, the problem is the way the text was > >processed. > > > >every letter is there, it is the order that is bad. > >The "(" and ")" are there, > >the "/" is there, > >the "kg" is there, > >the "m" is there, > >the "3" is there the "V" is there. > > > >the reason the 3 is not displayed as an exponent is because > >of font support > > > >I am not sure where the rho went, but I think it has to do > >with not > >having the proper font support. > > > >If you have the proper fonts installed, everything will be > >displayed, and if one were to > >make the code process the text properly, your formula would > >be correct. > > > > > >>However, the deeper problem is that _there is nosuch thing > >>as a standard mathematical notation_. Even on the single > >>physicsformula page I looked at here, multiplication is > >>represented both implicitlyby adjacent letters and > >>explicitly by a big, vertically-centered dot.Even on this > >>one page, equality is parsed in differen t, inconsistent > >>ways.The intended meanings are clear from context and a > >>familiarity with physicsbut are ambiguous taken by > >>themselves. I could go on and on - take a look > >>athttp://www.jsoftware.com/jwiki/NYCJUG/MathematicalNotati > >>o n for a little moreon this - but I won't. > > > >I totally agree with you, this is pain in the ass. > > > >>This inconsistency of notation is, in fact, part ofthe > >reason > >>Iverson created APL in the first place. > > > >great idea from a great man. > > > >>The upshot is that > >>an idea like Dan's is probably more fruitful than > >>thisnotion of grabbing things off a PDF. Even then, > >>you'll need to spend a fairamount of time interpreting > >>what you get. > > > >I respectfully disagree. The program I am using does a > >fair job > >proving to me that it is fisable. > > > >>For a look at how someone handles a lot of formulas and > >>translates them intoJ, see Tom Allen's pages starting > >>athttp://www.jsoftware.com/jwiki/Essays/SpaceTime2D/SpaceT > >>i me2D01.Good luck,DevonOn Fri, Jun 19, 2009 at 8:24 PM, > >><[email protected]> wrote:> I do not plan to > >>use OCR.>> I am thinking more along the lines of cutting > >>and pasting a> section out of a> Portable Document Format > >>(pdf) file that represents in> normal> mathematical > >>notation a formula.>> Acter doing the copy, use > >>cut/paste buffer to generate> equivalent j code.>> As I > >>understand it ( probably wrong ) what is in the> cut/paste > >>buffer is a sequence> of bytes which represents in pdf the > >>formula. I am thinking> that different formulas> ( no > >>matter how little or how big the difference ) have> > >>different bytes. So, no matter> how difficult, one should > >>be able to transcribe from pdf> representation to j > >>representation.>> I think it would be way cool (1960s > >>euphemism) to go to a> web page containing formula> for > >>Physics and copy a pdf version of a formula and then> turn > >>it into the j representation> automatically.>> ----- > >>Original Message Follows -----> From: bill lam > >><[email protected]>> To: [email protected]> > >>Subject: Re: [Jprogramming] math.pdf -> J Server -> > >>math.ijs> file> Date: Fri, 19 Jun 2009 10:09:30 +0800>> > >>>Except for the ocr part, looks similar to mathematica.> > >>>> >btw the 'Quality' Web Email you used breaks every > >>>thread it> >replies.> >> >--> regards,> > >>>====>===============================================> > GPG > >>key 1024D/4434BAB3 2008-08-24> >gpg --keyserver > >>subkeys.pgp.net --recv-keys 4434BAB3> > >>>--------------------------------------------------------- > >>- -> >----------- For information about J forums see> > >>>http://www.jsoftware.com/forums.htm> > >>---------------------------------------------------------- > >>- -----------> For information about J forums see > >>http://www.jsoftware.com/forums.htm>-- Devon McCormick, > >>CFA^me^ at acm.org is mypreferred > >>e-mail---------------------------------------------------- > >>- -----------------For information about J forums see > >>http://www.jsoftware.com/forums.htm > >----------------------------------------------------------- > >----------- For information about J forums see > >http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm -- regards, ==================================================== GPG key 1024D/4434BAB3 2008-08-24 gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
