Re: Regexp
On Tue, Jul 08, 2008 at 09:33:02AM +1000, konrad Zielinski wrote: > "Flesch Index: 93.1/100" > what I want to get out is the 93.1 If it just for parsing (scraping) the input stream, it is very simple: (from "Flesch Index:") (format (till "/" T) 1) This will return a scaled number. > and assign it to a local variable > called Flesch. Here an example to read the value and double it, and format it again: (in "file" (from "Flesch Index:") (let Flesch (format (till "/" T) 1) (format (* 2 Flesch) 1) ) ) Or should the variable 'Flesch' also be read from the file? Then it is better to use 'intern', or simply 'read' : (in "file" (let (Var (read) Val (prog (from "Index:") (format (till "/" T) 1))) (set Var Val) (println Var (format Val 1) (val Var)) ) ) Flesch "93.1" 931 -> 931 > later lines like to do things in a completly different way such as: > > "9 words, average length 3.89 characters = 1.00 syllables" It depends what you want to do, and what assumptions you can make for the input data. Let's assume you know the positions and scale of all numbers: : (in "file" (skip) # Skip leading white space (let Line (split (line) " ") (list (format (pack (car Line))) (format (pack (get Line 5)) 2) (format (pack (get Line 8)) 2) ) ) ) -> (9 389 100) The same can be achieved with 'from' and 'read' instead of 'format' : (in "file" (make (link (read)) (from "length") (link (scl 2 (read))) (from "characters =") (link (scl 2 (read))) ) ) -> (9 389 100) There are many other possible solutions, though. Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: Regexp
The particular line I gave up on was "Flesch Index: 93.1/100" what I want to get out is the 93.1 and assign it to a local variable called Flesch. later lines like to do things in a completly different way such as: "9 words, average length 3.89 characters = 1.00 syllables" 2008/7/7 Alexander Burger <[EMAIL PROTECTED]>: > Hi Konrad, > > thanks for the long explanation. > > >> this. when ever it prints a list it checks to see if it might be a >> string by inspecting the first n bytes (I don't know how many) if they >> are all printable ascii cahracters then it prints the list as a > > That's quite a kludge, IMHO. > > >> delimiter is allready taken, and there is no alternative. so I can't >> easily write an unpacked string in source. > > This is not difficult when you use the backquote read macro: > > : (de foo () > (let S '`(chop "abcdefghijklmnopqrstuvwxyz") > (prinl (reverse S)) > (length S) ) ) > -> foo > > : (foo) > zyxwvutsrqponmlkjihgfedcba > -> 26 > > >> What I'm saying is that there may be a case to say that treating >> strings as lists by default may be better then treating them as > > Perhaps. But this is also a matter of efficiency: A list of single > characters takes up four times the space (and even eight times on > 64bits) compared to the packed symbol representation. > > PicoLisp tries not to be overly clever. A list should be a list, and a > symbol should be a symbol, under all circumstances. What you see is what > you get. When you start to print things differently depending on the > context, you'll just create confusion. > > >> transient strings by default, especially as the transietn symbol names >> are stored as lists under the hood anyway. > > Well, they are stored in cells, but not as lists of individual > characters. > > >> the documentation states somewhere that decision to treat strings as >> transient symbols may have been missguided. > > If I recall correctly, the opposite was meant: That transient symbols > (which are an essential feature of the language) look syntactically like > strings in other languages. > > In earlier versions of PicoLisp, transient symbols had another syntax > (i.e. :Var instead of "Var"). But then you cannot easily write transient > symbols with white space in the name etc., and you effectively lose the > ability to use transient symbols *like* strings in other languages. > > >> Once Strings are just lists all of the normal list processing >> functions can be appleid to them, and special characters can then be > > Yes, this is an important feature, and also used frequently in PicoLisp. > It is easily achieved with (line), (chop) etc. But more important is > that real transient symbols also have a value cell and a property list. > This makes it possible to do things with "strings" which are not > possible in other languages. > > For example, the whole locale translation mechanisms depend on the fact > that "strings" contain their current translation in the value cell. With > that, (prinl "house") will result in, for example, "Haus" if the locale > is German. > > >> given standard representations. The need to find out what the control >> sequences for things like tab and new line was also a stumbling block >> as they are not as intuitive as the '/' escapes used by most other >> programming languages. > > You are free do define e.g. > > : (setq "\n" "^J" "\r" "^M" "\t" "^I") > > (in the same transient scope, of course) > > >> the precision of numbers varies between lines. This last part of >> actually parsing the numbers and getting them treated as numbers is >> where I got up to before giving up. > > How did you try this? Should be no problem with the 'format' function, > where you can pass precision and separator values. > > Cheers, > - Alex > -- > UNSUBSCRIBE: mailto:[EMAIL PROTECTED] > -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: Regexp
Hi Konrad, thanks for the long explanation. > this. when ever it prints a list it checks to see if it might be a > string by inspecting the first n bytes (I don't know how many) if they > are all printable ascii cahracters then it prints the list as a That's quite a kludge, IMHO. > delimiter is allready taken, and there is no alternative. so I can't > easily write an unpacked string in source. This is not difficult when you use the backquote read macro: : (de foo () (let S '`(chop "abcdefghijklmnopqrstuvwxyz") (prinl (reverse S)) (length S) ) ) -> foo : (foo) zyxwvutsrqponmlkjihgfedcba -> 26 > What I'm saying is that there may be a case to say that treating > strings as lists by default may be better then treating them as Perhaps. But this is also a matter of efficiency: A list of single characters takes up four times the space (and even eight times on 64bits) compared to the packed symbol representation. PicoLisp tries not to be overly clever. A list should be a list, and a symbol should be a symbol, under all circumstances. What you see is what you get. When you start to print things differently depending on the context, you'll just create confusion. > transient strings by default, especially as the transietn symbol names > are stored as lists under the hood anyway. Well, they are stored in cells, but not as lists of individual characters. > the documentation states somewhere that decision to treat strings as > transient symbols may have been missguided. If I recall correctly, the opposite was meant: That transient symbols (which are an essential feature of the language) look syntactically like strings in other languages. In earlier versions of PicoLisp, transient symbols had another syntax (i.e. :Var instead of "Var"). But then you cannot easily write transient symbols with white space in the name etc., and you effectively lose the ability to use transient symbols *like* strings in other languages. > Once Strings are just lists all of the normal list processing > functions can be appleid to them, and special characters can then be Yes, this is an important feature, and also used frequently in PicoLisp. It is easily achieved with (line), (chop) etc. But more important is that real transient symbols also have a value cell and a property list. This makes it possible to do things with "strings" which are not possible in other languages. For example, the whole locale translation mechanisms depend on the fact that "strings" contain their current translation in the value cell. With that, (prinl "house") will result in, for example, "Haus" if the locale is German. > given standard representations. The need to find out what the control > sequences for things like tab and new line was also a stumbling block > as they are not as intuitive as the '/' escapes used by most other > programming languages. You are free do define e.g. : (setq "\n" "^J" "\r" "^M" "\t" "^I") (in the same transient scope, of course) > the precision of numbers varies between lines. This last part of > actually parsing the numbers and getting them treated as numbers is > where I got up to before giving up. How did you try this? Should be no problem with the 'format' function, where you can pass precision and separator values. Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: Regexp
Probably the main reason why my initial code continualy packed and unpacked is the lack of syntax support for packed stings. Take Erlang as an example. In erlang a string is just a list of bytes. (yes by default they only support ASCII) the interpreter toplevel however is built to support this. when ever it prints a list it checks to see if it might be a string by inspecting the first n bytes (I don't know how many) if they are all printable ascii cahracters then it prints the list as a string. Likewise when I enter "foobar" in the source it knows that this is a string. At present pico does not have any such support. the " delimiter is allready taken, and there is no alternative. so I can't easily write an unpacked string in source. And unless I remember to use princ the top level will print a string as (f o o b a r). Now I was dealing with strings that ran to thousands of characters so this got annoying fast. What I'm saying is that there may be a case to say that treating strings as lists by default may be better then treating them as transient strings by default, especially as the transietn symbol names are stored as lists under the hood anyway. Granted it would be a backwards incompatable change but as I recall the documentation states somewhere that decision to treat strings as transient symbols may have been missguided. Once Strings are just lists all of the normal list processing functions can be appleid to them, and special characters can then be given standard representations. The need to find out what the control sequences for things like tab and new line was also a stumbling block as they are not as intuitive as the '/' escapes used by most other programming languages. In the end things are harder than they need to be. On the specific things I was doing 1) read a texmacs file. every so often there is a tag. note this is not XML, it happens to use angle brakets to mark tags but that is it. in the above there is a significant |, as far as I can see both / and \ are also significant and mean different things. 2) submit the body of each section (after removing quotes) to the style command 3) parse the output of style to extract interesting numbers. Note this is paritcularly problametic as style produces hard to parse output and the precision of numbers varies between lines. This last part of actually parsing the numbers and getting them treated as numbers is where I got up to before giving up. 2008/7/7 Alexander Burger <[EMAIL PROTECTED]>: > On Mon, Jul 07, 2008 at 10:22:03AM +1000, konrad Zielinski wrote: >> I found this very frustrating to do in picolisp due to conflicing >> requirements, basically boiling down to how the language implements >> strings. > > It would be interesting to discuss the details here. > > >> packing and unpacking my strings. depending on weather I wanted to >> print them or do somthing else to them. >> >> I really thing that the strings should have simply been implemented as >> lists of chars. > > Yes, this is the way to go. What stops you to do it this way? > > Reading (with 'line' and 'till') will return lists of characters, and > the 'prin' functions will print them. Packing and unpacking might be > unnecessary. > > Cheers, > - Alex > -- > UNSUBSCRIBE: mailto:[EMAIL PROTECTED] > -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: Regexp
On Mon, Jul 07, 2008 at 10:22:03AM +1000, konrad Zielinski wrote: > I found this very frustrating to do in picolisp due to conflicing > requirements, basically boiling down to how the language implements > strings. It would be interesting to discuss the details here. > packing and unpacking my strings. depending on weather I wanted to > print them or do somthing else to them. > > I really thing that the strings should have simply been implemented as > lists of chars. Yes, this is the way to go. What stops you to do it this way? Reading (with 'line' and 'till') will return lists of characters, and the 'prin' functions will print them. Packing and unpacking might be unnecessary. Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: Regexp
It all depends on what you are doing. I recently tried implementing somthing I had working in Python into Pico Lisp. This is a program which needs to do three things 1) read A file & extract blocks of text from it. 2) submit the blocks of text (one at a time) to another commandline program 3) parse the output of the program and produce a summary report. I found this very frustrating to do in picolisp due to conflicing requirements, basically boiling down to how the language implements strings. I have to come down to the conclusion that implementing strings as sybmols was not a good design choice. I found myself moving freqently packing and unpacking my strings. depending on weather I wanted to print them or do somthing else to them. I really thing that the strings should have simply been implemented as lists of chars. 2008/7/4 Henrik Sarvell <[EMAIL PROTECTED]>: > -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: Regexp
I wrote (and Alex cleaned it up) a small script that is a simple extension to the match function, it's not real regular expressions but it will work ok for validating form input which is what I will use it for. It's documented here: http://www.prodevtips.com/2008/07/01/regular-expressions-in-pico-lisp/ --=_Part_14097_6711509.1215167224114 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline I wrote (and Alex cleaned it up) a small script that is a simple extension to the match function, it's not real regular expressions but it will work ok for validating form input which is what I will use it for. It's documented here: http://www.prodevtips.com/2008/07/01/regular-expressions-in-pico-lisp/";>http://www.prodevtips.com/2008/07/01/regular-expressions-in-pico-lisp/ --=_Part_14097_6711509.1215167224114-- -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED] ---
Re: Regexp
Hi Andrei, > Now I'm seeking a regexp library for picolisp. Speaking of myself, I never felt the need for a full-blown regexp library, because I find the built-in 'match' function quite flexible, especially in combination with other list functions. To employ true regular expressions in PicoLisp, however, I would simply write a glue function to the GNU regexp library. I haven't investigated that library yet, but once wrote something similar (though probably more primitive) for the 'fnmatch' function (file name matching). To try it, put the following into a file "fnMatch.l": (load "lib/gcc.l") (gcc "util" NIL 'fnMatch) #include any fnMatch(any ex) { any x; x = evSym(cdr(ex)); { char pat[bufSize(x)]; bufString(x, pat); x = evSym(cddr(ex)); { char str[bufSize(x)]; bufString(x, str); return fnmatch(pat, str, 0)? Nil : T; } } } /**/ You can test it as: $ ./p dbg.l fnMatch.l : (fnMatch "foo*.c" "foobar.c") -> T : (fnMatch "foo*.c" "bar.c") -> NIL In an analog way it should be possible to interface to the GNU regexp library. Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Regexp
Thanks to all who answered my question about picoserver. Your help is very precious. Now I'm seeking a regexp library for picolisp. At 1st glance there is some solutions for Common Lisp http://www.ccs.neu.edu/home/dorai/pregexp/pregexp.html But it needs to be rewritten completely to our dialect. Maybe there are other libraries or bindings? -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]