subject:"Regexp"

Re: Regexp

2008-07-07 Thread Alexander Burger

On Tue, Jul 08, 2008 at 09:33:02AM +1000, konrad Zielinski wrote:
> "Flesch Index: 93.1/100"
> what I want to get out is the 93.1

If it just for parsing (scraping) the input stream, it is very simple:

   (from "Flesch Index:")
   (format (till "/" T) 1)

This will return a scaled number.


> and assign it to a local variable
> called Flesch.

Here an example to read the value and double it, and format it again:

   (in "file"
  (from "Flesch Index:")
  (let Flesch (format (till "/" T) 1)
 (format (* 2 Flesch) 1) ) )


Or should the variable 'Flesch' also be read from the file? Then
it is better to use 'intern', or simply 'read'

   : (in "file"
  (let (Var (read)  Val (prog (from "Index:") (format (till "/" T) 1)))
 (set Var Val)
 (println Var (format Val 1) (val Var)) ) )
   Flesch "93.1" 931
   -> 931


> later lines like to do things in a completly different way such as:
> 
> "9 words, average length 3.89 characters = 1.00 syllables"

It depends what you want to do, and what assumptions you can make for
the input data. Let's assume you know the positions and scale of all
numbers:

   : (in "file"
  (skip)  # Skip leading white space
  (let Line (split (line) " ")
 (list
(format (pack (car Line)))
(format (pack (get Line 5)) 2)
(format (pack (get Line 8)) 2) ) ) )
   -> (9 389 100)


The same can be achieved with 'from' and 'read' instead of 'format'

   : (in "file"
  (make
 (link (read))
 (from "length")
 (link (scl 2 (read)))
 (from "characters =")
 (link (scl 2 (read))) ) ) 
   -> (9 389 100)

There are many other possible solutions, though.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

Re: Regexp

2008-07-07 Thread konrad Zielinski

The particular line I gave up on was

"Flesch Index: 93.1/100"

what I want to get out is the 93.1  and assign it to a local variable
called Flesch.

later lines like to do things in a completly different way such as:

"9 words, average length 3.89 characters = 1.00 syllables"



2008/7/7 Alexander Burger <[EMAIL PROTECTED]>:
> Hi Konrad,
>
> thanks for the long explanation.
>
>
>> this. when ever it prints a list it checks to see if it might be a
>> string by inspecting the first n bytes (I don't know how many) if they
>> are all printable ascii cahracters then it prints the list as a
>
> That's quite a kludge, IMHO.
>
>
>> delimiter is allready taken, and there is no alternative. so I can't
>> easily write an unpacked string in source.
>
> This is not difficult when you use the backquote read macro:
>
>   : (de foo ()
>  (let S '`(chop "abcdefghijklmnopqrstuvwxyz")
> (prinl (reverse S))
> (length S) ) )
>   -> foo
>
>   : (foo)
>   zyxwvutsrqponmlkjihgfedcba
>   -> 26
>
>
>> What I'm saying is that there may be a case to say that treating
>> strings as lists by default may be better then treating them as
>
> Perhaps. But this is also a matter of efficiency: A list of single
> characters takes up four times the space (and even eight times on
> 64bits) compared to the packed symbol representation.
>
> PicoLisp tries not to be overly clever. A list should be a list, and a
> symbol should be a symbol, under all circumstances. What you see is what
> you get. When you start to print things differently depending on the
> context, you'll just create confusion.
>
>
>> transient strings by default, especially as the transietn symbol names
>> are stored as lists under the hood anyway.
>
> Well, they are stored in cells, but not as lists of individual
> characters.
>
>
>> the documentation states somewhere that decision to treat strings as
>> transient symbols may have been missguided.
>
> If I recall correctly, the opposite was meant: That transient symbols
> (which are an essential feature of the language) look syntactically like
> strings in other languages.
>
> In earlier versions of PicoLisp, transient symbols had another syntax
> (i.e. :Var instead of "Var"). But then you cannot easily write transient
> symbols with white space in the name etc., and you effectively lose the
> ability to use transient symbols *like* strings in other languages.
>
>
>> Once Strings are just lists all of the normal list processing
>> functions can be appleid to them, and special characters can then be
>
> Yes, this is an important feature, and also used frequently in PicoLisp.
> It is easily achieved with (line), (chop) etc. But more important is
> that real transient symbols also have a value cell and a property list.
> This makes it possible to do things with "strings" which are not
> possible in other languages.
>
> For example, the whole locale translation mechanisms depend on the fact
> that "strings" contain their current translation in the value cell. With
> that, (prinl "house") will result in, for example, "Haus" if the locale
> is German.
>
>
>> given standard representations. The need to find out what the control
>> sequences for things like tab and new line was also a stumbling block
>> as they are not as intuitive as the '/' escapes used by most other
>> programming languages.
>
> You are free do define e.g.
>
>   : (setq "\n" "^J"  "\r" "^M"  "\t" "^I")
>
> (in the same transient scope, of course)
>
>
>> the precision of numbers varies between lines.  This last part of
>> actually parsing the numbers and getting them treated as numbers is
>> where I got up to before giving up.
>
> How did you try this? Should be no problem with the 'format' function,
> where you can pass precision and separator values.
>
> Cheers,
> - Alex
> --
> UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
>
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

Re: Regexp

2008-07-07 Thread Alexander Burger

Hi Konrad,

thanks for the long explanation.


> this. when ever it prints a list it checks to see if it might be a
> string by inspecting the first n bytes (I don't know how many) if they
> are all printable ascii cahracters then it prints the list as a

That's quite a kludge, IMHO.


> delimiter is allready taken, and there is no alternative. so I can't
> easily write an unpacked string in source.

This is not difficult when you use the backquote read macro:

   : (de foo ()
  (let S '`(chop "abcdefghijklmnopqrstuvwxyz")
 (prinl (reverse S))  
 (length S) ) )
   -> foo

   : (foo)  
   zyxwvutsrqponmlkjihgfedcba
   -> 26


> What I'm saying is that there may be a case to say that treating
> strings as lists by default may be better then treating them as

Perhaps. But this is also a matter of efficiency: A list of single
characters takes up four times the space (and even eight times on
64bits) compared to the packed symbol representation.

PicoLisp tries not to be overly clever. A list should be a list, and a
symbol should be a symbol, under all circumstances. What you see is what
you get. When you start to print things differently depending on the
context, you'll just create confusion.


> transient strings by default, especially as the transietn symbol names
> are stored as lists under the hood anyway.

Well, they are stored in cells, but not as lists of individual
characters.


> the documentation states somewhere that decision to treat strings as
> transient symbols may have been missguided.

If I recall correctly, the opposite was meant: That transient symbols
(which are an essential feature of the language) look syntactically like
strings in other languages.

In earlier versions of PicoLisp, transient symbols had another syntax
(i.e. :Var instead of "Var"). But then you cannot easily write transient
symbols with white space in the name etc., and you effectively lose the
ability to use transient symbols *like* strings in other languages.


> Once Strings are just lists all of the normal list processing
> functions can be appleid to them, and special characters can then be

Yes, this is an important feature, and also used frequently in PicoLisp.
It is easily achieved with (line), (chop) etc. But more important is
that real transient symbols also have a value cell and a property list.
This makes it possible to do things with "strings" which are not
possible in other languages.

For example, the whole locale translation mechanisms depend on the fact
that "strings" contain their current translation in the value cell. With
that, (prinl "house") will result in, for example, "Haus" if the locale
is German.


> given standard representations. The need to find out what the control
> sequences for things like tab and new line was also a stumbling block
> as they are not as intuitive as the '/' escapes used by most other
> programming languages.

You are free do define e.g.

   : (setq "\n" "^J"  "\r" "^M"  "\t" "^I")

(in the same transient scope, of course)


> the precision of numbers varies between lines.  This last part of
> actually parsing the numbers and getting them treated as numbers is
> where I got up to before giving up.

How did you try this? Should be no problem with the 'format' function,
where you can pass precision and separator values.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

Re: Regexp

2008-07-06 Thread konrad Zielinski

Probably the main reason why my initial code continualy packed and
unpacked is the lack of syntax support for packed stings. Take  Erlang
as an example.

In erlang a string is just a list of bytes. (yes by default they only
support ASCII) the interpreter toplevel however is built to support
this. when ever it prints a list it checks to see if it might be a
string by inspecting the first n bytes (I don't know how many) if they
are all printable ascii cahracters then it prints the list as a
string.

Likewise when I enter "foobar" in the source it knows that this is a
string.  At present pico does not have any such support. the "
delimiter is allready taken, and there is no alternative. so I can't
easily write an unpacked string in source.
And unless I remember to use princ the top level will print a string as
(f o o b a r).
Now I was dealing with strings that ran to thousands of characters so
this got annoying fast.

What I'm saying is that there may be a case to say that treating
strings as lists by default may be better then treating them as
transient strings by default, especially as the transietn symbol names
are stored as lists under the hood anyway.

Granted it would be a backwards incompatable change but as I recall
the documentation states somewhere that decision to treat strings as
transient symbols may have been missguided.

Once Strings are just lists all of the normal list processing
functions can be appleid to them, and special characters can then be
given standard representations. The need to find out what the control
sequences for things like tab and new line was also a stumbling block
as they are not as intuitive as the '/' escapes used by most other
programming languages.

In the end things are harder than they need to be.

On the specific things I was doing

1) read a texmacs file. every so often there is a  tag. note this is not XML, it happens to use angle brakets  to
mark tags but that is it. in the above there is a significant |, as
far as I can see both / and \ are also significant and mean different
things.

2) submit the body of each section (after removing quotes) to the style command

3) parse the output of style to extract interesting numbers. Note this
is paritcularly problametic as style produces hard to parse output and
the precision of numbers varies between lines.  This last part of
actually parsing the numbers and getting them treated as numbers is
where I got up to before giving up.

2008/7/7 Alexander Burger <[EMAIL PROTECTED]>:
> On Mon, Jul 07, 2008 at 10:22:03AM +1000, konrad Zielinski wrote:
>> I found this very frustrating to do in picolisp due to conflicing
>> requirements, basically boiling down to how the language implements
>> strings.
>
> It would be interesting to discuss the details here.
>
>
>> packing and unpacking my strings. depending on weather I wanted to
>> print them or do somthing else to them.
>>
>> I really thing that the strings should have simply been implemented as
>> lists of chars.
>
> Yes, this is the way to go. What stops you to do it this way?
>
> Reading (with 'line' and 'till') will return lists of characters, and
> the 'prin' functions will print them. Packing and unpacking might be
> unnecessary.
>
> Cheers,
> - Alex
> --
> UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
>
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

Re: Regexp

2008-07-06 Thread Alexander Burger

On Mon, Jul 07, 2008 at 10:22:03AM +1000, konrad Zielinski wrote:
> I found this very frustrating to do in picolisp due to conflicing
> requirements, basically boiling down to how the language implements
> strings.

It would be interesting to discuss the details here.

> packing and unpacking my strings. depending on weather I wanted to
> print them or do somthing else to them.
> 
> I really thing that the strings should have simply been implemented as
> lists of chars.

Yes, this is the way to go. What stops you to do it this way?

Reading (with 'line' and 'till') will return lists of characters, and
the 'prin' functions will print them. Packing and unpacking might be
unnecessary.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

Re: Regexp

2008-07-06 Thread konrad Zielinski

It all depends on what you are doing.

I recently tried implementing somthing I had working in Python into
Pico Lisp. This is a program which needs to do three things

1) read A file & extract blocks of text from it.
2) submit the blocks of text (one at a time) to another commandline program
3) parse the output of the program and produce a summary report.

I found this very frustrating to do in picolisp due to conflicing
requirements, basically boiling down to how the language implements
strings.

I have to come down to the conclusion that implementing strings as
sybmols was not a good design choice. I found myself moving freqently
packing and unpacking my strings. depending on weather I wanted to
print them or do somthing else to them.

I really thing that the strings should have simply been implemented as
lists of chars.

2008/7/4 Henrik Sarvell <[EMAIL PROTECTED]>:
>
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

Re: Regexp

2008-07-04 Thread Henrik Sarvell

I wrote (and Alex cleaned it up) a small script that is a simple extension
to the match function, it's not real regular expressions but it will work ok
for validating form input which is what I will use it for. It's documented
here: http://www.prodevtips.com/2008/07/01/regular-expressions-in-pico-lisp/

--=_Part_14097_6711509.1215167224114
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

I wrote (and Alex cleaned it up) a small script that is a simple extension to 
the match function, it's not real regular expressions but it will work ok 
for validating form input which is what I will use it for. It's documented 
here: http://www.prodevtips.com/2008/07/01/regular-expressions-in-pico-lisp/";>http://www.prodevtips.com/2008/07/01/regular-expressions-in-pico-lisp/

--=_Part_14097_6711509.1215167224114--
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
---

Re: Regexp

2008-07-04 Thread Alexander Burger

Hi Andrei,

> Now I'm seeking a regexp library for picolisp.

Speaking of myself, I never felt the need for a full-blown regexp
library, because I find the built-in 'match' function quite flexible,
especially in combination with other list functions.

To employ true regular expressions in PicoLisp, however, I would simply
write a glue function to the GNU regexp library. I haven't investigated
that library yet, but once wrote something similar (though probably more
primitive) for the 'fnmatch' function (file name matching).

To try it, put the following into a file "fnMatch.l":


(load "lib/gcc.l")
(gcc "util" NIL 'fnMatch)

#include 

any fnMatch(any ex) {
   any x;

   x = evSym(cdr(ex));
   {
  char pat[bufSize(x)];

  bufString(x, pat);
  x = evSym(cddr(ex));
  {
 char str[bufSize(x)];

 bufString(x, str);
 return fnmatch(pat, str, 0)? Nil : T;
  }
   }
}
/**/


You can test it as:

   $ ./p dbg.l fnMatch.l
   : (fnMatch "foo*.c" "foobar.c")
   -> T
   : (fnMatch "foo*.c" "bar.c")   
   -> NIL


In an analog way it should be possible to interface to the GNU regexp
library.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

Regexp

2008-07-04 Thread Andrei Ivushkin

Thanks to all who answered my question about picoserver. Your help is very  
precious.
Now I'm seeking a regexp library for picolisp. At 1st glance there is some  
solutions for Common Lisp  
http://www.ccs.neu.edu/home/dorai/pregexp/pregexp.html

But it needs to be rewritten completely to our dialect.
Maybe there are other libraries or bindings?
--
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

Re: Regexp

Re: Regexp

Re: Regexp

Re: Regexp

Re: Regexp

Re: Regexp

Re: Regexp

Re: Regexp

Regexp

9 matches

Site Navigation

Mail list logo

Footer information