Unsubscribe

2012-06-02 Thread Bastien
Good bye Bastien bastiengue...@googlemail.com :-(
You are now unsubscribed



-- 
 Bastien
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Alexander Burger
Hi Jorge,

  An important question is: Does this parallel processing of database
  objects also involve modifications of these objects? If so, the
  necessary synchronization between the processes will produce additional
  costs.
 
 Yes, I need to update the database with results from the executed commands.

Then I suspect that parallelizing the task may make things worse.

Parallelizing something makes sense when CPUs load is the bottleneck.
For database updates, however, the bottleneck is disk I/O and the
system's disk buffer cache. What makes sense in such cases is to
parallelize the work on separate databases, running on separate
machines.


 So, you suggest limiting the updates to happen only in the main process, 
 right?

Yes. To be precise, the recommended way is to have all database
operations run in child processes which are all children of a common
parent process. If we call this parent process the main process, then
the updates don't happen in that process but in one of the children.


Having said this, I see that your test program doesn't operate on a
database at all. Calling the C compiler in parallel tasks _does_ make
sense. Then we talk about something completely different.

 In that case I would need something like the following to be able to
 invoke the shell commands and update the database with the results.

This would indeed make sense, if the shell commands induce a heavy load.

Then the CPU load is the bottleneck again, and the way to go is to fork
several processes to do the work (i.e. call the shell commands), but
still have a _single_ process operating on the database. This would be
optimal.


 I don't like it, too convoluted for my taste. Any suggestion on how to
 improve the performance/style? Perhaps a different approach would be
 better?

Yes, I think so too. If I understand you right, you want to call a
number of shell commands (in a batch list), and then store the results
in a database. If so, you could use something like that:

   (de processJobs (CPUs Batch)
  (let Slots (need CPUs free)
 (for Exe Batch
(let Pos
   (wait NIL
  (seek
 '((Pos)
(cond
   ((== free (car Pos))  # Found a free slot
  (set Pos busy) )
   ((n== busy (car Pos)) # Found a result
  (msg (car Pos))  # Instead of 'msg': Store 
result in DB
  (set Pos busy) ) ) )
 Slots ) )
   (later Pos (eval Exe)) ) ) ) )

You pass the number of CPUs and a list of executable expressions which
may do arbitrary work, including calls to a shell. The 'msg' call can be
replaced with something more useful, e.g. a store the result into a
database.

Example call:

   (processJobs 4
  (make
 (do 20
(link '(in '(sh -c sleep 1; echo $RANDOM) (read))) ) ) )

This outpus 20 random numbers via 'msg', in maximally 4 parallel
processes.


I didn't completely analyze your code. Just a warning about an error:

 (de completeJob (Pos)
(let (cdar Pos)
   (let RESULT (caar Pos)
  (eval (caddar Pos)) ) )

The second line redefines the function 'cdar'. Is that intended?

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Jorge Acereda
Hi,

On Jun 2, 2012, at 10:10 AM, Alexander Burger wrote:
 
 
 Having said this, I see that your test program doesn't operate on a
 database at all. Calling the C compiler in parallel tasks _does_ make
 sense. Then we talk about something completely different.

It was just an example. I'll have a database recording all the dependencies for 
each target.

 
 In that case I would need something like the following to be able to
 invoke the shell commands and update the database with the results.
 
 This would indeed make sense, if the shell commands induce a heavy load.

It does, say we have to invoke in the order of 1 gcc commands. My toy 
application is a 'make' replacement.


 
 I don't like it, too convoluted for my taste. Any suggestion on how to
 improve the performance/style? Perhaps a different approach would be
 better?
 
 Yes, I think so too. If I understand you right, you want to call a
 number of shell commands (in a batch list), and then store the results
 in a database. If so, you could use something like that:

Not exactly, I would prefer not to compose that list in advance. Would be good 
to be able to keep queueing commands while previous commands are running.
My guess is that deciding what to execute next will be time-consuming, as soon 
as it can be determined that a target is ready for execution its command should 
be queued.


 
   (de processJobs (CPUs Batch)
  (let Slots (need CPUs free)
 (for Exe Batch
(let Pos
   (wait NIL
  (seek
 '((Pos)
(cond
   ((== free (car Pos))  # Found a free slot
  (set Pos busy) )
   ((n== busy (car Pos)) # Found a result
  (msg (car Pos))  # Instead of 'msg': Store 
 result in DB
  (set Pos busy) ) ) )
 Slots ) )
   (later Pos (eval Exe)) ) ) ) )


I guess an additional step is needed that performs the equivalent of my 
waitJobs:


(bench
   (processJobs 2 (mapcar '((X) (list '* X X)) (range 1 10
1
4
9
16
25
36
49
64
0.006 sec

No result for 8 and 9.

 
 I didn't completely analyze your code. Just a warning about an error:
 
 (de completeJob (Pos)
   (let (cdar Pos)
  (let RESULT (caar Pos)
 (eval (caddar Pos)) ) )
 
 The second line redefines the function 'cdar'. Is that intended?

Of course not :-)
It should read something like:
(bind (cadar Pos) ...

Each slot contains (result environment expression), and that line tries 
to restore the environment of the instant when the command was queued, so that 
it restores all the bound variables before evaluating the expression.

Time to experiment a bit more. Thanks.

--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Henrik Sarvell
Correct, and I should have added that in my case that only trivial
stuff was actually executed straight in the parent process, the more
hairy stuff was forked in the main, so a child calls the boss which
creates new children in order to avoid children of children (afaik the
db syncing only works with a single child depth).

In main.l (the boss):

# workers
(de forkWorkHorse Args
   (unless (fork)
  (eval Args)
  (bye)))

(de updatePresentFeed (Feed)
   (dbSync)
   (aImportCommon '+Rss Feed)
   (commit 'upd)
   (bye))

(de updatePresentFeeds (Feeds)
   (for Feed Feeds
  (let Cycles 30
 (if (fork)
(let Pid @
   (while (and (n0 Cycles) (kill Pid 0))
  (dec 'Cycles)
  (wait 1000))
   (kill Pid))
(updatePresentFeed Feed)

The last updatePresentFees is actually an ugly (but robust) hack to
make up for the fact that some external sites would provide RSS input
that would crash the child, at this point I didn't have the time or
energy to figure out why, I simply wanted the import process to move
along and forget about it. Ie if something takes too long to finish we
kill it and go for the next in line.

In the child:

(dm aImportFleshLater (Feed)
   (put! Feed 'lastFetch (stamp '+Gh))
   (let As (createArticles This Feed T)
  (boss 'forkWorkHorse 'fleshArticles (lit As))
  As))

Here we import the articles so for instance the headlines display ASAP
for the user, other intensive meta data (data the user can wait for)
generation is handled through calling forkWorkHorse in main.l with the
appropriate data.




On Sat, Jun 2, 2012 at 3:23 PM, Alexander Burger a...@software-lab.de wrote:
 Hi Henrik,

 I don't know if the boss function might be of help to you? It helped
 me in order for the current forked instance of the web server to be

 That's right.

 But I'd like to point out (again) that 'boss' must be used with absolute
 care. It executes expressions in the parent process, but the parent
 process is the central coordinator for child synchronization and IPC. It
 should always be as responsive as possible (because otherwise all
 children might be slowed down or even blocked), and it should never!
 crash.

 A crashing or blocking child process, on the other hand, doesn't do any
 harm to the rest of the system. Therefore, I strongly recommend to do
 any database modifications in a child process, as is the standard setup
 in all PicoLisp examples.

 Another reason for not putting any load on the parent process is: If the
 parent starts to do non-trivial work, it will increase its heap size. In
 the following, all fork'ed child processes will inherit this heap,
 including possible temporary data and cached objects. It is difficult
 then go get a predictable behavior.

 In a nutshell: Keep you hands off 'boss', unless you are absolutely sure
 what you are doing!

 Cheers,
 - Alex
 --
 UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Alexander Burger
Hi Jorge,

 It was just an example. I'll have a database recording all the dependencies 
 for each target.
 
  
  In that case I would need something like the following to be able to
  invoke the shell commands and update the database with the results.
  
  This would indeed make sense, if the shell commands induce a heavy load.
 
 It does, say we have to invoke in the order of 1 gcc commands. My
 toy application is a 'make' replacement.

OK



  Yes, I think so too. If I understand you right, you want to call a
  number of shell commands (in a batch list), and then store the results
  in a database. If so, you could use something like that:
 
 Not exactly, I would prefer not to compose that list in advance. Would
 be good to be able to keep queueing commands while previous commands are
 running.
 My guess is that deciding what to execute next will be time-consuming,
 as soon as it can be determined that a target is ready for execution its
 command should be queued.

I think you can do that easily by with 'fifo', using a global '*Batch'
variable instead of the 'Batch' parameter in

(de processJobs (CPUs Batch)


 I guess an additional step is needed that performs the equivalent of
 my waitJobs:

Right. This was missing. The remaining running jobs must be waited for.


Taking all that into account, I propose the following solution: We
install a background 'task', which runs, say, every 2 seconds, and
handles all that. We use a global '*Batch' to hold the queue:

   # *Batch

   (task -2000 0# Run once every 2 sec
  Slots (need 4 free) # QuadCore
  (map
 '((Pos)
(cond
   ((== free (car Pos))   # Found a free slot
  (when (fifo '*Batch)  # Any jobs?
 (set Pos busy)   # Yes
 (later Pos (eval @)) ) )
   ((n== busy (car Pos))  # Found a result
  (msg (car Pos))   # Handle result
  (ifn (fifo '*Batch)   # More jobs?
 (set Pos free)   # No
 (set Pos busy)   # Yes
 (later Pos (eval @)) ) ) ) )
 Slots ) )

It checks the slots every 2 seconds, starts new processes when a free
slot is found, and handles the results if any are available.


We can use 'fifo' to batch a new job:

   (fifo '*Batch '(call cc ..))

   : (for X 10 (fifo '*Batch (list '* X X)))
   - (* 10 10)
   : 1  
   4
   9
   16
   25
   36
   49
   64
   81
   100

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Alexander Burger
Hi Henrik,

 Correct, and I should have added that in my case that only trivial
 stuff was actually executed straight in the parent process, the more

OK, all right.

 hairy stuff was forked in the main, so a child calls the boss which
 creates new children in order to avoid children of children (afaik the
 db syncing only works with a single child depth).

Right. It uses 'tell' internally, which sends messages only to all
sister processes, and to all direct child processes.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Jorge Acereda
Hi Alexander,

On Jun 2, 2012, at 1:01 PM, Alexander Burger wrote:
 
   (task -2000 0# Run once every 2 sec
  Slots (need 4 free) # QuadCore
  (map
 '((Pos)
(cond
   ((== free (car Pos))   # Found a free slot
  (when (fifo '*Batch)  # Any jobs?
 (set Pos busy)   # Yes
 (later Pos (eval @)) ) )
   ((n== busy (car Pos))  # Found a result
  (msg (car Pos))   # Handle result
  (ifn (fifo '*Batch)   # More jobs?
 (set Pos free)   # No
 (set Pos busy)   # Yes
 (later Pos (eval @)) ) ) ) )
 Slots ) )

OK, if I understood 'task' correctly this runs entirely on the 'main' process 
(the one that will be accessing the database and queueing the commands).
Looks like if the main process is busy in a very CPU-intensive task (no IO at 
all) it could be missing some chances to launch additional processes, but I 
guess I can workaround that inserting a (wait 0) at some point in the expensive 
calculation to give the task a chance to run, right?
This is perfect, nice and simple.

Thanks,
  Jorge


--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: efficiently building a large list

2012-06-02 Thread Joe Bogner
Hi Henrik -

Thanks for sharing. I used your approach and it ran quickly after I built
the index using balance.

(bench (setq  SL (by '((X) (get X 'CustNum)) sort L))) T)
(bench (setq SLC (mapcar '((This) (: CustNum)) SL)) T)
(off A) (bench (balance 'A SLC T))

I'm stumped one piece. If I run the below code multiple times then my total
increases

: (Sum)
4.466 sec
- 495029119
: (Sum)
4.497 sec
- 990058238
: (Sum)
4.507 sec
- 1485087357

(de Sum ()
  (zero Amount)
  (bench
(for This SL
  (let (Key (: CustNum) Amt (: Amount) Idx (idx 'A Key))
(setq Amt (if Amt Amt 0))
(inc 'Amount Amt) #check figure to make sure it sums up

# the val of the cell is by default a customer number, set it to be
0 if it's non-numeric
(ifn (num? (val (car Idx))) (set (car Idx) 0))

(set (car Idx) (+ (val (car Idx)) Amt)) ) ) )
  (sum '((X) (car X)) (idx 'A)) )


I don't know exactly how to phrase the question. I'm storing the total in
the val of the cell (I think). I would have thought it was in the val of
the cell stored in the index. However, if I

(off A) (bench (balance 'A SLA T))

, it still duplicates.

If I run this first, it clears it out: (for X (idx 'A) (set (car (idx 'A
X)) 0))

Where is the value being stored such that I need to set each value of the
cell to 0 regardless of rebuilding  the index?


Here's a simple example that I used to understand the concept:

: (setq Z abc)
- abc
: (val Z)
- abc
: (set Z 0)
- 0
: (val Z)
- 0
: (set Z (+ (val Z) 1))
- 1
: (val Z)
- 1
: Z
- abc

Like your example, I think I'm storing the number in the val of the symbol
(cell).

I apologize for the long winded question

Thanks
Joe




On Fri, Jun 1, 2012 at 1:38 AM, Henrik Sarvell hsarv...@gmail.com wrote:

 I noticed you were talking about idx.

 The below code is from vizreader and was part of a system that counted
 and stored all the non-common words in every article:

 # We extract all words from the article without special characters and
 count them
 (dm words (L)
   (let Words NIL
  (for W L
 (and
(setq W (lowc (pack W)))
(not (common? This W))
(if (idx 'Words W T)
   (inc (car @))
   (set W 1
  (idx 'Words)))

 It is using idx and summing up the occurrences of each word and turned
 out to be the fastest way of solving that problem anyway, maybe it's
 helpful to you.




 On Fri, Jun 1, 2012 at 10:33 AM, Joe Bogner joebog...@gmail.com wrote:
  Thanks Tomas, I've started using nil now.
 
   This is what I came up with to aggregate the data. It actually runs
  reasonably well. I'm sharing because I always enjoy reading other
 people's
  picoLisp code so I figure others may as well.
 
  My source file has 4 million rows
 
  : (bench (pivot L 'CustNum))
  35.226 sec
 
  # outputs 31,000 rows.
 
  My approach is to load it in as follows:
 
  (class +Invoice)
  (rel CustNum (+String))
  (rel ProdNum (+String))
  (rel Amount (+Number))
  (rel Quantity (+Number))
 
  (de Load ()
(zero N)
(setq L (make (
(in invoices.txt
  (until (eof)
(setq Line (line) )
(setq D (mapcar pack (split Line ^I)))
(link (new
  '(+Invoice)
  'CustNum (car (nth D 1))
  'ProdNum (car (nth D 2))
  'Amount (format (car (nth D 3)))
  'Quantity (format (car (nth D 4))) )) ) ) ) ) ) T )
 
 
  I can probably clean this up.  I tinkered around with various approaches
 and
  this was the best I could come up with in a few hours. At first I was
 using
  something like the group from lib.l but found it to be too slow. I think
 it
  was due to the fact that I optimize for a sorted list instead of scanning
  for a match in the made list
 
  (de sortedGroup (List Fld)
(make
  (let (Last NIL LastSym NIL)
   (for This List
(let Key (get This Fld)
  (if ( Last Key)
  (prog
  (if LastSym (link LastSym))
  (off LastSym)
  (push 'LastSym Key)) )
   (push 'LastSym This)
   (setq Last Key) ) )
   (link LastSym)) ) )
 
  And here's the piece that ties it all together:
 
  (de pivot (L Fld)
(let (SL (by '((X) (get X Fld)) sort L) SG (sortedGroup SL Fld))
  (out pivot.txt
(for X SG
  (let (Amt 0)
(mapc '((This) (inc 'Amt (: Amount))) (cdr (reverse X)))
(setq Key (get (car X) Fld))
(prinl Key ^I Amt) ) ) ) ) )
 
 
  (Load)
 
  : (bench (pivot L 'CustNum))
  35.226 sec
 
  : (bench (pivot L 'ProdNum))
  40.945 sec
 
  It seems the best performance was by sorting, then splitting and then
  summing the individual parts. It also makes for a nice report.
 
  Sidenote: At first I thought I was getting better performance by using a
  modified version of quicksort off rosetta code, but then I switched it to
  the built-in sort and saw considerably better speed.
 
  Thanks for the help everyone
 
  On Thu, May 31, 2012 at 3:37 PM,