Unsubscribe
Good bye Bastien bastiengue...@googlemail.com :-( You are now unsubscribed -- Bastien -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Parallel command execution
Hi Jorge, An important question is: Does this parallel processing of database objects also involve modifications of these objects? If so, the necessary synchronization between the processes will produce additional costs. Yes, I need to update the database with results from the executed commands. Then I suspect that parallelizing the task may make things worse. Parallelizing something makes sense when CPUs load is the bottleneck. For database updates, however, the bottleneck is disk I/O and the system's disk buffer cache. What makes sense in such cases is to parallelize the work on separate databases, running on separate machines. So, you suggest limiting the updates to happen only in the main process, right? Yes. To be precise, the recommended way is to have all database operations run in child processes which are all children of a common parent process. If we call this parent process the main process, then the updates don't happen in that process but in one of the children. Having said this, I see that your test program doesn't operate on a database at all. Calling the C compiler in parallel tasks _does_ make sense. Then we talk about something completely different. In that case I would need something like the following to be able to invoke the shell commands and update the database with the results. This would indeed make sense, if the shell commands induce a heavy load. Then the CPU load is the bottleneck again, and the way to go is to fork several processes to do the work (i.e. call the shell commands), but still have a _single_ process operating on the database. This would be optimal. I don't like it, too convoluted for my taste. Any suggestion on how to improve the performance/style? Perhaps a different approach would be better? Yes, I think so too. If I understand you right, you want to call a number of shell commands (in a batch list), and then store the results in a database. If so, you could use something like that: (de processJobs (CPUs Batch) (let Slots (need CPUs free) (for Exe Batch (let Pos (wait NIL (seek '((Pos) (cond ((== free (car Pos)) # Found a free slot (set Pos busy) ) ((n== busy (car Pos)) # Found a result (msg (car Pos)) # Instead of 'msg': Store result in DB (set Pos busy) ) ) ) Slots ) ) (later Pos (eval Exe)) ) ) ) ) You pass the number of CPUs and a list of executable expressions which may do arbitrary work, including calls to a shell. The 'msg' call can be replaced with something more useful, e.g. a store the result into a database. Example call: (processJobs 4 (make (do 20 (link '(in '(sh -c sleep 1; echo $RANDOM) (read))) ) ) ) This outpus 20 random numbers via 'msg', in maximally 4 parallel processes. I didn't completely analyze your code. Just a warning about an error: (de completeJob (Pos) (let (cdar Pos) (let RESULT (caar Pos) (eval (caddar Pos)) ) ) The second line redefines the function 'cdar'. Is that intended? Cheers, - Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Parallel command execution
Hi, On Jun 2, 2012, at 10:10 AM, Alexander Burger wrote: Having said this, I see that your test program doesn't operate on a database at all. Calling the C compiler in parallel tasks _does_ make sense. Then we talk about something completely different. It was just an example. I'll have a database recording all the dependencies for each target. In that case I would need something like the following to be able to invoke the shell commands and update the database with the results. This would indeed make sense, if the shell commands induce a heavy load. It does, say we have to invoke in the order of 1 gcc commands. My toy application is a 'make' replacement. I don't like it, too convoluted for my taste. Any suggestion on how to improve the performance/style? Perhaps a different approach would be better? Yes, I think so too. If I understand you right, you want to call a number of shell commands (in a batch list), and then store the results in a database. If so, you could use something like that: Not exactly, I would prefer not to compose that list in advance. Would be good to be able to keep queueing commands while previous commands are running. My guess is that deciding what to execute next will be time-consuming, as soon as it can be determined that a target is ready for execution its command should be queued. (de processJobs (CPUs Batch) (let Slots (need CPUs free) (for Exe Batch (let Pos (wait NIL (seek '((Pos) (cond ((== free (car Pos)) # Found a free slot (set Pos busy) ) ((n== busy (car Pos)) # Found a result (msg (car Pos)) # Instead of 'msg': Store result in DB (set Pos busy) ) ) ) Slots ) ) (later Pos (eval Exe)) ) ) ) ) I guess an additional step is needed that performs the equivalent of my waitJobs: (bench (processJobs 2 (mapcar '((X) (list '* X X)) (range 1 10 1 4 9 16 25 36 49 64 0.006 sec No result for 8 and 9. I didn't completely analyze your code. Just a warning about an error: (de completeJob (Pos) (let (cdar Pos) (let RESULT (caar Pos) (eval (caddar Pos)) ) ) The second line redefines the function 'cdar'. Is that intended? Of course not :-) It should read something like: (bind (cadar Pos) ... Each slot contains (result environment expression), and that line tries to restore the environment of the instant when the command was queued, so that it restores all the bound variables before evaluating the expression. Time to experiment a bit more. Thanks. -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Parallel command execution
Correct, and I should have added that in my case that only trivial stuff was actually executed straight in the parent process, the more hairy stuff was forked in the main, so a child calls the boss which creates new children in order to avoid children of children (afaik the db syncing only works with a single child depth). In main.l (the boss): # workers (de forkWorkHorse Args (unless (fork) (eval Args) (bye))) (de updatePresentFeed (Feed) (dbSync) (aImportCommon '+Rss Feed) (commit 'upd) (bye)) (de updatePresentFeeds (Feeds) (for Feed Feeds (let Cycles 30 (if (fork) (let Pid @ (while (and (n0 Cycles) (kill Pid 0)) (dec 'Cycles) (wait 1000)) (kill Pid)) (updatePresentFeed Feed) The last updatePresentFees is actually an ugly (but robust) hack to make up for the fact that some external sites would provide RSS input that would crash the child, at this point I didn't have the time or energy to figure out why, I simply wanted the import process to move along and forget about it. Ie if something takes too long to finish we kill it and go for the next in line. In the child: (dm aImportFleshLater (Feed) (put! Feed 'lastFetch (stamp '+Gh)) (let As (createArticles This Feed T) (boss 'forkWorkHorse 'fleshArticles (lit As)) As)) Here we import the articles so for instance the headlines display ASAP for the user, other intensive meta data (data the user can wait for) generation is handled through calling forkWorkHorse in main.l with the appropriate data. On Sat, Jun 2, 2012 at 3:23 PM, Alexander Burger a...@software-lab.de wrote: Hi Henrik, I don't know if the boss function might be of help to you? It helped me in order for the current forked instance of the web server to be That's right. But I'd like to point out (again) that 'boss' must be used with absolute care. It executes expressions in the parent process, but the parent process is the central coordinator for child synchronization and IPC. It should always be as responsive as possible (because otherwise all children might be slowed down or even blocked), and it should never! crash. A crashing or blocking child process, on the other hand, doesn't do any harm to the rest of the system. Therefore, I strongly recommend to do any database modifications in a child process, as is the standard setup in all PicoLisp examples. Another reason for not putting any load on the parent process is: If the parent starts to do non-trivial work, it will increase its heap size. In the following, all fork'ed child processes will inherit this heap, including possible temporary data and cached objects. It is difficult then go get a predictable behavior. In a nutshell: Keep you hands off 'boss', unless you are absolutely sure what you are doing! Cheers, - Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Parallel command execution
Hi Jorge, It was just an example. I'll have a database recording all the dependencies for each target. In that case I would need something like the following to be able to invoke the shell commands and update the database with the results. This would indeed make sense, if the shell commands induce a heavy load. It does, say we have to invoke in the order of 1 gcc commands. My toy application is a 'make' replacement. OK Yes, I think so too. If I understand you right, you want to call a number of shell commands (in a batch list), and then store the results in a database. If so, you could use something like that: Not exactly, I would prefer not to compose that list in advance. Would be good to be able to keep queueing commands while previous commands are running. My guess is that deciding what to execute next will be time-consuming, as soon as it can be determined that a target is ready for execution its command should be queued. I think you can do that easily by with 'fifo', using a global '*Batch' variable instead of the 'Batch' parameter in (de processJobs (CPUs Batch) I guess an additional step is needed that performs the equivalent of my waitJobs: Right. This was missing. The remaining running jobs must be waited for. Taking all that into account, I propose the following solution: We install a background 'task', which runs, say, every 2 seconds, and handles all that. We use a global '*Batch' to hold the queue: # *Batch (task -2000 0# Run once every 2 sec Slots (need 4 free) # QuadCore (map '((Pos) (cond ((== free (car Pos)) # Found a free slot (when (fifo '*Batch) # Any jobs? (set Pos busy) # Yes (later Pos (eval @)) ) ) ((n== busy (car Pos)) # Found a result (msg (car Pos)) # Handle result (ifn (fifo '*Batch) # More jobs? (set Pos free) # No (set Pos busy) # Yes (later Pos (eval @)) ) ) ) ) Slots ) ) It checks the slots every 2 seconds, starts new processes when a free slot is found, and handles the results if any are available. We can use 'fifo' to batch a new job: (fifo '*Batch '(call cc ..)) : (for X 10 (fifo '*Batch (list '* X X))) - (* 10 10) : 1 4 9 16 25 36 49 64 81 100 Cheers, - Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Parallel command execution
Hi Henrik, Correct, and I should have added that in my case that only trivial stuff was actually executed straight in the parent process, the more OK, all right. hairy stuff was forked in the main, so a child calls the boss which creates new children in order to avoid children of children (afaik the db syncing only works with a single child depth). Right. It uses 'tell' internally, which sends messages only to all sister processes, and to all direct child processes. Cheers, - Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Parallel command execution
Hi Alexander, On Jun 2, 2012, at 1:01 PM, Alexander Burger wrote: (task -2000 0# Run once every 2 sec Slots (need 4 free) # QuadCore (map '((Pos) (cond ((== free (car Pos)) # Found a free slot (when (fifo '*Batch) # Any jobs? (set Pos busy) # Yes (later Pos (eval @)) ) ) ((n== busy (car Pos)) # Found a result (msg (car Pos)) # Handle result (ifn (fifo '*Batch) # More jobs? (set Pos free) # No (set Pos busy) # Yes (later Pos (eval @)) ) ) ) ) Slots ) ) OK, if I understood 'task' correctly this runs entirely on the 'main' process (the one that will be accessing the database and queueing the commands). Looks like if the main process is busy in a very CPU-intensive task (no IO at all) it could be missing some chances to launch additional processes, but I guess I can workaround that inserting a (wait 0) at some point in the expensive calculation to give the task a chance to run, right? This is perfect, nice and simple. Thanks, Jorge -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: efficiently building a large list
Hi Henrik - Thanks for sharing. I used your approach and it ran quickly after I built the index using balance. (bench (setq SL (by '((X) (get X 'CustNum)) sort L))) T) (bench (setq SLC (mapcar '((This) (: CustNum)) SL)) T) (off A) (bench (balance 'A SLC T)) I'm stumped one piece. If I run the below code multiple times then my total increases : (Sum) 4.466 sec - 495029119 : (Sum) 4.497 sec - 990058238 : (Sum) 4.507 sec - 1485087357 (de Sum () (zero Amount) (bench (for This SL (let (Key (: CustNum) Amt (: Amount) Idx (idx 'A Key)) (setq Amt (if Amt Amt 0)) (inc 'Amount Amt) #check figure to make sure it sums up # the val of the cell is by default a customer number, set it to be 0 if it's non-numeric (ifn (num? (val (car Idx))) (set (car Idx) 0)) (set (car Idx) (+ (val (car Idx)) Amt)) ) ) ) (sum '((X) (car X)) (idx 'A)) ) I don't know exactly how to phrase the question. I'm storing the total in the val of the cell (I think). I would have thought it was in the val of the cell stored in the index. However, if I (off A) (bench (balance 'A SLA T)) , it still duplicates. If I run this first, it clears it out: (for X (idx 'A) (set (car (idx 'A X)) 0)) Where is the value being stored such that I need to set each value of the cell to 0 regardless of rebuilding the index? Here's a simple example that I used to understand the concept: : (setq Z abc) - abc : (val Z) - abc : (set Z 0) - 0 : (val Z) - 0 : (set Z (+ (val Z) 1)) - 1 : (val Z) - 1 : Z - abc Like your example, I think I'm storing the number in the val of the symbol (cell). I apologize for the long winded question Thanks Joe On Fri, Jun 1, 2012 at 1:38 AM, Henrik Sarvell hsarv...@gmail.com wrote: I noticed you were talking about idx. The below code is from vizreader and was part of a system that counted and stored all the non-common words in every article: # We extract all words from the article without special characters and count them (dm words (L) (let Words NIL (for W L (and (setq W (lowc (pack W))) (not (common? This W)) (if (idx 'Words W T) (inc (car @)) (set W 1 (idx 'Words))) It is using idx and summing up the occurrences of each word and turned out to be the fastest way of solving that problem anyway, maybe it's helpful to you. On Fri, Jun 1, 2012 at 10:33 AM, Joe Bogner joebog...@gmail.com wrote: Thanks Tomas, I've started using nil now. This is what I came up with to aggregate the data. It actually runs reasonably well. I'm sharing because I always enjoy reading other people's picoLisp code so I figure others may as well. My source file has 4 million rows : (bench (pivot L 'CustNum)) 35.226 sec # outputs 31,000 rows. My approach is to load it in as follows: (class +Invoice) (rel CustNum (+String)) (rel ProdNum (+String)) (rel Amount (+Number)) (rel Quantity (+Number)) (de Load () (zero N) (setq L (make ( (in invoices.txt (until (eof) (setq Line (line) ) (setq D (mapcar pack (split Line ^I))) (link (new '(+Invoice) 'CustNum (car (nth D 1)) 'ProdNum (car (nth D 2)) 'Amount (format (car (nth D 3))) 'Quantity (format (car (nth D 4))) )) ) ) ) ) ) T ) I can probably clean this up. I tinkered around with various approaches and this was the best I could come up with in a few hours. At first I was using something like the group from lib.l but found it to be too slow. I think it was due to the fact that I optimize for a sorted list instead of scanning for a match in the made list (de sortedGroup (List Fld) (make (let (Last NIL LastSym NIL) (for This List (let Key (get This Fld) (if ( Last Key) (prog (if LastSym (link LastSym)) (off LastSym) (push 'LastSym Key)) ) (push 'LastSym This) (setq Last Key) ) ) (link LastSym)) ) ) And here's the piece that ties it all together: (de pivot (L Fld) (let (SL (by '((X) (get X Fld)) sort L) SG (sortedGroup SL Fld)) (out pivot.txt (for X SG (let (Amt 0) (mapc '((This) (inc 'Amt (: Amount))) (cdr (reverse X))) (setq Key (get (car X) Fld)) (prinl Key ^I Amt) ) ) ) ) ) (Load) : (bench (pivot L 'CustNum)) 35.226 sec : (bench (pivot L 'ProdNum)) 40.945 sec It seems the best performance was by sorting, then splitting and then summing the individual parts. It also makes for a nice report. Sidenote: At first I thought I was getting better performance by using a modified version of quicksort off rosetta code, but then I switched it to the built-in sort and saw considerably better speed. Thanks for the help everyone On Thu, May 31, 2012 at 3:37 PM,