I noticed you were talking about idx.

The below code is from vizreader and was part of a system that counted
and stored all the non-common words in every article:

# We extract all words from the article without special characters and
count them
(dm words> (L)
   (let Words NIL
      (for W L
         (and
            (setq W (lowc (pack W)))
            (not (common?> This W))
            (if (idx 'Words W T)
               (inc (car @))
               (set W 1))))
      (idx 'Words)))

It is using idx and summing up the occurrences of each word and turned
out to be the fastest way of solving that problem anyway, maybe it's
helpful to you.




On Fri, Jun 1, 2012 at 10:33 AM, Joe Bogner <joebog...@gmail.com> wrote:
> Thanks Tomas, I've started using nil now.
>
>  This is what I came up with to aggregate the data. It actually runs
> reasonably well. I'm sharing because I always enjoy reading other people's
> picoLisp code so I figure others may as well.
>
> My source file has 4 million rows
>
> : (bench (pivot L 'CustNum))
> 35.226 sec
>
> # outputs 31,000 rows.
>
> My approach is to load it in as follows:
>
> (class +Invoice)
> (rel CustNum (+String))
> (rel ProdNum (+String))
> (rel Amount (+Number))
> (rel Quantity (+Number))
>
> (de Load ()
>   (zero N)
>   (setq L (make (
>   (in "invoices.txt"
>     (until (eof)
>       (setq Line (line) )
>       (setq D (mapcar pack (split Line "^I")))
>       (link (new
>         '(+Invoice)
>         'CustNum (car (nth D 1))
>         'ProdNum (car (nth D 2))
>         'Amount (format (car (nth D 3)))
>         'Quantity (format (car (nth D 4))) )) ) ) ) ) ) T )
>
>
> I can probably clean this up.  I tinkered around with various approaches and
> this was the best I could come up with in a few hours. At first I was using
> something like the group from lib.l but found it to be too slow. I think it
> was due to the fact that I optimize for a sorted list instead of scanning
> for a match in the made list
>
> (de sortedGroup (List Fld)
>   (make
>     (let (Last NIL LastSym NIL)
>      (for This List
>       (let Key (get This Fld)
>         (if (<> Last Key)
>             (prog
>             (if LastSym (link LastSym))
>             (off LastSym)
>             (push 'LastSym Key)) )
>          (push 'LastSym This)
>          (setq Last Key) ) )
>          (link LastSym)) ) )
>
> And here's the piece that ties it all together:
>
> (de pivot (L Fld)
>   (let (SL (by '((X) (get X Fld)) sort L) SG (sortedGroup SL Fld))
>     (out "pivot.txt"
>       (for X SG
>         (let (Amt 0)
>           (mapc '((This) (inc 'Amt (: Amount))) (cdr (reverse X)))
>           (setq Key (get (car X) Fld))
>           (prinl Key "^I" Amt) ) ) ) ) )
>
>
> (Load)
>
> : (bench (pivot L 'CustNum))
> 35.226 sec
>
> : (bench (pivot L 'ProdNum))
> 40.945 sec
>
> It seems the best performance was by sorting, then splitting and then
> summing the individual parts. It also makes for a nice report.
>
> Sidenote: At first I thought I was getting better performance by using a
> modified version of quicksort off rosetta code, but then I switched it to
> the built-in sort and saw considerably better speed.
>
> Thanks for the help everyone
>
> On Thu, May 31, 2012 at 3:37 PM, Tomas Hlavaty <t...@logand.com> wrote:
>>
>> Hi Joe,
>>
>> > Sidebar: Is there a way to disable the interactive session from
>> > printing the return of a statement? For example, if I do a (setq ABC
>> > L) where L is a million items, I'd prefer the option of not having all
>> > million items print on my console. I've worked around this by wrapping
>> > it in a prog and returning NIL. Is there an easier way?
>>
>> you could also use http://software-lab.de/doc/refN.html#nil or
>> http://software-lab.de/doc/refT.html#t
>>
>> Cheers,
>>
>> Tomas
>> --
>> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
>
>
--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe

Reply via email to