On Thu, 12 Jan 2017 01:54:28 +0000
Roman Fleysher <roman.fleys...@einstein.yu.edu> wrote:

> From bash script, make list of rows, run commands, load results to
> sqlite. Or
> 
> From sqlite, use extension to run commands for each row.
> 
> Both have issues.

You will find your shell script faster.  The more files, the more it
matters.  That's because (presumably) your script invokes wc(1) once,
and that process iterates over the filenames.  The longer your
pipeline, the more the process is parallelized, too.  

You could also use make(1) to maintain aggregate counts.  Imagine
roughly a rule like this:

        .txt.N:
                wc $^ > $@

Re-run make at convenient intervals, and let it determine which counts
are out of date.  The more files, the bigger, and the more static they
are, the more make saves you.  

If you use an extension, every row gets a process.  If you use 
system(3) instead of exec(2), every row gets two processes.  Each row's
process (or two) has to run sequentially.  That will add up.  

The advantage of the extension is that it's a "view"; the data are as
good as the time the query is run.  (We're giving up atomicity, because
nothing prevents the files from being changed as the query runs.)  

To make the extension faster -- as fast as the script would be -- you
can't run one process per row.  Instead, implement wc(1) as a C
function, and create a wc extension.  Instead of writing the code
yourself, I'd steal an implementation.  NetBSD's wc.c is 354 lines,
half of which is copyright & UI.  

A "wc" extension would be simpler than a "system" extension.  Once
you've written your first one, you'll find doing others is easy.
Before you know it, you'll have a little collection of /usr/bin-like
extensions that do what you need.  They'll be faster as fast as
anything the shell does (modulo parallel execution) and more convenient
because you'll do everything in SQL.  

In case it's any help, I implemented an extension a while back that
calls stat(2), cf. http://www.schemamania.org/sql/sqlite/udf/.  Look
for "File Metadata" halfway down the page.  

HTH.  

--jkl


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to