On Thu, 12 Jan 2017 01:54:28 +0000 Roman Fleysher <roman.fleys...@einstein.yu.edu> wrote:
> From bash script, make list of rows, run commands, load results to > sqlite. Or > > From sqlite, use extension to run commands for each row. > > Both have issues. You will find your shell script faster. The more files, the more it matters. That's because (presumably) your script invokes wc(1) once, and that process iterates over the filenames. The longer your pipeline, the more the process is parallelized, too. You could also use make(1) to maintain aggregate counts. Imagine roughly a rule like this: .txt.N: wc $^ > $@ Re-run make at convenient intervals, and let it determine which counts are out of date. The more files, the bigger, and the more static they are, the more make saves you. If you use an extension, every row gets a process. If you use system(3) instead of exec(2), every row gets two processes. Each row's process (or two) has to run sequentially. That will add up. The advantage of the extension is that it's a "view"; the data are as good as the time the query is run. (We're giving up atomicity, because nothing prevents the files from being changed as the query runs.) To make the extension faster -- as fast as the script would be -- you can't run one process per row. Instead, implement wc(1) as a C function, and create a wc extension. Instead of writing the code yourself, I'd steal an implementation. NetBSD's wc.c is 354 lines, half of which is copyright & UI. A "wc" extension would be simpler than a "system" extension. Once you've written your first one, you'll find doing others is easy. Before you know it, you'll have a little collection of /usr/bin-like extensions that do what you need. They'll be faster as fast as anything the shell does (modulo parallel execution) and more convenient because you'll do everything in SQL. In case it's any help, I implemented an extension a while back that calls stat(2), cf. http://www.schemamania.org/sql/sqlite/udf/. Look for "File Metadata" halfway down the page. HTH. --jkl _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users