Re: [sqlite] Simple read-only program very slow

Kevin O'Gorman Mon, 04 Dec 2017 18:49:00 -0800

On Sun, Dec 3, 2017 at 8:49 AM, Keith Medcalf <[email protected]> wrote:


>
> On Sunday, 3 December, 2017 08:24, Richard Rousselot <
> [email protected]> wrote:
>
> >Had similar issue a few years ago; we were using the SQLite3.exe.  We
> >recompiled the exe as 64 bit which allowed it to use more than 4 GB
> >of ram, loaded the machine with as much memory as we could.  Voila,
> >super fast processing.
>
> >Can the Python libraries be made 64 bit some how?
>
> Yes.  You must be using a 64-bit version of Python and the procedure to
> replace the sqlite3.dll / sqlite3.so it uses is the same as for the 32-bit
> version, or to compile and use a 64-bit version of the apsw extension is
> unchanged.
>
> Neither Windows nor Linux can thunk a dynamic load module such that the
> one used is a different model than the running process (it was proprietary
> IBM technology that no one else seems smart enough to duplicate), so you
> have to update Python to the 64-bit model as well.
>
> On Windows 10 16299.98 I have both a 32-bit (Python 2.7.14) and 64-bit
> (Python 3.6.4) installed and build 32-bit DLLs for the former and 64-bit
> for the latter from the same source (just selecting -m32 or -m64 as
> appropriate).  I use the MinGW64/GCC compiler because (a) it can compile in
> either model depending on the switch you use without requiring any code
> changes, (b) supports long long and long double in 32-bit; and, (c) does
> not require the use of the Microsoft C Runtime "moving target" libraries --
> it can compile to the subsystem runtime (MSVCRT) that has been stable
> since, oh, the first OS/2 New Technology (which later became Windows NT)
> way back when.  Oh, and MinGW/GCC does "true" position independent code and
> when you do a static link of a module to either and executable or dynamic
> load library, it is truly static with no strange external dependencies.
>
> Since SQLite3 is heavily I/O bound (or at least syscall/kernel call bound
> for mutexes, etc) in practically everything it does, the 64-bit version is
> much faster (about 20%) than the 32-bit version, when running on a 64-bit
> OS, since the OS does not have to thunk the call stack when
> accessing/returning from  the kernel.
>
> >On Thu, Nov 30, 2017 at 7:01 PM Keith Medcalf <[email protected]>
> >wrote:
> >
> >>
> >> Is there an index on pos where ppos is the left-most field (or the
> >only
> >> field) in the index?
> >> What is the column affinity of ppos?  Of the fiold you are passing
> >as a
> >> parameter?
> >> Is ppos unique?
> >>
> >> If you CREATE [UNIQUE] INDEX searchindex ON pos (ppos, pnum);
> >>
> >> then your query can be satisfied only using the searchindex
> >covering index.
> >>
> >> If there is not an index on ppos, then you will be wasting time
> >recreating
> >> the index for each query.
> >>
> >> You will probably need to increase the cache size beyond the paltry
> >> default in order for the entire btree structures to be cached in
> >RAM -- you
> >> probably want to make it as big as you can.
> >>
> >> ---
> >> The fact that there's a Highway to Hell but only a Stairway to
> >Heaven says
> >> a lot about anticipated traffic volume.
> >>
> >>
> >> >-----Original Message-----
> >> >From: sqlite-users [mailto:sqlite-users-
> >> >[email protected]] On Behalf Of Kevin O'Gorman
> >> >Sent: Saturday, 25 November, 2017 20:14
> >> >To: sqlite-users
> >> >Subject: [sqlite] Simple read-only program very slow
> >> >
> >> >I'm pretty new at SQLite, so this may seem obvious to you.  Be
> >kind.
> >> >I'm using Python on Ubuntu Linux 16.04 LTS, and the sqlite that is
> >> >built
> >> >into Python.  The database
> >> >is using WAL.
> >> >
> >> >I've got a database of some 100 million records, and a file of
> >just
> >> >over
> >> >300 thousand that I want represented in it.  I wanted to check how
> >> >much
> >> >difference it was going to make, so I wrote a super
> >> >simple program to the read the file and count how many records are
> >> >already
> >> >there.  I got impatient waiting for it so I killed the process and
> >> >added an
> >> >output of one dot (".") per 1000 records.  It went very fast for
> >what
> >> >I
> >> >estimate was around 200 dots and hit a wall.  It made progress,
> >but
> >> >very
> >> >very slowly.
> >> >
> >> >So I killed it again and added a commit() call every time it
> >output a
> >> >dot.
> >> >It didn't hit a wall, just some rough road (that is, it slowed
> >down
> >> >at
> >> >about the same spot but not nearly so drastically).
> >> >
> >> >The code makes to changes to the database at all.  Why does
> >commit()
> >> >make a
> >> >difference?  What else should I learn from this?
> >> >
> >> >The field being used for the lookup has an index.
> >> >
> >> >++ kevin
> >> >
> >> >Code follows:
> >> >#!/usr/bin/env python3
> >> >"""Count the number of records that represent rows in the database
> >> >'pos'
> >> >table.
> >> >The database is not modified.
> >> >
> >> > Last Modified: Sat Nov 25 18:56:49 PST 2017
> >> >"""
> >> >
> >> >import os.path          #
> >> >https://docs.python.org/3.5/library/os.path.html
> >> >import sys              #
> >> >https://docs.python.org/3.5/library/sys.html
> >> >import argparse         #
> >> >https://docs.python.org/3.5/library/argparse.html
> >> >import sqlite3          #
> >> >https://docs.python.org/3.5/library/sqlite3.html
> >> >import re               #
> >https://docs.python.org/3.5/library/re.html
> >> >
> >> ># from /usr/local/lib/python3.5/dist-packages
> >> >import qcreate
> >> >from qerror import *
> >> >import myparser
> >> >
> >> >if __name__ == '__main__':
> >> >    parser = argparse.ArgumentParser(description="""A program to
> >read
> >> >positions and count how many are
> >> >            in the database""",)
> >> >    parser.add_argument("--dbname", default=None,
> >> >            help="name of the database to work on (overrides
> >> >qubic.ini
> >> >file)")
> >> >    parser.add_argument("file", nargs='?',
> >> >type=argparse.FileType('r'),
> >> >default=sys.stdin,
> >> >                help="file containing the qsearch results (default
> >> >stdin)")
> >> >    args=parser.parse_args()
> >> >    infile = args.file
> >> >
> >> >    if args.dbname is None:
> >> >        here=os.path.split(os.path.realpath('.'))[1]
> >> >        for confdir in
> >> >".",os.path.join(os.environ["HOME"],".config"),"/etc/xdg/":
> >> >            f = os.path.join(confdir, "qubic.ini")
> >> >            if os.path.exists(f):
> >> >                args.dbname = myparser.parse(f, here, "dbname")
> >> >                if args.dbname is not None:
> >> >                    break
> >> >        if args.dbname is None:
> >> >            print(" *** ERROR: no database name provided and none
> >> >found in
> >> >qubic.ini files")
> >> >            sys.exit(1)
> >> >
> >> >    present = missing = lines = 0
> >> >    with sqlite3.connect(args.dbname) as conn:
> >> >        for line in infile:
> >> >            fields = line.split()
> >> >            pos = fields[0]
> >> >            if len(pos) != 64: # Important test to catch grep
> >without
> >> >--no-filename
> >> >                raise InputError(" ERROR: input line has wrong-
> >sized
> >> >position: " + line)
> >> >
> >> >            pnum = None
> >> >            for row in conn.execute("""
> >> >                        SELECT pnum
> >> >                        FROM pos
> >> >                        WHERE ppos=?
> >> >                    """,(pos,)):
> >> >                pnum = row[0]
> >> >                break
> >> >            if pnum is None:
> >> >                missing += 1
> >> >            else:
> >> >                present += 1
> >> >            lines += 1
> >> >            if lines % 1000 == 0:
> >> >                print(".",flush=True,end="")
> >> >                conn.commit()
> >> >    print("there were",present,"records on file and",missing,"
> >were
> >> >missing")
> >> >    print("out of a total of", lines, "records.")
> >> >
> >> >
> >> >
> >> >--
> >> >word of the year: *kakistocracy*
>
>
All of my machines are 64-bit, and run 64-bit Linux kernels.  Python3 is a
64-bit executable according to 'file'.

I re-ran the whole thing (slightly modified to also do a test on pmain) on
my big dual-Xeon (32 hyperthreads, 256GB! RAM ) from System76.  In spite of
having about half the CPU clock speed, and being single-thread, it ran
about 5 times faster.  I take this to mean that it really helps if the
entire database fits in the kernel memory buffers.  Kinda like putting it
on an SSD, only better.  I also take it to mean two other things:
1. I should have been more careful to make sure the system was in the same
state on every trial.  It makes a difference if part of the DB is already
in the buffers.
2. Most of the slowdown was buffer thrashing, caused by an access pattern
that had to read some parts of the database several times.

I no longer think this is any kind of a problem in SQLite.

-- 
Dictionary.com's word of the year: *complicit*
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Simple read-only program very slow

Reply via email to