Where is the resource bottleneck that causes a simple query to never
complete after 40 minutes

I am using sqlite to synchronize files on my home PC and my work PC.  I
created a file listing of each computer and imported into separate tables.
Then I used the following query to locate missing files.  The ?ftype? is
simply ?F? for file or ?D? for directory.

SELECT home.fpath
FROM home
LEFT JOIN work ON work.fpath = home.fpath
WHERE work.fpath IS NULL
AND home.ftype = 'f?;


I started with a small dataset as proof of concept and it worked as
expected.  I have used MySQL in the past for various projects but this is
my first time using Sqlite.  I decided to stress test the query and see how
it performs with large datasets.

I create a file listing of my archive drive that has over 2 million files.
When I ran the query on my desktop at home it completed in about 8
seconds.  When I tried the same query on my sever at work, I aborted after
40 minutes.  I tried it on a second machine with the same results.  I tried
a much smaller dataset and it worked as expected, just much, much slower.

To be fair, I just replaced my computer at home with a custom built
machine.  It has lots of RAM and a modern, fast quad core processor.
However, the server I tried at work has the same amount of RAM and an older
8 core processor.  I expected there might be a noticeable difference but 8
seconds compared to 40+ minutes was a surprise.

Other caveats are my PC at home is Windows 8.1 with Cygwin + Sqlite.  My
server at work is Redhat Linux + Sqlite via yum.

My question is where is the bottleneck?  What is about my machine at home
that allows Sqlite to query 2 million files so quickly?

I am happy to provide specific hardware and software specifications if that
is helpful.

Reply via email to