Where is the resource bottleneck that causes a simple query to never complete after 40 minutes
I am using sqlite to synchronize files on my home PC and my work PC. I created a file listing of each computer and imported into separate tables. Then I used the following query to locate missing files. The ?ftype? is simply ?F? for file or ?D? for directory. SELECT home.fpath FROM home LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL AND home.ftype = 'f?; I started with a small dataset as proof of concept and it worked as expected. I have used MySQL in the past for various projects but this is my first time using Sqlite. I decided to stress test the query and see how it performs with large datasets. I create a file listing of my archive drive that has over 2 million files. When I ran the query on my desktop at home it completed in about 8 seconds. When I tried the same query on my sever at work, I aborted after 40 minutes. I tried it on a second machine with the same results. I tried a much smaller dataset and it worked as expected, just much, much slower. To be fair, I just replaced my computer at home with a custom built machine. It has lots of RAM and a modern, fast quad core processor. However, the server I tried at work has the same amount of RAM and an older 8 core processor. I expected there might be a noticeable difference but 8 seconds compared to 40+ minutes was a surprise. Other caveats are my PC at home is Windows 8.1 with Cygwin + Sqlite. My server at work is Redhat Linux + Sqlite via yum. My question is where is the bottleneck? What is about my machine at home that allows Sqlite to query 2 million files so quickly? I am happy to provide specific hardware and software specifications if that is helpful.