[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On Tue, 28 Jul 2015 at 20:55, rotaiv wrote: > I upgraded to the latest version and it decreased to 16 seconds. > With indexes, 5 seconds. :-D Is index creation time included in those 5 seconds? If your database gets created from scratch and only used once every time you do the syncing (as it appears to be the case, from what you wrote before), index creation time needs to be included for a fair comparison. cu Reinhard
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 2015-07-28 09:25 PM, rotaiv wrote: > Not quite. The older version brought it down to 8 seconds (as compared to > 5 seconds) but still a whole lot better than 40+ minutes. > > It is very interesting to see an index can make such a remarkable > difference in that particular scenario. This is my first time creating > indexes but I will definitely be doing that more in the future. > > From what I read, it was recommended to create indexes for any field used > in the SELECT statement. Is that correct? If you realize how an Index makes things faster, the question above will become easy to answer. You should learn more from web sources, but to get you rolling, here is the very condensed "Indexing 101" course: TLDR: Some very basic Index info - please skip if you know Indexing well - or - if you are well versed, please note any ideas or caveats I may have missed. I hope you have used one of the old type telephone books - the ones the phone company would send out with a long list with everyone's number in an area. It's rather easy to use if you know the person's full name and family name/surname because it is listed alphabetically by surname and shows the first name, address etc. Imagine for a moment though, your boss for some nefarious reason tasks you with getting him a list of all the people in the phone book with the name "Jeremy". There is no real quick way to do this, you have to start at the start and read every line in the book seeing if perhaps that person is a Jeremy. It will take ages. (More than 40 minutes!) Now imagine some friendly person in the office sneaks you a copy of a little black book they have made before, in this book is a list of all the first names in alphabetical order along with on which page and line in the phone-book they appear. You look into it, go straight to J, look for Jeremy, you see 12 Jeremy's and can quickly find them in the large phone-book and note their full details. You finish the task in under 5 minutes and impress the boss. This little black book is basically an Index and it makes every search faster IF you were looking for first names (The thing you were "looking for" I will refer to as a "search-term"). So now you have the Surname-ordered phone-book (we say the Surname is a Key of the phone-book, and the little black book is an Index on Name). Note that Surname cannot be a Primary Key because it won't be unique. The only value that will probably be completely Unique for every entry in the book is the Phone-Number itself, so this might be used in the Primary-Key, but isn't important to us when planning search-able Index creation, because it will rarely be a search-term). A typical Schema might look something like this: CREATE TABLE PhoneBook( Surname, Name, City, Suburb, Street, ApartmentNo, PhoneNo, PRIMARY KEY (Surname, PhoneNo) ) and for the little black book: INDEX ON PhoneBook(Name) Your boss is however not satisfied that you have suffered enough and so tasks you with getting a list of people whom all live in "Church" street - and warns you that he will from now on need street-name related lists often. This puts you back to square one, but you immediately realize that another little black book which listed all the street-names in alphabetical order and then linked to where in the phone-book they appear, would make your life much easier. You need an Index on Street, compiles the little black book list (which takes a long time) but afterward your life is much easier. INDEX ON PhoneBook(Street) One more complication happens - your boss reads your first result set, and says "no no, I don't want people from Church street in Downtown, I only want the people from Church street in Chinatown." The next step is to note in your little black book not just the street name, but also the Suburb - a combined Index. Now you are faced with a choice, you can list the names as Suburb,Street - or - Street,Suburb. Which to use? Well, turns out this depends on how you will do the most searching. If you will always have a Street-name to search for and sometimes need to limit it to a Suburb or two, then this is better: INDEX ON PhoneBook(Street, Suburb) The Index above is also better if you will always have both a Street-name and a Suburb to search for because of the fact that there are many more Streets than Suburbs and the same Suburb will repeat a lot more than the Street - this is called Cardinality. We say the Suburb has low Cardinality because it repeats a lot and so there are very few Unique Suburb names. Street names have a higher Cardinality (more Unique names) and full addresses have very high Cardinality (Almost every person lives in a unique home/apartment). Notice however that if ALL you get to search on is a Suburb, the Index above is pretty useless and will result in a full Index-scan which the query optimizer will probably forgo and cause a full Table-scan. (reading through the en
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 28 Jul 2015, at 8:25pm, rotaiv wrote: > From what I read, it was recommended to create indexes for any field used > in the SELECT statement. Is that correct? Give my original query: > > SELECT home.fpath > FROM home > LEFT JOIN work ON work.fpath = home.fpath > WHERE work.fpath IS NULL > AND home.ftype = 'f'; I really must get around to writing a web page on this subject. It's almost a decade since I decided to. Ignore the idea of creating indexes for certain fields. It will speed things up slightly but nowhere near as much as is possible. A better approach is to look at each SELECT command (or WHERE clause) and to create an index ideally suited to speeding up that one thing. Something to bear in mind is that each search or sort operation can only usefully use one index. So for this SELECT: SELECT * FROM myTable WHERE a = '123' ORDER BY b there's no point in creating both these indexes CREATE INDEX m_a ON myTable (a); CREATE INDEX m_b ON myTable (b); Because once SQL has used m_a to pick the right rows it is no longer looking at the whole table so it can't use index m_b to do the sorting. However, this index CREATE INDEX m_a_b ON myTable (a,b); can be used to do both operations: it allows SQL to find the right rows in the right order, just by selecting a section of the index. It is the ideal index for that SELECT. Simon.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 28 Jul 2015, at 7:55pm, rotaiv wrote: > That is the machine. Originally. it was taking 40+ minutes. I upgraded to > the latest version and it decreased to 16 seconds. With indexes, 5 > seconds. :-D Ah, okay I get it. In that case the index would probably have given you a time of 5 seconds even without the update. Creating good indexes is key to fast searching times in SQL. Simon.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 28 Jul 2015, at 7:43pm, rotaiv wrote: > I create indexes and my original query time dropped from 16 seconds to this: > > real 0m5.928s > user 0m5.361s > sys 0m0.565s Is it worth checking on the machine you originally reported 40+ minutes for ? That one should be considerably faster. Simon.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
Not quite. The older version brought it down to 8 seconds (as compared to 5 seconds) but still a whole lot better than 40+ minutes. It is very interesting to see an index can make such a remarkable difference in that particular scenario. This is my first time creating indexes but I will definitely be doing that more in the future. >From what I read, it was recommended to create indexes for any field used in the SELECT statement. Is that correct? Give my original query: SELECT home.fpath FROM home LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL AND home.ftype = 'f'; What indexes should I create? Seems like I only need an index on home.fpath and work.fpath but I wanted to make sure. On Tue, Jul 28, 2015 at 3:13 PM, Simon Slavin wrote: > > On 28 Jul 2015, at 7:55pm, rotaiv wrote: > > > That is the machine. Originally. it was taking 40+ minutes. I upgraded > to > > the latest version and it decreased to 16 seconds. With indexes, 5 > > seconds. :-D > > Ah, okay I get it. In that case the index would probably have given you a > time of 5 seconds even without the update. Creating good indexes is key to > fast searching times in SQL. > > Simon. > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
That is the machine. Originally. it was taking 40+ minutes. I upgraded to the latest version and it decreased to 16 seconds. With indexes, 5 seconds. :-D On Tue, Jul 28, 2015 at 2:53 PM, Simon Slavin wrote: > > On 28 Jul 2015, at 7:43pm, rotaiv wrote: > > > I create indexes and my original query time dropped from 16 seconds to > this: > > > > real 0m5.928s > > user 0m5.361s > > sys 0m0.565s > > Is it worth checking on the machine you originally reported 40+ minutes > for ? That one should be considerably faster. > > Simon. > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
I create indexes and my original query time dropped from 16 seconds to this: real 0m5.928s user 0m5.361s sys 0m0.565s However, when I use the query you suggested, it was a little slower: real 0m9.827s user 0m8.952s sys 0m0.873s On Mon, Jul 27, 2015 at 5:38 PM, Keith Medcalf wrote: > > Or even better: > > select fpath > from home > where not exists (select 1 from work where work.fpath=home.fpath and > work.ftype=home.ftype) >and ftype = 'f' > > with a unique index on home (ftype, fpath) and a unique index on work > (ftype, fpath) of course. > > > -Original Message- > > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users- > > bounces at mailinglists.sqlite.org] On Behalf Of Igor Tandetnik > > Sent: Monday, 27 July, 2015 10:04 > > To: sqlite-users at mailinglists.sqlite.org > > Subject: Re: [sqlite] Query takes 8 seconds on one machine but I abort > > after 40+ minutes on other machines > > > > On 7/27/2015 9:58 AM, Simon Slavin wrote: > > > So you're checking both to see that [work.fpath = home.fpath ] and to > > see that [work.fpath IS NULL]. This looks weird to me. > > > > That's a common technique with LEFT JOIN - it's selecting home records > > that lack a corresponding work record. In other words, it's equivalent to > > > > SELECT fpath > > FROM home > > WHERE fpath NOT IN (SELECT fpath FROM work) > > AND home.ftype = 'f?; > > > > -- > > Igor Tandetnik > > > > ___ > > sqlite-users mailing list > > sqlite-users at mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 27 Jul 2015, at 7:32pm, rotaiv wrote: > Originally, I copied the database file from my PC to my Linux server. I > thought it might be a carriage return or line feed issue so I recreated the > database from scratch on my Linux server. Didn't change anything. SQLite is completely platform agnostic. A database file created on any platform works on any platform. It doesn't care about endline sequences or where the high and low bytes go. Simon.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 2015-07-27 05:48 PM, Marc L. Allen wrote: > When would that specific LEFT JOIN ever do anything except return NULLs for > the right table? It only accepts rows from work where fpath is null, and > only joins those rows where fpath = home.fpath. Since fpath must be null, > home.fpath should never be equal. TLDR: Just a quick simple explain of the left joining phenomenon in case anyone needed it (Sometimes the basics elude me), feel free to skip this post if you know how it works - thanks. Left join is an outer join, meaning simply it iterates all rows in the joined table and adds all positive matches, duplicating as needed but never culling any rows from the originating tuple set (the LEFT table of the Join). So if I ask an SQL engine (any of them) to list me the numbers from 1 to 10 and LEFT JOIN onto that (ON) the word 'Odd' for numbers that are odd, it won't remove any of the original listing, it will simply add the fields to the lines where matches exist, and add NULL fields where no matches are found. A normal (Inner) join will only show positive matches found in both tables. Some SQLite-flavoured SQL to point out the above: (Note: this math works in SQLite because of how INT division is handled, it may not work exactly like this in other engines, but the idea is universal) CREATE TABLE Nums(x INT); CREATE TABLE Kinds(r INT, k TEXT); WITH CN(x) AS ( SELECT 1 UNION ALL SELECT x+1 FROM CN LIMIT 10 ) INSERT INTO Nums SELECT x from CN; INSERT INTO Kinds VALUES(1,'Odd'); -- Example 1: This Query uses a standard inner join to match rows in -- two tables and culls the rows from the first (LEFT) table -- that doesn't have an entry in the other (RIGHT) table. SELECT Nums.x, Kinds.k FROM Nums JOIN Kinds ON Kinds.r=(Nums.x-((Nums.x / 2) * 2)); -- x | k -- | - -- 1 | Odd -- 3 | Odd -- 5 | Odd -- 7 | Odd -- 9 | Odd -- Example 2: This Query is word-for-word the exact same but uses a -- LEFT join to match rows in two tables and add the rows -- from the second (RIGHT) table to the first (LEFT) table -- if a match exists, if it doesn't, then it adds NULL, but -- it never culls the LEFT table in any way: SELECT Nums.x, Kinds.k FROM Nums LEFT JOIN Kinds ON Kinds.r=(Nums.x-((Nums.x / 2) * 2)); -- x | k -- | -- -- 1 | Odd -- 2 | Null -- 3 | Odd -- 4 | Null -- 5 | Odd -- 6 | Null -- 7 | Odd -- 8 | Null -- 9 | Odd -- 10 | Null -- Example 3: So, if we do not really care about whether to DISPLAY the -- oddness of the values, but only wish to list the EVEN numbers -- (aka the non-Odd numbers), I could use that LEFT JOIN and -- check for those NULLS in ANY of the joined columns, even the -- same that were used in the ON clause: -- (this kind of represents what the OP's query did): SELECT Nums.x FROM Nums LEFT JOIN Kinds ON Kinds.r=(Nums.x-((Nums.x / 2) * 2)) WHERE Kinds.r IS NULL; -- x -- -- 2 -- 4 -- 6 -- 8 -- 10 Hope that helps someone - Cheers, Ryan.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 2015-07-27 04:29 PM, rotaiv wrote: >> ?WOW! That is an amazing difference. Makes me glad that I'm OCD about >> staying relatively current. And please ignore my previous post. I somehow >> managed to not register that you would doing a LEFT join, an equi-join.? >> > I am also OCD about updates. I installed Sqlite using yum right before I > tried this test. I guess CentOS release 6.6 is not using the latest > version just yet. I would never have believed a new version would have had > such a dramatic difference. It's not the older SQLite taking so much longer, it's simply that the file created with a newer SQLite probably contains a SCHEMA or objects not readily understood or able to be optimized by the older engine. if the file was made with the old engine it would perform perfectly well in both the old and new engines. Either that, or a bug that's been fixed since then. Have a great day, Ryan
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On Mon, 27 Jul 2015 at 17:48, Marc L. Allen wrote: > When would that specific LEFT JOIN ever do anything except return > NULLs for the right table? It only accepts rows from work where > fpath is null, and only joins those rows where fpath = home.fpath. > Since fpath must be null, home.fpath should never be equal. You're assuming that "LEFT JOIN ... ON ..." behaves the same as "JOIN ... WHERE ...", which is not the case as it would defeat the purpose of a left join. sqlite> create table home (fpath text); sqlite> create table work (fpath text); sqlite> insert into home values ('path1'); sqlite> insert into home values ('path2'); sqlite> insert into work values ('path2'); sqlite> insert into work values ('path3'); sqlite> SELECT home.fpath ...> FROM home ...> LEFT JOIN work ON work.fpath = home.fpath ...> WHERE work.fpath IS NULL; path1 See, it returns the rows of home.fpath that don't have a match in work.fpath. cu Reinhard
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On Mon, 27 Jul 2015 at 17:34, Marc L. Allen wrote: > Sorry.. confused does sqlite allow comparison between NULLs? No. > LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL > > Should never match anything... it checks to see if work.fpath IS > NULL and if the same work.fpath is equal to something. It is a LEFT (OUTER) join, which NULLs the columns of the right thable in the result set for rows in the left table that don't have a match. cu Reinhard
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
Or even better: select fpath from home where not exists (select 1 from work where work.fpath=home.fpath and work.ftype=home.ftype) and ftype = 'f' with a unique index on home (ftype, fpath) and a unique index on work (ftype, fpath) of course. > -Original Message- > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users- > bounces at mailinglists.sqlite.org] On Behalf Of Igor Tandetnik > Sent: Monday, 27 July, 2015 10:04 > To: sqlite-users at mailinglists.sqlite.org > Subject: Re: [sqlite] Query takes 8 seconds on one machine but I abort > after 40+ minutes on other machines > > On 7/27/2015 9:58 AM, Simon Slavin wrote: > > So you're checking both to see that [work.fpath = home.fpath ] and to > see that [work.fpath IS NULL]. This looks weird to me. > > That's a common technique with LEFT JOIN - it's selecting home records > that lack a corresponding work record. In other words, it's equivalent to > > SELECT fpath > FROM home > WHERE fpath NOT IN (SELECT fpath FROM work) > AND home.ftype = 'f?; > > -- > Igor Tandetnik > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On Mon, 27 Jul 2015 at 16:24, rotaiv wrote: > I am more than open to a more logical and/or more efficient query if > anyone has suggestions. One alternative was posted by Igor, but it might end up as the same query plan after the optimizer has done its thing (EXPLAIN will tell you). Adding indexes to the columns you are JOINing on might also help, if you haven't done already, but you'll have to test whether index creation time pays off in saved query time. cu Reinhard
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 27 Jul 2015, at 3:04pm, Igor Tandetnik wrote: > That's a common technique with LEFT JOIN Thanks, Igor. On 27 Jul 2015, at 1:44pm, rotaiv wrote: > SELECT home.fpath > FROM home > LEFT JOIN work ON work.fpath = home.fpath > WHERE work.fpath IS NULL > AND home.ftype = 'f?; Can you verify that you have an index on work.fpath ? 16 seconds is still a little slower than it should be. Simon.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 27 Jul 2015 at 13:44, rotaiv wrote: > Where is the resource bottleneck that causes a simple query to never > complete after 40 minutes > > I am using sqlite to synchronize files on my home PC and my work PC. I > created a file listing of each computer and imported into separate tables. And how are you doing this? > Then I used the following query to locate missing files. The ?ftype? is > simply ?F? for file or ?D? for directory. Have you written a program to do this or are you using the sqlite3 shell program? You're presumably working with one database that you move between machines so that you can import the listings created on each machine. Is this correct? What is the schema for your database? -- Cheers -- Tim
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
> > That's a common technique with LEFT JOIN > > Thanks, Igor. > A bit MySQL specific, but shows some options http://explainextended.com/2010/05/27/left-join-is-null-vs-not-in-vs-not-exists-nullable-columns/ Regards Andy Ling --- This email has been scanned for email related threats and delivered safely by Mimecast. For more information please visit http://www.mimecast.com ---
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 27 Jul 2015, at 1:44pm, rotaiv wrote: > SELECT home.fpath > FROM home > LEFT JOIN work ON work.fpath = home.fpath > WHERE work.fpath IS NULL > AND home.ftype = 'f?; Are you sure you got this right ? It uses work.fpath twice: > LEFT JOIN work ON work.fpath = home.fpath > WHERE work.fpath IS NULL So you're checking both to see that [work.fpath = home.fpath ] and to see that [work.fpath IS NULL]. This looks weird to me. Simon.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
> It's not the older SQLite taking so much longer, it's simply that the file > created with a newer SQLite probably contains a SCHEMA or objects not > readily understood or able to be optimized by the older engine. if the file > was made with the old engine it would perform perfectly well in both the > old and new engines. > Originally, I copied the database file from my PC to my Linux server. I thought it might be a carriage return or line feed issue so I recreated the database from scratch on my Linux server. Didn't change anything. > Either that, or a bug that's been fixed since then. > I'm thinking a bug or they significantly improved the selection statement code!
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On Mon, Jul 27, 2015 at 1:00 PM, Simon Slavin wrote: > > On 27 Jul 2015, at 7:32pm, rotaiv wrote: > > > Originally, I copied the database file from my PC to my Linux server. I > > thought it might be a carriage return or line feed issue so I recreated > the > > database from scratch on my Linux server. Didn't change anything. > > SQLite is completely platform agnostic. A database file created on any > platform works on any platform. It doesn't care about endline sequences or > where the high and low bytes go. > This assumes the means of copying the file didn't result in CRLF translation of some sort. Still, if the copy process did modify the file in some way, I suspect the problem would have been a corrupt file error, not a slow running query. -- Scott Robison
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
Oh, jeeze. Wow. And I know all that... I cleverly managed to ignore the "WHERE" and think it was an AND because it was one the same line. Duh. Thanks. -Original Message- From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Reinhard Max Sent: Monday, July 27, 2015 12:04 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines On Mon, 27 Jul 2015 at 17:48, Marc L. Allen wrote: > When would that specific LEFT JOIN ever do anything except return > NULLs for the right table? It only accepts rows from work where fpath > is null, and only joins those rows where fpath = home.fpath. > Since fpath must be null, home.fpath should never be equal. You're assuming that "LEFT JOIN ... ON ..." behaves the same as "JOIN ... WHERE ...", which is not the case as it would defeat the purpose of a left join. sqlite> create table home (fpath text); sqlite> create table work (fpath text); sqlite> insert into home values ('path1'); insert into home values sqlite> ('path2'); insert into work values ('path2'); insert into work sqlite> values ('path3'); SELECT home.fpath ...> FROM home ...> LEFT JOIN work ON work.fpath = home.fpath ...> WHERE work.fpath IS NULL; path1 See, it returns the rows of home.fpath that don't have a match in work.fpath. cu Reinhard ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users This email and any attachments are only for use by the intended recipient(s) and may contain legally privileged, confidential, proprietary or otherwise private information. Any unauthorized use, reproduction, dissemination, distribution or other disclosure of the contents of this e-mail or its attachments is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
When would that specific LEFT JOIN ever do anything except return NULLs for the right table? It only accepts rows from work where fpath is null, and only joins those rows where fpath = home.fpath. Since fpath must be null, home.fpath should never be equal. -Original Message- From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Reinhard Max Sent: Monday, July 27, 2015 11:40 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines On Mon, 27 Jul 2015 at 17:34, Marc L. Allen wrote: > Sorry.. confused does sqlite allow comparison between NULLs? No. > LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL > > Should never match anything... it checks to see if work.fpath IS NULL > and if the same work.fpath is equal to something. It is a LEFT (OUTER) join, which NULLs the columns of the right thable in the result set for rows in the left table that don't have a match. cu Reinhard ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users This email and any attachments are only for use by the intended recipient(s) and may contain legally privileged, confidential, proprietary or otherwise private information. Any unauthorized use, reproduction, dissemination, distribution or other disclosure of the contents of this e-mail or its attachments is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
Sorry.. confused does sqlite allow comparison between NULLs? I mean... LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL Should never match anything... it checks to see if work.fpath IS NULL and if the same work.fpath is equal to something. I get the weird feeling that I'm missing something stupid, so massive apologies in advance. Maybe I came in late and this is some clever way of using an index? Marc -Original Message- From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin Sent: Monday, July 27, 2015 10:47 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines On 27 Jul 2015, at 3:04pm, Igor Tandetnik wrote: > That's a common technique with LEFT JOIN Thanks, Igor. On 27 Jul 2015, at 1:44pm, rotaiv wrote: > SELECT home.fpath > FROM home > LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL AND > home.ftype = 'f?; Can you verify that you have an index on work.fpath ? 16 seconds is still a little slower than it should be. Simon. ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users This email and any attachments are only for use by the intended recipient(s) and may contain legally privileged, confidential, proprietary or otherwise private information. Any unauthorized use, reproduction, dissemination, distribution or other disclosure of the contents of this e-mail or its attachments is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
> ?WOW! That is an amazing difference. Makes me glad that I'm OCD about > staying relatively current. And please ignore my previous post. I somehow > managed to not register that you would doing a LEFT join, an equi-join.? > I am also OCD about updates. I installed Sqlite using yum right before I tried this test. I guess CentOS release 6.6 is not using the latest version just yet. I would never have believed a new version would have had such a dramatic difference.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
> Also, I agree with Simon that the query itself, and it's use of NULL, looks > weird. And I don't see where it could find any matches because the "ON > work.fpath = home.fpath" should never match if either fpath is NULL. > > I confess, I understand basic SQL but it is not one of my strengths at all. I found an example of this query via Google, tried it, and it worked. I manually manipulated the data source to generate matches based on the criteria I was looking for. The results were completely as expected. I am more than open to a more logical and/or more efficient query if anyone has suggestions.
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
Problem solved. I updated the version on the server and it completes in 16 seconds. To answer your question, I am using the find command to created a delimited text file. I copy that file to my home PC then importing that file into a sqlite database table. I use the same process to create a text file on my home PC then import that as well. The tables are very simple. Just filename, type, date, time and epoch seconds (easy comparison). On Mon, Jul 27, 2015 at 10:16 AM, Tim Streater wrote: > On 27 Jul 2015 at 13:44, rotaiv wrote: > > > Where is the resource bottleneck that causes a simple query to never > > complete after 40 minutes > > > > I am using sqlite to synchronize files on my home PC and my work PC. I > > created a file listing of each computer and imported into separate > tables. > > And how are you doing this? > > > Then I used the following query to locate missing files. The ?ftype? is > > simply ?F? for file or ?D? for directory. > > Have you written a program to do this or are you using the sqlite3 shell > program? You're presumably working with one database that you move between > machines so that you can import the listings created on each machine. Is > this correct? What is the schema for your database? > > -- > Cheers -- Tim > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
Thank for reminding me to check the obvious. The server was using version 3.6.20. I manually upgraded to version 3.8.10.2 and the query completed in 16 seconds. Still not as fast as my home PC but given the age of the hardware, this is understandable. I will take 16 seconds as opposed to 40+ minutes any day!
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
That is correct - I am looking for records that are empty. I tried this version of the SQL query as well and during my testing, the first example seemed to be more efficient. Also, I have version 3.8.10.2 on my home PC and version 3.6.20 on my server at work. I am going to update the server version and see if that makes a difference. On Mon, Jul 27, 2015 at 10:04 AM, Igor Tandetnik wrote: > On 7/27/2015 9:58 AM, Simon Slavin wrote: > >> So you're checking both to see that [work.fpath = home.fpath ] and to see >> that [work.fpath IS NULL]. This looks weird to me. >> > > That's a common technique with LEFT JOIN - it's selecting home records > that lack a corresponding work record. In other words, it's equivalent to > > SELECT fpath > FROM home > WHERE fpath NOT IN (SELECT fpath FROM work) > AND home.ftype = 'f?; > > -- > Igor Tandetnik > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 7/27/2015 9:58 AM, Simon Slavin wrote: > So you're checking both to see that [work.fpath = home.fpath ] and to see > that [work.fpath IS NULL]. This looks weird to me. That's a common technique with LEFT JOIN - it's selecting home records that lack a corresponding work record. In other words, it's equivalent to SELECT fpath FROM home WHERE fpath NOT IN (SELECT fpath FROM work) AND home.ftype = 'f?; -- Igor Tandetnik
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On 7/27/15, rotaiv wrote: > > My question is ... What is about my machine at home > that allows Sqlite to query 2 million files so quickly? > Probably a newer version of SQLite. You didn't tell us the version numbers for SQLite on the two machines. -- D. Richard Hipp drh at sqlite.org
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On Mon, Jul 27, 2015 at 9:21 AM, rotaiv wrote: > Problem solved. I updated the version on the server and it completes in 16 > seconds. > ?WOW! That is an amazing difference. Makes me glad that I'm OCD about staying relatively current. And please ignore my previous post. I somehow managed to not register that you would doing a LEFT join, an equi-join.? > > To answer your question, I am using the find command to created a delimited > text file. I copy that file to my home PC then importing that file into a > sqlite database table. I use the same process to create a text file on my > home PC then import that as well. The tables are very simple. Just > filename, type, date, time and epoch seconds (easy comparison). > > -- Schrodinger's backup: The condition of any backup is unknown until a restore is attempted. Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be. He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! <>< John McKown
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
On Mon, Jul 27, 2015 at 8:42 AM, Richard Hipp wrote: > On 7/27/15, rotaiv wrote: > > > > My question is ... What is about my machine at home > > that allows Sqlite to query 2 million files so quickly? > > > ?I missed the initial post, somehow.? ?Could there be an index on something on the home machine which does not exist on the work machine? Can you do an EXPLAIN on both to see what the query plan is?? ?Perhaps another major reason would be if the home machine hardware is faster than the other machine?. An extreme example: at work my machine is a very old 2 GiB Pentium system running at 1.8 GiHz with a normal 5400 RPM hard drive. My home machine is a 16 GiB, 3.7 GiHz Xeon with an 320 GiB SSD + 2 TiB 7200 RPM HD. Need I mention how much faster my home machine is? The software configuration is almost identical: Fedora 22 x86_64. I also wonder if the work PC is running other "server" type applications which use up CPU and RAM. Also, is the work PC running 64 bit or 32 bit Linux? How about Windows at home (I assume 64 bit here)? But, in any case, 40 minutes seems really poor performance, unless there is some hardware issue. Also, I agree with Simon that the query itself, and it's use of NULL, looks weird. And I don't see where it could find any matches because the "ON work.fpath = home.fpath" should never match if either fpath is NULL. > > Probably a newer version of SQLite. You didn't tell us the version > numbers for SQLite on the two machines. > -- > D. Richard Hipp > drh at sqlite.org > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- Schrodinger's backup: The condition of any backup is unknown until a restore is attempted. Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be. He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! <>< John McKown
[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines
Where is the resource bottleneck that causes a simple query to never complete after 40 minutes I am using sqlite to synchronize files on my home PC and my work PC. I created a file listing of each computer and imported into separate tables. Then I used the following query to locate missing files. The ?ftype? is simply ?F? for file or ?D? for directory. SELECT home.fpath FROM home LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL AND home.ftype = 'f?; I started with a small dataset as proof of concept and it worked as expected. I have used MySQL in the past for various projects but this is my first time using Sqlite. I decided to stress test the query and see how it performs with large datasets. I create a file listing of my archive drive that has over 2 million files. When I ran the query on my desktop at home it completed in about 8 seconds. When I tried the same query on my sever at work, I aborted after 40 minutes. I tried it on a second machine with the same results. I tried a much smaller dataset and it worked as expected, just much, much slower. To be fair, I just replaced my computer at home with a custom built machine. It has lots of RAM and a modern, fast quad core processor. However, the server I tried at work has the same amount of RAM and an older 8 core processor. I expected there might be a noticeable difference but 8 seconds compared to 40+ minutes was a surprise. Other caveats are my PC at home is Windows 8.1 with Cygwin + Sqlite. My server at work is Redhat Linux + Sqlite via yum. My question is where is the bottleneck? What is about my machine at home that allows Sqlite to query 2 million files so quickly? I am happy to provide specific hardware and software specifications if that is helpful.