[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-29 Thread Reinhard Max

On Tue, 28 Jul 2015 at 20:55, rotaiv wrote:

> I upgraded to the latest version and it decreased to 16 seconds. 
> With indexes, 5 seconds. :-D

Is index creation time included in those 5 seconds?

If your database gets created from scratch and only used once every 
time you do the syncing (as it appears to be the case, from what you 
wrote before), index creation time needs to be included for a fair 
comparison.


cu
Reinhard


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-29 Thread R.Smith

On 2015-07-28 09:25 PM, rotaiv wrote:
> Not quite.  The older version brought it down to 8 seconds (as compared to
> 5 seconds) but still a whole lot better than 40+ minutes.
>
> It is very interesting to see an index can make such a remarkable
> difference in that particular scenario.  This is my first time creating
> indexes but I will definitely be doing that more in the future.
>
>  From what I read, it was recommended to create indexes for any field used
> in the SELECT statement.  Is that correct?

If you realize how an Index makes things faster, the question above will 
become easy to answer. You should learn more from web sources, but to 
get you rolling, here is the very condensed "Indexing 101" course:

TLDR: Some very basic Index info - please skip if you know Indexing well 
- or - if you are well versed, please note any ideas or caveats I may 
have missed.


I hope you have used one of the old type telephone books - the ones the 
phone company would send out with a long list with everyone's number in 
an area. It's rather easy to use if you know the person's full name and 
family name/surname because it is listed alphabetically by surname and 
shows the first name, address etc.

Imagine for a moment though, your boss for some nefarious reason tasks 
you with getting him a list of all the people in the phone book with the 
name "Jeremy". There is no real quick way to do this, you have to start 
at the start and read every line in the book seeing if perhaps that 
person is a Jeremy. It will take ages. (More than 40 minutes!)

Now imagine some friendly person in the office sneaks you a copy of a 
little black book they have made before, in this book is a list of all 
the first names in alphabetical order along with on which page and line 
in the phone-book they appear. You look into it, go straight to J, look 
for Jeremy, you see 12 Jeremy's and can quickly find them in the large 
phone-book and note their full details. You finish the task in under 5 
minutes and impress the boss.

This little black book is basically an Index and it makes every search 
faster IF you were looking for first names (The thing you were "looking 
for" I will refer to as a "search-term"). So now you have the 
Surname-ordered phone-book (we say the Surname is a Key of the 
phone-book, and the little black book is an Index on Name). Note that 
Surname cannot be a Primary Key because it won't be unique. The only 
value that will probably be completely Unique for every entry in the 
book is the Phone-Number itself, so this might be used in the 
Primary-Key, but isn't important to us when planning search-able Index 
creation, because it will rarely be a search-term).

A typical Schema might look something like this:

CREATE TABLE PhoneBook(
   Surname,  Name,
   City,  Suburb, Street, ApartmentNo,
   PhoneNo,
   PRIMARY KEY (Surname, PhoneNo)
)

and for the little black book:

INDEX ON PhoneBook(Name)

Your boss is however not satisfied that you have suffered enough and so 
tasks you with getting a list of people whom all live in "Church" street 
- and warns you that he will from now on need street-name related lists 
often.  This puts you back to square one, but you immediately realize 
that another little black book which listed all the street-names in 
alphabetical order and then linked to where in the phone-book they 
appear, would make your life much easier. You need an Index on Street, 
compiles the little black book list (which takes a long time) but 
afterward your life is much easier.

INDEX ON PhoneBook(Street)

One more complication happens - your boss reads your first result set, 
and says "no no, I don't want people from Church street in Downtown, I 
only want the people from Church street in Chinatown." The next step is 
to note in your little black book not just the street name, but also the 
Suburb - a combined Index.  Now you are faced with a choice, you can 
list the names as Suburb,Street - or - Street,Suburb. Which to use?

Well, turns out this depends on how you will do the most searching. If 
you will always have a Street-name to search for and sometimes need to 
limit it to a Suburb or two, then  this is better:

INDEX ON PhoneBook(Street, Suburb)

The Index above is also better if you will always have both a 
Street-name and a Suburb to search for because of the fact that there 
are many more Streets than Suburbs and the same Suburb will repeat a lot 
more than the Street - this is called Cardinality. We say the Suburb has 
low Cardinality because it repeats a lot and so there are very few 
Unique Suburb names. Street names have a higher Cardinality (more Unique 
names) and full addresses have very high Cardinality (Almost every 
person lives in a unique home/apartment).

Notice however that if ALL you get to search on is a Suburb, the Index 
above is pretty useless and will result in a full Index-scan which the 
query optimizer will probably forgo and cause a full Table-scan. 
(reading through the 

[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-28 Thread Simon Slavin

On 28 Jul 2015, at 8:25pm, rotaiv  wrote:

> From what I read, it was recommended to create indexes for any field used
> in the SELECT statement.  Is that correct?  Give my original query:
> 
> SELECT home.fpath
> FROM home
> LEFT JOIN work ON work.fpath = home.fpath
> WHERE work.fpath IS NULL
> AND home.ftype = 'f';

I really must get around to writing a web page on this subject.  It's almost a 
decade since I decided to.

Ignore the idea of creating indexes for certain fields.  It will speed things 
up slightly but nowhere near as much as is possible.  A better approach is to 
look at each SELECT command (or WHERE clause) and to create an index ideally 
suited to speeding up that one thing.  Something to bear in mind is that each 
search or sort operation can only usefully use one index.  So for this SELECT:

SELECT * FROM myTable WHERE a = '123' ORDER BY b

there's no point in creating both these indexes

CREATE INDEX m_a ON myTable (a);
CREATE INDEX m_b ON myTable (b);

Because once SQL has used m_a to pick the right rows it is no longer looking at 
the whole table so it can't use index m_b to do the sorting.  However, this 
index

CREATE INDEX m_a_b ON myTable (a,b);

can be used to do both operations: it allows SQL to find the right rows in the 
right order, just by selecting a section of the index.  It is the ideal index 
for that SELECT.

Simon.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-28 Thread Simon Slavin

On 28 Jul 2015, at 7:55pm, rotaiv  wrote:

> That is the machine.  Originally. it was taking 40+ minutes.  I upgraded to
> the latest version and it decreased to 16 seconds.  With indexes, 5
> seconds. :-D

Ah, okay I get it.  In that case the index would probably have given you a time 
of 5 seconds even without the update.  Creating good indexes is key to fast 
searching times in SQL.

Simon.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-28 Thread Simon Slavin

On 28 Jul 2015, at 7:43pm, rotaiv  wrote:

> I create indexes and my original query time dropped from 16 seconds to this:
> 
> real  0m5.928s
> user  0m5.361s
> sys   0m0.565s

Is it worth checking on the machine you originally reported 40+ minutes for ?  
That one should be considerably faster.

Simon.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-28 Thread rotaiv
Not quite.  The older version brought it down to 8 seconds (as compared to
5 seconds) but still a whole lot better than 40+ minutes.

It is very interesting to see an index can make such a remarkable
difference in that particular scenario.  This is my first time creating
indexes but I will definitely be doing that more in the future.

>From what I read, it was recommended to create indexes for any field used
in the SELECT statement.  Is that correct?  Give my original query:

SELECT home.fpath
FROM home
LEFT JOIN work ON work.fpath = home.fpath
WHERE work.fpath IS NULL
AND home.ftype = 'f';

What indexes should I create?  Seems like I only need an index on
home.fpath  and work.fpath but I wanted to make sure.

On Tue, Jul 28, 2015 at 3:13 PM, Simon Slavin  wrote:

>
> On 28 Jul 2015, at 7:55pm, rotaiv  wrote:
>
> > That is the machine.  Originally. it was taking 40+ minutes.  I upgraded
> to
> > the latest version and it decreased to 16 seconds.  With indexes, 5
> > seconds. :-D
>
> Ah, okay I get it.  In that case the index would probably have given you a
> time of 5 seconds even without the update.  Creating good indexes is key to
> fast searching times in SQL.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-28 Thread rotaiv
That is the machine.  Originally. it was taking 40+ minutes.  I upgraded to
the latest version and it decreased to 16 seconds.  With indexes, 5
seconds. :-D

On Tue, Jul 28, 2015 at 2:53 PM, Simon Slavin  wrote:

>
> On 28 Jul 2015, at 7:43pm, rotaiv  wrote:
>
> > I create indexes and my original query time dropped from 16 seconds to
> this:
> >
> > real  0m5.928s
> > user  0m5.361s
> > sys   0m0.565s
>
> Is it worth checking on the machine you originally reported 40+ minutes
> for ?  That one should be considerably faster.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-28 Thread rotaiv
I create indexes and my original query time dropped from 16 seconds to this:

real  0m5.928s
user  0m5.361s
sys   0m0.565s

However, when I use the query you suggested, it was a little slower:

real  0m9.827s
user  0m8.952s
sys   0m0.873s


On Mon, Jul 27, 2015 at 5:38 PM, Keith Medcalf  wrote:

>
> Or even better:
>
> select fpath
>   from home
>  where not exists (select 1 from work where work.fpath=home.fpath and
> work.ftype=home.ftype)
>and ftype = 'f'
>
> with a unique index on home (ftype, fpath) and a unique index on work
> (ftype, fpath) of course.
>
> > -Original Message-
> > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
> > bounces at mailinglists.sqlite.org] On Behalf Of Igor Tandetnik
> > Sent: Monday, 27 July, 2015 10:04
> > To: sqlite-users at mailinglists.sqlite.org
> > Subject: Re: [sqlite] Query takes 8 seconds on one machine but I abort
> > after 40+ minutes on other machines
> >
> > On 7/27/2015 9:58 AM, Simon Slavin wrote:
> > > So you're checking both to see that [work.fpath = home.fpath ] and to
> > see that [work.fpath IS NULL].  This looks weird to me.
> >
> > That's a common technique with LEFT JOIN - it's selecting home records
> > that lack a corresponding work record. In other words, it's equivalent to
> >
> > SELECT fpath
> > FROM home
> > WHERE fpath NOT IN (SELECT fpath FROM work)
> > AND home.ftype = 'f?;
> >
> > --
> > Igor Tandetnik
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Simon Slavin

On 27 Jul 2015, at 7:32pm, rotaiv  wrote:

> Originally, I copied the database file from my PC to my Linux server.  I
> thought it might be a carriage return or line feed issue so I recreated the
> database from scratch on my Linux server.  Didn't change anything.

SQLite is completely platform agnostic.  A database file created on any 
platform works on any platform.  It doesn't care about endline sequences or 
where the high and low bytes go.

Simon.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread R.Smith


On 2015-07-27 05:48 PM, Marc L. Allen wrote:
> When would that specific LEFT JOIN ever do anything except return NULLs for 
> the right table?  It only accepts rows from work where fpath is null, and 
> only joins those rows where fpath = home.fpath.  Since fpath must be null, 
> home.fpath should never be equal.

TLDR: Just a quick simple explain of the left joining phenomenon in case 
anyone needed it (Sometimes the basics elude me), feel free to skip this 
post if you know how it works - thanks.


Left join is an outer join, meaning simply it iterates all rows in the 
joined table and adds all positive matches, duplicating as needed but 
never culling any rows from the originating tuple set (the LEFT table of 
the Join).

So if I ask an SQL engine (any of them) to list me the numbers from 1 to 
10 and LEFT JOIN onto that (ON) the word 'Odd' for numbers that are odd, 
it won't remove any of the original listing, it will simply add the 
fields to the lines where matches exist, and add NULL fields where no 
matches are found. A normal (Inner) join will only show positive matches 
found in both tables.

Some SQLite-flavoured SQL to point out the above:
(Note: this math works in SQLite because of how INT division is handled, 
it may not work exactly like this in other engines, but the idea is 
universal)


CREATE TABLE Nums(x INT);
CREATE TABLE Kinds(r INT, k TEXT);

WITH CN(x) AS (
SELECT 1 UNION ALL SELECT x+1 FROM CN LIMIT 10
)
INSERT INTO Nums SELECT x from CN;

INSERT INTO Kinds VALUES(1,'Odd');



-- Example 1:  This Query uses a standard inner join to match rows in
-- two tables and culls the rows from the first (LEFT) table
-- that doesn't have an entry in the other (RIGHT) table.

SELECT Nums.x, Kinds.k
   FROM Nums
   JOIN Kinds ON Kinds.r=(Nums.x-((Nums.x / 2) * 2));

   --   x  |   k
   --  | -
   --   1  |  Odd
   --   3  |  Odd
   --   5  |  Odd
   --   7  |  Odd
   --   9  |  Odd



--  Example 2:  This Query is word-for-word the exact same but uses a
-- LEFT join to match rows in two tables and add the rows
--  from the second (RIGHT) table to the first (LEFT) table
--  if a match exists, if it doesn't, then it adds NULL, but
-- it never culls the LEFT table in any way:

SELECT Nums.x, Kinds.k
FROM Nums
LEFT JOIN Kinds ON Kinds.r=(Nums.x-((Nums.x / 2) * 2));

   --   x  | k
   --  | --
   --   1  | Odd
   --   2  | Null
   --   3  | Odd
   --   4  | Null
   --   5  | Odd
   --   6  | Null
   --   7  | Odd
   --   8  | Null
   --   9  | Odd
   --  10  | Null



-- Example 3:  So, if we do not really care about whether to DISPLAY the
-- oddness of the values, but only wish to list the EVEN 
numbers
-- (aka the non-Odd numbers), I could use that LEFT JOIN and
-- check for those NULLS in ANY of the joined columns, even the
-- same that were used in the ON clause:
-- (this kind of represents what the OP's query did):

SELECT Nums.x
FROM Nums
   LEFT JOIN Kinds ON Kinds.r=(Nums.x-((Nums.x / 2) * 2))
  WHERE Kinds.r IS NULL;

   --   x
   -- 
   --   2
   --   4
   --   6
   --   8
   --  10




Hope that helps someone - Cheers,
Ryan.



[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread R.Smith


On 2015-07-27 04:29 PM, rotaiv wrote:
>> ?WOW! That is an amazing difference. Makes me glad that I'm OCD about
>> staying relatively current. And please ignore my previous post. I somehow
>> managed to not register that you would doing a LEFT join, an equi-join.?
>>
> I am also OCD about updates.  I installed Sqlite using yum right before I
> tried this test.  I guess CentOS release 6.6 is not using the latest
> version just yet.  I would never have believed a new version would have had
> such a dramatic difference.

It's not the older SQLite taking so much longer, it's simply that the 
file created with a newer SQLite probably contains a SCHEMA or objects 
not readily understood or able to be optimized by the older engine. if 
the file was made with the old engine it would perform perfectly well in 
both the old and new engines.

Either that, or a bug that's been fixed since then.

Have a great day,
Ryan



[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Reinhard Max

On Mon, 27 Jul 2015 at 17:48, Marc L. Allen wrote:

> When would that specific LEFT JOIN ever do anything except return 
> NULLs for the right table?  It only accepts rows from work where 
> fpath is null, and only joins those rows where fpath = home.fpath. 
> Since fpath must be null, home.fpath should never be equal.

You're assuming that "LEFT JOIN ... ON ..." behaves the same as "JOIN 
... WHERE ...", which is not the case as it would defeat the purpose 
of a left join.

sqlite> create table home (fpath text);
sqlite> create table work (fpath text);
sqlite> insert into home values ('path1');
sqlite> insert into home values ('path2');
sqlite> insert into work values ('path2');
sqlite> insert into work values ('path3');
sqlite> SELECT home.fpath
...> FROM home
...> LEFT JOIN work ON work.fpath = home.fpath
...> WHERE work.fpath IS NULL;
path1

See, it returns the rows of home.fpath that don't have a match in 
work.fpath.

cu
Reinhard


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Reinhard Max


On Mon, 27 Jul 2015 at 17:34, Marc L. Allen wrote:

> Sorry.. confused does sqlite allow comparison between NULLs?

No.

> LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL
>
> Should never match anything... it checks to see if work.fpath IS 
> NULL and if the same work.fpath is equal to something.

It is a LEFT (OUTER) join, which NULLs the columns of the right thable 
in the result set for rows in the left table that don't have a match.

cu
Reinhard


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Keith Medcalf

Or even better:

select fpath
  from home
 where not exists (select 1 from work where work.fpath=home.fpath and 
work.ftype=home.ftype)
   and ftype = 'f'

with a unique index on home (ftype, fpath) and a unique index on work (ftype, 
fpath) of course.

> -Original Message-
> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
> bounces at mailinglists.sqlite.org] On Behalf Of Igor Tandetnik
> Sent: Monday, 27 July, 2015 10:04
> To: sqlite-users at mailinglists.sqlite.org
> Subject: Re: [sqlite] Query takes 8 seconds on one machine but I abort
> after 40+ minutes on other machines
> 
> On 7/27/2015 9:58 AM, Simon Slavin wrote:
> > So you're checking both to see that [work.fpath = home.fpath ] and to
> see that [work.fpath IS NULL].  This looks weird to me.
> 
> That's a common technique with LEFT JOIN - it's selecting home records
> that lack a corresponding work record. In other words, it's equivalent to
> 
> SELECT fpath
> FROM home
> WHERE fpath NOT IN (SELECT fpath FROM work)
> AND home.ftype = 'f?;
> 
> --
> Igor Tandetnik
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users





[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Reinhard Max

On Mon, 27 Jul 2015 at 16:24, rotaiv wrote:

> I am more than open to a more logical and/or more efficient query if 
> anyone has suggestions.

One alternative was posted by Igor, but it might end up as the same 
query plan after the optimizer has done its thing (EXPLAIN will tell 
you).

Adding indexes to the columns you are JOINing on might also help, if 
you haven't done already, but you'll have to test whether index 
creation time pays off in saved query time.

cu
Reinhard


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Simon Slavin

On 27 Jul 2015, at 3:04pm, Igor Tandetnik  wrote:

> That's a common technique with LEFT JOIN

Thanks, Igor.


On 27 Jul 2015, at 1:44pm, rotaiv  wrote:

> SELECT home.fpath
> FROM home
> LEFT JOIN work ON work.fpath = home.fpath
> WHERE work.fpath IS NULL
> AND home.ftype = 'f?;

Can you verify that you have an index on work.fpath ?  16 seconds is still a 
little slower than it should be.

Simon.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Tim Streater
On 27 Jul 2015 at 13:44, rotaiv  wrote:

> Where is the resource bottleneck that causes a simple query to never
> complete after 40 minutes
>
> I am using sqlite to synchronize files on my home PC and my work PC.  I
> created a file listing of each computer and imported into separate tables.

And how are you doing this?

> Then I used the following query to locate missing files.  The ?ftype? is
> simply ?F? for file or ?D? for directory.

Have you written a program to do this or are you using the sqlite3 shell 
program? You're presumably working with one database that you move between 
machines so that you can import the listings created on each machine. Is this 
correct? What is the schema for your database?

--
Cheers  --  Tim


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Andy Ling
> > That's a common technique with LEFT JOIN
> 
> Thanks, Igor.
>

A bit MySQL specific, but shows some options

http://explainextended.com/2010/05/27/left-join-is-null-vs-not-in-vs-not-exists-nullable-columns/

Regards

Andy Ling
---
This email has been scanned for email related threats and delivered safely by 
Mimecast.
For more information please visit http://www.mimecast.com
---



[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Simon Slavin

On 27 Jul 2015, at 1:44pm, rotaiv  wrote:

> SELECT home.fpath
> FROM home
> LEFT JOIN work ON work.fpath = home.fpath
> WHERE work.fpath IS NULL
> AND home.ftype = 'f?;

Are you sure you got this right ?  It uses work.fpath twice:

> LEFT JOIN work ON work.fpath = home.fpath
> WHERE work.fpath IS NULL

So you're checking both to see that [work.fpath = home.fpath ] and to see that 
[work.fpath IS NULL].  This looks weird to me.

Simon.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread rotaiv
> It's not the older SQLite taking so much longer, it's simply that the file
> created with a newer SQLite probably contains a SCHEMA or objects not
> readily understood or able to be optimized by the older engine. if the file
> was made with the old engine it would perform perfectly well in both the
> old and new engines.
>

Originally, I copied the database file from my PC to my Linux server.  I
thought it might be a carriage return or line feed issue so I recreated the
database from scratch on my Linux server.  Didn't change anything.


> Either that, or a bug that's been fixed since then.
>

I'm thinking a bug or they significantly improved the selection statement
code!


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Scott Robison
On Mon, Jul 27, 2015 at 1:00 PM, Simon Slavin  wrote:

>
> On 27 Jul 2015, at 7:32pm, rotaiv  wrote:
>
> > Originally, I copied the database file from my PC to my Linux server.  I
> > thought it might be a carriage return or line feed issue so I recreated
> the
> > database from scratch on my Linux server.  Didn't change anything.
>
> SQLite is completely platform agnostic.  A database file created on any
> platform works on any platform.  It doesn't care about endline sequences or
> where the high and low bytes go.
>

This assumes the means of copying the file didn't result in CRLF
translation of some sort. Still, if the copy process did modify the file in
some way, I suspect the problem would have been a corrupt file error, not a
slow running query.
-- 
Scott Robison


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Marc L. Allen
Oh, jeeze.  Wow.

And I know all that... I cleverly managed to ignore the "WHERE" and think it 
was an AND because it was one the same line.

Duh.

Thanks.


-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Reinhard Max
Sent: Monday, July 27, 2015 12:04 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Query takes 8 seconds on one machine but I abort after 
40+ minutes on other machines


On Mon, 27 Jul 2015 at 17:48, Marc L. Allen wrote:

> When would that specific LEFT JOIN ever do anything except return 
> NULLs for the right table?  It only accepts rows from work where fpath 
> is null, and only joins those rows where fpath = home.fpath.
> Since fpath must be null, home.fpath should never be equal.

You're assuming that "LEFT JOIN ... ON ..." behaves the same as "JOIN ... WHERE 
...", which is not the case as it would defeat the purpose of a left join.

sqlite> create table home (fpath text);
sqlite> create table work (fpath text);
sqlite> insert into home values ('path1'); insert into home values 
sqlite> ('path2'); insert into work values ('path2'); insert into work 
sqlite> values ('path3'); SELECT home.fpath
...> FROM home
...> LEFT JOIN work ON work.fpath = home.fpath
...> WHERE work.fpath IS NULL;
path1

See, it returns the rows of home.fpath that don't have a match in work.fpath.

cu
Reinhard
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


This email and any attachments are only for use by the intended recipient(s) 
and may contain legally privileged, confidential, proprietary or otherwise 
private information. Any unauthorized use, reproduction, dissemination, 
distribution or other disclosure of the contents of this e-mail or its 
attachments is strictly prohibited. If you have received this email in error, 
please notify the sender immediately and delete the original.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Marc L. Allen

When would that specific LEFT JOIN ever do anything except return NULLs for the 
right table?  It only accepts rows from work where fpath is null, and only 
joins those rows where fpath = home.fpath.  Since fpath must be null, 
home.fpath should never be equal.

-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Reinhard Max
Sent: Monday, July 27, 2015 11:40 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Query takes 8 seconds on one machine but I abort after 
40+ minutes on other machines



On Mon, 27 Jul 2015 at 17:34, Marc L. Allen wrote:

> Sorry.. confused does sqlite allow comparison between NULLs?

No.

> LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL
>
> Should never match anything... it checks to see if work.fpath IS NULL 
> and if the same work.fpath is equal to something.

It is a LEFT (OUTER) join, which NULLs the columns of the right thable in the 
result set for rows in the left table that don't have a match.

cu
Reinhard
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


This email and any attachments are only for use by the intended recipient(s) 
and may contain legally privileged, confidential, proprietary or otherwise 
private information. Any unauthorized use, reproduction, dissemination, 
distribution or other disclosure of the contents of this e-mail or its 
attachments is strictly prohibited. If you have received this email in error, 
please notify the sender immediately and delete the original.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Marc L. Allen
Sorry.. confused does sqlite allow comparison between NULLs?

I mean...

LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL

Should never match anything... it checks to see if work.fpath IS NULL and if 
the same work.fpath is equal to something.  

I get the weird feeling that I'm missing something stupid, so massive apologies 
in advance.  Maybe I came in late and this is some clever way of using an index?

Marc

-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin
Sent: Monday, July 27, 2015 10:47 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Query takes 8 seconds on one machine but I abort after 
40+ minutes on other machines


On 27 Jul 2015, at 3:04pm, Igor Tandetnik  wrote:

> That's a common technique with LEFT JOIN

Thanks, Igor.


On 27 Jul 2015, at 1:44pm, rotaiv  wrote:

> SELECT home.fpath
> FROM home
> LEFT JOIN work ON work.fpath = home.fpath WHERE work.fpath IS NULL AND 
> home.ftype = 'f?;

Can you verify that you have an index on work.fpath ?  16 seconds is still a 
little slower than it should be.

Simon.
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



This email and any attachments are only for use by the intended recipient(s) 
and may contain legally privileged, confidential, proprietary or otherwise 
private information. Any unauthorized use, reproduction, dissemination, 
distribution or other disclosure of the contents of this e-mail or its 
attachments is strictly prohibited. If you have received this email in error, 
please notify the sender immediately and delete the original.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread rotaiv
> ?WOW! That is an amazing difference. Makes me glad that I'm OCD about
> staying relatively current. And please ignore my previous post. I somehow
> managed to not register that you would doing a LEFT join, an equi-join.?
>

I am also OCD about updates.  I installed Sqlite using yum right before I
tried this test.  I guess CentOS release 6.6 is not using the latest
version just yet.  I would never have believed a new version would have had
such a dramatic difference.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread rotaiv
> Also, I agree with Simon that the query itself, and it's use of NULL, looks
> weird. And I don't see where it could find any matches because the "ON
> work.fpath = home.fpath" should never match if either fpath is NULL.
>
>
I confess, I understand basic SQL but it is not one of my strengths at
all.  I found an example of this query via Google, tried it, and it
worked.  I manually manipulated the data source to generate matches based
on the criteria I was looking for.  The results were completely as
expected.

I am more than open to a more logical and/or more efficient query if anyone
has suggestions.


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread rotaiv
Problem solved.  I updated the version on the server and it completes in 16
seconds.

To answer your question, I am using the find command to created a delimited
text file.  I copy that file to my home PC then importing that file into a
sqlite database table.  I use the same process to create a text file on my
home PC then import that as well.   The tables are very simple.  Just
filename, type, date, time and epoch seconds (easy comparison).

On Mon, Jul 27, 2015 at 10:16 AM, Tim Streater  wrote:

> On 27 Jul 2015 at 13:44, rotaiv  wrote:
>
> > Where is the resource bottleneck that causes a simple query to never
> > complete after 40 minutes
> >
> > I am using sqlite to synchronize files on my home PC and my work PC.  I
> > created a file listing of each computer and imported into separate
> tables.
>
> And how are you doing this?
>
> > Then I used the following query to locate missing files.  The ?ftype? is
> > simply ?F? for file or ?D? for directory.
>
> Have you written a program to do this or are you using the sqlite3 shell
> program? You're presumably working with one database that you move between
> machines so that you can import the listings created on each machine. Is
> this correct? What is the schema for your database?
>
> --
> Cheers  --  Tim
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread rotaiv
Thank for reminding me to check the obvious.  The server was using version
3.6.20.  I manually upgraded to version 3.8.10.2 and the query completed in
16 seconds.  Still not as fast as my home PC but given the age of the
hardware, this is understandable.  I will take 16 seconds as opposed to 40+
minutes any day!


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread rotaiv
That is correct - I am looking for records that are empty.  I tried this
version of the SQL query as well and during my testing, the first example
seemed to be more efficient.

Also, I have version 3.8.10.2 on my home PC and version 3.6.20 on my server
at work. I am going to update the server version and see if that makes a
difference.

On Mon, Jul 27, 2015 at 10:04 AM, Igor Tandetnik  wrote:

> On 7/27/2015 9:58 AM, Simon Slavin wrote:
>
>> So you're checking both to see that [work.fpath = home.fpath ] and to see
>> that [work.fpath IS NULL].  This looks weird to me.
>>
>
> That's a common technique with LEFT JOIN - it's selecting home records
> that lack a corresponding work record. In other words, it's equivalent to
>
> SELECT fpath
> FROM home
> WHERE fpath NOT IN (SELECT fpath FROM work)
> AND home.ftype = 'f?;
>
> --
> Igor Tandetnik
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Igor Tandetnik
On 7/27/2015 9:58 AM, Simon Slavin wrote:
> So you're checking both to see that [work.fpath = home.fpath ] and to see 
> that [work.fpath IS NULL].  This looks weird to me.

That's a common technique with LEFT JOIN - it's selecting home records 
that lack a corresponding work record. In other words, it's equivalent to

SELECT fpath
FROM home
WHERE fpath NOT IN (SELECT fpath FROM work)
AND home.ftype = 'f?;

-- 
Igor Tandetnik



[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread Richard Hipp
On 7/27/15, rotaiv  wrote:
>
> My question is ...  What is about my machine at home
> that allows Sqlite to query 2 million files so quickly?
>

Probably a newer version of SQLite.  You didn't tell us the version
numbers for SQLite on the two machines.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread John McKown
On Mon, Jul 27, 2015 at 9:21 AM, rotaiv  wrote:

> Problem solved.  I updated the version on the server and it completes in 16
> seconds.
>

?WOW! That is an amazing difference. Makes me glad that I'm OCD about
staying relatively current. And please ignore my previous post. I somehow
managed to not register that you would doing a LEFT join, an equi-join.?



>
> To answer your question, I am using the find command to created a delimited
> text file.  I copy that file to my home PC then importing that file into a
> sqlite database table.  I use the same process to create a text file on my
> home PC then import that as well.   The tables are very simple.  Just
> filename, type, date, time and epoch seconds (easy comparison).
>
> --

Schrodinger's backup: The condition of any backup is unknown until a
restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread John McKown
On Mon, Jul 27, 2015 at 8:42 AM, Richard Hipp  wrote:

> On 7/27/15, rotaiv  wrote:
> >
> > My question is ...  What is about my machine at home
> > that allows Sqlite to query 2 million files so quickly?
> >
>

?I missed the initial post, somehow.?

?Could there be an index on something on the home machine which does not
exist on the work machine? Can you do an EXPLAIN on both to see what the
query plan is??


?Perhaps another major reason would be if the home machine hardware is
faster than the other machine?. An extreme example: at work my machine is a
very old 2 GiB Pentium system running at 1.8 GiHz with a normal 5400 RPM
hard drive. My home machine is a 16 GiB, 3.7 GiHz Xeon with an 320 GiB SSD
+ 2 TiB 7200 RPM HD. Need I mention how much faster my home machine is? The
software configuration is almost identical: Fedora 22 x86_64.

I also wonder if the work PC is running other "server" type applications
which use up CPU and RAM. Also, is the work PC running 64 bit or 32 bit
Linux? How about Windows at home (I assume 64 bit here)?

But, in any case, 40 minutes seems really poor performance, unless there is
some hardware issue.

Also, I agree with Simon that the query itself, and it's use of NULL, looks
weird. And I don't see where it could find any matches because the "ON
work.fpath = home.fpath" should never match if either fpath is NULL.



>
> Probably a newer version of SQLite.  You didn't tell us the version
> numbers for SQLite on the two machines.
> --
> D. Richard Hipp
> drh at sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 

Schrodinger's backup: The condition of any backup is unknown until a
restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown


[sqlite] Query takes 8 seconds on one machine but I abort after 40+ minutes on other machines

2015-07-27 Thread rotaiv
Where is the resource bottleneck that causes a simple query to never
complete after 40 minutes

I am using sqlite to synchronize files on my home PC and my work PC.  I
created a file listing of each computer and imported into separate tables.
Then I used the following query to locate missing files.  The ?ftype? is
simply ?F? for file or ?D? for directory.

SELECT home.fpath
FROM home
LEFT JOIN work ON work.fpath = home.fpath
WHERE work.fpath IS NULL
AND home.ftype = 'f?;


I started with a small dataset as proof of concept and it worked as
expected.  I have used MySQL in the past for various projects but this is
my first time using Sqlite.  I decided to stress test the query and see how
it performs with large datasets.

I create a file listing of my archive drive that has over 2 million files.
When I ran the query on my desktop at home it completed in about 8
seconds.  When I tried the same query on my sever at work, I aborted after
40 minutes.  I tried it on a second machine with the same results.  I tried
a much smaller dataset and it worked as expected, just much, much slower.

To be fair, I just replaced my computer at home with a custom built
machine.  It has lots of RAM and a modern, fast quad core processor.
However, the server I tried at work has the same amount of RAM and an older
8 core processor.  I expected there might be a noticeable difference but 8
seconds compared to 40+ minutes was a surprise.

Other caveats are my PC at home is Windows 8.1 with Cygwin + Sqlite.  My
server at work is Redhat Linux + Sqlite via yum.

My question is where is the bottleneck?  What is about my machine at home
that allows Sqlite to query 2 million files so quickly?

I am happy to provide specific hardware and software specifications if that
is helpful.