Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
Hi Ron, It may be that increasing the sqlite cache size will substantially reduce the time for either the CREATE INDEX or the SELECT ... ORDER BY instructions (depending upon which method you choose). https://sqlite.org/pragma.html#pragma_cache_size On Wed, Sep 27, 2017 at 1:58 AM, jungle Boogie wrote: > Hi there, > > Just because I'm interested, I'm wondering if you can identify your > hardware, and how long it takes your system to do your desired operation on > such a large number of records. > > Do let us know which option you performed the query with as well. > > P.S. For best results, I'd recommend using the latest release of SQLite. > > Thanks! > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
Hi there, Just because I'm interested, I'm wondering if you can identify your hardware, and how long it takes your system to do your desired operation on such a large number of records. Do let us know which option you performed the query with as well. P.S. For best results, I'd recommend using the latest release of SQLite. Thanks! ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
š Than you! -Original Message- From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Stephen Chrzanowski Sent: Tuesday, September 26, 2017 4:31 PM To: SQLite mailing list Subject: Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets? uh.. UNIQUE... DISTINCT... , On Tue, Sep 26, 2017 at 4:30 PM, Stephen Chrzanowski wrote: > > > On Tue, Sep 26, 2017 at 1:36 PM, Simon Slavin > wrote: > >> >> >> My one concern in reading your post is how your dates are formatted. >> When putting your date fields into your SQL table you will have to >> ensure that dates are saved as a day number, or as text which >> naturally sorts into date order, e.g. /DD/MM. You should not >> expect SQL to sort text such as "19 October 16" correctly. >> >> Simon. >> > > @OP, Simon is dead on, however, the only correction and clarification > to that statement is you'll want (if required) to sort by /MM/DD, > not /DD/MM. > > Also, for deduplication, if you're executing one instruction, you can > add a UNIQUE clause after the SELECT, unless you've got other > requirements that make defines what a duplicate entry is, then you'd > have to rely on your software for those decisions. > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
uh.. UNIQUE... DISTINCT... , On Tue, Sep 26, 2017 at 4:30 PM, Stephen Chrzanowski wrote: > > > On Tue, Sep 26, 2017 at 1:36 PM, Simon Slavin > wrote: > >> >> >> My one concern in reading your post is how your dates are formatted. >> When putting your date fields into your SQL table you will have to ensure >> that dates are saved as a day number, or as text which naturally sorts into >> date order, e.g. /DD/MM. You should not expect SQL to sort text such >> as "19 October 16" correctly. >> >> Simon. >> > > @OP, Simon is dead on, however, the only correction and clarification to > that statement is you'll want (if required) to sort by /MM/DD, not > /DD/MM. > > Also, for deduplication, if you're executing one instruction, you can add > a UNIQUE clause after the SELECT, unless you've got other requirements that > make defines what a duplicate entry is, then you'd have to rely on your > software for those decisions. > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
On Tue, Sep 26, 2017 at 1:36 PM, Simon Slavin wrote: > > > My one concern in reading your post is how your dates are formatted. When > putting your date fields into your SQL table you will have to ensure that > dates are saved as a day number, or as text which naturally sorts into date > order, e.g. /DD/MM. You should not expect SQL to sort text such as "19 > October 16" correctly. > > Simon. > @OP, Simon is dead on, however, the only correction and clarification to that statement is you'll want (if required) to sort by /MM/DD, not /DD/MM. Also, for deduplication, if you're executing one instruction, you can add a UNIQUE clause after the SELECT, unless you've got other requirements that make defines what a duplicate entry is, then you'd have to rely on your software for those decisions. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
That is exactly what I want to do -Original Message- From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Keith Medcalf Sent: Tuesday, September 26, 2017 3:04 PM To: SQLite mailing list Subject: Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets? SELECT DISTINCT * FROM table ORDER BY field1,field2,field3,field4 DESC,field5; to do the whole sorting and de-duplication in one step ... assuming you want to report duplicate entire rows only once ... --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin >Sent: Tuesday, 26 September, 2017 11:37 >To: SQLite mailing list >Subject: Re: [sqlite] Is there a way to perform a muti-level sort and >extract of large data sets? > > > >On 26 Sep 2017, at 6:24pm, Ron Barnes wrote: > >> I need to sort them as follows... >> >> Sort Field 1 Ascending >> Sort Field 2 Ascending WITHIN field 1 Sort Field 3 Ascending WITHIN >> field 2 WITHIN field 1 Sort Field 4 Descending WITHIN field 3 WITHIN >> field 2 WITHIN field >1 <== This is a Date field and the most current (Highest) Date to float >up >> Sort Field 5 Ascending WITHIN field 4 WITHIN field 3 WITHIN field 2 >WITHIN field 1 > >Trivial in any SQL engine including SQLite. It looks something like > >SELECT * FROM MyTable ORDER BY field1,field2,field3,field4 >DESC,field5 > >SQL users do that sort of thing all the time. > >However, if you use just the above command, SQLite will have to perform >this sorting of 600M records each time you execute the command, which >could take quite a long time ā minutes or hours depending on your >hardware. So for any flavour of SQL you would probably tell it to >create an index ā¦ > >CREATE INDEX m_12345 ON MyTable (field1,field2,field3,field4 >DESC,field5) > >This tells SQL to perform the sorting and save the resulting order on >disk. Then every time you perform the above SELECT command SQL notices >it already has the sort-order saved and just uses that one. >This can change the amount of time taken to a few seconds. > >My one concern in reading your post is how your dates are formatted. >When putting your date fields into your SQL table you will have to >ensure that dates are saved as a day number, or as text which naturally >sorts into date order, e.g. /DD/MM. You should not expect SQL to >sort text such as "19 October 16" correctly. > >Simon. >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
Thank you very much! I will research the two suggestions below. As for your sort assumption, you are correct. A A 1 2 A A 2 1 A B 1 2 A B 1 3 A C 1 1 -Original Message- From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Warren Young Sent: Tuesday, September 26, 2017 2:14 PM To: SQLite mailing list Subject: Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets? On Sep 26, 2017, at 11:24 AM, Ron Barnes wrote: > > I have approximately 600 million records that need to be sorted Where is the data now? > There are 18 table entries. You mean 18 columns per row, right? > I also need to deduplicate the records based upon the sorted output file. You speak of VB.NET, which means you donāt have a uniq tool as on POSIX systems: https://linux.die.net/man/1/uniq If you can install Cygwin or WSL on these Windows boxes, then youād have uniq, as well as a cross-platform solution. SQLite is available for both Cygwin and WSL. > I can take care of the deduplication (I think). The basic functionality of uniq is indeed pretty simple: given sorted input, write as output only lines that donāt repeat the content of the previous input line. The primary reason to mess with Cygwin or WSL on Windows is simply because using pre-built tools, you donāt have to debug and maintain it. Thereās value in ājust run it through uniq.ā Even if you can write it in VB.net in half an hour, youāre vastly over-budget compared to the half second it takes me to type ā | uniqā. > Sort Field 1 Ascending > Sort Field 2 Ascending WITHIN field 1 Iām not sure what you mean by āWITHINā. Are you simply saying that you want the data sorted first by field 2 and then by field 1, so that when two records have the same field 1 content, that the output has that pair of records ordered by field 2? E.g. Field 1Field 2 -- A B A C As opposed to: Field 1Field 2 -- A C A B If so, thatās trivial SQL, well-covered in Simonās reply. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
Thank you so much - I will test this as soon as I get home! -Original Message- From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin Sent: Tuesday, September 26, 2017 1:37 PM To: SQLite mailing list Subject: Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets? On 26 Sep 2017, at 6:24pm, Ron Barnes wrote: > I need to sort them as follows... > > Sort Field 1 Ascending > Sort Field 2 Ascending WITHIN field 1 > Sort Field 3 Ascending WITHIN field 2 WITHIN field 1 Sort Field 4 > Descending WITHIN field 3 WITHIN field 2 WITHIN field 1 <== This is a > Date field and the most current (Highest) Date to float up Sort Field > 5 Ascending WITHIN field 4 WITHIN field 3 WITHIN field 2 WITHIN field > 1 Trivial in any SQL engine including SQLite. It looks something like SELECT * FROM MyTable ORDER BY field1,field2,field3,field4 DESC,field5 SQL users do that sort of thing all the time. However, if you use just the above command, SQLite will have to perform this sorting of 600M records each time you execute the command, which could take quite a long time ā minutes or hours depending on your hardware. So for any flavour of SQL you would probably tell it to create an index ā¦ CREATE INDEX m_12345 ON MyTable (field1,field2,field3,field4 DESC,field5) This tells SQL to perform the sorting and save the resulting order on disk. Then every time you perform the above SELECT command SQL notices it already has the sort-order saved and just uses that one. This can change the amount of time taken to a few seconds. My one concern in reading your post is how your dates are formatted. When putting your date fields into your SQL table you will have to ensure that dates are saved as a day number, or as text which naturally sorts into date order, e.g. /DD/MM. You should not expect SQL to sort text such as "19 October 16" correctly. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
SELECT DISTINCT * FROM table ORDER BY field1,field2,field3,field4 DESC,field5; to do the whole sorting and de-duplication in one step ... assuming you want to report duplicate entire rows only once ... --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin >Sent: Tuesday, 26 September, 2017 11:37 >To: SQLite mailing list >Subject: Re: [sqlite] Is there a way to perform a muti-level sort and >extract of large data sets? > > > >On 26 Sep 2017, at 6:24pm, Ron Barnes wrote: > >> I need to sort them as follows... >> >> Sort Field 1 Ascending >> Sort Field 2 Ascending WITHIN field 1 >> Sort Field 3 Ascending WITHIN field 2 WITHIN field 1 >> Sort Field 4 Descending WITHIN field 3 WITHIN field 2 WITHIN field >1 <== This is a Date field and the most current (Highest) Date to >float up >> Sort Field 5 Ascending WITHIN field 4 WITHIN field 3 WITHIN field 2 >WITHIN field 1 > >Trivial in any SQL engine including SQLite. It looks something like > >SELECT * FROM MyTable ORDER BY field1,field2,field3,field4 >DESC,field5 > >SQL users do that sort of thing all the time. > >However, if you use just the above command, SQLite will have to >perform this sorting of 600M records each time you execute the >command, which could take quite a long time ā minutes or hours >depending on your hardware. So for any flavour of SQL you would >probably tell it to create an index ā¦ > >CREATE INDEX m_12345 ON MyTable (field1,field2,field3,field4 >DESC,field5) > >This tells SQL to perform the sorting and save the resulting order on >disk. Then every time you perform the above SELECT command SQL >notices it already has the sort-order saved and just uses that one. >This can change the amount of time taken to a few seconds. > >My one concern in reading your post is how your dates are formatted. >When putting your date fields into your SQL table you will have to >ensure that dates are saved as a day number, or as text which >naturally sorts into date order, e.g. /DD/MM. You should not >expect SQL to sort text such as "19 October 16" correctly. > >Simon. >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
On Sep 26, 2017, at 11:24 AM, Ron Barnes wrote: > > I have approximately 600 million records that need to be sorted Where is the data now? > There are 18 table entries. You mean 18 columns per row, right? > I also need to deduplicate the records based upon the sorted output file. You speak of VB.NET, which means you donāt have a uniq tool as on POSIX systems: https://linux.die.net/man/1/uniq If you can install Cygwin or WSL on these Windows boxes, then youād have uniq, as well as a cross-platform solution. SQLite is available for both Cygwin and WSL. > I can take care of the deduplication (I think). The basic functionality of uniq is indeed pretty simple: given sorted input, write as output only lines that donāt repeat the content of the previous input line. The primary reason to mess with Cygwin or WSL on Windows is simply because using pre-built tools, you donāt have to debug and maintain it. Thereās value in ājust run it through uniq.ā Even if you can write it in VB.net in half an hour, youāre vastly over-budget compared to the half second it takes me to type ā | uniqā. > Sort Field 1 Ascending > Sort Field 2 Ascending WITHIN field 1 Iām not sure what you mean by āWITHINā. Are you simply saying that you want the data sorted first by field 2 and then by field 1, so that when two records have the same field 1 content, that the output has that pair of records ordered by field 2? E.g. Field 1Field 2 -- A B A C As opposed to: Field 1Field 2 -- A C A B If so, thatās trivial SQL, well-covered in Simonās reply. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
On 26 Sep 2017, at 6:24pm, Ron Barnes wrote: > I need to sort them as follows... > > Sort Field 1 Ascending > Sort Field 2 Ascending WITHIN field 1 > Sort Field 3 Ascending WITHIN field 2 WITHIN field 1 > Sort Field 4 Descending WITHIN field 3 WITHIN field 2 WITHIN field 1 <== This > is a Date field and the most current (Highest) Date to float up > Sort Field 5 Ascending WITHIN field 4 WITHIN field 3 WITHIN field 2 WITHIN > field 1 Trivial in any SQL engine including SQLite. It looks something like SELECT * FROM MyTable ORDER BY field1,field2,field3,field4 DESC,field5 SQL users do that sort of thing all the time. However, if you use just the above command, SQLite will have to perform this sorting of 600M records each time you execute the command, which could take quite a long time ā minutes or hours depending on your hardware. So for any flavour of SQL you would probably tell it to create an index ā¦ CREATE INDEX m_12345 ON MyTable (field1,field2,field3,field4 DESC,field5) This tells SQL to perform the sorting and save the resulting order on disk. Then every time you perform the above SELECT command SQL notices it already has the sort-order saved and just uses that one. This can change the amount of time taken to a few seconds. My one concern in reading your post is how your dates are formatted. When putting your date fields into your SQL table you will have to ensure that dates are saved as a day number, or as text which naturally sorts into date order, e.g. /DD/MM. You should not expect SQL to sort text such as "19 October 16" correctly. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Is there a way to perform a muti-level sort and extract of large data sets?
Hello All, I have approximately 600 million records that need to be sorted and then extracted to a flat file. I am unable to code a solution using visual Basic .NET. It was suggested to me that a DB engine could perform my task for me. Is there a way to accomplish this using the multi-level sort example below? There are 18 table entries. I need to extract all 18 entries to create individual records but in a certain order. I also need to deduplicate the records based upon the sorted output file. I can take care of the deduplication (I think). I just need to get the records in the right order first. I need to sort them as follows... Sort Field 1 Ascending Sort Field 2 Ascending WITHIN field 1 Sort Field 3 Ascending WITHIN field 2 WITHIN field 1 Sort Field 4 Descending WITHIN field 3 WITHIN field 2 WITHIN field 1 <== This is a Date field and the most current (Highest) Date to float up Sort Field 5 Ascending WITHIN field 4 WITHIN field 3 WITHIN field 2 WITHIN field 1 Thank you in advance! -Ron ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users