[sqlite] comma-separated string data
A data column in a link table contains comma-separated string data, where each value represents a value to link to another table. (many-to-many relationship) How do you 'parse' a table entry like: "4,66,51,3009,2,678, ." to extract these values and use them in an SQL statement, perhaps a WHERE id='66'? Thanks, peter ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Performance HELP
This question regards SQLite (3.6) performance. (Lengthy because I want to describe the environment.) . Win-7 (64-bit, though I don't know if SQLite uses 64-bit). . 3 year old HP laptop with Intel Core Duo CPU P8400 @ 2.27GHz with only 4GB memory . 286GB HD (50% full) + 1TB 7200rpm external eSata HD (90% free) - contains target SQLite DB. . Target DB is single user, read-only, static . as follows o DB size = 24GB: in 8 tables. 2 most used tables contain 43million rows and 86 million rows - others are small look-up tables with 50-10,000 rows. 1-3 indices per table. o imported from 44GB CSV file with 45million rows of 600 columns o Used for: SQL query only (CLI or GUI), no updates, no deletes, no inserts, or no transactions, no apps. . Future: Double the size . intend to clone existing DB and populate with another year's data. Additional years will likely be added at later time. Now to the questions: 1. Is my current DB too large for SQLite to handle efficiently? I just read in O'Reilly, Using SQLite, book, "If you need to store and process several gigabytes or more of data, it might be wise to consider a more performance-oriented product." 2. Adding which resources would most improve performance ??? (Many queries, like a SELECT COUNT (and GROUP) of indexed column on a JOIN of the 2 large tables may take 30-60 or more minutes.) a. add 4GB or 12GB more memory ? b. get faster HD for target DB . ext. eSATA SSD (say 256GB) ? c. get faster ext. eSATA 10,000rpm HD for DB ? d. make performance changes to current DB settings ? (or learn to write better SQL ??) e. convert from SQLite to MySQL or something else? I like the simplicity and low admin or SQLite - so I prefer not to switch DBMS f.Something else ? Thank you so much for your help. peterK ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Performance HELP
Thank you for your responses. I was hoping for an easy hardware solution, like more memory or a faster HD . but it looks like indices and table design are where I need to focus. peterK ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] SQLite Use of Indexes
I have a large DB and want to improve query performance. Context: the DB contains 2 tables representing nationwide claim data: . a CLAIMS table = 43M rows with indices on claim_no and stateCounty code; and . a LINE table = 85M rows with indices on claim_no and HCPCS (a 5 char text code) . Have run ANALYZE . Using Win7 Want to count occurrences of given HCPCS codes for given states. Here is the SQL query I use: SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD FROM claims , LINE WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO AND HCPSCD IN ('78451','78452','78453','78454','78469','78472','78473','78481','78483','78 494','78496','78499') AND state_cd IN ('21', '45') GROUP BY STATE_CD ; Query Plan results Order From 0 1 TABLE LINE WITH INDEX idx_line_hcpscd 1 0 TABLE claims WITH INDEX idx_claims_claim_no QUESTIONS: 1. Can SQLite use > 1 index per table per SELECT? Seems like using both indices for each table would be better than the single index per table as chosen by the plan. 2. Is using something like "SELECT . FROM LINE INDEXED BY claim_no AND INDEXED BY hcpscd" possible? What is the correct syntax to include 2 INDEXED BY this way? 3. Is there a better way to write this query, for example, would rearranging the order of AND criteria in WHERE clause improve performance? 4. How do I interpret the 'order' and 'from' in the query plan results? Thanks much for your help, peter ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite Use of Indexes
Thank you Simon for responding to my questions. Your phonebook (FName/LName) analogy clearly explained why 2 indices per table per select won't work. Let me provide a bit more info and a possible attempt to implement your suggestions for better indices. My 'bread and butter' query counts the frequency of HCPSCD occurrences for each county of interest in each state of interest. [Occasionally I want counts for entire state.] I generally have 6 groups of 1~15 related HCPSCD codes and want counts from 1-12 counties of a state. CLAIMS table columns: claim_no, state_cd, cnty_cd are columns of interest; 40 other columns rarely used. LINE table columns: claim_no, hcpscd, plus 25 more columns (these are repeating fields per claim) [FYI: DB contains 44M claim numbers with 1~12 HCPSCD codes per claim, several thousand HCPSCD codes of which I am interested in 44 in 6 groups, 50 states with 9~125 counties per state of which I am usually interested in ~10.] I currently do a query like this . SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD, CNTY_CD FROM CLAIMS , LINE WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO AND HCPSCD IN ('78451','78452','78453','78454','78469','78472','78473','78481','78483','78 494','78496','78499') -- this is first group of hcpscd codes always queried AND STATE_CD = '21' AND CNTY_CD IN ('220', '345', '570') GROUP BY CNTY_CD UNION {repeat above SELECT with second group of hcpscd codes for same state and county . and so on 4 more times} So, to implement your suggestion of crafting better indices, here's my approach: 1. First action is joining the 2 tables on claim_no. {Therefore claim_no should be first row in index for both tables} 2. Find state and county in claims table 3. Find HCPSCD in line table So my 2 combined indices would be: CREATE INDEX idx_Claim_State_Cnty ON CLAIMS (CLAIM_NO, STATE_CD, CNTY_CD) CREATE INDEX idx_Line_hcpscd ON LINE (CLAIM_NO, HCPSCD) Is this what you were suggesting? Will these indices produce improved performance from the single column indices I was using? Thanks so much. Peter === > 1. Can SQLite use > 1 index per table per SELECT? Seems like using > both indices for each table would be better than the single index per table > as chosen by the plan. No, you?re right: one index per table, per SELECT. Pretend you have a phone book with two indexes: surname and first name. You have to look up someone called Miri Kallas. You can use either of the indexes to look up one name but once you?ve done that the other index is useless to you. You have made up some indexes which are useful but not the most useful. Drop those indexes and try to figure out one compound index on each table which would be best for the SELECT you asked about. Remember that this CREATE INDEX it1 ON t1 (c1) CREATE INDEX it2 ON t1 (c2) does not do the same thing as CREATE INDEX it12 ON t1 (c1, c2) Can?t do it for you because I can?t tell which of your columns are from which table. > 2. Is using something like "SELECT . FROM LINE INDEXED BY claim_no > AND INDEXED BY hcpscd" possible? What is the correct syntax to include 2 > INDEXED BY this way? If you have to tell SQLite which index to use you?re doing it wrong. Make up a good index and SQLite will decide to use it. Especially if you?ve done ANALYZE. > 3. Is there a better way to write this query, for example, would > rearranging the order of AND criteria in WHERE clause improve performance? The query optimizer is meant to do all that for you. However, I suspect that you may understand your query better if you get rid of some of your 'IN' clauses. If you imagine doing 24 (= 2 * 12) different SELECTs, one for each State and HCPSCD, what would your SELECT look like then ? With good indexes it should be possible to make each of those SELECTs execute ridiculously fast. Once you?ve figured out how to do that and make it run fast, /then/ you might want to recombine the query for each State, though perhaps not put both States in the same query. > 4. How do I interpret the 'order' and 'from' in the query plan > results? It?s showing you what each of your indexes is being used for. And what is shows is the neither index is being used for both selecting records and arranging the order of results. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] SQLite Use of Indices
Please help me understand how query plan chooses an index given sqlite_stat1 table. I originally had created 4 single-column indices (L3, L4, C3, C4) on this large 2 table DB. Following Simon's suggestion to create better/combined/reverse indices, I created L2 and C2 but query plan still selected the original indices - L3 and C4. So, I tried creating index C1 (the 'reverse' of C2) and hit the jackpot - the 3 min query runtime was cut to 1 min using C1 and C3. {This is counter-intuitive to me - does this mean you do the join after filtering?} Trying to repeat the success, I created L1 (the 'reverse' of L2) but no joy; query plan still selects C1 and L3 indices. Are there any other possible indices I should try? Testing with this query: SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD, CNTY_CD FROM claims , LINE WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO AND HCPSCD IN ('78451','78452','78453','78454','78469','78472','78473','78481','78483','78 494','78496','78499') AND STATE_CD = '21' AND CNTY_CD IN ('060','100','210') GROUP BY CNTY_CD ; Query Plan result: order from detail 0 0 TABLE claims WITH INDEX idx_Cnty_State_ClaimNo 1 1 TABLE LINE WITH INDEX idx_line_claim_no Here is the sqlite_stat1 table: Table index detail L1LINE idx_Line_hcpscd_claimNo 86378354 7807 2 L2LINE idx_Line_ClaimNo_HCPSCD 86378354 2 2 L3 ** LINE idx_line_claim_no 86378354 2 L4LINE idx_line_hcpscd 86378354 7807 C1 ** CLAIMSidx_Cnty_State_ClaimNo 43428892 112511 123841 C2 CLAIMSidx_Claim_State_Cnty 43428892 1 1 1 C3 CLAIMSidx_claims_claim_no 43428892 1 C4 CLAIMSidx_claims_stateCounty 43428892 620413 12384 How do I read the detail above for each index. What is the logic here? Thanks much, peter From: Simon Slavin <slav...@bigfraud.org> To: General Discussion of SQLite Database <sqlite-users@sqlite.org> Subject: Re: [sqlite] SQLite Use of Indexes Message-ID: <1b21a0a2-dc54-4afe-83e2-0832a4154...@bigfraud.org> Content-Type: text/plain; charset=windows-1252 On 9 Jul 2013, at 6:06pm, peter korinis <kori...@earthlink.net> wrote: > So, to implement your suggestion of crafting better indices, here?s my approach: > 1. First action is joining the 2 tables on claim_no. {Therefore claim_no should be first row in index for both tables} > 2. Find state and county in claims table > 3. Find HCPSCD in line table > So my 2 combined indices would be: >CREATE INDEX idx_Claim_State_Cnty ON CLAIMS (CLAIM_NO, STATE_CD, CNTY_CD) >CREATE INDEX idx_Line_hcpscd ON LINE (CLAIM_NO, HCPSCD) Those would be good indexes. You can see how good by using EXPLAIN QUERY PLAN for your SELECT. Or better still, actually try them out and time the results. Might be interesting to then reverse the order of the columns in each index and try those. See if they?re better or worse. Or create lots of indexes, then use EXPLAIN QUERY PLAN and find out which indexes SQLite decided to use, then delete the others. I am deliberately not giving you an absolute answer because you have a good large set of data for testing and you obviously understand the idea now. You?ll learn more by trying out several alternatives yourself. Simon. -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite Use of Indexes
From: "James K. Lowden" <jklow...@schemamania.org> To: sqlite-users@sqlite.org Subject: Re: [sqlite] SQLite Use of Indexes Message-ID: <20130712161038.b8b4df84.jklow...@schemamania.org> Content-Type: text/plain; charset=US-ASCII On Mon, 8 Jul 2013 15:32:21 -0400 "peter korinis" <kori...@earthlink.net> wrote: / thanks James, peter > . a CLAIMS table = 43M rows with indices on claim_no and > stateCounty code; and > . a LINE table = 85M rows with indices on claim_no and HCPCS > (a 5 char text code) > . Have run ANALYZE > . Using Win7 > Want to count occurrences of given HCPCS codes for given states. SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD FROM claims , LINE WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO -- ^^ == is not SQL right, a little python slipped in / AND HCPSCD IN ( '78451' , '78452' , '78453' , '78454' , '78469' , '78472' , '78473' , '78481' , '78483' , '78494' , '78496' , '78499' ) AND state_cd IN ('21', '45') GROUP BY STATE_CD >From a design perpective, you might want to think about what the IN clause is all about. Whatever they have in common probably belongs in a table, so you'd be able to say SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD FROM claims, LINE, HPs WHERE CLAIMS.CLAIM_NO = LINE.CLAIM_NO AND HCPSCD = HPs.code AND HPs.category = 'foo' AND state_cd IN ('21', '45') GROUP BY STATE_CD That suggests HCPSCD as the column to index; I suppose you have a lot more different HCPSCDs there are >7000 HCPSCDs ; I'm interested in ~100 in 12 groups / than STATE_CDs. If the HPs table is indexed by (category, code), then the join is on two ordered sets instead of an extended OR. /// would joining 3 tables give better response ?? // If HCPSCD is actually an integer, you may save space and time by declaring it as such. /// HCPSCD is NOT an integer // HTH. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] is SQLite the right tool to analyze a 44GB file
I'm new to SQLite . not a programmer . not a DBA . just an end-user with no dev support for a pilot project (single user, no updates, just queries). I want to analyze the data contained in a 44GB csv file with 44M rows x 600 columns (fields all <15 char). Seems like a DBMS will allow me to query it in a variety of ways to analyze the data. I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc with 4GB RAM + 200GB free disk space. End-user tools like Excel & Access failed due to lack of memory. I downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on but it would not load the csv files - 'csv worker failed'. So I tried Database Master from Nucleon but it failed after loading (it took 100 minutes) ~57,000 rows with error message = 'database or disk is full". I tried to create another table in the same db but could not with same error message. The DB size shows as 10,000KB (that looks suspiciously like a size setting?). >From what I've read SQLite can handle this size DB. So it seems that either I do not have enough RAM or there are memory/storage (default) limits or maybe time-out issues that prevent loading this large file . or the 2 GUI tools I tried have size limits. I do have a fast server (16GB, 12 procs, 64-bit intel, Win server) and an iMAC available. 1. Is SQLite the wrong tool for this project? (I don't want the overkill and admin overhead of a large MySQL or SQL Server, etc.) 2. If SQLite will work, are there configuration settings in SQLite or Win7 that will permit the load . or is there a better tool for this project? Thanks much for helping a newbie! peterK ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
Thank you all. Look like I'm stuck with the CLI though I have contacted Nucleon software support ... tried CLI yesterday but need more practice. Is there a good reference book you would recommend for SQLite? peter -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Black, Michael (IS) Sent: Tuesday, May 01, 2012 4:22 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file You need to try and do an import from the shell. GUIs seem to have way too many limits. http://sqlite.org/download.html Don't do any indexes up frontdo them afterwords if they'll help your queries. Indexes will slow down your import notably. I don't think you're anywhere near the limits of sqlite since it talks about terabytes. http://sqlite.org/limits.html Somebody else can answer for sure but wrapping your .import inside a transaction may be a good thing. I don't know if that's done by default. Your queries are liable to be pretty slow depending on what you have to do. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of peter korinis [kori...@earthlink.net] Sent: Tuesday, May 01, 2012 3:06 PM To: sqlite-users@sqlite.org Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file I'm new to SQLite . not a programmer . not a DBA . just an end-user with no dev support for a pilot project (single user, no updates, just queries). I want to analyze the data contained in a 44GB csv file with 44M rows x 600 columns (fields all <15 char). Seems like a DBMS will allow me to query it in a variety of ways to analyze the data. I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc with 4GB RAM + 200GB free disk space. End-user tools like Excel & Access failed due to lack of memory. I downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on but it would not load the csv files - 'csv worker failed'. So I tried Database Master from Nucleon but it failed after loading (it took 100 minutes) ~57,000 rows with error message = 'database or disk is full". I tried to create another table in the same db but could not with same error message. The DB size shows as 10,000KB (that looks suspiciously like a size setting?). >From what I've read SQLite can handle this size DB. So it seems that either I do not have enough RAM or there are memory/storage (default) limits or maybe time-out issues that prevent loading this large file . or the 2 GUI tools I tried have size limits. I do have a fast server (16GB, 12 procs, 64-bit intel, Win server) and an iMAC available. 1. Is SQLite the wrong tool for this project? (I don't want the overkill and admin overhead of a large MySQL or SQL Server, etc.) 2. If SQLite will work, are there configuration settings in SQLite or Win7 that will permit the load . or is there a better tool for this project? Thanks much for helping a newbie! peterK ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
not sure yet ... but i'm working on it (between interruptions). thanks -Original Message- >From: "Black, Michael (IS)" <michael.bla...@ngc.com> >Sent: May 2, 2012 10:15 AM >To: General Discussion of SQLite Database <sqlite-users@sqlite.org> >Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file > >Does that mean using the CLI worked for you? > > > >If so, you may be able to access the database with the other programs AFTER >you create it. > >Seems that creating a db from csv is challenging to some and not thoroughly >tested for large data sources. > > > >Michael D. Black > >Senior Scientist > >Advanced Analytics Directorate > >Advanced GEOINT Solutions Operating Unit > >Northrop Grumman Information Systems > >________ >From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on >behalf of peter korinis [kori...@earthlink.net] >Sent: Wednesday, May 02, 2012 9:06 AM >To: 'General Discussion of SQLite Database' >Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file > >Thank you all. >Look like I'm stuck with the CLI though I have contacted Nucleon software >support ... tried CLI yesterday but need more practice. >Is there a good reference book you would recommend for SQLite? > >peter > >-Original Message- >From: sqlite-users-boun...@sqlite.org >[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Black, Michael (IS) >Sent: Tuesday, May 01, 2012 4:22 PM >To: General Discussion of SQLite Database >Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file > >You need to try and do an import from the shell. GUIs seem to have way too >many limits. > >http://sqlite.org/download.html > > > >Don't do any indexes up frontdo them afterwords if they'll help your >queries. Indexes will slow down your import notably. > > > >I don't think you're anywhere near the limits of sqlite since it talks about >terabytes. > >http://sqlite.org/limits.html > > > >Somebody else can answer for sure but wrapping your .import inside a >transaction may be a good thing. > >I don't know if that's done by default. > > > >Your queries are liable to be pretty slow depending on what you have to do. > > > > > > > > > >Michael D. Black > >Senior Scientist > >Advanced Analytics Directorate > >Advanced GEOINT Solutions Operating Unit > >Northrop Grumman Information Systems > > >From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on >behalf of peter korinis [kori...@earthlink.net] >Sent: Tuesday, May 01, 2012 3:06 PM >To: sqlite-users@sqlite.org >Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file > >I'm new to SQLite . not a programmer . not a DBA . just an end-user with no >dev support for a pilot project (single user, no updates, just queries). > > > >I want to analyze the data contained in a 44GB csv file with 44M rows x 600 >columns (fields all <15 char). Seems like a DBMS will allow me to query it >in a variety of ways to analyze the data. > > > >I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc >with 4GB RAM + 200GB free disk space. > >End-user tools like Excel & Access failed due to lack of memory. I >downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on >but it would not load the csv files - 'csv worker failed'. So I tried >Database Master from Nucleon but it failed after loading (it took 100 >minutes) ~57,000 rows with error message = 'database or disk is full". I >tried to create another table in the same db but could not with same error >message. The DB size shows as 10,000KB (that looks suspiciously like a size >setting?). > > > >From what I've read SQLite can handle this size DB. So it seems that either >I do not have enough RAM or there are memory/storage (default) limits or >maybe time-out issues that prevent loading this large file . or the 2 GUI >tools I tried have size limits. I do have a fast server (16GB, 12 procs, >64-bit intel, Win server) and an iMAC available. > > > >1. Is SQLite the wrong tool for this project? (I don't want the >overkill and admin overhead of a large MySQL or SQL Server, etc.) > >2. If SQLite will work, are there configuration settings in SQLite or >Win7 that will permit the load . or is there a better tool for this project? > > > >Thanks much for helping a newbie! > > > >peterK > > > >___ >sqlite-users mailing list >sqlite-use
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
I have R but really haven't used it much. I know it's a great stats package and great for data reduction ... but I want to perform queries against my 44GB of data, filtering records by a variety of attributes, comparing those subsets in a variety of ad hoc ways, perhaps summing/counting other fields, etc. This is the kind of job excel is good at ... but the data is too bit! Seems like a database plus a good query GUI or some BI app would work. is R a good query tool? Thanks, peter -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Warren Young Sent: Thursday, May 03, 2012 9:36 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file On 5/1/2012 2:06 PM, peter korinis wrote: > Is SQLite the wrong tool for this project? Probably. SQLite is a data storage tool. With enough SQL cleverness, you can turn it into a data *reduction* tool. But a data analysis tool? No, not without marrying it to a real programming language. Granted, that's what almost everyone does do with SQLite, but if you're going to learn a programming language, I'd recommend you learn R, a language and environment made for the sort of problem you find yourself stuck with. http://r-project.org/ There are several R GUIs out there. I like R Studio best: http://www.rstudio.org/ You'll still find R Studio a sharp shock compared to Excel. And yes, it will require some programming, and yes, I know you said you aren't a programmer. But in the rest of the thread, it looks like people have convinced you to use SQLite from the command line, typing in raw SQL commands; guess what, that's programming. Not on the level of R code, but R isn't far down the same slippery slope. It may help you to know that R is most popular in the statistics community, which of course is populated by statisticians, not programmers. R isn't the easiest programming language to pick up, but it's far from the hardest. It's very similar to JavaScript, though a bit tougher to learn, mostly due to having accumulated some strange syntax over its 36 years. (That's counting R's predecessor, S.) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Data Import Techniques
Forgive me ... but what is SQLiteman? Will "import table data" help me load a csv file into a SQLite table? Thanks, peter -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Sent: Wednesday, May 02, 2012 3:03 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Data Import Techniques > Using the "Import Table Data" function in SQLiteman, the data loads very quickly. However in my application, using either an SQL "insert" command or a resultset, the import is very much slower. Is there another technique I can use to speed things up? This FAQ entry might interest you... (19) INSERT is really slow - I can only do few dozen INSERTs per second -> http://sqlite.org/faq.html#q19 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
Sqlitespy looks good ... I will try it. website says download contains sqlite itself, which I already have - will there be a problem using ...spy with existing sqlite? I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" to load a 999x46 comma-delimited file into a previously created empty table with 46 col. (if this works I will load two 22M row x 46 col csv files into that table.) does this cmd work this way or must I create INSERT statements to do 999 inserts (later 44M inserts)? Thanks, peter -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Oliver Peters Sent: Thursday, May 03, 2012 7:23 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file Am 03.05.2012 19:59, schrieb peter korinis: > I have R but really haven't used it much. I know it's a great stats > package and great for data reduction ... but I want to perform queries > against my 44GB of data, filtering records by a variety of attributes, > comparing those subsets in a variety of ad hoc ways, perhaps > summing/counting other fields, etc. I prefer http://www.yunqa.de/delphi/doku.php/products/sqlitespy/index for creating the database (tables, views) and doing the queries cause it's fast and reliable but I prefer to write(!) SQL code and not to create it through a generator (as it is done in Access). sqlitespy can't do the import job; I always do this with the CLI by creating INSERT statements with my scripting language in a separate file since sqlite 3.7.11 you don't need a statement like INSERT INTO table(col01,col02,col03) VALUES(1,2,3); INSERT INTO table(col01,col02,col03) VALUES(4,5,6); you can make it shorter: INSERT INTO table(col01,col02,col03) VALUES(1,2,3),(4,5,6); this is a great advantage if you need to do many INSERTs cause your file won't become so large > This is the kind of job excel is good at ... but the data is too bit! > Seems like a database plus a good query GUI or some BI app would work. is R > a good query tool? afaik there is no other way than to write (!) SQL Code - depending on the problem this can be done in an R script or directly in the database (i.e. as a VIEW) or as a combination [...] Oliver ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
I have scaled down the attributes of interest to 46 columns (discarding the other 550). No columns are calculated. No updates to this file ... one user ... only query, sort, etc. type transactions. So I want to load two 22GB csv files into an empty 46 column table. (I intend to test load with 999 records by 46 col file.) initially I only have 1 index on a record # ... am not positive several other fields that I want to index may not be missing data in some records (I assume that will error out if I make those an index). After I get the data loaded and inspect for nulls in prospective index attributes, can I add indices? I was planning to load using sqlite3 CLI ".import" command. Is there a way I can monitor the progress of the load, with only minimal impact on performance ? I've started several loads only to find out hours later that nothing has been loaded. Thanks for helpful info. peter -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Valentin Davydov Sent: Friday, May 04, 2012 9:43 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote: > I'm new to SQLite . not a programmer . not a DBA . just an end-user > with no dev support for a pilot project (single user, no updates, just queries). > > > > I want to analyze the data contained in a 44GB csv file with 44M rows > x 600 columns (fields all <15 char). Seems like a DBMS will allow me > to query it in a variety of ways to analyze the data. Yes, SQLite is quite capable of doing simple analyzis of such amounts of data, especially selecting small subsets based on a simple criteria. However before trying to do some real work you have to understand the structure of your data, realize your possible queries and carefully design database schema (tables and, equally important, indises). Perhaps, putting all data in a single 600-column table is not a good idea (though allowed technically), especially if your columns are equal by their physical nature: it is not so easy to select arbitrarily calculated columns, only rows. > I have the data files and SQLite on my laptop: a 64-bit Win7 Intel > dual-proc with 4GB RAM + 200GB free disk space. Well-indexed database of small data pieces usually takes up several times more disk space than the raw data. Probably 200GB would not be enough, dependng mostly on the number of indises. Consider dedicating a separate disk (or even RAID array) for it. > End-user tools like Excel & Access failed due to lack of memory. I > downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager > add-on but it would not load the csv files - 'csv worker failed'. So I > tried Database Master from Nucleon but it failed after loading (it > took 100 > minutes) ~57,000 rows with error message = 'database or disk is full". > I tried to create another table in the same db but could not with same > error message. The DB size shows as 10,000KB (that looks suspiciously > like a size setting?). Try bare sqlite shell instead of those external tools. It should take at least several hours to fill up your database. > From what I've read SQLite can handle this size DB. Surely it can. In one of my projects the database takes up almost 3 terabytes of disk space, contains more than 10^10 records and still provides small selects of indexed data in real time. > 1. Is SQLite the wrong tool for this project? (I don't want the > overkill and admin overhead of a large MySQL or SQL Server, etc.) It depends on the data structure, semantics and what you are going to find there. SQLite isn't very good for calculation of complex aggregate functions, but works fine in simple selecting and sorting. > 2. If SQLite will work, are there configuration settings in SQLite or > Win7 that will permit the load . or is there a better tool for this project? Increasing PAGE_SIZE to match the filesystem block (cluster) size and perhaps CACHE_SIZE to fill most of the available RAM would help a bit. Also, don't forget to turn off journaling and wrap all in a single transaction when creating database for the first time. Valentin Davydov. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
Regarding SQLITE3.exe statement ".import FILE TABLE" I created a table. My input file is a comma-delimited text file When I run .import I get the following "Error: FILE line 1: expected 46 columns of data but found 1" It seems .import is not recognizing comma delimiter. I suspect this is a simple syntax error, but I don't find any document/explanation. Thanks, peter -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Slavin Sent: Friday, May 04, 2012 11:16 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file On 4 May 2012, at 4:02pm, peter korinis <kori...@earthlink.net> wrote: > Sqlitespy looks good ... I will try it. > website says download contains sqlite itself, which I already have - > will there be a problem using ...spy with existing sqlite? SQLite is not a single library which has to live somewhere on your computer. It is C code which each programmer includes in their program. You can have twenty programs on your disk, each using a different version of SQLite, without problems. > I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" > to load a 999x46 comma-delimited file into a previously created empty > table with 46 col. (if this works I will load two 22M row x 46 col csv > files into that > table.) does this cmd work this way or must I create INSERT statements > to do > 999 inserts (later 44M inserts)? Semicolons are needed at the end of SQL commands. You don't want them at the end of commands which start with a dot. Apart from that you have something worth trying. Why not make a tiny test case with three lines of two columns before you get started with the 46-column monster ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
No I had not used ".mode" - the wording described ".mode" as "set output mode where." . the word "output" made me think this was for .output statement. I just tried " .mode csv ". what do your * mean? Do I put in the file and/or table name or nothing more? I tried several different ways but still get same error when I try to import. Without knowing syntax, I'm forced to ask these dumb questions or give up (not) . since no good documentation - sorry. Thanks, Have you ran *.mode csv*? Jonas Malaco Filho 2012/5/7 peter korinis > Regarding SQLITE3.exe statement ".import FILE TABLE" > I created a table. > My input file is a comma-delimited text file > When I run .import I get the following "Error: FILE line 1: expected 46 > columns of data but found 1" > It seems .import is not recognizing comma delimiter. > I suspect this is a simple syntax error, but I don't find any > document/explanation. pk ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
On 7 May 2012, at 4:41pm, peter korinis wrote: > My input file is a comma-delimited text file > When I run .import I get the following "Error: FILE line 1: expected 46 > columns of data but found 1" > It seems .import is not recognizing comma delimiter. <http://www.sqlite.org/sqlite.html> "The default separator is a pipe symbol ("|")." Simon. So, if I use gawk to change my comma separated file to | (pipe) delimiter . will it work? Do I have to use ".mode csv" as Jonas just advised? Thanks, peter ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
Thanks for advice http://www.pantz.org/software/sqlite/sqlite_commands_and_general_usage.html had the explanation/example I needed to get the import done successfully. Using ".separator ," was what I was missing. peter ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
I can't find a table I just created and imported data. With .databases I can see 'main' but with no location on disk and with . tables I see the table. Instructions says they are save . but where. I can't find them with windows search? pk ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
Simon I searched the entire disk for the table name and no matches. pk ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] is SQLite the right tool to analyze a 44GB file
I did NOT specify a file name when I started sqlite3 - so I guess all is lost. I'll have to redo it. So then the first thing is to specify the db (file) name - what syntax? I tried typing a file name (test.db) but error. pk ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users