Here is the reality.... distribute your data processing across as many hard drives as you can. The key is the more parallel drives you can have working, it does NOT matter what database you utilize. If you build your broker (server process that interacts with Sqlite... or other database and controls the entire parallel processing) properly, you can add or remove hard drives with a simple configuration change. Depending on how much data load you have you could also subnet banks of servers that run your sqlite data storage to provide for optimal network communication to your sqlite data processing nodes.
Trick question.... You setup 2 seperate machines, sqlite on one, and Oracle on another... they both have the exact make and model hard drives and OS. Which one can write data faster to the hard drive? .................................... I think you want to look into parallel data processing algorithms. Of course, to me that kind of stuff is common sense but I find most guys don't truly study their processor specifications as they should and therefore miss the low level details that would imply optimal decisions in their parallel algorithms design. For example... most people say I want to multi thread so my software runs "faster". That is not properly said. You multi thread to utilize the same computer speed but just more of the processor bandwidth per time slice. So, for example, if your system had 4 threads to schedule and each had 25% equal processing time... all threads run in 1 minute. So that would be 15 seconds per thread. If you multi thread and utilize 2 threads to process data in that same minute, you have doubled your processing time by 100% by utilizing 50% of the total 4 threads. You did not speed up the computer, you utilized more processor time per round robin thread run. So the bottom line is, when you design out your parallel algorithm, the database itself is not all that important but being able to configure as many as you need quickly IS. But knowing the speed of your drives, knowing the speed of your network, configuring and setting up to scale out to more nodes (network, desks, computers) IS the key. And of course, your broker architecture (the piece that controls the show collectively) must be written and carefully tested so it loses as little efficiency system wide as it has to make decisions in parselling work out to the various processing nodes. IF you utilize Sqlite for such an endeavor, I do believe you would be better off. Why? because Sqlite is so simple and has NO server process you control the software performance totally above the data storage below. So in essence in this way Sqlite IS a great choice. You can xcopy the db to a node and role! With database "servers" you will have to cotend with the variability of a closed source base you have no control over that takes up threads and processor time on the machines in which it runs. Less IS FOR SURE sometimes more. I have scene systems in the past that used Oracle, Informatica, etc. to do massive data processing... I mean hundreds of millions of rows per week. The reality is most of the teams that build this stuff buy a massively expensive RISC based system when you could spend the money on development of software that would "efficiently" scale a simpler solution like Sqlite across many nodes. The money is spent mostly on butt covering.... The technical decision was driven more by an ignorant sales person versus an in the know technology guy. -A- On Fri, Feb 27, 2009 at 2:35 PM, <pyt...@bdurham.com> wrote: > Allan, > > Thanks for your reply. I'm new to SQLite, but have used a similar list > of databases (with an emphasis on server vs. client side databases) > professionally for years. My background is designing and building > enterprise BI, ETL, and data warehouse systems using databases like > Oracle, DB2, SQL Server and ETL tools like Informatica, Ab Initio, BO > Data Integrator, etc. > > My goal is to be able to offer our customers cost effective, high > performance alternatives to traditional commercial ETL solutions. We > recently re-implemented a mid-sized Oracle/Informatica based ETL system > in Python using in memory data structures (dicts) and improved the > performance by a factor of 20x. We're now looking to expand this > approach to SQLite for environments where the volume of data is greater > than available memory. > >> When you talk about performance comparisons your question really needs to be >> squared up to what the database is used for. Are you on a server, >> workstation, or embedded? How large in the database? How are the internal >> algos of the database engine lined up? What is the memory footprint? How did >> you configure what is configurable in the database? > > I'm interested in exploring whether or not SQLite can be used as an ETL > tool with large data sets (~80+ Gb). In this context, SQLite would be > run on 64-bit Intel servers with lots of RAM (16-64 Gb). The data would > be stored/processed on server local SCSI drives vs. located on a SAN. > File access would be via a single process per SQLite database. The > interface language would most likely be Python. > >> The reality is I BET and I would love to know myself Sqlite is just as fast >> on inserts to the same hard drive as oracle. > > I would love to see such a benchmark as well. > >> Are you some kind of political Oracle covering biggot? > > LOL. No. I try to be as database agnostic as possible. > >> Or, do you want another chance to post something that states you are trying >> to find the best data storage solution to deliver sincere value to your >> client in terms of a database choice? > > Subtle point here ... I'm not looking for a long term data storage > solution - I'm exploring the possibility of using SQLite as a way to > validate, transform, and pre-aggregate raw data that would in turn be > exported to raw text files and imported (in final form) to a customer's > data warehouse. > >> Sometimes that choice is Sqlite, but in some cases it is Oracle instead. > > Agreed. > > Regards, > Malcolm > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- W Allan Edwards 214-289-2959 _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users