Re: [sqlite] create unique index quickly
On 5/22/06, Micha Bieber <[EMAIL PROTECTED]> wrote: > Interesting, but it doesn't have any adhoc query capability: Take in consideration, that this is not the whole truth - simply selecting and getting what the HDF5 authors call a 'hyperslab' is a great feature and an example for a simple query task. This special task alone might become an essential part of your problem. I wouldn't classify ad hoc queries as 'simple'. If all you need is simple queries this might be a much better choice. Not knowing any details of how the data will be used makes it hard to judge which would be better for the application. Thanks for pointing it out! :)
RE: [sqlite] create unique index quickly
Thank you for answering the question! Like I said in my other post, the index creation is about 20% faster than the unique index creation in my latest tests of an index of two INTEGERs. 20% out of several hours is significant. So index names are unique for the whole database? I was thinking they just had to be unique for the table. I'll double check my names... > > Is there some way I can create that unique index without > the overhead > > of uniqueness checking? I know all my triples are unique when used = > > together: I > > generated them that way. Or is there some other way I can > create that > > = index faster? > > > > The following technique is unsupported. It might change or > go away in a subsequent release. It might not work. > It might corrupt your database. Use with extreme caution. > > (1) CREATE INDEX primkey ON data(x,y,z) > (2) PRAMGA writable_schema=ON > (3) UPDATE sqlite_master > SET sql='CREATE UNIQUE INDEX primkey ON data(x,y,z)' > WHERE name='primkey' > (4) close and reopen the database connection > > The above technique is unsupported. It might not work. > It might corrupt your database. The overhead of the > uniqueness check is not that much so it is unlikely to help > much anyway. You have been warned. > > -- > D. Richard Hipp
RE: [sqlite] create unique index quickly
> I am simply curious. This sounds like an amazing > engineering challenge. If it is not a secret, can you > describe what this data represents and how it will be used? Genomics. Look up "Smith-Waterman" or "Needleman-Wunsch-Sellers" on the web. > What is the ultimate source of this data? Custom hardware > How many days/weeks/eons of it do you plan to accumulate? Less than a week. > How much > raw disk space is that? It varies by thresholding. 1GB to several TB. > If backups and journaling are not important, then is it > safe to assume that you can always regenerate that data on > demand? Is each "set" of data identical, or only > statistically similar to prior sets? I can always regenerate if I want to wait a few days. It's not exactly an exact science anyway. > Your project sounds like fun though, from what little > I've read of this thread. Sure beats writing boring > financial software ;) I admit I'm quite enjoying it as well.
Re: [sqlite] create unique index quickly
On 5/22/06, Brannon King <[EMAIL PROTECTED]> wrote: > _Personally_ I think, this sounds like a task not quite > fitting in sqlites (probably any 'standard' databases) realm. > This is a bit off-topic in this group, but because you > mention sub-boxes - did you ever look into more specialized > file-formats like HDF5: > > http://hdf.ncsa.uiuc.edu/HDF5/ What a fantastic link and suggestion! I just may end up using it. Interesting, but it doesn't have any adhoc query capability: "Performance-wise, how does HDF5 compare to a relational database? It really depends on your application. HDF5 is tuned to do efficient I/O and storage for "big" data (hundreds of megabytes and more). It will not work well for small reads/writes. It doesn't have indexing capabilities, though we are working on some limited features. See the HDF5_Prototype_Indexing_Requirements for details. HDF5 was designed to complement DBs and not to compete with them. "
RE: [sqlite] create unique index quickly
> _Personally_ I think, this sounds like a task not quite > fitting in sqlites (probably any 'standard' databases) realm. > This is a bit off-topic in this group, but because you > mention sub-boxes - did you ever look into more specialized > file-formats like HDF5: > > http://hdf.ncsa.uiuc.edu/HDF5/ > > with support for certain (of course limited) queries ? > > Micha > -- > What a fantastic link and suggestion! I just may end up using it.
Re: [sqlite] create unique index quickly
On 5/22/06, Dennis Jenkins <[EMAIL PROTECTED]> wrote: Jay Sprenkle wrote: > > Me too! > > The two largest database setups I've worked with: > the total of all the call records for a major phone company, > and the cumulative records of most of the drugs bought in > the United States, don't add up to as much as this. > What are you sampling at 1/200th of a microsecond? Jay, do you work for the NSA or the DEA or both? LOL! Worked as a consultant for Sprint and Argus Health systems. They both do huge data warehouses. Of course, if I told you the truth I'd have to kill you. ;)
Re: [sqlite] create unique index quickly
Monday, May 22, 2006, 15:17:21, Jay Sprenkle wrote: > On 5/22/06, Dennis Jenkins <[EMAIL PROTECTED]> wrote: >> Brannon King wrote: >> > The benefits I'm trying to get out of sqlite are the data queries. I >> > collect a large, sparse 2D array from hardware. The hardware device is >> > giving me a few GB of data data at 200MB/s. Future hardware versions >> > will be four times that fast and give me terabytes of data. After I >> > have the data, I then have to go through and make calculations on >> > sub-boxes of that data. (I'll post some more about that in a different >> > response.) I was trying to avoid coding my own >> > sparce-matrix-file-stream-mess that I would have to do if I didn't >> > have a nice DB engine. I think sqlite will work. I think it will be >> > fast enough. I'll have some nice RAID controllers on the production >> > machines with 48-256MB caches. >> >> Hello Brannon, >> >> I am simply curious. This sounds like an amazing engineering >> challenge. If it is not a secret, can you describe what this data >> represents and how it will be used? > Me too! Me too too :) _Personally_ I think, this sounds like a task not quite fitting in sqlites (probably any 'standard' databases) realm. This is a bit off-topic in this group, but because you mention sub-boxes - did you ever look into more specialized file-formats like HDF5: http://hdf.ncsa.uiuc.edu/HDF5/ with support for certain (of course limited) queries ? Micha --
Re: [sqlite] create unique index quickly
Jay Sprenkle wrote: > > Me too! > > The two largest database setups I've worked with: > the total of all the call records for a major phone company, > and the cumulative records of most of the drugs bought in > the United States, don't add up to as much as this. > What are you sampling at 1/200th of a microsecond? Jay, do you work for the NSA or the DEA or both?
Re: [sqlite] create unique index quickly
On 5/22/06, Dennis Jenkins <[EMAIL PROTECTED]> wrote: Brannon King wrote: > The benefits I'm trying to get out of sqlite are the data queries. I > collect a large, sparse 2D array from hardware. The hardware device is > giving me a few GB of data data at 200MB/s. Future hardware versions > will be four times that fast and give me terabytes of data. After I > have the data, I then have to go through and make calculations on > sub-boxes of that data. (I'll post some more about that in a different > response.) I was trying to avoid coding my own > sparce-matrix-file-stream-mess that I would have to do if I didn't > have a nice DB engine. I think sqlite will work. I think it will be > fast enough. I'll have some nice RAID controllers on the production > machines with 48-256MB caches. Hello Brannon, I am simply curious. This sounds like an amazing engineering challenge. If it is not a secret, can you describe what this data represents and how it will be used? Me too! The two largest database setups I've worked with: the total of all the call records for a major phone company, and the cumulative records of most of the drugs bought in the United States, don't add up to as much as this. What are you sampling at 1/200th of a microsecond?
Re: [sqlite] create unique index quickly
Brannon King wrote: > The benefits I'm trying to get out of sqlite are the data queries. I > collect a large, sparse 2D array from hardware. The hardware device is > giving me a few GB of data data at 200MB/s. Future hardware versions > will be four times that fast and give me terabytes of data. After I > have the data, I then have to go through and make calculations on > sub-boxes of that data. (I'll post some more about that in a different > response.) I was trying to avoid coding my own > sparce-matrix-file-stream-mess that I would have to do if I didn't > have a nice DB engine. I think sqlite will work. I think it will be > fast enough. I'll have some nice RAID controllers on the production > machines with 48-256MB caches. Hello Brannon, I am simply curious. This sounds like an amazing engineering challenge. If it is not a secret, can you describe what this data represents and how it will be used? What is the ultimate source of this data? How many days/weeks/eons of it do you plan to accumulate? How much raw disk space is that? If backups and journaling are not important, then is it safe to assume that you can always regenerate that data on demand? Is each "set" of data identical, or only statistically similar to prior sets? Your project sounds like fun though, from what little I've read of this thread. Sure beats writing boring financial software ;)
Re: [sqlite] create unique index quickly
"Brannon King" <[EMAIL PROTECTED]> wrote: > The statement > CREATE UNIQUE INDEX primkey ON 4dData (x,y,z) > is too slow for me. > > Is there some way I can create that unique index without the overhead of > uniqueness checking? I know all my triples are unique when used = > together: I > generated them that way. Or is there some other way I can create that = > index > faster? > The following technique is unsupported. It might change or go away in a subsequent release. It might not work. It might corrupt your database. Use with extreme caution. (1) CREATE INDEX primkey ON data(x,y,z) (2) PRAMGA writable_schema=ON (3) UPDATE sqlite_master SET sql='CREATE UNIQUE INDEX primkey ON data(x,y,z)' WHERE name='primkey' (4) close and reopen the database connection The above technique is unsupported. It might not work. It might corrupt your database. The overhead of the uniqueness check is not that much so it is unlikely to help much anyway. You have been warned. -- D. Richard Hipp <[EMAIL PROTECTED]>
Re: [sqlite] create unique index quickly
On 5/20/06, Brannon King <[EMAIL PROTECTED]> wrote: The benefits I'm trying to get out of sqlite are the data queries. I collect a large, sparse 2D array from hardware. The hardware device is giving me a few GB of data data at 200MB/s. Future hardware versions will be four times that fast and give me terabytes of data. After I have the data, I then have to go through and make calculations on sub-boxes of that data. Have you thought about constructing a data warehouse for this? Use something close to the metal (hash + binary file I/O) to store the initial data. Then write something to load what data you need into a database. You can then do your queries against that. I think your overall bottle neck is indexing, and indexing 800MB/s sustained will take big iron or a beowulf cluster.
Re: [sqlite] create unique index quickly
On Sat, 20 May 2006, Brannon King wrote: John Stanton wrote: You don't seem to need a data manipulation system like Sqlite, more a form of high volume storage. Do you really need elaborate SQL, journalling, ROLLBACK and assured disk storage? Di you consider some form of hashed storage, perhaps linear hashing, to build a compact and high performance associative array for your sparsely keyed data. Do you really need the overhead of B-trees is you are just storing a sparse array? JS I don't need journaling or rollback. I'd love a way to shut them off. But elaborate SQL, that sure is handy. I'm not just storing, I'm viewing stored, compressed data. I definitely need some way of querying a sparse matrix data that is larger than my DRAM. Sqlite sure seems like the quickest route to a workable product for that to happen. It has all the streaming/caching built in. Because of that, I assume it is faster than random file access. It supports complex data queries and indexes, both things I would need anyway. In the world of programming, I think many will agree you should get a working product, then make it faster. I'm just trying to get the most speed out of the easiest tool. If I need to rewrite the file storage for the next version, we can consider the cost to benefit for that separately. Is that 800MB/s sustained? Is it constant, 24/7? Or is it bursty data? If it's bursty, you could buffer the inserts in an append only log. Insertion to the SQL(ite) database can be done asynchronously. Writing to unstructured, append only log files can be done at pretty much disk I/O speed. If it's not bursty, but sustained, then I fear SQLite or any other database will not be able to match that rate. You'll probably run out of CPU before running out of disk IO. Christian PS. On a side note, is it wise still buying SGI kit?
Re: [sqlite] create unique index quickly
Brannon King wrote: John Stanton wrote: You don't seem to need a data manipulation system like Sqlite, more a form of high volume storage. Do you really need elaborate SQL, journalling, ROLLBACK and assured disk storage? Di you consider some form of hashed storage, perhaps linear hashing, to build a compact and high performance associative array for your sparsely keyed data. Do you really need the overhead of B-trees is you are just storing a sparse array? JS I don't need journaling or rollback. I'd love a way to shut them off. But elaborate SQL, that sure is handy. I'm not just storing, I'm viewing stored, compressed data. I definitely need some way of querying a sparse matrix data that is larger than my DRAM. Sqlite sure seems like the quickest route to a workable product for that to happen. It has all the streaming/caching built in. Because of that, I assume it is faster than random file access. It supports complex data queries and indexes, both things I would need anyway. In the world of programming, I think many will agree you should get a working product, then make it faster. I'm just trying to get the most speed out of the easiest tool. If I need to rewrite the file storage for the next version, we can consider the cost to benefit for that separately. I saw you performance requirements and data rate, which looks difficult to achieve when you are writing journals and ensuring the integrity of disk records. You will find that Sqlite is much slower than random file access, because Sqlite is built on top of random file access. You get random file access speed less all the overhead of Sqlite's journals and B-trees. We have an application using storage something like yours and we use memory mapped areas with AVL trees for indexing. If it needed to run as fast as yours we would probably use hashing rather than the binary trees. A sparse index is realized by concatenated keys. This method dynamically uses memory for caching, but is not limited to physical memory size, only virtual memory. It assumes POSIX capabilities from the OS. With hashing you avoid the overhead of B-tree balancing with insertions, but pay for it by not having the keys accessable in an ordered sequence. Ask yourself and your users/customers whether the easiest solution or the best solution is the most satisfactory.
Re: [sqlite] create unique index quickly
John Stanton wrote: You don't seem to need a data manipulation system like Sqlite, more a form of high volume storage. Do you really need elaborate SQL, journalling, ROLLBACK and assured disk storage? Di you consider some form of hashed storage, perhaps linear hashing, to build a compact and high performance associative array for your sparsely keyed data. Do you really need the overhead of B-trees is you are just storing a sparse array? JS I don't need journaling or rollback. I'd love a way to shut them off. But elaborate SQL, that sure is handy. I'm not just storing, I'm viewing stored, compressed data. I definitely need some way of querying a sparse matrix data that is larger than my DRAM. Sqlite sure seems like the quickest route to a workable product for that to happen. It has all the streaming/caching built in. Because of that, I assume it is faster than random file access. It supports complex data queries and indexes, both things I would need anyway. In the world of programming, I think many will agree you should get a working product, then make it faster. I'm just trying to get the most speed out of the easiest tool. If I need to rewrite the file storage for the next version, we can consider the cost to benefit for that separately.
Re: [sqlite] create unique index quickly
You don't seem to need a data manipulation system like Sqlite, more a form of high volume storage. Do you really need elaborate SQL, journalling, ROLLBACK and assured disk storage? Di you consider some form of hashed storage, perhaps linear hashing, to build a compact and high performance associative array for your sparsely keyed data. Do you really need the overhead of B-trees is you are just storing a sparse array? JS Brannon King wrote: The benefits I'm trying to get out of sqlite are the data queries. I collect a large, sparse 2D array from hardware. The hardware device is giving me a few GB of data data at 200MB/s. Future hardware versions will be four times that fast and give me terabytes of data. After I have the data, I then have to go through and make calculations on sub-boxes of that data. (I'll post some more about that in a different response.) I was trying to avoid coding my own sparce-matrix-file-stream-mess that I would have to do if I didn't have a nice DB engine. I think sqlite will work. I think it will be fast enough. I'll have some nice RAID controllers on the production machines with 48-256MB caches. From experimentation, the "UNIQUE INDEX" command is only 20% slower than the "INDEX" command. That doesn't make sense to me. I would think the UNIQUE INDEX creation would be take longer because it has to do redundancy checks. And I still feel there should be a way to create a UNIQUE INDEX without waiting for the redundancy check. Dennis Cote wrote A more general question is why are you trying to use sqlite? If you need the maximum possible speed you may be better off using in memory vectors and maps and highly tuned routines like the C++ STL algorithms instead of sqlite. Then you will can go back to using binary file I/O. SQLite adds overhead above and beyond binary file I/O, it will always be slower. If the benefits of sqlite don't outweigh the costs, you should stick with binary I/O. What benefit are you hoping to get out of using sqlite?
Re: [sqlite] create unique index quickly
For your mail i think you have everything planned, but just for curiosity, how do you plan to pass data from hardware (800 MB/s + Database data) to a raid?? A GigEthernet has 1000 Mb/s (so, 128 MB/s), a PCI-X based solution has 1.3 to 2.6 Gb/s (so 150-300 MB/s), and similar for SATA and PCIe(x16). Even a direct Hypertransport to the raid has 3.2 Gb/s (. Note that i'm not asking for the raid, just for the SQLite hardware to Raid connection. O ye of little faith. First, the hardware device. PCI64 based solutions run at 64bit x 66MHz with 50-70% efficiency. You can increase the clock to 133MHz using PCI-X. At 50% efficiency that is still 500MB/s. (My current solution runs the former and uses 50MB/s input.) Each channel in PCIe is 2.5Gb, apx 200MB/s. I assume I can get 60% efficiency on that. 16 channels of that is more than enough for 800MB/s. Hypertransport is way faster than 3.2Gb/s these days. We've looked at HTX (Hypertransport) connections, which are near 1000MB/s (http://www.pathscale.com/infinipath.php). In addition, I actually have an SGI Altix 350 box in the office that uses NUMALink connections running at 6.5GB/s: 200MHz x 128bit at near 100% efficiency. You can buy one of those for a scant $13k ;-) For the hard disk, you can buy SGI (http://www.sgi.com/products/storage/tech/6700.html) drive arrays that run at 3GB/s. How do they do that? Lots of hard drives connected to their NUMA controllers. At the Super Computer show I've seen 1U 8-disk arrays running at a steady 400MB/s using QLogic controllers and optical links for a scant $3k. Two of those and you've got it.
Re: [sqlite] create unique index quickly
At 21:28 20/05/2006, you wrote: The benefits I'm trying to get out of sqlite are the data queries. I collect a large, sparse 2D array from hardware. The hardware device is giving me a few GB of data data at 200MB/s. Future hardware versions will be four times that fast and give me terabytes of data. After I have the data, I then have to go through and make calculations on sub-boxes of that data. (I'll post some more about that in a different response.) I was trying to avoid coding my own sparce-matrix-file-stream-mess that I would have to do if I didn't have a nice DB engine. I think sqlite will work. I think it will be fast enough. I'll have some nice RAID controllers on the production machines with 48-256MB caches. From experimentation, the "UNIQUE INDEX" command is only 20% slower than the "INDEX" command. That doesn't make sense to me. I would think the UNIQUE INDEX creation would be take longer because it has to do redundancy checks. And I still feel there should be a way to create a UNIQUE INDEX without waiting for the redundancy check. Do the inserts in a temp table memory based inside a transaction, create the index on this table and then pass the table to file. Also, you may check/tweak SQLite, as page size (put 32768 byte, default is 1024), increase page cache (default is 2000, each page has 1.5*page bytes size, so do maths how many you can have), temp store (use memory). If not enough speed, you must or duplicate hardware (2 SQLite works faster than one) or think about change to another database, i suggest using DB2, Valentina (a fast-low price database) or any other commercial database. For your mail i think you have everything planned, but just for curiosity, how do you plan to pass data from hardware (800 MB/s + Database data) to a raid?? A GigEthernet has 1000 Mb/s (so, 128 MB/s), a PCI-X based solution has 1.3 to 2.6 Gb/s (so 150-300 MB/s), and similar for SATA and PCIe(x16). Even a direct Hypertransport to the raid has 3.2 Gb/s (. Note that i'm not asking for the raid, just for the SQLite hardware to Raid connection. A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?
Re: [sqlite] create unique index quickly
On 5/18/06, Brannon King <[EMAIL PROTECTED]> wrote: The statement CREATE UNIQUE INDEX primkey ON 4dData (x,y,z) is too slow for me. It takes 13 sec on 700k records on my Athlon XP 2GHz. That's about four seconds longer than it took me to put the data in there. The two times together are still faster than inserting into a table with a primary key already declared. Is there some way I can create that unique index without the overhead of uniqueness checking? I know all my triples are unique when used together: I generated them that way. Or is there some other way I can create that index faster? Thanks. __ Brannon King ¯¯ Brannon, You are getting almost 54K index entries per second using sqlite. How fast do you need to go? In another post you said: I need to do billions (no joke) of consecutive inserts into this table. Writing binary out with std::ofstream was using 26% of my code execution time according to the profiler. I changed to sqlite3, and now the sqlite3_step function uses 50% of my execution time. After reading the news groups, I removed the primary key. That dropped it down to about 41%. That was significant. However, I was still significantly slower than binary writes with ofstream. A more general question is why are you trying to use sqlite? If you need the maximum possible speed you may be better off using in memory vectors and maps and highly tuned routines like the C++ STL algorithms instead of sqlite. Then you will can go back to using binary file I/O. SQLite adds overhead above and beyond binary file I/O, it will always be slower. If the benefits of sqlite don't outweigh the costs, you should stick with binary I/O. What benefit are you hoping to get out of using sqlite? Dennis Cote
[sqlite] create unique index quickly
The statement CREATE UNIQUE INDEX primkey ON 4dData (x,y,z) is too slow for me. It takes 13 sec on 700k records on my Athlon XP 2GHz. That's about four seconds longer than it took me to put the data in there. The two times together are still faster than inserting into a table with a primary key already declared. Is there some way I can create that unique index without the overhead of uniqueness checking? I know all my triples are unique when used together: I generated them that way. Or is there some other way I can create that index faster? Thanks. __ Brannon King ¯¯