RE: RE: Advice needed please
Lee, I am always reluctant to post something which may look even remotely commercial but Oriole markets this kind of tool. It's, we believe, reasonably priced and you can try it for free, so perhaps it's worth for you to have a look before going into a full-blown development. As far as I have understood what your developer tries to do, he tries to multithread the SELECTs - running several SELECTs at once. I think that it is a bad idea, because you are trying to reinvent the wheel (a wheel also known as Parallel Query). As some have pointed out, the main bottleneck here is likely to be data writing - but also, I should add, waiting for Oracle to return the data and formatting. Our tool, pdqout, is also multithreaded - but instead of having multiple threads querying the database, one thread queries, one thread formats and one thread writes to disk. As a result, CPU utilization is pretty high ... HTH Stephane Faroult how about index organizing the table ? or .. creating an index on all the columns of the table..? this way the select will read only the index blocks..!! -- From: Robertson Lee - lerobe[SMTP:[EMAIL PROTECTED]] Reply To: [EMAIL PROTECTED] Sent: Thursday, May 30, 2002 9:43 PM To: Multiple recipients of list ORACLE-L Subject: RE: Advice needed please Thanks Dennis. Anything whether it be a reaction or an answer, is welcome. Regards Lee -Original Message- Sent: 30 May 2002 15:08 To: Multiple recipients of list ORACLE-L Lee - Just some reactions, few answers. - Generally a process like this will be disk-bound, not CPU-bound, so idle CPU time is to be expected unless your disk is REALLY fast. - Multiple simultaneous full-table scans may not be any faster because the disk heads may need to flit to and fro in order to satisfy each process' request. Sometimes a single full table scan is as fast is it gets for a mechanical device like a disk. RAID will be faster, of course, but ultimately the RAID is composed of disks. - Trying for something faster than select * is a real challenge. To perform a full table scan, Oracle must read each data block. The alternative is index scanning, but this means reading an index block, fetching a data block, etc. Not faster if you're going to eventually read all data blocks anyway. - If select * isn't fast enough, you should consider using table partitioning. That way each process can separately scan a separate partition and separately write to your output files. Hopefully someone else will think of a bright idea I've missed. Dennis Williams DBA Lifetouch, Inc. [EMAIL PROTECTED] -Original Message- Sent: Thursday, May 30, 2002 6:48 AM To: Multiple recipients of list ORACLE-L Oracle 8.0.5 Tru64 4.0f One of our developers here is writing a utility to provide fast unloads of tables (to replace fastunloader as it happens) His problem is as follows. Start from the bottom and work your way up. I would be really grateful if anyone can offer up some alternatives for us. Regards Lee -Original Message- From: Dudley Dave - ddudle Sent: 29 May 2002 16:04 To: Robertson Lee - lerobe Subject:RE: Do you still have that SQL Expert? No, you miss the point. I'm explicitly NOT using PQ (or at least not explicitly using it). Using a parallel hint on huge table unloads - with the single-threaded version of the code (i.e. pipdynsql.v2.0.0) didn't seem to make much difference at all. I didn't do the tests directly though, poeple on the account did. So it may be that the tables already had a degree of parallelism built in, in which case I'd guess the hint would be redundant. What I mean is that even if you use PQ for the server to extract the data in parallel you still have the bottle neck of a single client to send it all back to. That's what I was trying to get around. Assuming that we're not generally using the full network bandwidth, I'd assume that multiple clients ought to be able to dump out separate sections of a table at the same time, at roughly the same speed at a single client would unload a single table - i.e double the throughput. But I can't find anything on the web to tell me the best / most efficient way to actually do this. (By the way, I've tried the NO_PARALLEL hint too, to stop the server setting off too many conflicting slaves on its side. Again no better as far as I could tell.) N.B. Not sure if you'd suggest it, but before you do: most of the tables we'd really want to use this for are massive, and so are already partitioned. So where I say table I mean either that or a partition thereof. Besides, need a generic solution that doesn't rely on having to partition your table to unload it quickly. By the way, I'm specifically testing speed of my original
Advice needed please
Oracle 8.0.5 Tru64 4.0f One of our developers here is writing a utility to provide fast unloads of tables (to replace fastunloader as it happens) His problem is as follows. Start from the bottom and work your way up. I would be really grateful if anyone can offer up some alternatives for us. Regards Lee -Original Message- From: Dudley Dave - ddudle Sent: 29 May 2002 16:04 To: Robertson Lee - lerobe Subject:RE: Do you still have that SQL Expert? No, you miss the point. I'm explicitly NOT using PQ (or at least not explicitly using it). Using a parallel hint on huge table unloads - with the single-threaded version of the code (i.e. pipdynsql.v2.0.0) didn't seem to make much difference at all. I didn't do the tests directly though, poeple on the account did. So it may be that the tables already had a degree of parallelism built in, in which case I'd guess the hint would be redundant. What I mean is that even if you use PQ for the server to extract the data in parallel you still have the bottle neck of a single client to send it all back to. That's what I was trying to get around. Assuming that we're not generally using the full network bandwidth, I'd assume that multiple clients ought to be able to dump out separate sections of a table at the same time, at roughly the same speed at a single client would unload a single table - i.e double the throughput. But I can't find anything on the web to tell me the best / most efficient way to actually do this. (By the way, I've tried the NO_PARALLEL hint too, to stop the server setting off too many conflicting slaves on its side. Again no better as far as I could tell.) N.B. Not sure if you'd suggest it, but before you do: most of the tables we'd really want to use this for are massive, and so are already partitioned. So where I say table I mean either that or a partition thereof. Besides, need a generic solution that doesn't rely on having to partition your table to unload it quickly. By the way, I'm specifically testing speed of my original code (e.g. pipdynsql.v2.0.0 user/pass select * from table) against the new multithreaded development code - i.e. regardless of the machine load at the time, I want to see if multiple simultaneous unloads can be quicker than a single unload client (at the expense of using more machine / Oracle resource obviously). Is this making sense? Dave -Original Message- From: Robertson Lee - lerobe Sent: 29 May 2002 15:14 To: Dudley Dave - ddudle Subject:RE: Do you still have that SQL Expert? How are you using PQ, is it just a hint ??. Which tables are you testing against. -Original Message- From: Dudley Dave - ddudle Sent: 29 May 2002 14:27 To: Robertson Lee - lerobe Subject:RE: Do you still have that SQL Expert? OK Clever-Trousers, As you're so hot on table/index disk striping... I've written the program pipdynsql, which as you may have heard (this lunchtime if not before), people want to use to replace FastUnloader. I've been playing about with a new multithreaded version to try to download a table in sections to multiple client threads which then write back out to a single file (either ordered, or for max speed in random/undefined order). Can you think of any quick ways to do this, or tricks to try? I've tried ranges of rowids (as I'm told that's hold parrallel query works) but the ROWID (tab) hint does not seem to go through the table in rowid order. And it's a massive overhead to order by rowid to work out non-overlapping ranges. And even if you do, you have to say WHERE rowid = xxx AND rowid = xxx (as I say, can't force it to go in ROWID sort order) so this tends to be slower than nect opt... tried assuming there's a unique index and giving start points to each slave thread, which then selects a set number of rows. This is prety quick, but even this seems slower than a simple SELECT * FROM table (for the same number of records). tried loading temp rowid tables with sets of rowids and each slave does a full table scan of its rowid set table, with where clause connection to the data table. Can't find anything better than my original method which selects a unique key from an index with a master thread, for every x'th rownum. Then hands these out to the slave threads to select * from table where unique key = given key for specified number of records. e.g. master pulls out every 100,000th key with a modulus and each slave dumps out 100,000 rows at a time, starting at the key its given. The above uses a temp
RE: Advice needed please
Lee - Just some reactions, few answers. - Generally a process like this will be disk-bound, not CPU-bound, so idle CPU time is to be expected unless your disk is REALLY fast. - Multiple simultaneous full-table scans may not be any faster because the disk heads may need to flit to and fro in order to satisfy each process' request. Sometimes a single full table scan is as fast is it gets for a mechanical device like a disk. RAID will be faster, of course, but ultimately the RAID is composed of disks. - Trying for something faster than select * is a real challenge. To perform a full table scan, Oracle must read each data block. The alternative is index scanning, but this means reading an index block, fetching a data block, etc. Not faster if you're going to eventually read all data blocks anyway. - If select * isn't fast enough, you should consider using table partitioning. That way each process can separately scan a separate partition and separately write to your output files. Hopefully someone else will think of a bright idea I've missed. Dennis Williams DBA Lifetouch, Inc. [EMAIL PROTECTED] -Original Message- Sent: Thursday, May 30, 2002 6:48 AM To: Multiple recipients of list ORACLE-L Oracle 8.0.5 Tru64 4.0f One of our developers here is writing a utility to provide fast unloads of tables (to replace fastunloader as it happens) His problem is as follows. Start from the bottom and work your way up. I would be really grateful if anyone can offer up some alternatives for us. Regards Lee -Original Message- From: Dudley Dave - ddudle Sent: 29 May 2002 16:04 To: Robertson Lee - lerobe Subject:RE: Do you still have that SQL Expert? No, you miss the point. I'm explicitly NOT using PQ (or at least not explicitly using it). Using a parallel hint on huge table unloads - with the single-threaded version of the code (i.e. pipdynsql.v2.0.0) didn't seem to make much difference at all. I didn't do the tests directly though, poeple on the account did. So it may be that the tables already had a degree of parallelism built in, in which case I'd guess the hint would be redundant. What I mean is that even if you use PQ for the server to extract the data in parallel you still have the bottle neck of a single client to send it all back to. That's what I was trying to get around. Assuming that we're not generally using the full network bandwidth, I'd assume that multiple clients ought to be able to dump out separate sections of a table at the same time, at roughly the same speed at a single client would unload a single table - i.e double the throughput. But I can't find anything on the web to tell me the best / most efficient way to actually do this. (By the way, I've tried the NO_PARALLEL hint too, to stop the server setting off too many conflicting slaves on its side. Again no better as far as I could tell.) N.B. Not sure if you'd suggest it, but before you do: most of the tables we'd really want to use this for are massive, and so are already partitioned. So where I say table I mean either that or a partition thereof. Besides, need a generic solution that doesn't rely on having to partition your table to unload it quickly. By the way, I'm specifically testing speed of my original code (e.g. pipdynsql.v2.0.0 user/pass select * from table) against the new multithreaded development code - i.e. regardless of the machine load at the time, I want to see if multiple simultaneous unloads can be quicker than a single unload client (at the expense of using more machine / Oracle resource obviously). Is this making sense? Dave -Original Message- From: Robertson Lee - lerobe Sent: 29 May 2002 15:14 To: Dudley Dave - ddudle Subject:RE: Do you still have that SQL Expert? How are you using PQ, is it just a hint ??. Which tables are you testing against. -Original Message- From: Dudley Dave - ddudle Sent: 29 May 2002 14:27 To: Robertson Lee - lerobe Subject:RE: Do you still have that SQL Expert? OK Clever-Trousers, As you're so hot on table/index disk striping... I've written the program pipdynsql, which as you may have heard (this lunchtime if not before), people want to use to replace FastUnloader. I've been playing about with a new multithreaded version to try to download a table in sections to multiple client threads which then write back out to a single file (either ordered, or for max speed in random/undefined order). Can you think of any quick ways to do this, or tricks to try? I've tried ranges of rowids (as I'm told that's hold parrallel query works) but the ROWID
RE: Advice needed please
Thanks Dennis. Anything whether it be a reaction or an answer, is welcome. Regards Lee -Original Message- Sent: 30 May 2002 15:08 To: Multiple recipients of list ORACLE-L Lee - Just some reactions, few answers. - Generally a process like this will be disk-bound, not CPU-bound, so idle CPU time is to be expected unless your disk is REALLY fast. - Multiple simultaneous full-table scans may not be any faster because the disk heads may need to flit to and fro in order to satisfy each process' request. Sometimes a single full table scan is as fast is it gets for a mechanical device like a disk. RAID will be faster, of course, but ultimately the RAID is composed of disks. - Trying for something faster than select * is a real challenge. To perform a full table scan, Oracle must read each data block. The alternative is index scanning, but this means reading an index block, fetching a data block, etc. Not faster if you're going to eventually read all data blocks anyway. - If select * isn't fast enough, you should consider using table partitioning. That way each process can separately scan a separate partition and separately write to your output files. Hopefully someone else will think of a bright idea I've missed. Dennis Williams DBA Lifetouch, Inc. [EMAIL PROTECTED] -Original Message- Sent: Thursday, May 30, 2002 6:48 AM To: Multiple recipients of list ORACLE-L Oracle 8.0.5 Tru64 4.0f One of our developers here is writing a utility to provide fast unloads of tables (to replace fastunloader as it happens) His problem is as follows. Start from the bottom and work your way up. I would be really grateful if anyone can offer up some alternatives for us. Regards Lee -Original Message- From: Dudley Dave - ddudle Sent: 29 May 2002 16:04 To: Robertson Lee - lerobe Subject:RE: Do you still have that SQL Expert? No, you miss the point. I'm explicitly NOT using PQ (or at least not explicitly using it). Using a parallel hint on huge table unloads - with the single-threaded version of the code (i.e. pipdynsql.v2.0.0) didn't seem to make much difference at all. I didn't do the tests directly though, poeple on the account did. So it may be that the tables already had a degree of parallelism built in, in which case I'd guess the hint would be redundant. What I mean is that even if you use PQ for the server to extract the data in parallel you still have the bottle neck of a single client to send it all back to. That's what I was trying to get around. Assuming that we're not generally using the full network bandwidth, I'd assume that multiple clients ought to be able to dump out separate sections of a table at the same time, at roughly the same speed at a single client would unload a single table - i.e double the throughput. But I can't find anything on the web to tell me the best / most efficient way to actually do this. (By the way, I've tried the NO_PARALLEL hint too, to stop the server setting off too many conflicting slaves on its side. Again no better as far as I could tell.) N.B. Not sure if you'd suggest it, but before you do: most of the tables we'd really want to use this for are massive, and so are already partitioned. So where I say table I mean either that or a partition thereof. Besides, need a generic solution that doesn't rely on having to partition your table to unload it quickly. By the way, I'm specifically testing speed of my original code (e.g. pipdynsql.v2.0.0 user/pass select * from table) against the new multithreaded development code - i.e. regardless of the machine load at the time, I want to see if multiple simultaneous unloads can be quicker than a single unload client (at the expense of using more machine / Oracle resource obviously). Is this making sense? Dave -Original Message- From: Robertson Lee - lerobe Sent: 29 May 2002 15:14 To: Dudley Dave - ddudle Subject:RE: Do you still have that SQL Expert? How are you using PQ, is it just a hint ??. Which tables are you testing against. -Original Message- From: Dudley Dave - ddudle Sent: 29 May 2002 14:27 To: Robertson Lee - lerobe Subject:RE: Do you still have that SQL Expert? OK Clever-Trousers, As you're so hot on table/index disk striping... I've written the program pipdynsql, which as you may have heard (this lunchtime if not before), people want to use to replace FastUnloader. I've been playing about with a new multithreaded version to try to download a table in sections to multiple client threads which then write back out to a single file (either ordered, or for max speed in random/undefined order).
RE: Advice needed please
how about index organizing the table ? or .. creating an index on all the columns of the table..? this way the select will read only the index blocks..!! -- From: Robertson Lee - lerobe[SMTP:[EMAIL PROTECTED]] Reply To: [EMAIL PROTECTED] Sent: Thursday, May 30, 2002 9:43 PM To: Multiple recipients of list ORACLE-L Subject: RE: Advice needed please Thanks Dennis. Anything whether it be a reaction or an answer, is welcome. Regards Lee -Original Message- Sent: 30 May 2002 15:08 To: Multiple recipients of list ORACLE-L Lee - Just some reactions, few answers. - Generally a process like this will be disk-bound, not CPU-bound, so idle CPU time is to be expected unless your disk is REALLY fast. - Multiple simultaneous full-table scans may not be any faster because the disk heads may need to flit to and fro in order to satisfy each process' request. Sometimes a single full table scan is as fast is it gets for a mechanical device like a disk. RAID will be faster, of course, but ultimately the RAID is composed of disks. - Trying for something faster than select * is a real challenge. To perform a full table scan, Oracle must read each data block. The alternative is index scanning, but this means reading an index block, fetching a data block, etc. Not faster if you're going to eventually read all data blocks anyway. - If select * isn't fast enough, you should consider using table partitioning. That way each process can separately scan a separate partition and separately write to your output files. Hopefully someone else will think of a bright idea I've missed. Dennis Williams DBA Lifetouch, Inc. [EMAIL PROTECTED] -Original Message- Sent: Thursday, May 30, 2002 6:48 AM To: Multiple recipients of list ORACLE-L Oracle 8.0.5 Tru64 4.0f One of our developers here is writing a utility to provide fast unloads of tables (to replace fastunloader as it happens) His problem is as follows. Start from the bottom and work your way up. I would be really grateful if anyone can offer up some alternatives for us. Regards Lee -Original Message- From: Dudley Dave - ddudle Sent: 29 May 2002 16:04 To: Robertson Lee - lerobe Subject:RE: Do you still have that SQL Expert? No, you miss the point. I'm explicitly NOT using PQ (or at least not explicitly using it). Using a parallel hint on huge table unloads - with the single-threaded version of the code (i.e. pipdynsql.v2.0.0) didn't seem to make much difference at all. I didn't do the tests directly though, poeple on the account did. So it may be that the tables already had a degree of parallelism built in, in which case I'd guess the hint would be redundant. What I mean is that even if you use PQ for the server to extract the data in parallel you still have the bottle neck of a single client to send it all back to. That's what I was trying to get around. Assuming that we're not generally using the full network bandwidth, I'd assume that multiple clients ought to be able to dump out separate sections of a table at the same time, at roughly the same speed at a single client would unload a single table - i.e double the throughput. But I can't find anything on the web to tell me the best / most efficient way to actually do this. (By the way, I've tried the NO_PARALLEL hint too, to stop the server setting off too many conflicting slaves on its side. Again no better as far as I could tell.) N.B. Not sure if you'd suggest it, but before you do: most of the tables we'd really want to use this for are massive, and so are already partitioned. So where I say table I mean either that or a partition thereof. Besides, need a generic solution that doesn't rely on having to partition your table to unload it quickly. By the way, I'm specifically testing speed of my original code (e.g. pipdynsql.v2.0.0 user/pass select * from table) against the new multithreaded development code - i.e. regardless of the machine load at the time, I want to see if multiple simultaneous unloads can be quicker than a single unload client (at the expense of using more machine / Oracle resource obviously). Is this making sense? Dave -Original Message- From: Robertson Lee - lerobe Sent: 29 May 2002 15:14 To: Dudley Dave - ddudle Subject:RE: Do you still have that SQL Expert? How are you using PQ, is it just a hint ??. Which tables are you testing against. -Original Message- From: Dudley Dave - ddudle Sent: 29 May 2002 14:27 To: Robertson Lee - lerobe Subject:RE: Do you still have that SQL Expert? OK Clever