RE: RE: Advice needed please

2002-05-31 Thread Stephane Faroult

Lee,

   I am always reluctant to post something which may look even remotely commercial but 
Oriole markets this kind of tool. It's, we believe, reasonably priced and you can try 
it for free, so perhaps it's worth for you to have a look before going into a 
full-blown development.

   As far as I have understood what your developer tries to do, he tries to 
multithread the SELECTs - running several SELECTs at once. I think that it is a bad 
idea, because you are trying to reinvent the wheel (a wheel also known as Parallel 
Query). As some have pointed out, the main bottleneck here is likely to be data 
writing - but also, I should add, waiting for Oracle to return the data and 
formatting. Our tool, pdqout, is also multithreaded - but instead of having multiple 
threads querying the database, one thread queries, one thread formats and one thread 
writes to disk. As a result, CPU utilization is pretty high ...
  
HTH

Stephane Faroult

how about index organizing the table ? or ..
creating an index on all the
columns of the table..? 
this way the select will read only the index
blocks..!!

 --
 From: Robertson Lee -
lerobe[SMTP:[EMAIL PROTECTED]]
 Reply To: [EMAIL PROTECTED]
 Sent: Thursday, May 30, 2002 9:43 PM
 To:   Multiple recipients of list ORACLE-L
 Subject:  RE: Advice needed please
 
 Thanks Dennis.
 
 Anything whether it be a reaction or an answer,
is welcome.
 
 Regards
 
 Lee 
 
 
 -Original Message-
 Sent: 30 May 2002 15:08
 To: Multiple recipients of list ORACLE-L
 
 
 Lee - Just some reactions, few answers.
   - Generally a process like this will be
disk-bound, not CPU-bound, so
 idle
 CPU time is to be expected unless your disk is
REALLY fast.
   - Multiple simultaneous full-table scans may
not be any faster because
 the
 disk heads may need to flit to and fro in order
to satisfy each process'
 request. Sometimes a single full table scan is as
fast is it gets for a
 mechanical device like a disk. RAID will be
faster, of course, but
 ultimately the RAID is composed of disks.
   - Trying for something faster than select * is
a real challenge. To
 perform a full table scan, Oracle must read each
data block. The
 alternative
 is index scanning, but this means reading an
index block, fetching a data
 block, etc. Not faster if you're going to
eventually read all data blocks
 anyway. 
   - If select * isn't fast enough, you should
consider using table
 partitioning. That way each process can
separately scan a separate
 partition
 and separately write to your output files.
 Hopefully someone else will think of a bright
idea I've missed.
 Dennis Williams
 DBA
 Lifetouch, Inc.
 [EMAIL PROTECTED]
 
 
 -Original Message-
 Sent: Thursday, May 30, 2002 6:48 AM
 To: Multiple recipients of list ORACLE-L
 
 
 Oracle 8.0.5
 
 Tru64 4.0f
 
 One of our developers here is writing a utility
to provide fast unloads of
 tables (to replace fastunloader as it happens)
 
 His problem is as follows. Start from the bottom
and work your way up. I
 would be really grateful if anyone can offer up
some alternatives for us.
 
 Regards
 
 Lee
 
 
   -Original Message-
  From:   Dudley Dave - ddudle  
  Sent:   29 May 2002 16:04
  To: Robertson Lee - lerobe
  Subject:RE: Do you still have that
SQL Expert?
  
  
  No, you miss the point. I'm explicitly NOT
using PQ (or at least not
  explicitly using it).
  
  Using a parallel hint on huge table unloads
- with the
  single-threaded version of the code (i.e.
pipdynsql.v2.0.0) didn't seem
 to
  make much difference at all. I didn't do the
tests directly though,
 poeple
  on the account did. So it may be that the
tables already had a degree of
  parallelism built in, in which case I'd guess
the hint would be
 redundant.
  
  What I mean is that even if you use PQ for
the server to extract the
  data in parallel you still have the bottle neck
of a single client to
 send
  it all back to. That's what I was trying to get
around. Assuming that
  we're not generally using the full network
bandwidth, I'd assume that
  multiple clients ought to be able to dump out
separate sections of a
 table
  at the same time, at roughly the same speed at
a single client would
  unload a single table - i.e double the
throughput.
  
  But I can't find anything on the web to
tell me the best / most
  efficient way to actually do this. (By the way,
I've tried the
 NO_PARALLEL
  hint too, to stop the server setting off too
many conflicting slaves on
  its side. Again no better as far as I could
tell.)
  
  N.B. Not sure if you'd suggest it, but
before you do: most of the
  tables we'd really want to use this for are
massive, and so are already
  partitioned. So where I say table I mean
either that or a partition
  thereof. Besides, need a generic solution that
doesn't rely on having to
  partition your table to unload it quickly.
  
  By the way, I'm specifically testing speed
of my original

Advice needed please

2002-05-30 Thread Robertson Lee - lerobe

Oracle 8.0.5

Tru64 4.0f

One of our developers here is writing a utility to provide fast unloads of
tables (to replace fastunloader as it happens)

His problem is as follows. Start from the bottom and work your way up. I
would be really grateful if anyone can offer up some alternatives for us.

Regards

Lee


-Original Message-
   From:   Dudley Dave - ddudle  
   Sent:   29 May 2002 16:04
   To: Robertson Lee - lerobe
   Subject:RE: Do you still have that SQL Expert?
 
 
   No, you miss the point. I'm explicitly NOT using PQ (or at least not
 explicitly using it).
 
   Using a parallel hint on huge table unloads - with the
 single-threaded version of the code (i.e. pipdynsql.v2.0.0) didn't seem to
 make much difference at all. I didn't do the tests directly though, poeple
 on the account did. So it may be that the tables already had a degree of
 parallelism built in, in which case I'd guess the hint would be redundant.
 
   What I mean is that even if you use PQ for the server to extract the
 data in parallel you still have the bottle neck of a single client to send
 it all back to. That's what I was trying to get around. Assuming that
 we're not generally using the full network bandwidth, I'd assume that
 multiple clients ought to be able to dump out separate sections of a table
 at the same time, at roughly the same speed at a single client would
 unload a single table - i.e double the throughput.
 
   But I can't find anything on the web to tell me the best / most
 efficient way to actually do this. (By the way, I've tried the NO_PARALLEL
 hint too, to stop the server setting off too many conflicting slaves on
 its side. Again no better as far as I could tell.)
 
   N.B. Not sure if you'd suggest it, but before you do: most of the
 tables we'd really want to use this for are massive, and so are already
 partitioned. So where I say table I mean either that or a partition
 thereof. Besides, need a generic solution that doesn't rely on having to
 partition your table to unload it quickly.
 
   By the way, I'm specifically testing speed of my original code (e.g.
 pipdynsql.v2.0.0 user/pass select * from table) against the new
 multithreaded development code - i.e. regardless of the machine load at
 the time, I want to see if multiple simultaneous unloads can be quicker
 than a single unload client (at the expense of using more machine / Oracle
 resource obviously).
 
   Is this making sense?
 
   Dave
 
-Original Message-
   From:   Robertson Lee - lerobe  
   Sent:   29 May 2002 15:14
   To: Dudley Dave - ddudle
   Subject:RE: Do you still have that SQL Expert?
 
 
   How are you using PQ, is it just a hint ??. Which tables are you
 testing against.
 
 
-Original Message-
   From:   Dudley Dave - ddudle  
   Sent:   29 May 2002 14:27
   To: Robertson Lee - lerobe
   Subject:RE: Do you still have that SQL Expert?
 
   OK Clever-Trousers,
 
   As you're so hot on table/index disk striping...
 
   I've written the program pipdynsql, which as you may have
 heard (this lunchtime if not before), people want to use to replace
 FastUnloader.
 
   I've been playing about with a new multithreaded version to
 try to download a table in sections to multiple client threads which then
 write back out to a single file (either ordered, or for max speed in
 random/undefined order).
 
   Can you think of any quick ways to do this, or tricks to
 try?
 
   I've tried ranges of rowids (as I'm told that's hold
 parrallel query works) but the ROWID (tab) hint does not seem to go
 through the table in rowid order. And it's a massive overhead to order by
 rowid to work out non-overlapping ranges. And even if you do, you have to
 say WHERE rowid = xxx AND rowid = xxx (as I say, can't force it to go in
 ROWID sort order) so this tends to be slower than nect opt...
 
   tried assuming there's a unique index and giving start
 points to each slave thread, which then selects a set number of rows. This
 is prety quick, but even this seems slower than a simple SELECT * FROM
 table (for the same number of records).
 
   tried loading temp rowid tables with sets of rowids and
 each slave does a full table scan of its rowid set table, with where
 clause connection to the data table.
 
 
   Can't find anything better than my original method which
 selects a unique key from an index with a master thread, for every x'th
 rownum. Then hands these out to the slave threads to select * from table
 where unique key = given key for specified number of records. e.g. master
 pulls out every 100,000th key with a modulus and each slave dumps out
 100,000 rows at a time, starting at the key its given.
 
   The above uses a temp 

RE: Advice needed please

2002-05-30 Thread DENNIS WILLIAMS

Lee - Just some reactions, few answers.
  - Generally a process like this will be disk-bound, not CPU-bound, so idle
CPU time is to be expected unless your disk is REALLY fast.
  - Multiple simultaneous full-table scans may not be any faster because the
disk heads may need to flit to and fro in order to satisfy each process'
request. Sometimes a single full table scan is as fast is it gets for a
mechanical device like a disk. RAID will be faster, of course, but
ultimately the RAID is composed of disks.
  - Trying for something faster than select * is a real challenge. To
perform a full table scan, Oracle must read each data block. The alternative
is index scanning, but this means reading an index block, fetching a data
block, etc. Not faster if you're going to eventually read all data blocks
anyway. 
  - If select * isn't fast enough, you should consider using table
partitioning. That way each process can separately scan a separate partition
and separately write to your output files.
Hopefully someone else will think of a bright idea I've missed.
Dennis Williams
DBA
Lifetouch, Inc.
[EMAIL PROTECTED]


-Original Message-
Sent: Thursday, May 30, 2002 6:48 AM
To: Multiple recipients of list ORACLE-L


Oracle 8.0.5

Tru64 4.0f

One of our developers here is writing a utility to provide fast unloads of
tables (to replace fastunloader as it happens)

His problem is as follows. Start from the bottom and work your way up. I
would be really grateful if anyone can offer up some alternatives for us.

Regards

Lee


-Original Message-
   From:   Dudley Dave - ddudle  
   Sent:   29 May 2002 16:04
   To: Robertson Lee - lerobe
   Subject:RE: Do you still have that SQL Expert?
 
 
   No, you miss the point. I'm explicitly NOT using PQ (or at least not
 explicitly using it).
 
   Using a parallel hint on huge table unloads - with the
 single-threaded version of the code (i.e. pipdynsql.v2.0.0) didn't seem to
 make much difference at all. I didn't do the tests directly though, poeple
 on the account did. So it may be that the tables already had a degree of
 parallelism built in, in which case I'd guess the hint would be redundant.
 
   What I mean is that even if you use PQ for the server to extract the
 data in parallel you still have the bottle neck of a single client to send
 it all back to. That's what I was trying to get around. Assuming that
 we're not generally using the full network bandwidth, I'd assume that
 multiple clients ought to be able to dump out separate sections of a table
 at the same time, at roughly the same speed at a single client would
 unload a single table - i.e double the throughput.
 
   But I can't find anything on the web to tell me the best / most
 efficient way to actually do this. (By the way, I've tried the NO_PARALLEL
 hint too, to stop the server setting off too many conflicting slaves on
 its side. Again no better as far as I could tell.)
 
   N.B. Not sure if you'd suggest it, but before you do: most of the
 tables we'd really want to use this for are massive, and so are already
 partitioned. So where I say table I mean either that or a partition
 thereof. Besides, need a generic solution that doesn't rely on having to
 partition your table to unload it quickly.
 
   By the way, I'm specifically testing speed of my original code (e.g.
 pipdynsql.v2.0.0 user/pass select * from table) against the new
 multithreaded development code - i.e. regardless of the machine load at
 the time, I want to see if multiple simultaneous unloads can be quicker
 than a single unload client (at the expense of using more machine / Oracle
 resource obviously).
 
   Is this making sense?
 
   Dave
 
-Original Message-
   From:   Robertson Lee - lerobe  
   Sent:   29 May 2002 15:14
   To: Dudley Dave - ddudle
   Subject:RE: Do you still have that SQL Expert?
 
 
   How are you using PQ, is it just a hint ??. Which tables are you
 testing against.
 
 
-Original Message-
   From:   Dudley Dave - ddudle  
   Sent:   29 May 2002 14:27
   To: Robertson Lee - lerobe
   Subject:RE: Do you still have that SQL Expert?
 
   OK Clever-Trousers,
 
   As you're so hot on table/index disk striping...
 
   I've written the program pipdynsql, which as you may have
 heard (this lunchtime if not before), people want to use to replace
 FastUnloader.
 
   I've been playing about with a new multithreaded version to
 try to download a table in sections to multiple client threads which then
 write back out to a single file (either ordered, or for max speed in
 random/undefined order).
 
   Can you think of any quick ways to do this, or tricks to
 try?
 
   I've tried ranges of rowids (as I'm told that's hold
 parrallel query works) but the ROWID 

RE: Advice needed please

2002-05-30 Thread Robertson Lee - lerobe

Thanks Dennis.

Anything whether it be a reaction or an answer, is welcome.

Regards

Lee 


-Original Message-
Sent: 30 May 2002 15:08
To: Multiple recipients of list ORACLE-L


Lee - Just some reactions, few answers.
  - Generally a process like this will be disk-bound, not CPU-bound, so idle
CPU time is to be expected unless your disk is REALLY fast.
  - Multiple simultaneous full-table scans may not be any faster because the
disk heads may need to flit to and fro in order to satisfy each process'
request. Sometimes a single full table scan is as fast is it gets for a
mechanical device like a disk. RAID will be faster, of course, but
ultimately the RAID is composed of disks.
  - Trying for something faster than select * is a real challenge. To
perform a full table scan, Oracle must read each data block. The alternative
is index scanning, but this means reading an index block, fetching a data
block, etc. Not faster if you're going to eventually read all data blocks
anyway. 
  - If select * isn't fast enough, you should consider using table
partitioning. That way each process can separately scan a separate partition
and separately write to your output files.
Hopefully someone else will think of a bright idea I've missed.
Dennis Williams
DBA
Lifetouch, Inc.
[EMAIL PROTECTED]


-Original Message-
Sent: Thursday, May 30, 2002 6:48 AM
To: Multiple recipients of list ORACLE-L


Oracle 8.0.5

Tru64 4.0f

One of our developers here is writing a utility to provide fast unloads of
tables (to replace fastunloader as it happens)

His problem is as follows. Start from the bottom and work your way up. I
would be really grateful if anyone can offer up some alternatives for us.

Regards

Lee


-Original Message-
   From:   Dudley Dave - ddudle  
   Sent:   29 May 2002 16:04
   To: Robertson Lee - lerobe
   Subject:RE: Do you still have that SQL Expert?
 
 
   No, you miss the point. I'm explicitly NOT using PQ (or at least not
 explicitly using it).
 
   Using a parallel hint on huge table unloads - with the
 single-threaded version of the code (i.e. pipdynsql.v2.0.0) didn't seem to
 make much difference at all. I didn't do the tests directly though, poeple
 on the account did. So it may be that the tables already had a degree of
 parallelism built in, in which case I'd guess the hint would be redundant.
 
   What I mean is that even if you use PQ for the server to extract the
 data in parallel you still have the bottle neck of a single client to send
 it all back to. That's what I was trying to get around. Assuming that
 we're not generally using the full network bandwidth, I'd assume that
 multiple clients ought to be able to dump out separate sections of a table
 at the same time, at roughly the same speed at a single client would
 unload a single table - i.e double the throughput.
 
   But I can't find anything on the web to tell me the best / most
 efficient way to actually do this. (By the way, I've tried the NO_PARALLEL
 hint too, to stop the server setting off too many conflicting slaves on
 its side. Again no better as far as I could tell.)
 
   N.B. Not sure if you'd suggest it, but before you do: most of the
 tables we'd really want to use this for are massive, and so are already
 partitioned. So where I say table I mean either that or a partition
 thereof. Besides, need a generic solution that doesn't rely on having to
 partition your table to unload it quickly.
 
   By the way, I'm specifically testing speed of my original code (e.g.
 pipdynsql.v2.0.0 user/pass select * from table) against the new
 multithreaded development code - i.e. regardless of the machine load at
 the time, I want to see if multiple simultaneous unloads can be quicker
 than a single unload client (at the expense of using more machine / Oracle
 resource obviously).
 
   Is this making sense?
 
   Dave
 
-Original Message-
   From:   Robertson Lee - lerobe  
   Sent:   29 May 2002 15:14
   To: Dudley Dave - ddudle
   Subject:RE: Do you still have that SQL Expert?
 
 
   How are you using PQ, is it just a hint ??. Which tables are you
 testing against.
 
 
-Original Message-
   From:   Dudley Dave - ddudle  
   Sent:   29 May 2002 14:27
   To: Robertson Lee - lerobe
   Subject:RE: Do you still have that SQL Expert?
 
   OK Clever-Trousers,
 
   As you're so hot on table/index disk striping...
 
   I've written the program pipdynsql, which as you may have
 heard (this lunchtime if not before), people want to use to replace
 FastUnloader.
 
   I've been playing about with a new multithreaded version to
 try to download a table in sections to multiple client threads which then
 write back out to a single file (either ordered, or for max speed in
 random/undefined order).
 

RE: Advice needed please

2002-05-30 Thread Rahul

how about index organizing the table ? or .. creating an index on all the
columns of the table..? 
this way the select will read only the index blocks..!!

 --
 From: Robertson Lee - lerobe[SMTP:[EMAIL PROTECTED]]
 Reply To: [EMAIL PROTECTED]
 Sent: Thursday, May 30, 2002 9:43 PM
 To:   Multiple recipients of list ORACLE-L
 Subject:  RE: Advice needed please
 
 Thanks Dennis.
 
 Anything whether it be a reaction or an answer, is welcome.
 
 Regards
 
 Lee 
 
 
 -Original Message-
 Sent: 30 May 2002 15:08
 To: Multiple recipients of list ORACLE-L
 
 
 Lee - Just some reactions, few answers.
   - Generally a process like this will be disk-bound, not CPU-bound, so
 idle
 CPU time is to be expected unless your disk is REALLY fast.
   - Multiple simultaneous full-table scans may not be any faster because
 the
 disk heads may need to flit to and fro in order to satisfy each process'
 request. Sometimes a single full table scan is as fast is it gets for a
 mechanical device like a disk. RAID will be faster, of course, but
 ultimately the RAID is composed of disks.
   - Trying for something faster than select * is a real challenge. To
 perform a full table scan, Oracle must read each data block. The
 alternative
 is index scanning, but this means reading an index block, fetching a data
 block, etc. Not faster if you're going to eventually read all data blocks
 anyway. 
   - If select * isn't fast enough, you should consider using table
 partitioning. That way each process can separately scan a separate
 partition
 and separately write to your output files.
 Hopefully someone else will think of a bright idea I've missed.
 Dennis Williams
 DBA
 Lifetouch, Inc.
 [EMAIL PROTECTED]
 
 
 -Original Message-
 Sent: Thursday, May 30, 2002 6:48 AM
 To: Multiple recipients of list ORACLE-L
 
 
 Oracle 8.0.5
 
 Tru64 4.0f
 
 One of our developers here is writing a utility to provide fast unloads of
 tables (to replace fastunloader as it happens)
 
 His problem is as follows. Start from the bottom and work your way up. I
 would be really grateful if anyone can offer up some alternatives for us.
 
 Regards
 
 Lee
 
 
   -Original Message-
  From:   Dudley Dave - ddudle  
  Sent:   29 May 2002 16:04
  To: Robertson Lee - lerobe
  Subject:RE: Do you still have that SQL Expert?
  
  
  No, you miss the point. I'm explicitly NOT using PQ (or at least not
  explicitly using it).
  
  Using a parallel hint on huge table unloads - with the
  single-threaded version of the code (i.e. pipdynsql.v2.0.0) didn't seem
 to
  make much difference at all. I didn't do the tests directly though,
 poeple
  on the account did. So it may be that the tables already had a degree of
  parallelism built in, in which case I'd guess the hint would be
 redundant.
  
  What I mean is that even if you use PQ for the server to extract the
  data in parallel you still have the bottle neck of a single client to
 send
  it all back to. That's what I was trying to get around. Assuming that
  we're not generally using the full network bandwidth, I'd assume that
  multiple clients ought to be able to dump out separate sections of a
 table
  at the same time, at roughly the same speed at a single client would
  unload a single table - i.e double the throughput.
  
  But I can't find anything on the web to tell me the best / most
  efficient way to actually do this. (By the way, I've tried the
 NO_PARALLEL
  hint too, to stop the server setting off too many conflicting slaves on
  its side. Again no better as far as I could tell.)
  
  N.B. Not sure if you'd suggest it, but before you do: most of the
  tables we'd really want to use this for are massive, and so are already
  partitioned. So where I say table I mean either that or a partition
  thereof. Besides, need a generic solution that doesn't rely on having to
  partition your table to unload it quickly.
  
  By the way, I'm specifically testing speed of my original code (e.g.
  pipdynsql.v2.0.0 user/pass select * from table) against the new
  multithreaded development code - i.e. regardless of the machine load at
  the time, I want to see if multiple simultaneous unloads can be quicker
  than a single unload client (at the expense of using more machine /
 Oracle
  resource obviously).
  
  Is this making sense?
  
  Dave
  
   -Original Message-
  From:   Robertson Lee - lerobe  
  Sent:   29 May 2002 15:14
  To: Dudley Dave - ddudle
  Subject:RE: Do you still have that SQL Expert?
  
  
  How are you using PQ, is it just a hint ??. Which tables are you
  testing against.
  
  
   -Original Message-
  From:   Dudley Dave - ddudle  
  Sent:   29 May 2002 14:27
  To: Robertson Lee - lerobe
  Subject:RE: Do you still have that SQL Expert?
  
  OK Clever