RE: pl/sql question and owa_pattern question

2003-11-21 Thread Stephane Faroult
Guang,

   I agree with your analysis, looping on characters is not the faster you can do, 
simply because there is a significant overhead (compared to C code for instance) in a 
language such as PL/SQL - which might be perfectly acceptable in some circumstances, 
much less so in very repetitive tasks. 'Native compiling', ie turning PL/SQL in C, 
might improve performance. However, in my view the best performance gains you may get 
is by, so to speak, pushing the bulk of the processing deeper into the kernel (which 
isn't by the way exclusive of native compiling). Using a function such as INSTR() will 
be much more efficient than looping on characters.
 I would suggest something such as :
   - First use TRANSLATE() to replace all the characters you want to get rid of by a 
single, well identified character, say # (use CHR() || ... for non printable 
characters - you can build up the string of characters to translate in the 
initialisation section of a package rather than typing it).
   - Start with initializing your string to LTRIM(string, '#')
   - Then as long as pos := INSTR(string, '#') isn't 0,
 get your token as substr(string, 1, pos - 1) then assign ltrim(substr(string, pos 
+ 1), '#') to string (very similar to what you were planning to do with owa).

This will be probably much faster than a character-by-character loop and calls to an 
owa package.

HTH,

Stephane Faroult

- --- Original Message --- -
From: Guang Mei [EMAIL PROTECTED]
To: Multiple recipients of list ORACLE-L
[EMAIL PROTECTED]
Sent: Thu, 20 Nov 2003 19:39:55

Hi:

In my pl/sql program, I want to process each word
in a string. The
string is selected from a varchar2(300) column. The
delimit that separates
the words is not necessary space character. The
definition of the delimit
in this program is set as

1. Any character that is NOT AlphaNumerical (0-9,
A-Z,a-z)
and
2. the character is not one of these:  '-.,/*_'

Now my program is basically checking each
character, find the delimit, and
rebuild each word. After that I process each
word. The code looks like
this:

---
str :=  This will be a long string with length
upto 300 characters, it
may contain some invisible characters';
len := length(str)+1;
  for i in 1..len loop
ch := substr(str,i,1);
if (not strings.isAlnum(ch) and
instr('-.,/*_', ch)1)  then
  if word is not null then
-- do some processing to variable word !
word := null;-- reset it
  end if;
else
  word := word || ch;   -- concat ch to word
end if;
  end loop;

---

I think It's taking too long because it loops
through each characters. I
hope I could find a way to speed it up. I don't
have experiience in
owa_pattern, but I thought there might be a way to
do it here:


str :=  This will be a long string with length
upto 300 characters, it
may contain some invisible characters';
newstr := str;
pos := 1;
while pos != 0 loop
pos := owa_pattern.amatch(newstr, 1, '\W');   
-- how can I mask out
these  '-.,/*_'  ???
word := substr(newstr, 1, pos-1);
-- do some processing to variable word !
if pos != 0 then
  newstr := substr(newstr, pos+1);
end if;
end loop;
--

My simple tests showed that owa_pattern call is
much slower than direct
string manupilation. But I would like to try it in
this case if I could
easily get the wrods from the string. Any
suggestions?

TIA.

Guang

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Stephane Faroult
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).


RE: pl/sql question and owa_pattern question

2003-11-21 Thread Guang Mei
Hi Stephane:

Thanks for your good suggestion. I compared the method you suggested and the orginal 
one and it indeed boosted the performance (in my simple test). However the ONLY 
problem I am having is that by doing TRANSLATE, I lost the original delimits. The new 
method (you suggested) correctly extract the words (and sent for processing), But 
after processing I need to put processed-words back to the orginal string with orginal 
demilters un-changed. I tried to track to position of delimit from the orginal string 
by doing

global_pos := global_pos + pos ;

in my while loop, but  ltrim(substr(string, pos + 1), '#')  will make global_pos 
wrong when ltrim trims '#'. Any work-around?

TIA.

Guang

-Original Message-
Stephane Faroult
Sent: Friday, November 21, 2003 4:19 AM
To: Multiple recipients of list ORACLE-L


Guang,

   I agree with your analysis, looping on characters is not the faster you can do, 
simply because there is a significant overhead (compared to C code for instance) in a 
language such as PL/SQL - which might be perfectly acceptable in some circumstances, 
much less so in very repetitive tasks. 'Native compiling', ie turning PL/SQL in C, 
might improve performance. However, in my view the best performance gains you may get 
is by, so to speak, pushing the bulk of the processing deeper into the kernel (which 
isn't by the way exclusive of native compiling). Using a function such as INSTR() will 
be much more efficient than looping on characters.
 I would suggest something such as :
   - First use TRANSLATE() to replace all the characters you want to get rid of by a 
single, well identified character, say # (use CHR() || ... for non printable 
characters - you can build up the string of characters to translate in the 
initialisation section of a package rather than typing it).
   - Start with initializing your string to LTRIM(string, '#')
   - Then as long as pos := INSTR(string, '#') isn't 0,
 get your token as substr(string, 1, pos - 1) then assign ltrim(substr(string, pos 
+ 1), '#') to string (very similar to what you were planning to do with owa).

This will be probably much faster than a character-by-character loop and calls to an 
owa package.

HTH,

Stephane Faroult

- --- Original Message --- -
From: Guang Mei [EMAIL PROTECTED]
To: Multiple recipients of list ORACLE-L
[EMAIL PROTECTED]
Sent: Thu, 20 Nov 2003 19:39:55

Hi:

In my pl/sql program, I want to process each word
in a string. The
string is selected from a varchar2(300) column. The
delimit that separates
the words is not necessary space character. The
definition of the delimit
in this program is set as

1. Any character that is NOT AlphaNumerical (0-9,
A-Z,a-z)
and
2. the character is not one of these:  '-.,/*_'

Now my program is basically checking each
character, find the delimit, and
rebuild each word. After that I process each
word. The code looks like
this:

---
str :=  This will be a long string with length
upto 300 characters, it
may contain some invisible characters';
len := length(str)+1;
  for i in 1..len loop
ch := substr(str,i,1);
if (not strings.isAlnum(ch) and
instr('-.,/*_', ch)1)  then
  if word is not null then
-- do some processing to variable word !
word := null;-- reset it
  end if;
else
  word := word || ch;   -- concat ch to word
end if;
  end loop;

---

I think It's taking too long because it loops
through each characters. I
hope I could find a way to speed it up. I don't
have experiience in
owa_pattern, but I thought there might be a way to
do it here:


str :=  This will be a long string with length
upto 300 characters, it
may contain some invisible characters';
newstr := str;
pos := 1;
while pos != 0 loop
pos := owa_pattern.amatch(newstr, 1, '\W');   
-- how can I mask out
these  '-.,/*_'  ???
word := substr(newstr, 1, pos-1);
-- do some processing to variable word !
if pos != 0 then
  newstr := substr(newstr, pos+1);
end if;
end loop;
--

My simple tests showed that owa_pattern call is
much slower than direct
string manupilation. But I would like to try it in
this case if I could
easily get the wrods from the string. Any
suggestions?

TIA.

Guang

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Stephane Faroult
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Guang Mei
  

RE: pl/sql question and owa_pattern question

2003-11-21 Thread Guang Mei
Perl is a good tool for text processing. But our program is already written
in pl/sql long time ago and there are intensive db calls in this pl/sql
program. (text processing is only part of it). So I can not change that.

BTW I did a comparison study a while ago for some of our pl/sql packages
(specifically for our application). When there are lots of db calls (select,
insert, update and delete), pl/sql package is faster than correponding perl
program (I made sure sqls are prepared once and used bind variables in perl.
All code were executed on the unix server, no other programs were running,
etc). That's why we stick to pl/sql because our app need the performance.
Others may have different results, it all depends on what the code does.

Guang

-Original Message-
Mladen Gogala
Sent: Thursday, November 20, 2003 11:14 PM
To: Multiple recipients of list ORACLE-L


I don't know about PL/SQL but here is how I would get separate words from a
big string:

#!/usr/bin/perl -w
use strict;
my (@ARR);
while () {
chomp;
@ARR = split(/[^0-9a-zA-Z_\.,]/);
foreach (@ARR) {
print $_\n;
}
}

There is something called DBI and it can be used to insert separated words
into the database, instead
of printing them. The bottom line is that perl is an excellent tool for
parsing strings and  all sorts of string
manipulation.

On 2003.11.20 22:39, Guang Mei wrote:
 Hi:

 In my pl/sql program, I want to process each word in a string. The
 string is selected from a varchar2(300) column. The delimit that separates
 the words is not necessary space character. The definition of the delimit
 in this program is set as

 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z)
 and
 2. the character is not one of these:  '-.,/*_'

 Now my program is basically checking each character, find the delimit, and
 rebuild each word. After that I process each word. The code looks like
 this:

 ---
 str :=  This will be a long string with length upto 300 characters, it
 may contain some invisible characters';
 len := length(str)+1;
   for i in 1..len loop
 ch := substr(str,i,1);
 if (not strings.isAlnum(ch) and instr('-.,/*_', ch)1)  then
   if word is not null then
 -- do some processing to variable word !
 word := null;-- reset it
   end if;
 else
   word := word || ch;   -- concat ch to word
 end if;
   end loop;

 ---

 I think It's taking too long because it loops through each characters. I
 hope I could find a way to speed it up. I don't have experiience in
 owa_pattern, but I thought there might be a way to do it here:

 
 str :=  This will be a long string with length upto 300 characters, it
 may contain some invisible characters';
 newstr := str;
 pos := 1;
 while pos != 0 loop
 pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out
 these  '-.,/*_'  ???
 word := substr(newstr, 1, pos-1);
 -- do some processing to variable word !
 if pos != 0 then
   newstr := substr(newstr, pos+1);
 end if;
 end loop;
 --

 My simple tests showed that owa_pattern call is much slower than direct
 string manupilation. But I would like to try it in this case if I could
 easily get the wrods from the string. Any suggestions?

 TIA.

 Guang


 --
 Please see the official ORACLE-L FAQ: http://www.orafaq.net
 --
 Author: Guang Mei
   INET: [EMAIL PROTECTED]

 Fat City Network Services-- 858-538-5051 http://www.fatcity.com
 San Diego, California-- Mailing list and web hosting services
 -
 To REMOVE yourself from this mailing list, send an E-Mail message
 to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
 the message BODY, include a line containing: UNSUB ORACLE-L
 (or the name of mailing list you want to be removed from).  You may
 also send the HELP command for other information (like subscribing).


--
Mladen Gogala
Oracle DBA
--
Please see the official ORACLE-L FAQ: http://www.orafaq.net
--
Author: Mladen Gogala
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Guang Mei
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an 

RE: pl/sql question and owa_pattern question

2003-11-21 Thread Jamadagni, Rajendra
Guang,

Well you are almost there ...  you need fifo structure  namely a pl/sql array

1. create a local pl/sql array to store the delimiter (store the ascii value of the 
delimiter to be safe) my_array (varchar2(5))
2. as you find a delimiter insert into the first position in the array and replace the 
delimiting character with #
3. lather.rinse.repeat.

when it is time to put it back
use a loop

nIndex := 0;
nPos   := 0;
loop
  npos := instr(my_str,'#',1);
  exit when npos := 0;
  nIndex := nindex + 1;
  my_str := substr(my_str,1,nPos-1) || chr(my_array(nIndex)) || sybstr(my_str, nPos+1);
end loop;


something like this should help, proof-read though ...

Raj

Rajendra dot Jamadagni at nospamespn dot com
All Views expressed in this email are strictly personal.
QOTD: Any clod can have facts, having an opinion is an art !


-Original Message-
Sent: Friday, November 21, 2003 11:44 AM
To: Multiple recipients of list ORACLE-L


Hi Stephane:

Thanks for your good suggestion. I compared the method you suggested and the orginal 
one and it indeed boosted the performance (in my simple test). However the ONLY 
problem I am having is that by doing TRANSLATE, I lost the original delimits. The new 
method (you suggested) correctly extract the words (and sent for processing), But 
after processing I need to put processed-words back to the orginal string with orginal 
demilters un-changed. I tried to track to position of delimit from the orginal string 
by doing

global_pos := global_pos + pos ;

in my while loop, but  ltrim(substr(string, pos + 1), '#')  will make global_pos 
wrong when ltrim trims '#'. Any work-around?

TIA.

Guang

-Original Message-
Stephane Faroult
Sent: Friday, November 21, 2003 4:19 AM
To: Multiple recipients of list ORACLE-L


Guang,

   I agree with your analysis, looping on characters is not the faster you can do, 
simply because there is a significant overhead (compared to C code for instance) in a 
language such as PL/SQL - which might be perfectly acceptable in some circumstances, 
much less so in very repetitive tasks. 'Native compiling', ie turning PL/SQL in C, 
might improve performance. However, in my view the best performance gains you may get 
is by, so to speak, pushing the bulk of the processing deeper into the kernel (which 
isn't by the way exclusive of native compiling). Using a function such as INSTR() will 
be much more efficient than looping on characters.
 I would suggest something such as :
   - First use TRANSLATE() to replace all the characters you want to get rid of by a 
single, well identified character, say # (use CHR() || ... for non printable 
characters - you can build up the string of characters to translate in the 
initialisation section of a package rather than typing it).
   - Start with initializing your string to LTRIM(string, '#')
   - Then as long as pos := INSTR(string, '#') isn't 0,
 get your token as substr(string, 1, pos - 1) then assign ltrim(substr(string, pos 
+ 1), '#') to string (very similar to what you were planning to do with owa).

This will be probably much faster than a character-by-character loop and calls to an 
owa package.

HTH,

Stephane Faroult

- --- Original Message --- -
From: Guang Mei [EMAIL PROTECTED]
To: Multiple recipients of list ORACLE-L
[EMAIL PROTECTED]
Sent: Thu, 20 Nov 2003 19:39:55

Hi:

In my pl/sql program, I want to process each word
in a string. The
string is selected from a varchar2(300) column. The
delimit that separates
the words is not necessary space character. The
definition of the delimit
in this program is set as

1. Any character that is NOT AlphaNumerical (0-9,
A-Z,a-z)
and
2. the character is not one of these:  '-.,/*_'

Now my program is basically checking each
character, find the delimit, and
rebuild each word. After that I process each
word. The code looks like
this:

---
str :=  This will be a long string with length
upto 300 characters, it
may contain some invisible characters';
len := length(str)+1;
  for i in 1..len loop
ch := substr(str,i,1);
if (not strings.isAlnum(ch) and
instr('-.,/*_', ch)1)  then
  if word is not null then
-- do some processing to variable word !
word := null;-- reset it
  end if;
else
  word := word || ch;   -- concat ch to word
end if;
  end loop;

---

I think It's taking too long because it loops
through each characters. I
hope I could find a way to speed it up. I don't
have experiience in
owa_pattern, but I thought there might be a way to
do it here:


str :=  This will be a long string with length
upto 300 characters, it
may contain some invisible characters';
newstr := str;
pos := 1;
while pos != 0 loop
pos := owa_pattern.amatch(newstr, 1, '\W');   
-- how can I mask out
these  '-.,/*_'  ???
word := substr(newstr, 1, pos-1);
-- do some processing to variable word !
if pos != 0 

Re: pl/sql question and owa_pattern question

2003-11-21 Thread Mladen Gogala
PL/SQL is the fastest thing of them all when it comes to executing 
SQL commands, but there are things which simply aren't practical 
in 9.2 PL/SQL. Regular expression processing is one of those things.
Fortunately, you can mix the two. Without DBI, perl scripts simply
woudn't be very useful. Of course, there are things that are faster
then even the fastest perl script. Lexer written in C is one of them
and you don't need much work to write one, either, but using OCI is
not easy. OCI is a library written to confuse the enemy, not to help
developer. Using plain and simple regex or PCRE within a C program
is the same thing as above, but slightly more complicated then a lexer.
For the specific task of manipulating patterns and resolving regular
expressions, I use perl almost exclusively because I find it an optimal 
tradeoff between ease of use and performance. If performance is a 
paramount, as in real time application processing, then you'll have to 
resort to C and, possibly, write an external procedure and, thus,
enabling oracle to use C regex calls or even pcre. I was toying with the 
idea of enabling oracle to use PCRE but I gave up when I read that 10g 
will have that included.

On 11/21/2003 11:59:31 AM, Guang Mei wrote:
 Perl is a good tool for text processing. But our program is already written
 in pl/sql long time ago and there are intensive db calls in this pl/sql
 program. (text processing is only part of it). So I can not change that.
 
 BTW I did a comparison study a while ago for some of our pl/sql packages
 (specifically for our application). When there are lots of db calls (select,
 insert, update and delete), pl/sql package is faster than correponding perl
 program (I made sure sqls are prepared once and used bind variables in perl.
 All code were executed on the unix server, no other programs were running,
 etc). That's why we stick to pl/sql because our app need the performance.
 Others may have different results, it all depends on what the code does.
 
 Guang
 
 -Original Message-
 Mladen Gogala
 Sent: Thursday, November 20, 2003 11:14 PM
 To: Multiple recipients of list ORACLE-L
 
 
 I don't know about PL/SQL but here is how I would get separate words from a
 big string:
 
 #!/usr/bin/perl -w
 use strict;
 my (@ARR);
 while () {
 chomp;
 @ARR = split(/[^0-9a-zA-Z_\.,]/);
 foreach (@ARR) {
 print $_\n;
 }
 }
 
 There is something called DBI and it can be used to insert separated words
 into the database, instead
 of printing them. The bottom line is that perl is an excellent tool for
 parsing strings and  all sorts of string
 manipulation.
 
 On 2003.11.20 22:39, Guang Mei wrote:
  Hi:
 
  In my pl/sql program, I want to process each word in a string. The
  string is selected from a varchar2(300) column. The delimit that separates
  the words is not necessary space character. The definition of the delimit
  in this program is set as
 
  1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z)
  and
  2. the character is not one of these:  '-.,/*_'
 
  Now my program is basically checking each character, find the delimit, and
  rebuild each word. After that I process each word. The code looks like
  this:
 
  ---
  str :=  This will be a long string with length upto 300 characters, it
  may contain some invisible characters';
  len := length(str)+1;
for i in 1..len loop
  ch := substr(str,i,1);
  if (not strings.isAlnum(ch) and instr('-.,/*_', ch)1)  then
if word is not null then
  -- do some processing to variable word !
  word := null;-- reset it
end if;
  else
word := word || ch;   -- concat ch to word
  end if;
end loop;
 
  ---
 
  I think It's taking too long because it loops through each characters. I
  hope I could find a way to speed it up. I don't have experiience in
  owa_pattern, but I thought there might be a way to do it here:
 
  
  str :=  This will be a long string with length upto 300 characters, it
  may contain some invisible characters';
  newstr := str;
  pos := 1;
  while pos != 0 loop
  pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out
  these  '-.,/*_'  ???
  word := substr(newstr, 1, pos-1);
  -- do some processing to variable word !
  if pos != 0 then
newstr := substr(newstr, pos+1);
  end if;
  end loop;
  --
 
  My simple tests showed that owa_pattern call is much slower than direct
  string manupilation. But I would like to try it in this case if I could
  easily get the wrods from the string. Any suggestions?
 
  TIA.
 
  Guang
 
 
  --
  Please see the official ORACLE-L FAQ: http://www.orafaq.net
  --
  Author: Guang Mei
INET: [EMAIL PROTECTED]
 
  Fat City Network Services-- 858-538-5051 http://www.fatcity.com
  San Diego, California-- Mailing list and web hosting services
  -
  To REMOVE yourself from this 

Re: pl/sql question and owa_pattern question

2003-11-21 Thread Daniel Hanks
Would extproc_perl fit well enough, though, until 10g is here?

On Fri, 21 Nov 2003, Mladen Gogala wrote:

 PL/SQL is the fastest thing of them all when it comes to executing 
 SQL commands, but there are things which simply aren't practical 
 in 9.2 PL/SQL. Regular expression processing is one of those things.
 Fortunately, you can mix the two. Without DBI, perl scripts simply
 woudn't be very useful. Of course, there are things that are faster
 then even the fastest perl script. Lexer written in C is one of them
 and you don't need much work to write one, either, but using OCI is
 not easy. OCI is a library written to confuse the enemy, not to help
 developer. Using plain and simple regex or PCRE within a C program
 is the same thing as above, but slightly more complicated then a lexer.
 For the specific task of manipulating patterns and resolving regular
 expressions, I use perl almost exclusively because I find it an optimal 
 tradeoff between ease of use and performance. If performance is a 
 paramount, as in real time application processing, then you'll have to 
 resort to C and, possibly, write an external procedure and, thus,
 enabling oracle to use C regex calls or even pcre. I was toying with the 
 idea of enabling oracle to use PCRE but I gave up when I read that 10g 
 will have that included.
 
 On 11/21/2003 11:59:31 AM, Guang Mei wrote:
  Perl is a good tool for text processing. But our program is already written
  in pl/sql long time ago and there are intensive db calls in this pl/sql
  program. (text processing is only part of it). So I can not change that.
  
  BTW I did a comparison study a while ago for some of our pl/sql packages
  (specifically for our application). When there are lots of db calls (select,
  insert, update and delete), pl/sql package is faster than correponding perl
  program (I made sure sqls are prepared once and used bind variables in perl.
  All code were executed on the unix server, no other programs were running,
  etc). That's why we stick to pl/sql because our app need the performance.
  Others may have different results, it all depends on what the code does.
  
  Guang
  
  -Original Message-
  Mladen Gogala
  Sent: Thursday, November 20, 2003 11:14 PM
  To: Multiple recipients of list ORACLE-L
  
  
  I don't know about PL/SQL but here is how I would get separate words from a
  big string:
  
  #!/usr/bin/perl -w
  use strict;
  my (@ARR);
  while () {
  chomp;
  @ARR = split(/[^0-9a-zA-Z_\.,]/);
  foreach (@ARR) {
  print $_\n;
  }
  }
  
  There is something called DBI and it can be used to insert separated words
  into the database, instead
  of printing them. The bottom line is that perl is an excellent tool for
  parsing strings and  all sorts of string
  manipulation.
  
  On 2003.11.20 22:39, Guang Mei wrote:
   Hi:
  
   In my pl/sql program, I want to process each word in a string. The
   string is selected from a varchar2(300) column. The delimit that separates
   the words is not necessary space character. The definition of the delimit
   in this program is set as
  
   1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z)
   and
   2. the character is not one of these:  '-.,/*_'
  
   Now my program is basically checking each character, find the delimit, and
   rebuild each word. After that I process each word. The code looks like
   this:
  
   ---
   str :=  This will be a long string with length upto 300 characters, it
   may contain some invisible characters';
   len := length(str)+1;
 for i in 1..len loop
   ch := substr(str,i,1);
   if (not strings.isAlnum(ch) and instr('-.,/*_', ch)1)  then
 if word is not null then
   -- do some processing to variable word !
   word := null;-- reset it
 end if;
   else
 word := word || ch;   -- concat ch to word
   end if;
 end loop;
  
   ---
  
   I think It's taking too long because it loops through each characters. I
   hope I could find a way to speed it up. I don't have experiience in
   owa_pattern, but I thought there might be a way to do it here:
  
   
   str :=  This will be a long string with length upto 300 characters, it
   may contain some invisible characters';
   newstr := str;
   pos := 1;
   while pos != 0 loop
   pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out
   these  '-.,/*_'  ???
   word := substr(newstr, 1, pos-1);
   -- do some processing to variable word !
   if pos != 0 then
 newstr := substr(newstr, pos+1);
   end if;
   end loop;
   --
  
   My simple tests showed that owa_pattern call is much slower than direct
   string manupilation. But I would like to try it in this case if I could
   easily get the wrods from the string. Any suggestions?
  
   TIA.
  
   Guang
  
  
   --
   Please see the official ORACLE-L FAQ: http://www.orafaq.net
   --
   Author: Guang Mei
 INET: [EMAIL PROTECTED]
  
   Fat City 

pl/sql question and owa_pattern question

2003-11-20 Thread Guang Mei
Hi:

In my pl/sql program, I want to process each word in a string. The
string is selected from a varchar2(300) column. The delimit that separates
the words is not necessary space character. The definition of the delimit
in this program is set as

1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z)
and
2. the character is not one of these:  '-.,/*_'

Now my program is basically checking each character, find the delimit, and
rebuild each word. After that I process each word. The code looks like
this:

---
str :=  This will be a long string with length upto 300 characters, it
may contain some invisible characters';
len := length(str)+1;
  for i in 1..len loop
ch := substr(str,i,1);
if (not strings.isAlnum(ch) and instr('-.,/*_', ch)1)  then
  if word is not null then
-- do some processing to variable word !
word := null;-- reset it
  end if;
else
  word := word || ch;   -- concat ch to word
end if;
  end loop;

---

I think It's taking too long because it loops through each characters. I
hope I could find a way to speed it up. I don't have experiience in
owa_pattern, but I thought there might be a way to do it here:


str :=  This will be a long string with length upto 300 characters, it
may contain some invisible characters';
newstr := str;
pos := 1;
while pos != 0 loop
pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out
these  '-.,/*_'  ???
word := substr(newstr, 1, pos-1);
-- do some processing to variable word !
if pos != 0 then
  newstr := substr(newstr, pos+1);
end if;
end loop;
--

My simple tests showed that owa_pattern call is much slower than direct
string manupilation. But I would like to try it in this case if I could
easily get the wrods from the string. Any suggestions?

TIA.

Guang


-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Guang Mei
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).


Re: pl/sql question and owa_pattern question

2003-11-20 Thread Mladen Gogala
I don't know about PL/SQL but here is how I would get separate words from a big string:

#!/usr/bin/perl -w
use strict;
my (@ARR);
while () {
chomp;
@ARR = split(/[^0-9a-zA-Z_\.,]/);
foreach (@ARR) {
print $_\n;
}
}

There is something called DBI and it can be used to insert separated words into the 
database, instead
of printing them. The bottom line is that perl is an excellent tool for parsing 
strings and  all sorts of string
manipulation.

On 2003.11.20 22:39, Guang Mei wrote:
 Hi:
 
 In my pl/sql program, I want to process each word in a string. The
 string is selected from a varchar2(300) column. The delimit that separates
 the words is not necessary space character. The definition of the delimit
 in this program is set as
 
 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z)
 and
 2. the character is not one of these:  '-.,/*_'
 
 Now my program is basically checking each character, find the delimit, and
 rebuild each word. After that I process each word. The code looks like
 this:
 
 ---
 str :=  This will be a long string with length upto 300 characters, it
 may contain some invisible characters';
 len := length(str)+1;
   for i in 1..len loop
 ch := substr(str,i,1);
 if (not strings.isAlnum(ch) and instr('-.,/*_', ch)1)  then
   if word is not null then
 -- do some processing to variable word !
 word := null;-- reset it
   end if;
 else
   word := word || ch;   -- concat ch to word
 end if;
   end loop;
 
 ---
 
 I think It's taking too long because it loops through each characters. I
 hope I could find a way to speed it up. I don't have experiience in
 owa_pattern, but I thought there might be a way to do it here:
 
 
 str :=  This will be a long string with length upto 300 characters, it
 may contain some invisible characters';
 newstr := str;
 pos := 1;
 while pos != 0 loop
 pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out
 these  '-.,/*_'  ???
 word := substr(newstr, 1, pos-1);
 -- do some processing to variable word !
 if pos != 0 then
   newstr := substr(newstr, pos+1);
 end if;
 end loop;
 --
 
 My simple tests showed that owa_pattern call is much slower than direct
 string manupilation. But I would like to try it in this case if I could
 easily get the wrods from the string. Any suggestions?
 
 TIA.
 
 Guang
 
 
 -- 
 Please see the official ORACLE-L FAQ: http://www.orafaq.net
 -- 
 Author: Guang Mei
   INET: [EMAIL PROTECTED]
 
 Fat City Network Services-- 858-538-5051 http://www.fatcity.com
 San Diego, California-- Mailing list and web hosting services
 -
 To REMOVE yourself from this mailing list, send an E-Mail message
 to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
 the message BODY, include a line containing: UNSUB ORACLE-L
 (or the name of mailing list you want to be removed from).  You may
 also send the HELP command for other information (like subscribing).
 

-- 
Mladen Gogala
Oracle DBA
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Mladen Gogala
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).