RE: looking for faster Ideas...

2004-01-28 Thread Sias, Richard
I tried to compile this and found errors.
1. line 121 missing a ")", I stuck it just before the THEN
2. line 68 truncated. I added "E[N +1,1] = "N" THEN SILENT = 1"

I moved back to lesser URL and found description in "C" code to help with above. These 
writers are aware of the truncations etc. The code in basic was put up UNCHANGED then 
they worked the "C" code with described algorithms.

I then switched the variables in subroutine call statement at line 1 (METAPH, NAME)
I then created a "I" descriptor  SUBR("MTAPHON", LNAME)
Viewed the items
Created and build and index in MTAPHON field.
I seems to work even with my "bad fix".

then I experimented by changing "4" to "6" in line 23
--FOR N = 1 TO L WHILE LEN(METAPH) < 6

Rich Sias, DBA
Keystone Mercy Health Plan
215-937-8860


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Behalf Of Ian McGowan
Sent: Tuesday, January 27, 2004 5:54 PM
To: U2 Users Discussion List
Subject: RE: looking for faster Ideas...


http://aspell.sourceforge.net/metaphone/metaphone.basic

soundex is pathetic - nowadays, metaphone is much better.

if you're feeling perl'ish

http://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0009.html

has an interesting discussion of using several approximate methods for
identifying records by name.  it even discusses the betty/elizabeth,
jack/john problem...  looks slow so you would probably have to cache the
results. c'mon there must be *something* unique in the file they send!
:-)

On Tue, 2004-01-27 at 14:32, George Gallen wrote:
> I thought of that, but soundex only works on the first three letters, if
> I remember correctly.
> or it only encodes the first three letters, then remaining are
> unchanged.
>  

--
CONFIDENTIALITY NOTICE: This electronic mail may contain information that is 
privileged, confidential, and/or otherwise protected from disclosure to anyone other 
than its intended recipient(s). Any dissemination or use of this electronic mail or 
its contents by persons other than the intended recipient(s) is strictly prohibited. 
If you have received this communication in error, please notify the sender immediately 
by reply e-mail so that we may correct our internal records. Please then delete the 
original message. Thank you.

==

___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-28 Thread Mike Rajkowski
Title: RE: looking for faster Ideas...









I might not have mad myself clear.  If you have 10,000 name that want to be
removed.  You put them into a hasfile, and then process though the csv
file, and attempt to read the item from the hash file based on the criteria ( i.e. Name )

 

A few read per line, if ordering does not
matter.

 

Otherwise you could potentially have to do
10,000 (multiple more if order matters) case statements, for each name.

 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of George Gallen
Sent: Tuesday, January 27, 2004 1:00 PM
To: 'U2 Users Discussion List'
Subject: RE: looking for faster
Ideas...

 



Mike, doing what you
propose would require a massive file to start with, and





would require a crap load
of disk reads, which would be far slower then a bunch





of cases, and the project
isn't worth that kind of investment anyway. But thanks.





 





the source line would
look something like





 





"","jon c
smith","1234 anywhere
st","","","somecity","SS","12345-1254",""





 





I'm looking for
"smith" & "12345" and sometimes "anywhere"





 





We may get a call from
john smith (john not jon because they





didn't spell their first
name), didn't leave their middle init and





didn't give us their 9
digit zip, only 5 digit zip.





 





So I can't build any
indexes. Searching for multiple pieces on the same line





   pretty 
much gives a fairly good matchup considing the source and match





   data aren't
EXACTLY the same.





 





Any of course, I'm not
going to go hog wild in doing this. Creating a temp





file, parsing into
dynamic arrays loops and lookups...way too much, rather





just use PERL to
pre-process.







-----Original Message-
From: Mike Rajkowski
[mailto:[EMAIL PROTECTED]
Sent: Tuesday,
 January 27, 2004 2:41 PM
To: U2 Users Discussion List
Subject: RE: looking for faster
Ideas...



Create a temp file, and
populate it with variations of the name in question (upcase and remove
spaces).  (Storing address
information in each record)

 

Then loop through your
list, taking the name, and parsing the various combinations of the words.  

( John David Doe  - JOHNDOE, DOEJOHN JOHNDAVIDDOE,
JOHNDOEDAVID)

 

And attempt to read the
item from the temp file, if it can read an item then verify the address
information.  Otherwise check the
next item.

 

 

-Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of George Gallen
Sent: Tuesday, January 27, 2004 12:13 PM
To: 'U2 Users Discussion List'
Subject: RE: looking for faster
Ideas...

 



in
rethinking my take on that. That would still be difficult





since
the arrays would only contain "parts" of the whole fields.





making
the searching of the arrays very difficult.





 





We can't
store the exact entry, since sometimes people will





call and
say stop sending me things and not give us the name





the same
way it's in the database we rent.





 





Basically
it takes the renting company a couple months to remove





the
name, but we like to filter it immediately to stop anything





from
going out before the renting company removes it, and it





also
will catch it if the renting company replaces it in a couple





months
later....





 





George





 





 -Original Message-
From: George Gallen
[mailto:[EMAIL PROTECTED]
Sent: Tuesday,
 January 27, 2004 2:06 PM
To: 'U2 Users Discussion List'
Subject: RE: looking for faster
Ideas...





I can't just check for names, it has to a name with a
specific zip code 
and if the name is fairly common,
we also add in part of the address to 
make sure no one else is weeded out
that shouldn't be. 

I suppose I could keep two or three arrays, do a
specific lookup in each 
saving the position, and if all
three positions are identicle (asuming all 
three arrays have the name,
address, zip in the same order) then that would 
be a matchThanks 

George 

>-Original Message- 
>From: Jeff Schasny [mailto:[EMAIL PROTECTED]]

>Sent: Tuesday, January 27, 2004
1:51 PM 
>To: U2 Users Discussion List

>Subject: RE: looking for faster
Ideas... 
> 
> 
>how about keeping a list of
excluded names as a record in a 
>file (or as a 
>flat file in a directory with
each name/item/whatever on a 
>line) and reading

>it into the program as a
dynamic array then doing a locate on 
>the string in 
>question.  Something like
this: 
> 
> 
>READ ALIST FROM AFILE,SOME-ID
ELSE STOP 
>X = 0 
>LOOP 
>   X += 1

>   ASTRING =
INLIST 
>UNTIL ASTRING = ''

>   LOCATE ASTRING IN
ALIST SETTING POS THEN 
> 
DO 
> 
OTHER 
> 
STUFF 
>

RE: looking for faster Ideas...

2004-01-28 Thread Hamlin, Steve
Title: RE: looking for faster Ideas...




How 
about using *nix sort and comm based on a like-structured csv reference 
file to produce a sub-file of possible hits, then trawl this output 
using D3/UD to refine the list of unwanted rows (building back into a flat file) 
and then again using comm to produce your cleaned output 
file.
 
Cuts 
down the file size you'll need to process in mv basic 
 
Cheers
 
Steve

  -Original Message-From: George Gallen 
  [mailto:[EMAIL PROTECTED]Sent: 27 January 2004 
  20:04To: 'U2 Users Discussion List'Subject: RE: looking 
  for faster Ideas...
  keep 
  in mind, it's not the renting company that is
  giveing us the remove infomation, it's the 
consumer,
  and 
  of course they never have the mailing piece in 
  their hand. Although usually, if they call, we can 
  get
  the 
  specific info we are looking for which can change
  the 
  case to one check.
   
  But 
  when the info is mailed in or emailed in or left on
  a 
  voice mail, that's when we run into not having the
  best 
  data to go with. Calling/emailing/mailing them
  back 
  usually just increases the annoyance level on
  their end, since we are contacting them Again..
   
  George
  
-Original Message-From: George Gallen 
[mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 2004 2:51 
PMTo: 'U2 Users Discussion List'Subject: RE: looking 
for faster Ideas...
sometimes there is a number, but rarely, are we given 
the number when requested to remove, usually just 
remove me from your $^&#^$*&$ mailing :) some add 
please. 
I considered PERL as a pre-processor to remove the 
names then pass that file to my program which does 
other stuff too 
George 
>-Original Message- >From: Ian McGowan [mailto:[EMAIL PROTECTED]] 
>Sent: Tuesday, January 27, 2004 2:22 PM 
    >To: U2 Users Discussion List >Subject: RE: looking for faster Ideas... > > >if 
speed is the issue, sounds like a job for a compiled lanuage. or 
>semi-compiled like perl or python. > >is there a unique number sent over 
by the other system?  it might be >quicker 
to parse the whole thing and keep an exclude file keyed off the 
>unique number.  if it weren't for embedded comma's 
you could >CONVERT "," >TO @AM, extract the key and write the record out as-is.  
that would be >quicker than 852 INDEX's 
:-) > >On Tue, 
2004-01-27 at 11:05, George Gallen wrote: >> I 
can't just check for names, it has to a name with a >specific zip code >> and if the 
name is fairly common, we also add in part of the >address to >> make sure no one else 
is weeded out that shouldn't be. >> 
>> I suppose I could keep two or three arrays, 
do a specific >lookup in each >> >> saving the position, and if 
all three positions are >identicle 
(asuming >> all >> three arrays have the name, address, zip in the same order) 
then that >> would >> be a matchThanks >> 
>> George >> 
    >> >-Original Message- 
>> >From: Jeff Schasny [ mailto:[EMAIL PROTECTED] 
>> <mailto:[EMAIL PROTECTED]> ] 
>> >Sent: Tuesday, January 27, 2004 1:51 PM 
>> >To: U2 Users Discussion List 
>> >Subject: RE: looking for faster 
Ideas... >> > >> > >> >how about keeping 
a list of excluded names as a record in a >> 
>file (or as a >> >flat file in a 
directory with each name/item/whatever on a >> 
>line) and reading >> >it into the 
program as a dynamic array then doing a locate on >> >the string in >> 
>question.  Something like this: >> 
> >> > >> 
>READ ALIST FROM AFILE,SOME-ID ELSE STOP >> 
>X = 0 >> >LOOP >> >   X += 1 >> 
>   ASTRING = INLIST >> 
>UNTIL ASTRING = '' >> >   
LOCATE ASTRING IN ALIST SETTING POS THEN >> 
>  DO >> 
>  OTHER >> 
>  STUFF >> 
>   END ELSE >> 
>  DONT >> 
>   END >> >REPEAT 
>> > >> 
>Of course of you really want speed then sort the list and use 
>> >a "BY clause >> >in the locate >> > 
>> >-Original Message- 
>> >From: George Gallen [ mailto:[EMAIL PROTECTED] 
>> <mailto:[EMAIL PROTECTED]> ] 
>> >Sent: Tuesday, January 27, 2004 11:33 
AM >> >To: 'Ardent List' >> >Subject: looking for faster Ideas... >> > >> > >

RE: looking for faster Ideas...

2004-01-28 Thread George Gallen
Title: RE: looking for faster Ideas...



I like 
this idea.
 
Thanks
George

  -Original Message-From: Anthony Youngman 
  [mailto:[EMAIL PROTECTED]Sent: Wednesday, 
  January 28, 2004 3:09 AMTo: U2 Users Discussion 
  ListSubject: RE: looking for faster Ideas...
  You may find contacting them again isn't the 
  annoyance you expect. People tend to get annoyed if they think they're dealing 
  with a computer. Get a genuine person get back to them and say "yes, we're 
  trying to fix this for you", and you've just turned someone from being anti 
  into being a prospective customer.
   
  Anyways, my take (to save on all this CASEing ...) - 
  I'd use MATREAD rather than Matt's choice of READ, and ... can you preprocess 
  on the basis of, say, zip code? Have an MV file containing all the records you 
  want excluded or matchcodes thereof.
   
  Let's say, John Smith of AB12345 contacts you 
  and says "take me off your list". You check, and his record has the correct 
  zip code in the CSV. So you edit your MV file, and discover that Will Carling 
  also told you to take him off some while back.
   
  ED EXCLUDEFILE AB12345
  -: P
  1: *WILL*CARLING*
  -: I *JOHN*SMITH*
  2: *JOHN*SMITH*
  -: FI
   
  So now, when you're processing your CSV, from each 
  record you can do
   
  extract zip code
  read zip-code-record from EXCLUDEFILE else record is 
  okay
  if record matches LOWER(zip-code-record) else record 
  is okay
  get next record
   
  Gets rid of reams of case statements, saves you 
  having to rewrite the program every time, and is fast because most records 
  will be validated on a single (failed) MV read.
   
  Cheers,
  Wol
  
  
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of George 
  GallenSent: 27 January 2004 20:04To: 'U2 Users 
  Discussion List'Subject: RE: looking for faster 
  Ideas...
  
   
  But 
  when the info is mailed in or emailed in or left on
  a 
  voice mail, that's when we run into not having the
  best 
  data to go with. Calling/emailing/mailing them
  back 
  usually just increases the annoyance level on
  their end, since we are contacting them Again..
   
  George***This 
  transmission is intended for the named recipient only. It may contain private 
  and confidential information. If this has come to you in error you must not 
  act on anything disclosed in it, nor must you copy it, modify it, disseminate 
  it in any way, or show it to anyone. Please e-mail the sender to inform us of 
  the transmission error or telephone ECA International immediately and delete 
  the e-mail from your information system.Telephone numbers for ECA 
  International offices are: Sydney +61 (0)2 9911 7799, Hong Kong + 852 2121 
  2388, London +44 (0)20 7351 5000 and New York +1 212 582 
  2333.***
___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-28 Thread George Gallen
Title: RE: looking for faster Ideas...



intresting.
 
George

  -Original Message-From: Stuart Boydell 
  [mailto:[EMAIL PROTECTED]Sent: Wednesday, January 28, 
  2004 2:43 AMTo: U2 Users Discussion ListSubject: RE: 
  looking for faster Ideas...
  Maybe something like this fuzzy text string searcher might work 
  for you
  http://www.pmsi.fr/fuzstrng.htm
   
  
-Original Message-From: 
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf 
Of George GallenSent: Wednesday, 28 January 2004 
09:33To: 'U2 Users Discussion List'Subject: RE: 
    looking for faster Ideas...
I 
thought of that, but soundex only works on the first three letters, if I 
remember correctly.
or 
it only encodes the first three letters, then remaining are 
unchanged.
 
The main problem is I can't isolate a last name from the source, it 
comes in as a full name,
and if I use the full name as given to us by the consumer, there is a 
chance it won't be in
the same exact format as in the file from the rental, might be 
missing the middle initial
one may have a married hyphenated name, one could be a shortened or 
different first name
(ie. betty instead of elizabeth, or jack instead 
john..etc).
 
Since my original was a list of if/thens, looks like the I'm not 
going to be able to gain much
in 
speed any other way with straight programming (that is no temp files, or 
files to bounce off).
 
George**This 
  email message and any files transmitted with it are confidentialand 
  intended solely for the use of addressed recipient(s). If you have 
  received this email in error please notify the Spotless IS Support Centre 
  (61 3 9269 7555) immediately who will advise further action.This 
  footnote also confirms that this email message has been scannedfor the 
  presence of computer 
  viruses.**
___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-28 Thread George Gallen
Title: RE: looking for faster Ideas...





I'm going to have to read this one over a few times.
My brain hurt thinking about it :)


Thanks
George


>-Original Message-
>From: Craig Bennett [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 6:21 PM
>To: U2 Users Discussion List
>Subject: Re: looking for faster Ideas...
>
>
>George,
>
>I don't know if this will help you, but part of the problem with a CASE
>statement is that every statement is tested until you have a 
>match and EVERY
>statement is tested if there is no match. If you don't have a 
>large number
>to remove, this can get very wasteful.
>
>When I need to parse some data and I need to do it fast (and I 
>don't care
>that I may write a very long tedious program (sometime I even write a
>program to build the final program)) I find that a state 
>machine model with
>computed gosubs based on ASCII character numbers can be quicker.
>
>I started writing the code below, but then remembered I had to work :(
>
>The basic Idea is to only test the characters you need and to 
>test them one
>by one where each letter in a match is another internal subroutine eg:
>
>LOOP WHILE POS LE (DATALEN - MATCHLEN) DO
>    * Just match A-Z and we are only looking for names 
>starting with A and T
>    CHARCODE = SEQ(MYDATA[POS, 1])    ;* Under UV 
>BYTEVAL(MYDATA, POS)
>is MUCH quicker.
>    ON CHARCODE + 64 GOSUB NOMATCH,
>    FIRSTCHARA,
>    
>NOMATCH,    ;* B
>    
>NOMATCH,    ;* C
>    
>NOMATCH,    ;* D
>    
>NOMATCH,    ;* E
>    
>NOMATCH,    ;* F
>    
>NOMATCH,    ;* G
>    
>NOMATCH,    ;* H
>    
>NOMATCH,    ;* I
>    
>NOMATCH,    ;* J
>    
>NOMATCH,    ;* K
>    
>NOMATCH,    ;* L
>    
>NOMATCH,    ;* M
>    
>NOMATCH,    ;* N
>    
>NOMATCH,    ;* O
>    
>NOMATCH,    ;* P
>    
>NOMATCH,    ;* Q
>    
>NOMATCH,    ;* R
>    
>NOMATCH,    ;* S
>    FIRSTCHART,
>    
>NOMATCH,    ;* U
>    
>NOMATCH,    ;* V
>    
>NOMATCH,    ;* W
>    
>NOMATCH,    ;* X
>    
>NOMATCH,    ;* Y
>    
>NOMATCH,    ;* Z
>    NOMATCH
>
>REPEAT
>
>NOMATCH:
>    * Set a flag to false
>    MATCH.NAME = 0
>RETURN
>
>FIRSTCHARA:
>    POS += 1
>    CHARCODE = SEQ(MYDATA[POS, 1])
>    ON CHARCODE + 64 GOSUB NOMATCH,
>    
>NOMATCH,    ;* A
>    SECONDCHARB,
>    
>NOMATCH,    ;* C
>    
>NOMATCH,    ;* D
>    
>NOMATCH,    ;* E
>    
>NOMATCH,    ;* F
>    
>NOMATCH,    ;* G
>    
>NOMATCH,    ;* H
>    
>NOMATCH,    ;* I
>    
>NOMATCH,    ;* J
>    
>NOMATCH,    ;* K
>    
>NOMATCH,    ;* L
>    
>NOMA

RE: looking for faster Ideas...

2004-01-28 Thread George Gallen
Title: RE: looking for faster Ideas...





True putting the first check in the case, then checking the 2nd and 3rd...
   in the body of the case at first sounds good...but if it fails on the
   2nd or 3rd check in the body of the case, it will no longer check any
   other cases, since it had a positive case found, so I have to have
   all all checks on the case line (does that make any sense?)


As for the 2k blocks. If all this program did was weed out names, you
   are right, that would be a better way to go. However, it also does 
   other things to each line (like put in our own unique mailing code
   for nixie-returns) for all those that aren't supposed to get kicked out.


George


>-Original Message-
>From: Tony Wood [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 5:28 PM
>To: U2 Users Discussion List
>Subject: Re: looking for faster Ideas...
>
>
>Hi George,
>
>We some processing through files of 5-4Mb in D3 and UniVerse. 
>We found one
>of the quicker to process these files was to read about 2k of data at a
>time. You would need to identify the last complete line work 
>with everything
>before that keeping the last bit for then next processing chunk.
>
>As far as finding matches you have one, two or three pieces of 
>data to match
>on.So start with one if you score a match then look further. This will
>reduce your processing to quickly find anything that might 
>match rather than
>having to match on everything for every line. Processing in 2k 
>chunks also
>means you can index for "SMITH" and find none quickly rather 
>than processing
>each line looking for "SMITH" + "" and "SMITH" + "MENERE ST".
>
>I would avoid using index on a line by line basis. I would 
>also look at what
>information you usually get and consider using a record where 
>the item id is
>the key search string. Where you have more than one out-opter 
>you can then
>use either multi-values or attributes to contain the other 
>search criteria.
>
>Sounds a little complicated but it breaks the job into smaller 
>chunks to be
>resolved and will require less processing in the long run I believe.
>
>Good luck
>
>T.
>
>- Original Message - 
>From: "George Gallen" <[EMAIL PROTECTED]>
>To: "'Ardent List'" <[EMAIL PROTECTED]>
>Sent: Wednesday, January 28, 2004 5:33 AM
>Subject: looking for faster Ideas...
>
>
>> I can't setup any indexs to speed this up. Basically I'm 
>scanning a CSV
>file
>> for names to remove
>>    and set the flag of KICK=1 to remove it (creating a new 
>CSV file at the
>> same time).
>>
>> Keep in mind the ".." are people's last names, or zip codes, 
>or part of
>> their address, changed
>> them to ".." to protect the unwanting...
>>
>> Right now, I do a series of CASE's ...
>> Now, it's not a major problem as I'm only checking for 20 or 
>so names, but
>> as more and more people
>>   request to be removed (and we don't have access to the 
>creation of the
>> list). this could get quite
>>   slow over 50 or 60 thousand lines of checking.
>>
>> LIN is one line of the CSV file, the INDEX is checking for a 
>last name & a
>> zip code and sometimes
>>    part of the address line.
>>
>> Any Ideas?
>>
>> Remember, we can't change the source of the file, it will 
>always be a CSV,
>> being read line by line
>>
>>    KICK=0
>>    BEGIN CASE
>>   CASE -1
>>  KICK=1
>> BEGIN CASE
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>> INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>> INDEX(LIN,"..",1)#0
>> CASE IND

RE: looking for faster Ideas...

2004-01-28 Thread Anthony Youngman
Title: RE: looking for faster Ideas...



You may find contacting them again isn't the annoyance 
you expect. People tend to get annoyed if they think they're dealing with a computer. Get a genuine person get back to them and say "yes, we're trying to 
fix this for you", and you've just turned someone from being anti into being a 
prospective customer.
 
Anyways, my take (to save on all this CASEing ...) - 
I'd use MATREAD rather than Matt's choice of READ, and ... can you preprocess on 
the basis of, say, zip code? Have an MV file containing all the records you want 
excluded or matchcodes thereof.
 
Let's say, John Smith of AB12345 contacts you and 
says "take me off your list". You check, and his record has the correct zip code 
in the CSV. So you edit your MV file, and discover that Will Carling also told 
you to take him off some while back.
 
ED EXCLUDEFILE AB12345
-: P
1: *WILL*CARLING*
-: I *JOHN*SMITH*
2: *JOHN*SMITH*
-: FI
 
So now, when you're processing your CSV, from each 
record you can do
 
extract zip code
read zip-code-record from EXCLUDEFILE else record is 
okay
if record matches LOWER(zip-code-record) else record is 
okay
get next record
 
Gets rid of reams of case statements, saves you having 
to rewrite the program every time, and is fast because most records will be validated on a single (failed) MV read.
 
Cheers,
Wol


From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of George 
GallenSent: 27 January 2004 20:04To: 'U2 Users Discussion 
List'Subject: RE: looking for faster Ideas...

 
But 
when the info is mailed in or emailed in or left on
a 
voice mail, that's when we run into not having the
best 
data to go with. Calling/emailing/mailing them
back 
usually just increases the annoyance level on
their 
end, since we are contacting them Again..
 
George

***

This transmission is intended for the named recipient only. It may contain private and confidential information. If this has come to you in error you must not act on anything disclosed in it, nor must you copy it, modify it, disseminate it in any way, or show it to anyone. Please e-mail the sender to inform us of the transmission error or telephone ECA International immediately and delete the e-mail from your information system.

Telephone numbers for ECA International offices are: Sydney +61 (0)2 9911 7799, Hong Kong + 852 2121 2388, London +44 (0)20 7351 5000 and New York +1 212 582 2333.

***


___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-28 Thread Stuart Boydell
Title: RE: looking for faster Ideas...
ï


Maybe something like this fuzzy text string searcher might work 
for you
http://www.pmsi.fr/fuzstrng.htm
 

  -Original Message-From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED]On Behalf Of George 
  GallenSent: Wednesday, 28 January 2004 09:33To: 'U2   Users Discussion List'Subject: RE: looking for faster 
  Ideas...
  I 
  thought of that, but soundex only works on the first three letters, if I   remember correctly.
  or 
  it only encodes the first three letters, then remaining are 
  unchanged.
   
  The 
  main problem is I can't isolate a last name from the source, it comes in as a 
  full name,
  and 
  if I use the full name as given to us by the consumer, there is a chance it 
  won't be in
  the 
  same exact format as in the file from the rental, might be missing the middle 
  initial
  one 
  may have a married hyphenated name, one could be a shortened or different   first name
  (ie. 
  betty instead of elizabeth, or jack instead 
john..etc).
   
  Since my original was a list of if/thens, looks like the I'm not going 
  to be able to gain much
  in 
  speed any other way with straight programming (that is no temp files, or files 
  to bounce off).
   
  George

**
This email message and any files transmitted with it are confidential
and intended solely for the use of addressed recipient(s). If you have 
received this email in error please notify the Spotless IS Support Centre (61 3 9269 7555) immediately who will advise further action.

This footnote also confirms that this email message has been scanned
for the presence of computer viruses.
**


___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


Re: looking for faster Ideas...

2004-01-28 Thread Mats Carlid
George,

my guess is that You use sequential IO on the CSV-file and that is what 
eats time.
If You have memory enough to read the entire file inte memory and  
REMOVE lines
instead of  READSEQ them You'll see a _dramatic_ performance increase.

Else split  Your  case into say four   cases  of ruoghly the same 
length  with 
all lastnames  sorting before  say  'F'  in the first one  and those 
between  F and  M in the
second  etc .  This approach will reduce the time in the  CASE 
constructs by a factor four.

Wich of course may turn out to be only some percent of total time  :-(

/Mats

George Gallen wrote:

I can't setup any indexs to speed this up. Basically I'm scanning a CSV file
for names to remove
  and set the flag of KICK=1 to remove it (creating a new CSV file at the
same time).
Keep in mind the ".." are people's last names, or zip codes, or part of
their address, changed
them to ".." to protect the unwanting...
Right now, I do a series of CASE's ...
Now, it's not a major problem as I'm only checking for 20 or so names, but
as more and more people
 request to be removed (and we don't have access to the creation of the
list). this could get quite
 slow over 50 or 60 thousand lines of checking.
LIN is one line of the CSV file, the INDEX is checking for a last name & a
zip code and sometimes
  part of the address line.
Any Ideas?

Remember, we can't change the source of the file, it will always be a CSV,
being read line by line
  KICK=0
  BEGIN CASE
 CASE -1
KICK=1
	 BEGIN CASE
   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
INDEX(LIN,"..",1)#0
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
INDEX(LIN,"..",1)#0 
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
	CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
	CASE -1
	   KICK=0
	 END CASE
  END CASE

George Gallen
Senior Programmer/Analyst
Accounting/Data Division
[EMAIL PROTECTED]
ph:856.848.1000 Ext 220
SLACK Incorporated - An innovative information, education and management
company
http://www.slackinc.com
___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users
 



___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


Re: looking for faster Ideas...

2004-01-27 Thread Craig Bennett
George,

I don't know if this will help you, but part of the problem with a CASE
statement is that every statement is tested until you have a match and EVERY
statement is tested if there is no match. If you don't have a large number
to remove, this can get very wasteful.

When I need to parse some data and I need to do it fast (and I don't care
that I may write a very long tedious program (sometime I even write a
program to build the final program)) I find that a state machine model with
computed gosubs based on ASCII character numbers can be quicker.

I started writing the code below, but then remembered I had to work :(

The basic Idea is to only test the characters you need and to test them one
by one where each letter in a match is another internal subroutine eg:

LOOP WHILE POS LE (DATALEN - MATCHLEN) DO
* Just match A-Z and we are only looking for names starting with A and T
CHARCODE = SEQ(MYDATA[POS, 1]);* Under UV BYTEVAL(MYDATA, POS)
is MUCH quicker.
ON CHARCODE + 64 GOSUB NOMATCH,
FIRSTCHARA,
NOMATCH,;* B
NOMATCH,;* C
NOMATCH,;* D
NOMATCH,;* E
NOMATCH,;* F
NOMATCH,;* G
NOMATCH,;* H
NOMATCH,;* I
NOMATCH,;* J
NOMATCH,;* K
NOMATCH,;* L
NOMATCH,;* M
NOMATCH,;* N
NOMATCH,;* O
NOMATCH,;* P
NOMATCH,;* Q
NOMATCH,;* R
NOMATCH,;* S
FIRSTCHART,
NOMATCH,;* U
NOMATCH,;* V
NOMATCH,;* W
NOMATCH,;* X
NOMATCH,;* Y
NOMATCH,;* Z
NOMATCH

REPEAT

NOMATCH:
* Set a flag to false
MATCH.NAME = 0
RETURN

FIRSTCHARA:
POS += 1
CHARCODE = SEQ(MYDATA[POS, 1])
ON CHARCODE + 64 GOSUB NOMATCH,
NOMATCH,;* A
SECONDCHARB,
NOMATCH,;* C
NOMATCH,;* D
NOMATCH,;* E
NOMATCH,;* F
NOMATCH,;* G
NOMATCH,;* H
NOMATCH,;* I
NOMATCH,;* J
NOMATCH,;* K
NOMATCH,;* L
NOMATCH,;* M
NOMATCH,;* N
NOMATCH,;* O
NOMATCH,;* P
NOMATCH,;* Q
NOMATCH,;* R
NOMATCH,;* S
SECONDCHART,
NOMATCH,;* U
NOMATCH,;* V
NOMATCH,;* W
NOMATCH,;* X
NOMATCH,;* Y
 

Re: looking for faster Ideas...

2004-01-27 Thread Tony Wood
Hi George,

We some processing through files of 5-4Mb in D3 and UniVerse. We found one
of the quicker to process these files was to read about 2k of data at a
time. You would need to identify the last complete line work with everything
before that keeping the last bit for then next processing chunk.

As far as finding matches you have one, two or three pieces of data to match
on.So start with one if you score a match then look further. This will
reduce your processing to quickly find anything that might match rather than
having to match on everything for every line. Processing in 2k chunks also
means you can index for "SMITH" and find none quickly rather than processing
each line looking for "SMITH" + "" and "SMITH" + "MENERE ST".

I would avoid using index on a line by line basis. I would also look at what
information you usually get and consider using a record where the item id is
the key search string. Where you have more than one out-opter you can then
use either multi-values or attributes to contain the other search criteria.

Sounds a little complicated but it breaks the job into smaller chunks to be
resolved and will require less processing in the long run I believe.

Good luck

T.

- Original Message - 
From: "George Gallen" <[EMAIL PROTECTED]>
To: "'Ardent List'" <[EMAIL PROTECTED]>
Sent: Wednesday, January 28, 2004 5:33 AM
Subject: looking for faster Ideas...


> I can't setup any indexs to speed this up. Basically I'm scanning a CSV
file
> for names to remove
>and set the flag of KICK=1 to remove it (creating a new CSV file at the
> same time).
>
> Keep in mind the ".." are people's last names, or zip codes, or part of
> their address, changed
> them to ".." to protect the unwanting...
>
> Right now, I do a series of CASE's ...
> Now, it's not a major problem as I'm only checking for 20 or so names, but
> as more and more people
>   request to be removed (and we don't have access to the creation of the
> list). this could get quite
>   slow over 50 or 60 thousand lines of checking.
>
> LIN is one line of the CSV file, the INDEX is checking for a last name & a
> zip code and sometimes
>part of the address line.
>
> Any Ideas?
>
> Remember, we can't change the source of the file, it will always be a CSV,
> being read line by line
>
>KICK=0
>BEGIN CASE
>   CASE -1
>  KICK=1
> BEGIN CASE
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
> INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
> INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> CASE -1
>KICK=0
> END CASE
>END CASE
>
> George Gallen
> Senior Programmer/Analyst
> Accounting/Data Division
> [EMAIL PROTECTED]
> ph:856.848.1000 Ext 220
>
> SLACK Incorporated - An innovative information, education and management
> company
> http://www.slackinc.com
>
> ___
> u2-users mailing list
> [EMAIL PROTECTED]
> http://www.oliver.com/mailman/listinfo/u2-users
>
>


___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-27 Thread Ian McGowan
http://aspell.sourceforge.net/metaphone/metaphone.basic

soundex is pathetic - nowadays, metaphone is much better.

if you're feeling perl'ish

http://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0009.html

has an interesting discussion of using several approximate methods for
identifying records by name.  it even discusses the betty/elizabeth,
jack/john problem...  looks slow so you would probably have to cache the
results. c'mon there must be *something* unique in the file they send!
:-)

On Tue, 2004-01-27 at 14:32, George Gallen wrote:
> I thought of that, but soundex only works on the first three letters, if
> I remember correctly.
> or it only encodes the first three letters, then remaining are
> unchanged.
>  
> The main problem is I can't isolate a last name from the source, it
> comes in as a full name,
> and if I use the full name as given to us by the consumer, there is a
> chance it won't be in
> the same exact format as in the file from the rental, might be missing
> the middle initial
> one may have a married hyphenated name, one could be a shortened or
> different first name
> (ie. betty instead of elizabeth, or jack instead john..etc).
>  
> Since my original was a list of if/thens, looks like the I'm not going
> to be able to gain much
> in speed any other way with straight programming (that is no temp files,
> or files to bounce off).
>  
> George
> 
> -Original Message-
> From: Jeff Schasny [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 27, 2004 5:12 PM
> To: U2 Users Discussion List
> Subject: RE: looking for faster Ideas...
> 
> 
> I suppose you could soundex the whole thing
> 
> -Original Message-
> From: Geoffrey Mitchell [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 27, 2004 2:59 PM
> To: U2 Users Discussion List
> Subject: RE: looking for faster Ideas...
> 
> 
> We do something like this, using a "match code" composed of fragments of
> data concatenated together.  I think we use a delimiter, but you
> wouldn't need to.
> 
> So, if you want to match Johnson in zipcode 12345 on Maple street, you
> might have a matchcode of "JOHNSON*12345*MAPLE", so you would extract
> the relevant fields, build the matchcode and check it against a list or
> file.  Actually, we use an I-type dictionary to generate the matchcode,
> and have an index built on it.  For small datasets this may be *slower*
> than your case statement, but I would think that it would be easier to
> maintain, and for large datasets it should be quicker since the time to
> construct the matchcode and do a read, selectindex, or whatever would be
> constant.  Of course, if you have a Jonsson that gets spelled Johnson,
> you're going to have problems no matter how you approach it.
> 
> On Tue, 2004-01-27 at 13:05, George Gallen wrote: 
> 
> I can't just check for names, it has to a name with a specific zip code
> and if the name is fairly common, we also add in part of the address to
> make sure no one else is weeded out that shouldn't be.
> 
> I suppose I could keep two or three arrays, do a specific lookup in each
> saving the position, and if all three positions are identicle (asuming
> all
> three arrays have the name, address, zip in the same order) then that
> would
> be a matchThanks
> 
> George
> 
> >-Original Message-
> >From: Jeff Schasny [  <mailto:[EMAIL PROTECTED]>
> mailto:[EMAIL PROTECTED]
> >Sent: Tuesday, January 27, 2004 1:51 PM
> >To: U2 Users Discussion List
> >Subject: RE: looking for faster Ideas...
> >
> >
> >how about keeping a list of excluded names as a record in a 
> >file (or as a
> >flat file in a directory with each name/item/whatever on a 
> >line) and reading
> >it into the program as a dynamic array then doing a locate on 
> >the string in
> >question.  Something like this:
> >
> >
> >READ ALIST FROM AFILE,SOME-ID ELSE STOP
> >X = 0
> >LOOP
> >   X += 1
> >   ASTRING = INLIST
> >UNTIL ASTRING = ''
> >   LOCATE ASTRING IN ALIST SETTING POS THEN
> >  DO
> >  OTHER
> >  STUFF
> >   END ELSE
> >  DONT
> >   END
> >REPEAT
> >
> >Of course of you really want speed then sort the list and use 
> >a "BY clause
> >in the locate
> >
> >-Original Message-
> >From: George Gallen [  <mailto:[EMAIL PROTECTED]>
> mailto:[EMAIL PROTECTED]
> >Sent: Tuesday, January 27, 2004 11:33 AM
> >To: 'Ardent List'
> >Subject: looking for faster Ideas...
> >
> >
> >I can't setup any indexs to s

RE: looking for faster Ideas...

2004-01-27 Thread Jeff Schasny
Title: RE: looking for faster Ideas...



More 
than you could have ever possibly wanted to know about 
soundex:
 
http://www.avotaynu.com/soundex.html

  -Original Message-From: George Gallen 
  [mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 2004 3:33 
  PMTo: 'U2 Users Discussion List'Subject: RE: looking for 
  faster Ideas...
  I 
  thought of that, but soundex only works on the first three letters, if I 
  remember correctly.
  or 
  it only encodes the first three letters, then remaining are 
  unchanged.
   
  The 
  main problem is I can't isolate a last name from the source, it comes in as a 
  full name,
  and 
  if I use the full name as given to us by the consumer, there is a chance it 
  won't be in
  the 
  same exact format as in the file from the rental, might be missing the middle 
  initial
  one 
  may have a married hyphenated name, one could be a shortened or different 
  first name
  (ie. 
  betty instead of elizabeth, or jack instead 
john..etc).
   
  Since my original was a list of if/thens, looks like the I'm not going 
  to be able to gain much
  in 
  speed any other way with straight programming (that is no temp files, or files 
  to bounce off).
   
  George
  
-Original Message-From: Jeff Schasny 
[mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 2004 
5:12 PMTo: U2 Users Discussion ListSubject: RE: 
    looking for faster Ideas...
I 
suppose you could soundex the whole thing

  -Original Message-From: Geoffrey Mitchell 
  [mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 2004 
  2:59 PMTo: U2 Users Discussion ListSubject: RE: 
  looking for faster Ideas...We do something like this, 
  using a "match code" composed of fragments of data concatenated 
  together.  I think we use a delimiter, but you wouldn't need 
  to.So, if you want to match Johnson in zipcode 12345 on Maple 
  street, you might have a matchcode of "JOHNSON*12345*MAPLE", so you would 
  extract the relevant fields, build the matchcode and check it against a 
  list or file.  Actually, we use an I-type dictionary to generate the 
  matchcode, and have an index built on it.  For small datasets this 
  may be *slower* than your case statement, but I would think that it would 
  be easier to maintain, and for large datasets it should be quicker since 
  the time to construct the matchcode and do a read, selectindex, or 
  whatever would be constant.  Of course, if you have a Jonsson that 
  gets spelled Johnson, you're going to have problems no matter how you 
  approach it.On Tue, 2004-01-27 at 13:05, George Gallen wrote: 
  I can't just check 
for names, it has to a name with a specific zip codeand if the name 
is fairly common, we also add in part of the address tomake sure no 
one else is weeded out that shouldn't be.I suppose I could keep two 
or three arrays, do a specific lookup in eachsaving the position, 
and if all three positions are identicle (asuming allthree arrays 
have the name, address, zip in the same order) then that wouldbe a 
matchThanksGeorge>-Original 
Message->From: Jeff Schasny [mailto:[EMAIL PROTECTED]]>Sent: Tuesday, January 27, 2004 1:51 PM>To: U2 
Users Discussion List>Subject: RE: looking for faster 
Ideas...>>>how about keeping a list of excluded 
names as a record in a >file (or as a>flat file in a 
directory with each name/item/whatever on a >line) and 
reading>it into the program as a dynamic array then doing a 
locate on >the string in>question.  Something like 
this:>>>READ ALIST FROM AFILE,SOME-ID ELSE 
STOP>X = 0>LOOP>   X += 
1>   ASTRING = INLIST>UNTIL ASTRING = 
''>   LOCATE ASTRING IN ALIST SETTING POS 
THEN>  
DO>  
OTHER>  STUFF>   
END ELSE>  DONT>   
END>REPEAT>>Of course of you really want speed then 
sort the list and use >a "BY clause>in the 
locate>>-Original Message->From: George 
Gallen [mailto:[EMAIL PROTECTED]]>Sent: Tuesday, January 27, 2004 11:33 AM>To: 
'Ardent List'>Subject: looking for faster 
Ideas...>>>I can't setup any indexs to speed this 
up. Basically I'm >scanning a CSV file>for names to 
remove>   and set the flag of KICK=1 to remove it 
(creating a new CSV >file at the>same 
time).>>Keep in mind the ".." are people's last names, or 
zip codes, or part of>their address, changed>them to ".." 
to protect the unwanting...>>Right n

RE: looking for faster Ideas...

2004-01-27 Thread George Gallen
Title: RE: looking for faster Ideas...



I 
thought of that, but soundex only works on the first three letters, if I 
remember correctly.
or it 
only encodes the first three letters, then remaining are 
unchanged.
 
The 
main problem is I can't isolate a last name from the source, it comes in as a 
full name,
and if 
I use the full name as given to us by the consumer, there is a chance it won't 
be in
the 
same exact format as in the file from the rental, might be missing the middle 
initial
one 
may have a married hyphenated name, one could be a shortened or different first 
name
(ie. 
betty instead of elizabeth, or jack instead john..etc).
 
Since 
my original was a list of if/thens, looks like the I'm not going to be able to 
gain much
in 
speed any other way with straight programming (that is no temp files, or files 
to bounce off).
 
George

  -Original Message-From: Jeff Schasny 
  [mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 2004 5:12 
  PMTo: U2 Users Discussion ListSubject: RE: looking for 
  faster Ideas...
  I 
  suppose you could soundex the whole thing
  
-Original Message-From: Geoffrey Mitchell 
[mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 2004 2:59 
PMTo: U2 Users Discussion ListSubject: RE: looking for 
    faster Ideas...We do something like this, using a 
"match code" composed of fragments of data concatenated together.  I 
think we use a delimiter, but you wouldn't need to.So, if you want 
to match Johnson in zipcode 12345 on Maple street, you might have a 
matchcode of "JOHNSON*12345*MAPLE", so you would extract the relevant 
fields, build the matchcode and check it against a list or file.  
Actually, we use an I-type dictionary to generate the matchcode, and have an 
index built on it.  For small datasets this may be *slower* than your 
case statement, but I would think that it would be easier to maintain, and 
for large datasets it should be quicker since the time to construct the 
matchcode and do a read, selectindex, or whatever would be constant.  
Of course, if you have a Jonsson that gets spelled Johnson, you're going to 
have problems no matter how you approach it.On Tue, 2004-01-27 at 
13:05, George Gallen wrote: 
I can't just check 
  for names, it has to a name with a specific zip codeand if the name is 
  fairly common, we also add in part of the address tomake sure no one 
  else is weeded out that shouldn't be.I suppose I could keep two or 
  three arrays, do a specific lookup in eachsaving the position, and if 
  all three positions are identicle (asuming allthree arrays have the 
  name, address, zip in the same order) then that wouldbe a 
  matchThanksGeorge>-Original 
  Message->From: Jeff Schasny [mailto:[EMAIL PROTECTED]]>Sent: Tuesday, January 27, 2004 1:51 PM>To: U2 
  Users Discussion List>Subject: RE: looking for faster 
  Ideas...>>>how about keeping a list of excluded names 
  as a record in a >file (or as a>flat file in a directory 
  with each name/item/whatever on a >line) and reading>it into 
  the program as a dynamic array then doing a locate on >the string 
  in>question.  Something like this:>>>READ 
  ALIST FROM AFILE,SOME-ID ELSE STOP>X = 
  0>LOOP>   X += 1>   ASTRING = 
  INLIST>UNTIL ASTRING = ''>   LOCATE 
  ASTRING IN ALIST SETTING POS THEN>  
  DO>  
  OTHER>  STUFF>   END 
  ELSE>  DONT>   
  END>REPEAT>>Of course of you really want speed then 
  sort the list and use >a "BY clause>in the 
  locate>>-Original Message->From: George 
  Gallen [mailto:[EMAIL PROTECTED]]>Sent: Tuesday, January 27, 2004 11:33 AM>To: 
  'Ardent List'>Subject: looking for faster 
  Ideas...>>>I can't setup any indexs to speed this up. 
  Basically I'm >scanning a CSV file>for names to 
  remove>   and set the flag of KICK=1 to remove it 
  (creating a new CSV >file at the>same 
  time).>>Keep in mind the ".." are people's last names, or 
  zip codes, or part of>their address, changed>them to ".." to 
  protect the unwanting...>>Right now, I do a series of CASE's 
  ...>Now, it's not a major problem as I'm only checking for 20 or 
  >so names, but>as more and more people>  request 
  to be removed (and we don't have access to the >creation of 
  the>list). this could get quite>  slow over 50 or 60 
  thousand lines of checking.>>LIN is one line of the CSV 
  file, the INDEX is checking for a >last name & a>zip 
  code and sometimes>   par

RE: looking for faster Ideas...

2004-01-27 Thread Jeff Schasny
Title: RE: looking for faster Ideas...



I 
suppose you could soundex the whole thing

  -Original Message-From: Geoffrey Mitchell 
  [mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 2004 2:59 
  PMTo: U2 Users Discussion ListSubject: RE: looking for 
  faster Ideas...We do something like this, using a "match 
  code" composed of fragments of data concatenated together.  I think we 
  use a delimiter, but you wouldn't need to.So, if you want to match 
  Johnson in zipcode 12345 on Maple street, you might have a matchcode of 
  "JOHNSON*12345*MAPLE", so you would extract the relevant fields, build the 
  matchcode and check it against a list or file.  Actually, we use an 
  I-type dictionary to generate the matchcode, and have an index built on 
  it.  For small datasets this may be *slower* than your case statement, 
  but I would think that it would be easier to maintain, and for large datasets 
  it should be quicker since the time to construct the matchcode and do a read, 
  selectindex, or whatever would be constant.  Of course, if you have a 
  Jonsson that gets spelled Johnson, you're going to have problems no matter how 
  you approach it.On Tue, 2004-01-27 at 13:05, George Gallen wrote: 
  I can't just check for 
names, it has to a name with a specific zip codeand if the name is 
fairly common, we also add in part of the address tomake sure no one 
else is weeded out that shouldn't be.I suppose I could keep two or 
three arrays, do a specific lookup in eachsaving the position, and if 
all three positions are identicle (asuming allthree arrays have the 
name, address, zip in the same order) then that wouldbe a 
matchThanksGeorge>-Original 
Message->From: Jeff Schasny [mailto:[EMAIL PROTECTED]]>Sent: Tuesday, January 27, 2004 1:51 PM>To: U2 Users 
Discussion List>Subject: RE: looking for faster 
Ideas...>>>how about keeping a list of excluded names 
as a record in a >file (or as a>flat file in a directory with 
each name/item/whatever on a >line) and reading>it into the 
program as a dynamic array then doing a locate on >the string 
in>question.  Something like this:>>>READ 
ALIST FROM AFILE,SOME-ID ELSE STOP>X = 
0>LOOP>   X += 1>   ASTRING = 
INLIST>UNTIL ASTRING = ''>   LOCATE ASTRING 
IN ALIST SETTING POS THEN>  
DO>  
OTHER>  STUFF>   END 
ELSE>  DONT>   
END>REPEAT>>Of course of you really want speed then 
sort the list and use >a "BY clause>in the 
locate>>-Original Message->From: George Gallen 
[mailto:[EMAIL PROTECTED]]>Sent: Tuesday, January 27, 2004 11:33 AM>To: 'Ardent 
List'>Subject: looking for faster Ideas...>>>I 
can't setup any indexs to speed this up. Basically I'm >scanning a 
CSV file>for names to remove>   and set the flag of 
KICK=1 to remove it (creating a new CSV >file at the>same 
time).>>Keep in mind the ".." are people's last names, or zip 
codes, or part of>their address, changed>them to ".." to 
protect the unwanting...>>Right now, I do a series of CASE's 
...>Now, it's not a major problem as I'm only checking for 20 or 
>so names, but>as more and more people>  request 
to be removed (and we don't have access to the >creation of 
the>list). this could get quite>  slow over 50 or 60 
thousand lines of checking.>>LIN is one line of the CSV file, 
the INDEX is checking for a >last name & a>zip code and 
sometimes>   part of the address line.>>Any 
Ideas?>>Remember, we can't change the source of the file, it 
will >always be a CSV,>being read line by 
line>>   KICK=0>   BEGIN 
CASE>  CASE 
-1> 
KICK=1>    BEGIN 
CASE>    
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
AND>INDEX(LIN,"..",1)#0>   
    CASE INDEX(LIN,"..",1)#0 AND 
INDEX(LIN,"..",1)#0>   
    CASE INDEX(LIN,"..",1)#0 AND 
INDEX(LIN,"..",1)#0>   
    CASE INDEX(LIN,"..",1)#0 AND 
INDEX(LIN,"..",1)#0>   
    CASE INDEX(LIN,"..",1)#0 AND 
INDEX(LIN,"..",1)#0>   
    CASE INDEX(LIN,"..",1)#0 AND 
INDEX(LIN,"..",1)#0>   
    CASE INDEX(LIN,"..",1)#0 AND 
INDEX(LIN,"..",1)#0>   
    CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>       CASE 
INDEX(LIN,"..",1)#0 AND 
INDEX(LIN,"..",1)#

RE: looking for faster Ideas...

2004-01-27 Thread Geoffrey Mitchell
Title: RE: looking for faster Ideas...




We do something like this, using a "match code" composed of fragments of data concatenated together.  I think we use a delimiter, but you wouldn't need to.

So, if you want to match Johnson in zipcode 12345 on Maple street, you might have a matchcode of "JOHNSON*12345*MAPLE", so you would extract the relevant fields, build the matchcode and check it against a list or file.  Actually, we use an I-type dictionary to generate the matchcode, and have an index built on it.  For small datasets this may be *slower* than your case statement, but I would think that it would be easier to maintain, and for large datasets it should be quicker since the time to construct the matchcode and do a read, selectindex, or whatever would be constant.  Of course, if you have a Jonsson that gets spelled Johnson, you're going to have problems no matter how you approach it.

On Tue, 2004-01-27 at 13:05, George Gallen wrote:

I can't just check for names, it has to a name with a specific zip code
and if the name is fairly common, we also add in part of the address to
make sure no one else is weeded out that shouldn't be.

I suppose I could keep two or three arrays, do a specific lookup in each
saving the position, and if all three positions are identicle (asuming all
three arrays have the name, address, zip in the same order) then that would
be a matchThanks

George

>-Original Message-
>From: Jeff Schasny [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 1:51 PM
    >To: U2 Users Discussion List
>Subject: RE: looking for faster Ideas...
>
>
>how about keeping a list of excluded names as a record in a 
>file (or as a
>flat file in a directory with each name/item/whatever on a 
>line) and reading
>it into the program as a dynamic array then doing a locate on 
>the string in
>question.  Something like this:
>
>
>READ ALIST FROM AFILE,SOME-ID ELSE STOP
>X = 0
>LOOP
>   X += 1
>   ASTRING = INLIST
>UNTIL ASTRING = ''
>   LOCATE ASTRING IN ALIST SETTING POS THEN
>  DO
>  OTHER
>  STUFF
>   END ELSE
>  DONT
>   END
>REPEAT
>
>Of course of you really want speed then sort the list and use 
>a "BY clause
>in the locate
>
>-Original Message-
>From: George Gallen [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 11:33 AM
>To: 'Ardent List'
>Subject: looking for faster Ideas...
>
>
>I can't setup any indexs to speed this up. Basically I'm 
>scanning a CSV file
>for names to remove
>   and set the flag of KICK=1 to remove it (creating a new CSV 
>file at the
>same time).
>
>Keep in mind the ".." are people's last names, or zip codes, or part of
>their address, changed
>them to ".." to protect the unwanting...
>
>Right now, I do a series of CASE's ...
>Now, it's not a major problem as I'm only checking for 20 or 
>so names, but
>as more and more people
>  request to be removed (and we don't have access to the 
>creation of the
>list). this could get quite
>  slow over 50 or 60 thousand lines of checking.
>
>LIN is one line of the CSV file, the INDEX is checking for a 
>last name & a
>zip code and sometimes
>   part of the address line.
>
>Any Ideas?
>
>Remember, we can't change the source of the file, it will 
>always be a CSV,
>being read line by line
>
>   KICK=0
>   BEGIN CASE
>  CASE -1
> KICK=1
>    BEGIN CASE
>    CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#

RE: looking for faster Ideas...

2004-01-27 Thread George Gallen
Title: RE: looking for faster Ideas...





what is it considered if you run the perl program
through perl2exe ?  


Is it compiled then? or still interpreted with a
big library?


George


>-Original Message-
>From: Jeff Schasny [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 3:46 PM
>To: U2 Users Discussion List
>Subject: RE: looking for faster Ideas...
>
>
>What? As opposed to Uni/UV/Pick Basic? Surprise! it compiles 
>to psudocode
>just like java. Now if you were to have proposed "C", Fortran, 
>Assembler,
>etc I could see your point.
>
>-Original Message-
>From: Ian McGowan [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 12:22 PM
>To: U2 Users Discussion List
>Subject: RE: looking for faster Ideas...
>
>
>if speed is the issue, sounds like a job for a compiled lanuage. or
>semi-compiled like perl or python.
>
>[snip]
>___
>u2-users mailing list
>[EMAIL PROTECTED]
>http://www.oliver.com/mailman/listinfo/u2-users
>



___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-27 Thread Jeff Schasny
What? As opposed to Uni/UV/Pick Basic? Surprise! it compiles to psudocode
just like java. Now if you were to have proposed "C", Fortran, Assembler,
etc I could see your point.

-Original Message-
From: Ian McGowan [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 27, 2004 12:22 PM
To: U2 Users Discussion List
Subject: RE: looking for faster Ideas...


if speed is the issue, sounds like a job for a compiled lanuage. or
semi-compiled like perl or python.

[snip]
___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-27 Thread George Gallen
Title: RE: looking for faster Ideas...



keep 
in mind, it's not the renting company that is
giveing us the remove infomation, it's the consumer,
and of 
course they never have the mailing piece in 
their 
hand. Although usually, if they call, we can get
the 
specific info we are looking for which can change
the 
case to one check.
 
But 
when the info is mailed in or emailed in or left on
a 
voice mail, that's when we run into not having the
best 
data to go with. Calling/emailing/mailing them
back 
usually just increases the annoyance level on
their 
end, since we are contacting them Again..
 
George

  -Original Message-From: George Gallen 
  [mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 2004 2:51 
  PMTo: 'U2 Users Discussion List'Subject: RE: looking for 
  faster Ideas...
  sometimes there is a number, but rarely, are we given 
  the number when requested to remove, usually just 
  remove me from your $^&#^$*&$ mailing :) some add 
  please. 
  I considered PERL as a pre-processor to remove the 
  names then pass that file to my program which does 
  other stuff too 
  George 
  >-Original Message- >From: Ian McGowan [mailto:[EMAIL PROTECTED]] 
  >Sent: Tuesday, January 27, 2004 2:22 PM >To: U2 Users Discussion List >Subject: 
  RE: looking for faster Ideas... > > >if speed is the issue, sounds like a 
  job for a compiled lanuage. or >semi-compiled like 
  perl or python. > >is 
  there a unique number sent over by the other system?  it might be 
  >quicker to parse the whole thing and keep an exclude file 
  keyed off the >unique number.  if it weren't 
  for embedded comma's you could >CONVERT "," 
  >TO @AM, extract the key and write the record out 
  as-is.  that would be >quicker than 852 
  INDEX's :-) > >On Tue, 
  2004-01-27 at 11:05, George Gallen wrote: >> I 
  can't just check for names, it has to a name with a >specific zip code >> and if the name 
  is fairly common, we also add in part of the >address to >> make sure no one else 
  is weeded out that shouldn't be. >> 
  >> I suppose I could keep two or three arrays, 
  do a specific >lookup in each >> >> saving the position, and if 
  all three positions are >identicle (asuming 
  >> all >> three arrays 
  have the name, address, zip in the same order) then that >> would >> be a matchThanks 
  >> >> George 
  >> >> 
  >-----Original Message- >> >From: Jeff 
  Schasny [ mailto:[EMAIL PROTECTED] 
  >> <mailto:[EMAIL PROTECTED]> ] 
  >> >Sent: Tuesday, January 27, 2004 1:51 PM 
  >> >To: U2 Users Discussion List 
  >> >Subject: RE: looking for faster Ideas... 
  >> > >> > 
  >> >how about keeping a list of excluded 
  names as a record in a >> >file (or as a 
  >> >flat file in a directory with each 
  name/item/whatever on a >> >line) and reading 
  >> >it into the program as a dynamic array 
  then doing a locate on >> >the string in 
  >> >question.  Something like this: 
  >> > >> > 
  >> >READ ALIST FROM AFILE,SOME-ID ELSE STOP 
  >> >X = 0 >> 
  >LOOP >> >   X += 1 
  >> >   ASTRING = INLIST 
  >> >UNTIL ASTRING = '' >> >   LOCATE ASTRING IN ALIST SETTING POS THEN 
  >> >  DO 
  >> >  OTHER 
  >> >  STUFF 
  >> >   END ELSE >> >  DONT >> >   END >> 
  >REPEAT >> > >> >Of course of you really want speed then sort the list and 
  use >> >a "BY clause >> >in the locate >> > 
  >> >-Original Message- 
  >> >From: George Gallen [ mailto:[EMAIL PROTECTED] 
  >> <mailto:[EMAIL PROTECTED]> ] 
  >> >Sent: Tuesday, January 27, 2004 11:33 AM 
  >> >To: 'Ardent List' >> >Subject: looking for faster Ideas... >> > >> > >> >I can't setup any indexs to speed this up. Basically I'm 
  >> >scanning a CSV file >> >for names to remove >> 
  >   and set the flag of KICK=1 to remove it (creating a new CSV 
  >> >file at the >> >same time). >> > 
  >> >Keep in mind the ".." are people's last 
  names, or zip >codes, or part of >> >> >their address, changed 
  >> >them to ".." to protect the unwanting... 
  >> > >> 
  >Right now, I do a series of CASE's ... >> 
  >Now, it's not a major problem as I'm only checking for 20 or 
  >> >so names, but >> >as more and more people >&g

RE: looking for faster Ideas...

2004-01-27 Thread George Gallen
Title: RE: looking for faster Ideas...



Mike, 
doing what you propose would require a massive file to start with, 
and
would 
require a crap load of disk reads, which would be far slower then a 
bunch
of 
cases, and the project isn't worth that kind of investment anyway. But 
thanks.
 
the 
source line would look something like
 
"","jon c smith","1234 anywhere 
st","","","somecity","SS","12345-1254",""
 
I'm 
looking for "smith" & "12345" and sometimes "anywhere"
 
We may 
get a call from john smith (john not jon because they
didn't 
spell their first name), didn't leave their middle init and
didn't 
give us their 9 digit zip, only 5 digit zip.
 
So I 
can't build any indexes. Searching for multiple pieces on the same 
line
   pretty  much gives a fairly good matchup considing the 
source and match
   data aren't EXACTLY the same.
 
Any of 
course, I'm not going to go hog wild in doing this. Creating a 
temp
file, 
parsing into dynamic arrays loops and lookups...way too much, 
rather
just 
use PERL to pre-process.

  -Original Message-From: Mike Rajkowski 
  [mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 2004 
  2:41 PMTo: U2 Users Discussion ListSubject: RE: looking 
  for faster Ideas...
  
  Create a temp file, 
  and populate it with variations of the name in question (upcase and remove spaces).  (Storing address information in each 
  record)
   
  Then loop through 
  your list, taking the name, and parsing the various combinations of the words. 
   
  ( 
  John David Doe  - JOHNDOE, DOEJOHN JOHNDAVIDDOE, 
  JOHNDOEDAVID)
   
  And attempt to read 
  the item from the temp file, if it can read an item then verify the address 
  information.  Otherwise check the 
  next item.
   
   
  -Original 
  Message-From: 
  [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of George GallenSent: Tuesday, January 27, 2004 12:13 
  PMTo: 'U2 Users Discussion 
  List'Subject: RE: looking 
  for faster Ideas...
   
  
  in 
  rethinking my take on that. That would still be 
  difficult
  
  since 
  the arrays would only contain "parts" of the whole 
  fields.
  
  making 
  the searching of the arrays very difficult.
  
   
  
  We can't 
  store the exact entry, since sometimes people 
  will
  
  call and 
  say stop sending me things and not give us the 
  name
  
  the same 
  way it's in the database we rent.
  
   
  
  Basically it takes 
  the renting company a couple months to 
  remove
  
  the 
  name, but we like to filter it immediately to stop 
  anything
  
  from 
  going out before the renting company removes it, and 
  it
  
  also 
  will catch it if the renting company replaces it in a 
  couple
  
  months 
  later....
  
   
  
  George
  
   
  
   -Original Message-From: George Gallen 
  [mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 2004 2:06 
  PMTo: 'U2 Users Discussion 
  List'Subject: RE: looking 
  for faster Ideas...
  
I can't just check for names, it has to a name with 
a specific zip code and if the name is fairly common, we also add in 
part of the address to make sure no one else is weeded out that shouldn't 
be. 
I suppose I could keep two or three arrays, do a 
specific lookup in each saving the position, and if all three positions are 
identicle (asuming all three arrays have the name, address, zip in the same 
order) then that would be a match....Thanks 
George 
>-Original Message- 
>From: Jeff Schasny [mailto:[EMAIL PROTECTED]] 
>Sent: Tuesday, January 
27, 2004 1:51 PM >To: U2 Users Discussion List 
>Subject: RE: looking for 
faster Ideas... > > >how about keeping a list of excluded names as a 
record in a >file (or as a >flat file in a directory with each 
name/item/whatever on a >line) and reading >it into the program as a dynamic 
array then doing a locate on >the string in >question.  Something like 
this: > > >READ ALIST FROM AFILE,SOME-ID ELSE 
STOP >X = 
0 >LOOP >   X += 1 >   ASTRING = 
INLIST >UNTIL ASTRING = '' >   LOCATE ASTRING IN 
ALIST SETTING POS THEN >  DO 
>  
OTHER >  
STUFF >   END ELSE >  
DONT >   END >REPEAT > >Of course of you really want speed 
then sort the list and use >a "BY clause >in the locate 
> >-Original 
Message- 

___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-27 Thread George Gallen
Title: RE: looking for faster Ideas...





sometimes there is a number, but rarely, are we given
the number when requested to remove, usually just
remove me from your $^&#^$*&$ mailing :) some add please.


I considered PERL as a pre-processor to remove the names
then pass that file to my program which does other stuff
too


George


>-Original Message-
>From: Ian McGowan [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 2:22 PM
>To: U2 Users Discussion List
>Subject: RE: looking for faster Ideas...
>
>
>if speed is the issue, sounds like a job for a compiled lanuage. or
>semi-compiled like perl or python.
>
>is there a unique number sent over by the other system?  it might be
>quicker to parse the whole thing and keep an exclude file keyed off the
>unique number.  if it weren't for embedded comma's you could 
>CONVERT ","
>TO @AM, extract the key and write the record out as-is.  that would be
>quicker than 852 INDEX's :-)
>
>On Tue, 2004-01-27 at 11:05, George Gallen wrote:
>> I can't just check for names, it has to a name with a 
>specific zip code 
>> and if the name is fairly common, we also add in part of the 
>address to 
>> make sure no one else is weeded out that shouldn't be. 
>> 
>> I suppose I could keep two or three arrays, do a specific 
>lookup in each
>> 
>> saving the position, and if all three positions are 
>identicle (asuming
>> all 
>> three arrays have the name, address, zip in the same order) then that
>> would 
>> be a matchThanks 
>> 
>> George 
>> 
>> >-----Original Message----- 
>> >From: Jeff Schasny [ mailto:[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]> ] 
>> >Sent: Tuesday, January 27, 2004 1:51 PM 
>> >To: U2 Users Discussion List 
>> >Subject: RE: looking for faster Ideas... 
>> > 
>> > 
>> >how about keeping a list of excluded names as a record in a 
>> >file (or as a 
>> >flat file in a directory with each name/item/whatever on a 
>> >line) and reading 
>> >it into the program as a dynamic array then doing a locate on 
>> >the string in 
>> >question.  Something like this: 
>> > 
>> > 
>> >READ ALIST FROM AFILE,SOME-ID ELSE STOP 
>> >X = 0 
>> >LOOP 
>> >   X += 1 
>> >   ASTRING = INLIST 
>> >UNTIL ASTRING = '' 
>> >   LOCATE ASTRING IN ALIST SETTING POS THEN 
>> >  DO 
>> >  OTHER 
>> >  STUFF 
>> >   END ELSE 
>> >  DONT 
>> >   END 
>> >REPEAT 
>> > 
>> >Of course of you really want speed then sort the list and use 
>> >a "BY clause 
>> >in the locate 
>> > 
>> >-Original Message- 
>> >From: George Gallen [ mailto:[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]> ] 
>> >Sent: Tuesday, January 27, 2004 11:33 AM 
>> >To: 'Ardent List' 
>> >Subject: looking for faster Ideas... 
>> > 
>> > 
>> >I can't setup any indexs to speed this up. Basically I'm 
>> >scanning a CSV file 
>> >for names to remove 
>> >   and set the flag of KICK=1 to remove it (creating a new CSV 
>> >file at the 
>> >same time). 
>> > 
>> >Keep in mind the ".." are people's last names, or zip 
>codes, or part of
>> 
>> >their address, changed 
>> >them to ".." to protect the unwanting... 
>> > 
>> >Right now, I do a series of CASE's ... 
>> >Now, it's not a major problem as I'm only checking for 20 or 
>> >so names, but 
>> >as more and more people 
>> >  request to be removed (and we don't have access to the 
>> >creation of the 
>> >list). this could get quite 
>> >  slow over 50 or 60 thousand lines of checking. 
>> > 
>> >LIN is one line of the CSV file, the INDEX is checking for a 
>> >last name & a 
>> >zip code and sometimes 
>> >   part of the address line. 
>> > 
>> >Any Ideas? 
>> > 
>> >Remember, we can't change the source of the file, it will 
>> >always be a CSV, 
>> >being read line by line 
>> > 
>> >   KICK=0 
>> >   BEGIN CASE 
>> >  CASE -1 
>> > KICK=1 
>> >    BEGIN CASE 
>> >    CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND 
>> >INDEX(LIN,"..",1)#0 
&g

RE: looking for faster Ideas...

2004-01-27 Thread Mike Rajkowski
Title: RE: looking for faster Ideas...









Create a temp file, and populate it with
variations of the name in question (upcase and remove
spaces).  (Storing address
information in each record)

 

Then loop through your list, taking the
name, and parsing the various combinations of the words.  

( John David Doe  - JOHNDOE,
DOEJOHN JOHNDAVIDDOE, JOHNDOEDAVID)

 

And attempt to read the item from the temp
file, if it can read an item then verify the address information.  Otherwise check the next item.

 

 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of George Gallen
Sent: Tuesday, January 27, 2004
12:13 PM
To: 'U2 Users Discussion List'
Subject: RE: looking for faster
Ideas...

 



in rethinking my take on
that. That would still be difficult





since the arrays would
only contain "parts" of the whole fields.





making the searching of
the arrays very difficult.





 





We can't store the exact
entry, since sometimes people will





call and say stop sending
me things and not give us the name





the same way it's in the
database we rent.





 





Basically it takes the
renting company a couple months to remove





the name, but we like to
filter it immediately to stop anything





from going out before the
renting company removes it, and it





also will catch it if the
renting company replaces it in a couple





months later





 





George





 





 -Original Message-
From: George Gallen
[mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 27, 2004
2:06 PM
To: 'U2 Users Discussion List'
Subject: RE: looking for faster
Ideas...





I can't just check for names, it has to a name with a
specific zip code 
and if the name is fairly common,
we also add in part of the address to 
make sure no one else is weeded out
that shouldn't be. 

I suppose I could keep two or three arrays, do a
specific lookup in each 
saving the position, and if all
three positions are identicle (asuming all 
three arrays have the name,
address, zip in the same order) then that would 
be a matchThanks 

George 

>-Original Message- 
>From: Jeff Schasny [mailto:[EMAIL PROTECTED]]

>Sent: Tuesday, January 27, 2004
1:51 PM 
>To: U2 Users Discussion List

>Subject: RE: looking for faster
Ideas... 
> 
> 
>how about keeping a list of
excluded names as a record in a 
>file (or as a 
>flat file in a directory with
each name/item/whatever on a 
>line) and reading

>it into the program as a
dynamic array then doing a locate on 
>the string in 
>question.  Something like
this: 
> 
> 
>READ ALIST FROM AFILE,SOME-ID
ELSE STOP 
>X = 0 
>LOOP 
>   X += 1

>   ASTRING =
INLIST 
>UNTIL ASTRING = ''

>   LOCATE ASTRING IN
ALIST SETTING POS THEN 
> 
DO 
> 
OTHER 
> 
STUFF 
>   END ELSE

> 
DONT 
>   END 
>REPEAT 
> 
>Of course of you really want
speed then sort the list and use 
>a "BY clause

>in the locate 
> 
>-Original Message-










___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-27 Thread Ian McGowan
On Tue, 2004-01-27 at 11:12, George Gallen wrote:
> in rethinking my take on that. That would still be difficult
> since the arrays would only contain "parts" of the whole fields.
> making the searching of the arrays very difficult.

ah, then you can't use grep or INDEX on the unparsed line - you have to
parse the line into records first.  some kind of unique key (phone
number?) would be helpful, but you could always have an exclude file
keyed on last name, with an MV list of zip codes: 

MCGOWAN
94111]94598]40210

and exclude them in your program:

LOOP
GOSUB READ.NEXT.LINE
IF DONE THEN EXIT
GOSUB PARSE.LINE
NAME=REC<2>
ZIP=REC<23>
READ EXCLUDE.ZIPS FROM EXCLUDE.FILE, NAME THEN
LOCATE ZIP IN EXCLUDE.ZIPS<1> SETTING POS THEN CONTINUE
END
... MORE PROCESSING ...
REPEAT

-- 
Ian McGowan <[EMAIL PROTECTED]>

___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-27 Thread Ian McGowan
if speed is the issue, sounds like a job for a compiled lanuage. or
semi-compiled like perl or python.

is there a unique number sent over by the other system?  it might be
quicker to parse the whole thing and keep an exclude file keyed off the
unique number.  if it weren't for embedded comma's you could CONVERT ","
TO @AM, extract the key and write the record out as-is.  that would be
quicker than 852 INDEX's :-)

On Tue, 2004-01-27 at 11:05, George Gallen wrote:
> I can't just check for names, it has to a name with a specific zip code 
> and if the name is fairly common, we also add in part of the address to 
> make sure no one else is weeded out that shouldn't be. 
> 
> I suppose I could keep two or three arrays, do a specific lookup in each
> 
> saving the position, and if all three positions are identicle (asuming
> all 
> three arrays have the name, address, zip in the same order) then that
> would 
> be a matchThanks 
> 
> George 
> 
> >-Original Message- 
> >From: Jeff Schasny [ mailto:[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]> ] 
> >Sent: Tuesday, January 27, 2004 1:51 PM 
> >To: U2 Users Discussion List 
> >Subject: RE: looking for faster Ideas... 
> > 
> > 
> >how about keeping a list of excluded names as a record in a 
> >file (or as a 
> >flat file in a directory with each name/item/whatever on a 
> >line) and reading 
> >it into the program as a dynamic array then doing a locate on 
> >the string in 
> >question.  Something like this: 
> > 
> > 
> >READ ALIST FROM AFILE,SOME-ID ELSE STOP 
> >X = 0 
> >LOOP 
> >   X += 1 
> >   ASTRING = INLIST 
> >UNTIL ASTRING = '' 
> >   LOCATE ASTRING IN ALIST SETTING POS THEN 
> >  DO 
> >  OTHER 
> >  STUFF 
> >   END ELSE 
> >  DONT 
> >   END 
> >REPEAT 
> > 
> >Of course of you really want speed then sort the list and use 
> >a "BY clause 
> >in the locate 
> > 
> >-Original Message- 
> >From: George Gallen [ mailto:[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]> ] 
> >Sent: Tuesday, January 27, 2004 11:33 AM 
> >To: 'Ardent List' 
> >Subject: looking for faster Ideas... 
> > 
> > 
> >I can't setup any indexs to speed this up. Basically I'm 
> >scanning a CSV file 
> >for names to remove 
> >   and set the flag of KICK=1 to remove it (creating a new CSV 
> >file at the 
> >same time). 
> > 
> >Keep in mind the ".." are people's last names, or zip codes, or part of
> 
> >their address, changed 
> >them to ".." to protect the unwanting... 
> > 
> >Right now, I do a series of CASE's ... 
> >Now, it's not a major problem as I'm only checking for 20 or 
> >so names, but 
> >as more and more people 
> >  request to be removed (and we don't have access to the 
> >creation of the 
> >list). this could get quite 
> >  slow over 50 or 60 thousand lines of checking. 
> > 
> >LIN is one line of the CSV file, the INDEX is checking for a 
> >last name & a 
> >zip code and sometimes 
> >   part of the address line. 
> > 
> >Any Ideas? 
> > 
> >Remember, we can't change the source of the file, it will 
> >always be a CSV, 
> >being read line by line 
> > 
> >   KICK=0 
> >   BEGIN CASE 
> >  CASE -1 
> > KICK=1 
> >BEGIN CASE 
> >CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND 
> >INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> >   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND 
> >INDEX(LIN,"..&qu

RE: looking for faster Ideas...

2004-01-27 Thread George Gallen
Title: RE: looking for faster Ideas...



in 
rethinking my take on that. That would still be difficult
since 
the arrays would only contain "parts" of the whole fields.
making 
the searching of the arrays very difficult.
 
We 
can't store the exact entry, since sometimes people will
call 
and say stop sending me things and not give us the name
the 
same way it's in the database we rent.
 
Basically it takes the renting company a couple months to 
remove
the 
name, but we like to filter it immediately to stop anything
from 
going out before the renting company removes it, and it
also 
will catch it if the renting company replaces it in a couple
months 
later
 
George
 
 -Original Message-From: 
George Gallen [mailto:[EMAIL PROTECTED]Sent: Tuesday, January 27, 
2004 2:06 PMTo: 'U2 Users Discussion List'Subject: RE: 
looking for faster Ideas...

  I can't just check for names, it has to a name with a specific 
  zip code and if the name is fairly common, we also add 
  in part of the address to make sure no one else is 
  weeded out that shouldn't be. 
  I suppose I could keep two or three arrays, do a specific 
  lookup in each saving the position, and if all three 
  positions are identicle (asuming all three arrays have 
  the name, address, zip in the same order) then that would be a matchThanks 
  George 
  >-Original Message- >From: Jeff Schasny [mailto:[EMAIL PROTECTED]] 
  >Sent: Tuesday, January 27, 2004 1:51 PM >To: U2 Users Discussion List >Subject: 
  RE: looking for faster Ideas... > > >how about keeping a list of excluded 
  names as a record in a >file (or as a 
  >flat file in a directory with each name/item/whatever on 
  a >line) and reading >it 
  into the program as a dynamic array then doing a locate on >the string in >question.  Something 
  like this: > > 
  >READ ALIST FROM AFILE,SOME-ID ELSE STOP >X = 0 >LOOP >   X += 1 >   
  ASTRING = INLIST >UNTIL ASTRING = 
  '' >   LOCATE ASTRING IN ALIST SETTING 
  POS THEN >  DO 
  >  OTHER >  STUFF >   END ELSE >  DONT >   END >REPEAT 
  > >Of course of you really want 
  speed then sort the list and use >a "BY 
  clause >in the locate > >-Original Message- 
  
___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-27 Thread George Gallen
Title: RE: looking for faster Ideas...





thats why the zip code and sometimes the part of the address
  is used also, the chances of the matching part of the name
  the zip code, and part of the address and NOT being unique
  is extremely low.


Which is also what complicates this.


George


>-Original Message-
>From: Ian McGowan [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 1:56 PM
>To: U2 Users Discussion List
>Subject: Re: looking for faster Ideas...
>
>
>do it outside basic using
>
>$grep -F -f pattern-file csv-file > remove-file
>
>the pattern file would have the pieces in there.  what if you're
>excluding something that's not unique?  "smith" would exclude
>"smithers", "smithy". "psmith (one for the wodehouse fans :-)" etc.
>
>i do this with some huge syslog files, and fairly big pattern files and
>it's pretty darn quick.  
>
>ian
>
>On Tue, 2004-01-27 at 10:33, George Gallen wrote:
>> I can't setup any indexs to speed this up. Basically I'm 
>scanning a CSV
>> file
>> for names to remove
>>    and set the flag of KICK=1 to remove it (creating a new 
>CSV file at
>> the
>> same time).
>> 
>> Keep in mind the ".." are people's last names, or zip codes, 
>or part of
>> their address, changed
>> them to ".." to protect the unwanting...
>> 
>> Right now, I do a series of CASE's ...
>> Now, it's not a major problem as I'm only checking for 20 or 
>so names,
>> but
>> as more and more people
>>   request to be removed (and we don't have access to the 
>creation of the
>> list). this could get quite
>>   slow over 50 or 60 thousand lines of checking.
>> 
>> LIN is one line of the CSV file, the INDEX is checking for a 
>last name &
>> a
>> zip code and sometimes
>>    part of the address line.
>> 
>> Any Ideas?
>> 
>> Remember, we can't change the source of the file, it will always be a
>> CSV,
>> being read line by line
>> 
>>    KICK=0
>>    BEGIN CASE
>>   CASE -1
>>  KICK=1
>>   BEGIN CASE
>> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>> INDEX(LIN,"..",1)#0
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>> INDEX(LIN,"..",1)#0 
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>>      CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>>      CASE -1
>>     KICK=0
>>   END CASE
>>    END CASE
>> 
>> George Gallen
>> Senior Programmer/Analyst
>> Accounting/Data Division
>> [EMAIL PROTECTED]
>> ph:856.848.1000 Ext 220
>> 
>> SLACK Incorporated - An innovative information, education 
>and management
>> company
>> http://www.slackinc.com
>> 
>> ___
>> u2-users mailing list
>> [EMAIL PROTECTED]
>> http://www.oliver.com/mailman/listinfo/u2-users
>-- 
>Ian McGowan <[EMAIL PROTECTED]>
>
>___
>u2-users mailing list
>[EMAIL PROTECTED]
>http://www.oliver.com/mailman/listinfo/u2-users
>



___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


Re: looking for faster Ideas...

2004-01-27 Thread Ian McGowan
or actually, since it seems you want to find everyone *except* the
opt-out-er's:

$grep -v -F -f pattern-file csv-file > process-file

and then work thru the process file.  still have a problem with partial
matches, though...

why not get rid of the csv file and keep a record for each user?  then
you could simply add an atb, OPTOUT, and SELECT DIRECTMAILLIST WITH
OPTOUT = ""?  ah, the csv file must be coming from some outside system.

On Tue, 2004-01-27 at 10:56, Ian McGowan wrote:
> do it outside basic using
> 
> $grep -F -f pattern-file csv-file > remove-file
> 
> the pattern file would have the pieces in there.  what if you're
> excluding something that's not unique?  "smith" would exclude
> "smithers", "smithy". "psmith (one for the wodehouse fans :-)" etc.
> 
> i do this with some huge syslog files, and fairly big pattern files and
> it's pretty darn quick.  
> 
> ian
> 
> On Tue, 2004-01-27 at 10:33, George Gallen wrote:
> > I can't setup any indexs to speed this up. Basically I'm scanning a
> CSV
> > file
> > for names to remove
> >and set the flag of KICK=1 to remove it (creating a new CSV file at
> > the
> > same time).
> > 
> > Keep in mind the ".." are people's last names, or zip codes, or part
> of
> > their address, changed
> > them to ".." to protect the unwanting...
> > 
> > Right now, I do a series of CASE's ...
> > Now, it's not a major problem as I'm only checking for 20 or so names,
> > but
> > as more and more people
> >   request to be removed (and we don't have access to the creation of
> the
> > list). this could get quite
> >   slow over 50 or 60 thousand lines of checking.
> > 
> > LIN is one line of the CSV file, the INDEX is checking for a last name
> &
> > a
> > zip code and sometimes
> >part of the address line.
> > 
> > Any Ideas?
> > 
> > Remember, we can't change the source of the file, it will always be a
> > CSV,
> > being read line by line
> > 
> >KICK=0
> >BEGIN CASE
> >   CASE -1
> >  KICK=1
> >  BEGIN CASE
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
> > INDEX(LIN,"..",1)#0
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
> > INDEX(LIN,"..",1)#0 
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> > CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
> > CASE -1
> >KICK=0
> >  END CASE
> >END CASE
> > 
> > George Gallen
> > Senior Programmer/Analyst
> > Accounting/Data Division
> > [EMAIL PROTECTED]
> > ph:856.848.1000 Ext 220
> > 
> > SLACK Incorporated - An innovative information, education and
> management
> > company
> > http://www.slackinc.com
> > 
> > ___
> > u2-users mailing list
> > [EMAIL PROTECTED]
> > http://www.oliver.com/mailman/listinfo/u2-users
-- 
Ian McGowan <[EMAIL PROTECTED]>

___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-27 Thread George Gallen
Title: RE: looking for faster Ideas...





I can't just check for names, it has to a name with a specific zip code
and if the name is fairly common, we also add in part of the address to
make sure no one else is weeded out that shouldn't be.


I suppose I could keep two or three arrays, do a specific lookup in each
saving the position, and if all three positions are identicle (asuming all
three arrays have the name, address, zip in the same order) then that would
be a matchThanks


George


>-Original Message-
>From: Jeff Schasny [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 1:51 PM
>To: U2 Users Discussion List
>Subject: RE: looking for faster Ideas...
>
>
>how about keeping a list of excluded names as a record in a 
>file (or as a
>flat file in a directory with each name/item/whatever on a 
>line) and reading
>it into the program as a dynamic array then doing a locate on 
>the string in
>question.  Something like this:
>
>
>READ ALIST FROM AFILE,SOME-ID ELSE STOP
>X = 0
>LOOP
>   X += 1
>   ASTRING = INLIST
>UNTIL ASTRING = ''
>   LOCATE ASTRING IN ALIST SETTING POS THEN
>  DO
>  OTHER
>  STUFF
>   END ELSE
>  DONT
>   END
>REPEAT
>
>Of course of you really want speed then sort the list and use 
>a "BY clause
>in the locate
>
>-Original Message-
>From: George Gallen [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 11:33 AM
>To: 'Ardent List'
>Subject: looking for faster Ideas...
>
>
>I can't setup any indexs to speed this up. Basically I'm 
>scanning a CSV file
>for names to remove
>   and set the flag of KICK=1 to remove it (creating a new CSV 
>file at the
>same time).
>
>Keep in mind the ".." are people's last names, or zip codes, or part of
>their address, changed
>them to ".." to protect the unwanting...
>
>Right now, I do a series of CASE's ...
>Now, it's not a major problem as I'm only checking for 20 or 
>so names, but
>as more and more people
>  request to be removed (and we don't have access to the 
>creation of the
>list). this could get quite
>  slow over 50 or 60 thousand lines of checking.
>
>LIN is one line of the CSV file, the INDEX is checking for a 
>last name & a
>zip code and sometimes
>   part of the address line.
>
>Any Ideas?
>
>Remember, we can't change the source of the file, it will 
>always be a CSV,
>being read line by line
>
>   KICK=0
>   BEGIN CASE
>  CASE -1
> KICK=1
>    BEGIN CASE
>    CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>INDEX(LIN,"..",1)#0 
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>       CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>       CASE -1
>      KICK=0
>    END CASE
>   END CASE
>
>George Gallen
>Senior Programmer/Analyst
>Accounting/Data Division
>[EMAIL PROTECTED]
>ph:856.848.1000 Ext 220
>
>SLACK Incorporated - An innovative information, education and 
>management
>company
>http://www.slackinc.com
>
>___
>u2-users mailing list
>[EMAIL PROTECTED]
>http://www.oliver.com/mailman/listinfo/u2-users
>___
>u2-users mailing list
>[EMAIL PROTECTED]
>http://www.oliver.com/mailman/listinfo/u2-users
>



___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


Re: looking for faster Ideas...

2004-01-27 Thread Ian McGowan
do it outside basic using

$grep -F -f pattern-file csv-file > remove-file

the pattern file would have the pieces in there.  what if you're
excluding something that's not unique?  "smith" would exclude
"smithers", "smithy". "psmith (one for the wodehouse fans :-)" etc.

i do this with some huge syslog files, and fairly big pattern files and
it's pretty darn quick.  

ian

On Tue, 2004-01-27 at 10:33, George Gallen wrote:
> I can't setup any indexs to speed this up. Basically I'm scanning a CSV
> file
> for names to remove
>and set the flag of KICK=1 to remove it (creating a new CSV file at
> the
> same time).
> 
> Keep in mind the ".." are people's last names, or zip codes, or part of
> their address, changed
> them to ".." to protect the unwanting...
> 
> Right now, I do a series of CASE's ...
> Now, it's not a major problem as I'm only checking for 20 or so names,
> but
> as more and more people
>   request to be removed (and we don't have access to the creation of the
> list). this could get quite
>   slow over 50 or 60 thousand lines of checking.
> 
> LIN is one line of the CSV file, the INDEX is checking for a last name &
> a
> zip code and sometimes
>part of the address line.
> 
> Any Ideas?
> 
> Remember, we can't change the source of the file, it will always be a
> CSV,
> being read line by line
> 
>KICK=0
>BEGIN CASE
>   CASE -1
>  KICK=1
>BEGIN CASE
> CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
> INDEX(LIN,"..",1)#0
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
> INDEX(LIN,"..",1)#0 
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>   CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
>   CASE -1
>  KICK=0
>END CASE
>END CASE
> 
> George Gallen
> Senior Programmer/Analyst
> Accounting/Data Division
> [EMAIL PROTECTED]
> ph:856.848.1000 Ext 220
> 
> SLACK Incorporated - An innovative information, education and management
> company
> http://www.slackinc.com
> 
> ___
> u2-users mailing list
> [EMAIL PROTECTED]
> http://www.oliver.com/mailman/listinfo/u2-users
-- 
Ian McGowan <[EMAIL PROTECTED]>

___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


RE: looking for faster Ideas...

2004-01-27 Thread Jeff Schasny
how about keeping a list of excluded names as a record in a file (or as a
flat file in a directory with each name/item/whatever on a line) and reading
it into the program as a dynamic array then doing a locate on the string in
question.  Something like this:


READ ALIST FROM AFILE,SOME-ID ELSE STOP
X = 0
LOOP
   X += 1
   ASTRING = INLIST
UNTIL ASTRING = ''
   LOCATE ASTRING IN ALIST SETTING POS THEN
  DO
  OTHER
  STUFF
   END ELSE
  DONT
   END
REPEAT

Of course of you really want speed then sort the list and use a "BY clause
in the locate

-Original Message-
From: George Gallen [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 27, 2004 11:33 AM
To: 'Ardent List'
Subject: looking for faster Ideas...


I can't setup any indexs to speed this up. Basically I'm scanning a CSV file
for names to remove
   and set the flag of KICK=1 to remove it (creating a new CSV file at the
same time).

Keep in mind the ".." are people's last names, or zip codes, or part of
their address, changed
them to ".." to protect the unwanting...

Right now, I do a series of CASE's ...
Now, it's not a major problem as I'm only checking for 20 or so names, but
as more and more people
  request to be removed (and we don't have access to the creation of the
list). this could get quite
  slow over 50 or 60 thousand lines of checking.

LIN is one line of the CSV file, the INDEX is checking for a last name & a
zip code and sometimes
   part of the address line.

Any Ideas?

Remember, we can't change the source of the file, it will always be a CSV,
being read line by line

   KICK=0
   BEGIN CASE
  CASE -1
 KICK=1
 BEGIN CASE
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
INDEX(LIN,"..",1)#0
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
INDEX(LIN,"..",1)#0 
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 
CASE -1
   KICK=0
 END CASE
   END CASE

George Gallen
Senior Programmer/Analyst
Accounting/Data Division
[EMAIL PROTECTED]
ph:856.848.1000 Ext 220

SLACK Incorporated - An innovative information, education and management
company
http://www.slackinc.com

___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users
___
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users