Re: Cleaning Text & Replace string speed

2017-06-16 Thread Keith Culotta via 4D_Tech
Even better.  With Match regex it takes about a second to remove a smaller 
number of characters ($text;32;126;9;13) from this file.   However, when few 
characters are selected to remain in the result  ($text;65;90;9;13), the time 
goes up into minutes; 8 compiled and 11 interpreted.  I guess it's all about 
the payload.

Thanks,
Keith - CDI

> On Jun 15, 2017, at 9:08 PM, Keisuke Miyako via 4D_Tech 
> <4d_tech@lists.4d.com> wrote:
> 
> as a match regex exercise, you could do...
> 
> $test:=Method1 ("abcdefghijklmnopqrstuvwxyz";\
> Character code("k");Character code("m");\
> Character code("d");Character code("y"))
>  //dklmy
> 
>>  // 
>>  // Method: StringOmit
>>  // -   Uses REPLACE STRING* to clear characters
>>  // INPUT1: Text - to strip
>>  // INPUT2: Longint - lowest allowed character code
>>  // INPUT3: Longint - highest allowed character code
>>  // INPUT{4}: Longint - additional allowed character codes
>>  //
>>  // OUTPUT:  Text - with remaining characters
>>  // 
>> 
>> C_TEXT($1;$in;$0;$out)
>> C_LONGINT($2;$3)
>> C_LONGINT(${4})
>> 
>> C_LONGINT($i;$pos;$len)
>> C_TEXT($min;$max;$motif)
>> 
>> $min:="\\u"+Substring(String($2;"");3)
>> $max:="\\u"+Substring(String($3;"");3)
>> 
>> $motif:="["+$min+"-"+$max
>> 
>> For ($i;4;Count parameters)
>> $motif:=$motif+"\\u"+Substring(String(${$i};"");3)
>> End for
>> 
>> $motif:=$motif+"]+"
>> 
>> $in:=$1
>> 
>> $i:=1
>> 
>> While (Match regex($motif;$in;$i;$pos;$len))
>> $out:=$out+Substring($in;$pos;$len)
>> $i:=$pos+$len
>> End while
>> 
>> $0:=$out
>> 
> 
> 
> 
> **
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
> **

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Cleaning Text & Replace string speed

2017-06-15 Thread Keisuke Miyako via 4D_Tech
as a match regex exercise, you could do...

$test:=Method1 ("abcdefghijklmnopqrstuvwxyz";\
Character code("k");Character code("m");\
Character code("d");Character code("y"))
  //dklmy

>   // 
>   // Method: StringOmit
>   // -   Uses REPLACE STRING* to clear characters
>   // INPUT1: Text - to strip
>   // INPUT2: Longint - lowest allowed character code
>   // INPUT3: Longint - highest allowed character code
>   // INPUT{4}: Longint - additional allowed character codes
>   //
>   // OUTPUT:  Text - with remaining characters
>   // 
>
> C_TEXT($1;$in;$0;$out)
> C_LONGINT($2;$3)
> C_LONGINT(${4})
>
> C_LONGINT($i;$pos;$len)
> C_TEXT($min;$max;$motif)
>
> $min:="\\u"+Substring(String($2;"");3)
> $max:="\\u"+Substring(String($3;"");3)
>
> $motif:="["+$min+"-"+$max
>
> For ($i;4;Count parameters)
> $motif:=$motif+"\\u"+Substring(String(${$i};"");3)
> End for
>
> $motif:=$motif+"]+"
>
> $in:=$1
>
> $i:=1
>
> While (Match regex($motif;$in;$i;$pos;$len))
> $out:=$out+Substring($in;$pos;$len)
> $i:=$pos+$len
> End while
>
> $0:=$out
>



**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Cleaning Text & Replace string speed

2017-06-15 Thread Keith Culotta via 4D_Tech
I could not find it in the archives, so I quickly wrote a method to strip two 
unwanted, but unidentified characters from a body of text (6mb).  Each 
acceptable character was added to the output text var.  I stopped it after an 
hour of the compiled code not returning a result.

Since I am always impressed at how fast 4D commands are, I gave "Replace 
string" a try.
The same text file took 37 seconds to clean two offending characters, and 17 
seconds to clean all but uppercase only characters from the same text, and in 
interpreted mode.  It took about 4 seconds in compiled mode for both tests.

Here's the code.

Keith - CDI

  // 
  // Method: StringOmit
  // -   Uses REPLACE STRING* to clear characters
  // INPUT1: Text - to strip
  // INPUT2: Longint - lowest allowed character code
  // INPUT3: Longint - highest allowed character code
  // INPUT{4}: Longint - additional allowed character codes
  //
  // OUTPUT:  Text - with remaining characters
  // 
C_TEXT($inText;$1;$0;$outText)
C_LONGINT($i;$len;$low;$2;$high;$3;$cp;$cc;$start)
C_BOOLEAN($hasAdded;$canAdd)

$inText:=$1
$len:=Length($inText)
$low:=$2
$high:=$3

$cp:=Count parameters
If ($cp>3)
$hasAdded:=True
ARRAY LONGINT($added;0)
For ($i;4;$cp)
APPEND TO ARRAY($added;${$i})
End for 
End if 

$start:=1
$i:=$len+2

If ($hasAdded)  // also test the array of additional characters
While ($start<=$len) & ($i>($len+1))
For ($i;$start;$len)
$cc:=Character code($inText[[$i]])
If ($cc<$low) | ($cc>$high) & (Find in 
array($added;$cc)<1)
$inText:=Replace string($inText;Char($cc);"";*)
$start:=$i
$i:=$len+2
$len:=Length($inText)
End if 
End for 
End while 

Else   // no point in testing an empty array 
While ($start<=$len) & ($i>($len+1))
For ($i;$start;$len)
$cc:=Character code($inText[[$i]])
If ($cc<$low) | ($cc>$high)
$inText:=Replace string($inText;Char($cc);"";*)
$start:=$i
$i:=$len+2
$len:=Length($inText)
End if 
End for 
End while 

End if 

$0:=$inText



**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**