Re: Counting # of specific characters in string/text

2017-05-08 Thread Alan Chan via 4D_Tech
Hi Chip,

Thanks for your advice. The discussion is about how optimized Replace String is 
in v16 especially  on * operation even comparing with position(*) which 
supposed to be very fast. It's not about how to design a 4D app with those 
commands.

Alan Chan

4D iNug Technical <4d_tech@lists.4d.com> writes:
>If all we are doing is talking about "how fast is this" then this 
>discussion has no relation to real world usage. 

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-08 Thread Chip Scheide via 4D_Tech
The "problem" of using/not using * is knowing whether it is helpful.

If I am looking for all instances of 'e' in the text of a book I can 
NOT use * as e ≠ E ≠ ë etc.
So... either I would need to do repeated loops through the text using 
variations of 'e' (e, E, ë, etc), or I can not use *.

If all we are doing is talking about "how fast is this" then this 
discussion has no relation to real world usage. 

Knowing that it take nearly 20 times as long for position to find a 
character without * is important information, as is knowing that your 
test(s) always used *.

In your example, 4 million characters, 100,000 instances without * 
should be (14 * 20) = ~280ms, assuming that position does not slow down 
even further on multiple instances. While 280 ms is not long, if you 
need to loop over this 100 times it shows to your UI.

14ms * 100 = 1400ms = 1.4 sec  -- user *might* notice
280ms * 100 = 28000 = 28 seconds -- user has powered off the system and 
is restarting thinking the machine froze.  :)

so.. like I said, knowing this is important.



On Sun, 07 May 2017 03:54:22 +0800, Alan Chan via 4D_Tech wrote:
> Hi Chip
> 
> We have been talking about * all along. Without * has never been in my test.
> 
> My test always consider total length of text "and" total count of 
> occurrence. My test is for 4,300,000 length text and 100,000 
> occurrence. It took 14ms on my machine.
> 
> Alan Chan
> 
> 4D iNug Technical <4d_tech@lists.4d.com> writes:
>> v13 testing
>> position - no * find character (u + umlaut) end of 2,000,000 length text
>> 56ms +/- ms
>> 
>> Position (*) - find same character at end of 2,000,000 length text
>> 3 ms +/- 1ms
>> 
>> so... when you know that diacritcallity, or capitalization, is 
>> important Position with * is MUCH faster -- at least in v13
> 
> **
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
> **
---
Gas is for washing parts
Alcohol is for drinkin'
Nitromethane is for racing 
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-07 Thread Arnaud de Montard via 4D_Tech

> Le 6 mai 2017 à 19:31, Alan Chan via 4D_Tech <4d_tech@lists.4d.com> a écrit :
> 
> Hi Arnaud,
> 
> First of all, it seems we have reverse definition of Diacritical. 

Yes, it's quite confusing, I don't like to use it. 

Diacritic: "a glyph added to a letter" (a+`=à, ’+e=é…). 
But "diacritical search" means nothing to me if "sensitive or insensitive" 
isn't mentioned. 
Here they do it and it's not confusing: 


Ligature: "two or more graphemes or letters are joined as a single glyph" 
(o+e=œ, a+e=æ…). 
In french 4D, search correctly compares some of these. 
They are not diacritics, as far as I know. 

I suppose other langages have their own particularities. 

All this to say that I prefer to say "strict", it's short and obvious it is 
"strict comparison of char code". And "not strict" takes in account various 
rules depending on various langages. 

-- 
Arnaud de Montard 




**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-06 Thread Alan Chan via 4D_Tech
Hi Chip

We have been talking about * all along. Without * has never been in my test.

My test always consider total length of text "and" total count of occurrence. 
My test is for 4,300,000 length text and 100,000 occurrence. It took 14ms on my 
machine.

Alan Chan

4D iNug Technical <4d_tech@lists.4d.com> writes:
>v13 testing
>position - no * find character (u + umlaut) end of 2,000,000 length text
>56ms +/- ms
>
>Position (*) - find same character at end of 2,000,000 length text
>3 ms +/- 1ms
>
>so... when you know that diacritcallity, or capitalization, is important 
>Position with * is MUCH faster -- at least in v13

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-06 Thread Chip Scheide via 4D_Tech
v13 testing
position - no * find character (u + umlaut) end of 2,000,000 length text
56ms +/- ms

Position (*) - find same character at end of 2,000,000 length text
3 ms +/- 1ms

so... when you know that diacritcallity, or capitalization, is important 
Position with * is MUCH faster -- at least in v13


> Oops... my bad. Reading too fast. Please ignore this.
> 
> Alan Chan
> 
> 4D iNug Technical <4d_tech@lists.4d.com> writes:
>> I found out why your result is quite different from mine. In your 
>> code, you evaluate the time once per execution of position. However, 
>> the replace string was evaluated once per 10,000 replacement (per 
>> execution of replace string).
> 
> 
> **
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
> **

Hell is other people 
 Jean-Paul Sartre
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-06 Thread Alan Chan via 4D_Tech
Oops... my bad. Reading too fast. Please ignore this.

Alan Chan

4D iNug Technical <4d_tech@lists.4d.com> writes:
>I found out why your result is quite different from mine. In your code, you 
>evaluate the time once per execution of position. However, the replace string 
>was evaluated once per 10,000 replacement (per execution of replace string).


**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-06 Thread Alan Chan via 4D_Tech
Hi Arnaud,

I found out why your result is quite different from mine. In your code, you 
evaluate the time once per execution of position. However, the replace string 
was evaluated once per 10,000 replacement (per execution of replace string).

Alan Chan

4D iNug Technical <4d_tech@lists.4d.com> writes:
>since v15R3, using 'Replace string' for count seems always faster to me than 
>'Position' - even if [compiled+strict comparison] it's quite the same. Could 
>be a detail, but instead of measuring time to loop $i times, I often prefer to 
>count how much
>iterations are executed during a given time, it makes interpreted/compiled 
>switch easier. All results under are from compiled, all 32b versions except 
>v16 in 64b. 
>
>The little v12 test base I've used is here:
>

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-06 Thread Arnaud de Montard via 4D_Tech

> Le 6 mai 2017 à 12:07, Alan Chan via 4D_Tech <4d_tech@lists.4d.com> a écrit :
> 
> I believe it's true only for non-diacritical operation. However, for 
> diacritical operation - replace string(*) and position(*), Position(*) is 
> still faster than replace string.

Hi Alan, 
since v15R3, using 'Replace string' for count seems always faster to me than 
'Position' - even if [compiled+strict comparison] it's quite the same. Could be 
a detail, but instead of measuring time to loop $i times, I often prefer to 
count how much iterations are executed during a given time, it makes 
interpreted/compiled switch easier. All results under are from compiled, all 
32b versions except v16 in 64b. 

The little v12 test base I've used is here:


-- 
Arnaud de Montard 



**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-05 Thread Arnaud de Montard via 4D_Tech

> Le 5 mai 2017 à 22:48, Jeremy Roussak via 4D_Tech <4d_tech@lists.4d.com> a 
> écrit :
> 
> Position allows you to specify where in the text the search is to start. You 
> don’t need to use Replace string: just start searching from the last found 
> position+1.

For those interested I posted a version using Replace string optimisation when 
possible here:

(thank you, Vincent)

-- 
Arnaud de Montard 



**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-05 Thread Jeremy Roussak via 4D_Tech
Position allows you to specify where in the text the search is to start. You 
don’t need to use Replace string: just start searching from the last found 
position+1.

Jeremy


Jeremy Roussak
j...@mac.com



> On 4 May 2017, at 17:58, John Baughman via 4D_Tech <4d_tech@lists.4d.com> 
> wrote:
> 
> Quick and dirty ideas…
> 
> $text:="Now is the time for all good men to come to the aid of their country"
> ARRAY TEXT($aWords;0)
> GET TEXT KEYWORDS($text;$aWords)
> $wordCount:=Count in array($aWords;"The")
> 
> COPY ARRAY($text;$textTest)
> $characterCount:=0
> While (Position("e";$textTest)>0)
>$characterCount:=$count+1
>$characterCount:=Replace string($textTest;"e";"";1)
> End while 
> 
> 
> John
> **
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
> **

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-05 Thread Keith Culotta via 4D_Tech
I haven't tested this for speed or unexpected conditions, but...

$a:="now is the timetime for all good men to come to the aid of their country"
$l1:=Length($a)

$b:="time"
$lb:=Length($b)

$a:=Replace string($a;$b;"";*)
$l2:=Length($a)

$l3:=$l1-$l2/$lb

ALERT(String($l3))


Keith - CDI

> On May 4, 2017, at 7:22 AM, Jörg Knebel via 4D_Tech <4d_tech@lists.4d.com> 
> wrote:
> 
> Hi all
> 
> I’m wondering if someone has a clever routine to count the number of 
> appearances of a specific character/word in a string/text and is willing to 
> share it.
> 
> Thanks
> 
> Cheers
> Jörg
> 
> 
> **
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
> **

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-04 Thread John Baughman via 4D_Tech
Quick and dirty ideas…

$text:="Now is the time for all good men to come to the aid of their country"
ARRAY TEXT($aWords;0)
GET TEXT KEYWORDS($text;$aWords)
$wordCount:=Count in array($aWords;"The")

COPY ARRAY($text;$textTest)
$characterCount:=0
While (Position("e";$textTest)>0)
$characterCount:=$count+1
$characterCount:=Replace string($textTest;"e";"";1)
End while 


John
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-04 Thread Arnaud de Montard via 4D_Tech

> Le 4 mai 2017 à 15:11, Vincent de Lachaux via 4D_Tech <4d_tech@lists.4d.com> 
> a écrit :
> 
> Something like:
> 
> [using Replace string]

About using Replace string and "big" text (example: count lines in a csv):

 
With 4D v16 compiled, Replace string is now always faster that the loop with 
Position, more specially when diacritical (e=é…). But before it was the wheel 
of death…

-- 
Arnaud de Montard 



**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-04 Thread Vincent de Lachaux via 4D_Tech
Something like:

  // 
  // Project method : str_Occurences
  // Database: Sandbox
  // ID[3FBA4FA5A1B74C9A87061DA1D97D5E51]
  // Created #2-1-2015 by Vincent de Lachaux
  // 
  // Description:
  //
  // 
  // Declarations
C_LONGINT($0)
C_TEXT($1)
C_TEXT($2)
C_BOOLEAN($3)

C_BOOLEAN($Boo_diacritical)
C_LONGINT($Lon_afterLength;$Lon_beforeLength;$Lon_delta;$Lon_occurences;$Lon_parameters)
C_TEXT($Txt_match;$Txt_target)

If (False)
C_LONGINT(str_Occurences ;$0)
C_TEXT(str_Occurences ;$1)
C_TEXT(str_Occurences ;$2)
C_BOOLEAN(str_Occurences ;$3)
End if

  // unitTest:
If (False)
ASSERT(str_Occurences ("Hello world!";"l")=3)
ASSERT(str_Occurences ("Hello world!";"Hello")=1)
ASSERT(str_Occurences ("Hello world!";"hello")=0)
ASSERT(str_Occurences ("Hello world!";"hello";False)=1)

End if
  // 
  // Initialisations
$Lon_parameters:=Count parameters

If (Asserted($Lon_parameters>=2;"Missing parameter"))

  //Required parameters
$Txt_target:=$1
$Txt_match:=$2

$Boo_diacritical:=True

  //Optional parameters
If ($Lon_parameters>=3)

$Boo_diacritical:=$3

End if

Else

ABORT

End if

  // 

$Lon_beforeLength:=Length($Txt_target)

If ($Boo_diacritical)

  //comparisons will be based on character codes
$Txt_target:=Replace string($Txt_target;$Txt_match;"";*)

Else

$Txt_target:=Replace string($Txt_target;$Txt_match;"")

End if

$Lon_afterLength:=Length($Txt_target)

$Lon_delta:=$Lon_beforeLength-$Lon_afterLength

$Lon_occurences:=Choose(Length($Txt_match)=1;$Lon_delta;$Lon_delta/Length($Txt_match))

  // 
  // Return
$0:=$Lon_occurences

  // 
// End


v i n c e n td el a c h a u x

Bee green - keep it on the screen



**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Counting # of specific characters in string/text

2017-05-04 Thread Arnaud de Montard via 4D_Tech

> Le 4 mai 2017 à 14:22, Jörg Knebel via 4D_Tech <4d_tech@lists.4d.com> a écrit 
> :
> 
> Hi all
> 
> I’m wondering if someone has a clever routine to count the number of 
> appearances of a specific character/word in a string/text and is willing to 
> share it.

I use this one:

//Str_count (find_t;in_t {;"*") -> long
//$3 optional: pass "*" for strict comparison (default not strict)
//example :
//  $text:="eéèEÈÉ"
//  $what:="e"
//  $nb1:=Str_count ($what;$text) -> 6
//  $nb2:=Str_count ($what;$text;"*") -> 1
C_LONGINT($0)
C_TEXT($1)
C_TEXT($2)
C_TEXT($3)

C_BOOLEAN($strict_b)
C_LONGINT($findLen_l)
C_LONGINT($len_l)
C_LONGINT($out_l)
C_LONGINT($params_l)
C_LONGINT($start_l)
C_LONGINT($textLen_l)
C_TEXT($find_t)
C_TEXT($in_t)
C_TEXT($nmc_t)

If (False)
C_LONGINT(Str_count ;$0)
C_TEXT(Str_count ;$1)
C_TEXT(Str_count ;$2)
C_TEXT(Str_count ;$3)
End if

//_
$out_l:=-1  //error
$nmc_t:=Current method name
$params_l:=Count parameters
Case of
: (Not(Asserted($params_l>1;$nmc_t+" 2 parameters required")))
//error
Else
$find_t:=$1
$in_t:=$2
$findLen_l:=Length($find_t)
$textLen_l:=Length($in_t)
$out_l:=0
$strict_b:=($params_l>2)

Case of
: ($findLen_l=0)
: ($findLen_l>$textLen_l)
Else
$start_l:=1
$len_l:=$findLen_l
If ($strict_b)
$start_l:=Position($find_t;$in_t;$start_l;*)
While ($start_l>0)
$out_l:=$out_l+1
$start_l:=$start_l+$len_l
$start_l:=Position($find_t;$in_t;$start_l;*)
End while
Else
$start_l:=Position($find_t;$in_t;$start_l;$len_l)
While ($start_l>0)
$out_l:=$out_l+1
$start_l:=$start_l+$len_l

$start_l:=Position($find_t;$in_t;$start_l;$len_l)
End while
End if
End case

End case
$0:=$out_l
//_

-- 
Arnaud de Montard 




**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Counting # of specific characters in string/text

2017-05-04 Thread Jörg Knebel via 4D_Tech
Hi all

I’m wondering if someone has a clever routine to count the number of 
appearances of a specific character/word in a string/text and is willing to 
share it.

Thanks

Cheers
Jörg


**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**