RE: OMG text processing performance 6.7 - 9.5 - correction

2020-02-06 Thread Neville via use-livecode
Belay my claim about the offsets found from using an offset search on raw text and on the utf-8 version of that text giving exactly the same offset numbers for corresponding hits - they don’t of course. The offsets reported in the raw text are binary 8-bit character offsets, the offsets

RE: OMG text processing performance 6.7 - 9.5

2020-02-05 Thread Neville via use-livecode
One further comment … when talking about long text and not using lineOffset I really do mean long. Using source text the first 500 KB of Les Miserables, the times for simple Parse 1 offset search with skip for *both* raw text and utf-8 was 1 ms, and for lineOffset 10 ms and 14 ms

RE: OMG text processing performance 6.7 - 9.5

2020-02-05 Thread Neville via use-livecode
Richard, here is a link to my test stack https://www.dropbox.com/sh/bbpe12p8bf56ofe/AADbhV2LavLP4Y3CZ8Ab8NGia?dl=0 The LesMiserables.txt file is included for convenience; it should be placed in your Documents

Re: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread Richard Gaskin via use-livecode
Super thorough work there, Neville. Thanks. Could I trouble you to post code listings for the various algos? I'd like to try them on my MBOX archives, and they may also be useful for others looking for parsing routines in the archives. -- Richard Gaskin Fourth World Systems Neville

Re: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread Mark Wieder via use-livecode
On 2/4/20 6:43 PM, Colin Holgate via use-livecode wrote: Would have been neat if it took 24601 milliseconds. Chortle -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to

Re: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread Colin Holgate via use-livecode
Would have been neat if it took 24601 milliseconds. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences:

RE: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread Neville via use-livecode
Just for interest, and to see just how slow lineOffset is, I added a couple of more tests to the search for occurrences of “Valjean” in the Gutenberg English translation of Les Miserables. I also wanted find how filter performs. The searches were first applied to the raw binary text as read

Re: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread Alex Tweedly via use-livecode
On 04/02/2020 22:12, Richard Gaskin via use-livecode wrote: The code I was using was similar to Alex' itemDel solution, but playing with all three together shows itemDel only slightly faster than delete, and both much faster than traversing in-place with "start at". You know I'm always

RE: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread Neville via use-livecode
The recent testing of the Parse1 and Parse2 algorithms I think must have been on ascii not utf-8 text I tested on the English translation of Les Miserables, to ensure at least a sprinkling of multi-bite characters in the text, and a longish file: 3.4 MB. I tested for the search string

Re: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread Richard Gaskin via use-livecode
Ralph DiMola wrote: > My initial timings was with an earlier v9 version. I will do some > timings on 9.5.1. In the meanwhile I wonder if doing a "delete char > 1 to n of myVar" is more expensive then "put char n to -1 of myVar > into myVar" as I do. I had thought the exercise was to obtain a

RE: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread Ralph DiMola via use-livecode
mation Services rdim...@evergreeninfo.net -Original Message- From: use-livecode [mailto:use-livecode-boun...@lists.runrev.com] On Behalf Of Bob Sneidar via use-livecode Sent: Tuesday, February 04, 2020 2:45 PM To: How to use LiveCode Cc: Bob Sneidar Subject: Re: OMG text processing performance 6.7

Re: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread Bob Sneidar via use-livecode
Heresy! Burn the cretan!!! ;-) Bob S > On Feb 4, 2020, at 10:43 , Richard Gaskin via use-livecode > wrote: > > Hmmm It may be that Mark Waddingham was wrong in the guidance he gave > earlier about Unicode vs memcopy, but I wonder if there may be something else > here.

Re: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread J. Landman Gay via use-livecode
On 2/4/20 12:43 PM, Richard Gaskin via use-livecode wrote: J. Landman Gay wrote: On 2/3/20 2:19 PM, hh via use-livecode wrote: Parse1 is here always at least 30% faster than Parse2. I'm seeing the same thing, only more so. I searched for "the" in a 424K text file: parse1 = 11 ms parse2 =

Re: OMG text processing performance 6.7 - 9.5

2020-02-04 Thread Richard Gaskin via use-livecode
J. Landman Gay wrote: On 2/3/20 2:19 PM, hh via use-livecode wrote: Parse1 is here always at least 30% faster than Parse2. I'm seeing the same thing, only more so. I searched for "the" in a 424K text file: parse1 = 11 ms parse2 = 111 ms Hmmm It may be that Mark Waddingham was wrong

RE: OMG text processing performance 6.7 - 9.5

2020-02-03 Thread J. Landman Gay via use-livecode
Of J. Landman Gay via use-livecode Sent: Monday, February 03, 2020 6:48 PM To: How to use LiveCode Cc: J. Landman Gay Subject: Re: OMG text processing performance 6.7 - 9.5 On 2/3/20 2:19 PM, hh via use-livecode wrote: Parse1 is here always at least 30% faster than Parse2. I'm seeing the same

RE: OMG text processing performance 6.7 - 9.5

2020-02-03 Thread Ralph DiMola via use-livecode
...@lists.runrev.com] On Behalf Of J. Landman Gay via use-livecode Sent: Monday, February 03, 2020 6:48 PM To: How to use LiveCode Cc: J. Landman Gay Subject: Re: OMG text processing performance 6.7 - 9.5 On 2/3/20 2:19 PM, hh via use-livecode wrote: > Parse1 is here always at least 30% faster t

Re: OMG text processing performance 6.7 - 9.5

2020-02-03 Thread J. Landman Gay via use-livecode
On 2/3/20 2:19 PM, hh via use-livecode wrote: Parse1 is here always at least 30% faster than Parse2. I'm seeing the same thing, only more so. I searched for "the" in a 424K text file: parse1 = 11 ms parse2 = 111 ms The text was imported into a field using the property inspector, which I

RE: OMG text processing performance 6.7 - 9.5

2020-02-03 Thread Ralph DiMola via use-livecode
, February 03, 2020 3:19 PM To: use-livecode@lists.runrev.com Cc: hh Subject: Re: OMG text processing performance 6.7 - 9.5 Parse1 is here always at least 30% faster than Parse2. Yet another approach in LC 7/8/9 that I find to be very fast (especially for a lot of hits in large strings, e.g. when

Re: OMG text processing performance 6.7 - 9.5

2020-02-03 Thread hh via use-livecode
Parse1 is here always at least 30% faster than Parse2. Yet another approach in LC 7/8/9 that I find to be very fast (especially for a lot of hits in large strings, e.g. when searching for "and" or "the"): -- Offset per ItemDelimiter -- Searches for pStr in pSrc using pCase function Parse0 pStr,

Re: OMG text processing performance 6.7 - 9.5

2020-02-03 Thread Richard Gaskin via use-livecode
Sannyasin Brahmanathaswami wrote: hhmm…. how do you "truncate the string and search from the beginning" ?? Can you give a code snippet example? BR I found this as well. Another thing, it's faster to truncate the string and search from the beginning than using a "start at" on the entire

Re: OMG text processing performance 6.7 - 9.5

2020-02-03 Thread Sannyasin Brahmanathaswami via use-livecode
hhmm…. how do you "truncate the string and search from the beginning" ?? Can you give a code snippet example? BR I found this as well. Another thing, it's faster to truncate the string and search from the beginning than using a "start at" on the entire string when searching for all

Re: OMG text processing performance 6.7 - 9.5

2020-02-01 Thread Richard Gaskin via use-livecode
Ralph DiMola wrote: I found this as well. Another thing, it's faster to truncate the string and search from the beginning than using a "start at" on the entire string when searching for all occurrences of a string . This was counter intuitive to me until Mark explained that skipping chars

Re: OMG text processing performance 6.7 - 9.5

2020-01-31 Thread Bob Sneidar via use-livecode
That's what Isuspect as well. Thanks. Bob S > On Jan 31, 2020, at 09:02 , Mark Waddingham via use-livecode > wrote: > > The code on the LC side is the same (engine and client drivers) so it’s > almost certainly hardware / OS causing the difference. > > Warmest Regards, > > Mark

Re: OMG text processing performance 6.7 - 9.5

2020-01-31 Thread Mark Waddingham via use-livecode
I don’t think you read too much into differences of as little as 10ticks - the error in time measurement for a single run would be too high. It’s seems to make sense that vms would do the same task slower than the machine that they run on so I don’t think that’s a very interesting comparison.

Re: OMG text processing performance 6.7 - 9.5

2020-01-31 Thread Bob Sneidar via use-livecode
I take that back it's a 32 bit Windows OS (dunno why I even still have this PC). But bitness is not going to affect a single network thread. Also the processor is an i3 running at 3.1. My mac is an i7 running at 2.3. Also my Mac is clamped to 100mb networking due to the nature of our VIOP

Re: OMG text processing performance 6.7 - 9.5

2020-01-31 Thread Bob Sneidar via use-livecode
Not so fast. On a standalone workstation Windows 7 64bit 16gig memory and an SSD: 64 ticks. Compared with 14 ticks on my Mac OS X. My Parallels VM is outperforming a workstation. Oh, and the Windows workstation? It's the workstation running the mySQL instance!!! That's exactly my point. It is

Re: OMG text processing performance 6.7 - 9.5

2020-01-31 Thread Niggemann, Bernd via use-livecode
Ben, If you have access to a business-license you could use "script profiling" on a small but representative sample of your data and see where the bottlenecks are. If you find any you could try to optimize that part. "script profiling" adds its own overhead to the processing time (roughly

Re: OMG text processing performance 6.7 - 9.5

2020-01-31 Thread Ben Rubinstein via use-livecode
Original Message- From: use-livecode [mailto:use-livecode-boun...@lists.runrev.com] On Behalf Of Neville via use-livecode Sent: Thursday, January 30, 2020 4:49 PM To: use-livecode@lists.runrev.com Cc: Neville Subject: Re: OMG text processing performance 6.7 - 9.5 Are you perchance using lineOff

Re: OMG text processing performance 6.7 - 9.5

2020-01-31 Thread Mark Waddingham via use-livecode
That’s not comparing like-with-like though - you are comparing VMs running Windows on your Mac with your Mac by the sound of it... VMs introduce a fair bit of overhead for all I/O (and also for some code - depending on the age of your CPU and the virtualisation support it has). Mark Sent from

Re: OMG text processing performance 6.7 - 9.5

2020-01-30 Thread Bob Sneidar via use-livecode
Hi Mark. I have to chime in here that the difference between OS X apps accessing a sql database and Windows doing the same thing in the same app is substantial, and I cannot think why, unless it is Windows itself causing the problem. Querying my customer database for all my customer records

RE: OMG text processing performance 6.7 - 9.5

2020-01-30 Thread Ralph DiMola via use-livecode
f Of Neville via use-livecode Sent: Thursday, January 30, 2020 4:49 PM To: use-livecode@lists.runrev.com Cc: Neville Subject: Re: OMG text processing performance 6.7 - 9.5 Are you perchance using lineOffset searches? I have found that lineOffset performance on utf8 text degrades exponentially with th

Re: OMG text processing performance 6.7 - 9.5

2020-01-30 Thread Neville via use-livecode
Are you perchance using lineOffset searches? I have found that lineOffset performance on utf8 text degrades exponentially with the length of the file, presumably as it searches for line breaks. Use offset instead which remains fast (and much faster still if you can search on the raw text before

Re: OMG text processing performance 6.7 - 9.5

2020-01-30 Thread Matthias Rebbe via use-livecode
Ben, what DB are you connecting to? We are running here a VM with Windows 2019 and MS SQL 2017. On a Windows 10 64bit VM we are using the 32 bit Microsoft ODBC Driver 11 for SQL Server to connect from our 32bit LC standalone to the MSSQL server, although 64bit ODBC Driver 11 is installed.

Re: OMG text processing performance 6.7 - 9.5

2020-01-30 Thread Mark Waddingham via use-livecode
On 2020-01-30 14:38, Ben Rubinstein via use-livecode wrote: Hi Mark, Thanks for taking the time to reply! I'm indeed currently in the process of seeing whether I can persuade the client's IT department to install the 32-bit drivers on the new VM. I'm optimistic that will buy me some time, but

Re: OMG text processing performance 6.7 - 9.5

2020-01-30 Thread Ben Rubinstein via use-livecode
Hi Mark, Thanks for taking the time to reply! I'm indeed currently in the process of seeing whether I can persuade the client's IT department to install the 32-bit drivers on the new VM. I'm optimistic that will buy me some time, but it won't be a complete solution because they outsource

Re: OMG text processing performance 6.7 - 9.5

2020-01-30 Thread Mark Waddingham via use-livecode
On 2020-01-30 13:20, Ben Rubinstein via use-livecode wrote: The context is that I'm finally forced to replace an app that's been processing data for a client for well over a decade. To date the standalone has been built on LC 6.7.11; but now we need to put it on a new platform with 64-bit