Re: Replacing tango.text.Ascii.isearch

2022-10-28 Thread rikki cattermole via Digitalmars-d-learn
On 29/10/2022 11:05 AM, Siarhei Siamashka wrote: And as for the D language and Phobos, should "ß" still uppercase to "SS"? Or can we change it to uppercase "ẞ" and remove German from the list of tricky languages at https://dlang.org/library/std/uni/to_upper.html ? Should Turkish be listed

Re: Replacing tango.text.Ascii.isearch

2022-10-28 Thread Siarhei Siamashka via Digitalmars-d-learn
On Wednesday, 26 October 2022 at 06:05:14 UTC, Ali Çehreli wrote: The problem with Unicode is its main aim of allowing characters of multiple writing systems in the same text. When multiple writing systems are in play, conflicts and ambiguities will appear. I personally don't think that it's

Re: Replacing tango.text.Ascii.isearch

2022-10-26 Thread Ali Çehreli via Digitalmars-d-learn
On 10/25/22 22:49, Siarhei Siamashka wrote: > Unicode is significantly simpler than a set of various > incompatible 8-bit encodings Strongly agreed. > I'm surely > able to ignore the peculiarities of modern Turkish Unicode The problem with Unicode is its main aim of allowing characters of

Re: Replacing tango.text.Ascii.isearch

2022-10-26 Thread rikki cattermole via Digitalmars-d-learn
On 26/10/2022 6:49 PM, Siarhei Siamashka wrote: On Wednesday, 26 October 2022 at 05:17:06 UTC, rikki cattermole wrote: if you are able to ignore that Unicode is a thing, I'd recommend it. It is complicated, as we humans are very complicated ;) I can't ignore Unicode, because I frequently have

Re: Replacing tango.text.Ascii.isearch

2022-10-25 Thread Siarhei Siamashka via Digitalmars-d-learn
On Wednesday, 26 October 2022 at 05:17:06 UTC, rikki cattermole wrote: if you are able to ignore that Unicode is a thing, I'd recommend it. It is complicated, as we humans are very complicated ;) I can't ignore Unicode, because I frequently have to deal with Cyrillic alphabet ;) Also Unicode

Re: Replacing tango.text.Ascii.isearch

2022-10-25 Thread rikki cattermole via Digitalmars-d-learn
On 26/10/2022 6:06 PM, Siarhei Siamashka wrote: Should we ignore the `"D should strive to be correct, rather than fast"` comment from bauss for now? Or some actions can be taken to improve the current situation? Bauss is correct. It should be implemented but it does not need to be fast. But

Re: Replacing tango.text.Ascii.isearch

2022-10-25 Thread Siarhei Siamashka via Digitalmars-d-learn
On Tuesday, 25 October 2022 at 06:32:00 UTC, rikki cattermole wrote: On 25/10/2022 5:17 PM, Siarhei Siamashka wrote: What are the best practices to deal with Turkish text in D language? std.uni doesn't support it. OK, I'm not specifically interested in this personally and I even would be

Re: Replacing tango.text.Ascii.isearch

2022-10-25 Thread rikki cattermole via Digitalmars-d-learn
On 25/10/2022 5:17 PM, Siarhei Siamashka wrote: Wow, I didn't expect anything like this and just thought that the nightmares of handling 8-bit codepages for non-English languages ceased to exist nowadays. Too bad. What are the best practices to deal with Turkish text in D language? std.uni

Re: Replacing tango.text.Ascii.isearch

2022-10-24 Thread Siarhei Siamashka via Digitalmars-d-learn
On Thursday, 13 October 2022 at 08:27:17 UTC, bauss wrote: ```d bool isearch(S1, S2)(S1 haystack, S2 needle) { import std.uni; import std.algorithm; return haystack.asLowerCase.canFind(needle.asLowerCase); } ``` untested. -Steve This doesn't actually work properly in all

Re: Replacing tango.text.Ascii.isearch

2022-10-13 Thread Patrick Schluter via Digitalmars-d-learn
On Thursday, 13 October 2022 at 08:27:17 UTC, bauss wrote: On Wednesday, 5 October 2022 at 17:29:25 UTC, Steven Schveighoffer wrote: On 10/5/22 12:59 PM, torhu wrote: I need a case-insensitive check to see if a string contains another string for a "quick filter" feature. It should preferrably

Re: Replacing tango.text.Ascii.isearch

2022-10-13 Thread rikki cattermole via Digitalmars-d-learn
On 13/10/2022 9:55 PM, bauss wrote: Yeah, text isn't easy :D Indeed! It has me a bit concerned actually, I'm wondering if my string stuff will even work correctly for UI's due to performance issues. My string builder for instance allocates like crazy just to do slicing. But hey, at least

Re: Replacing tango.text.Ascii.isearch

2022-10-13 Thread bauss via Digitalmars-d-learn
On Thursday, 13 October 2022 at 08:48:49 UTC, rikki cattermole wrote: On 13/10/2022 9:42 PM, bauss wrote: Oh and to add onto this, IFF you have to do it the hacky way, then converting to uppercase instead of lowercase should be preferred, because not all lowercase characters can perform round

Re: Replacing tango.text.Ascii.isearch

2022-10-13 Thread rikki cattermole via Digitalmars-d-learn
On 13/10/2022 9:42 PM, bauss wrote: Oh and to add onto this, IFF you have to do it the hacky way, then converting to uppercase instead of lowercase should be preferred, because not all lowercase characters can perform round trip, although a small group of characters, then using uppercase fixes

Re: Replacing tango.text.Ascii.isearch

2022-10-13 Thread bauss via Digitalmars-d-learn
On Thursday, 13 October 2022 at 08:35:50 UTC, bauss wrote: On Thursday, 13 October 2022 at 08:30:04 UTC, rikki cattermole wrote: On 13/10/2022 9:27 PM, bauss wrote: This doesn't actually work properly in all languages. It will probably work in most, but it's not entirely correct. Ex. Turkish

Re: Replacing tango.text.Ascii.isearch

2022-10-13 Thread bauss via Digitalmars-d-learn
On Thursday, 13 October 2022 at 08:30:04 UTC, rikki cattermole wrote: On 13/10/2022 9:27 PM, bauss wrote: This doesn't actually work properly in all languages. It will probably work in most, but it's not entirely correct. Ex. Turkish will not work with it properly. Very interesting article:

Re: Replacing tango.text.Ascii.isearch

2022-10-13 Thread rikki cattermole via Digitalmars-d-learn
On 13/10/2022 9:27 PM, bauss wrote: This doesn't actually work properly in all languages. It will probably work in most, but it's not entirely correct. Ex. Turkish will not work with it properly. Very interesting article: http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html

Re: Replacing tango.text.Ascii.isearch

2022-10-13 Thread bauss via Digitalmars-d-learn
On Wednesday, 5 October 2022 at 17:29:25 UTC, Steven Schveighoffer wrote: On 10/5/22 12:59 PM, torhu wrote: I need a case-insensitive check to see if a string contains another string for a "quick filter" feature. It should preferrably be perceived as instant by the user, and needs to check a

Re: Replacing tango.text.Ascii.isearch

2022-10-09 Thread rassoc via Digitalmars-d-learn
On 10/9/22 03:08, Siarhei Siamashka via Digitalmars-d-learn wrote: Does the difference really have to be two orders of magnitude for you to acknowledge that there might be a performance problem in Phobos? [...] Except that similar one-liners implemented using other programming languages are

Re: Replacing tango.text.Ascii.isearch

2022-10-08 Thread Siarhei Siamashka via Digitalmars-d-learn
On Saturday, 8 October 2022 at 01:07:46 UTC, rassoc wrote: On 10/8/22 00:50, Siarhei Siamashka via Digitalmars-d-learn wrote: On Friday, 7 October 2022 at 12:19:59 UTC, bachmeier wrote: python -c "print(('a' * 49 + 'b') * 2)" > test.lst That's generating a file with a single line: $> wc

Re: Replacing tango.text.Ascii.isearch

2022-10-07 Thread rassoc via Digitalmars-d-learn
On 10/8/22 00:50, Siarhei Siamashka via Digitalmars-d-learn wrote: On Friday, 7 October 2022 at 12:19:59 UTC, bachmeier wrote: python -c "print(('a' * 49 + 'b') * 2)" > test.lst That's generating a file with a single line: $> wc -l test.lst 1 test.lst Going with an appropriate 100k mixed

Re: Replacing tango.text.Ascii.isearch

2022-10-07 Thread Siarhei Siamashka via Digitalmars-d-learn
On Friday, 7 October 2022 at 12:19:59 UTC, bachmeier wrote: https://www.cs.utexas.edu/users/moore/best-ideas/string-searching/ "the longer the pattern is, the faster the algorithm goes" Yes, that's how substring search works in the standard libraries of the other programming languages. Now

Re: Replacing tango.text.Ascii.isearch

2022-10-07 Thread bachmeier via Digitalmars-d-learn
On Friday, 7 October 2022 at 07:16:19 UTC, Siarhei Siamashka wrote: On Friday, 7 October 2022 at 06:34:50 UTC, Siarhei Siamashka wrote: Also are we allowed to artificially construct needle and haystack to blow up this test rather than only benchmarking it on typical real data? Such as

Re: Replacing tango.text.Ascii.isearch

2022-10-07 Thread Siarhei Siamashka via Digitalmars-d-learn
On Friday, 7 October 2022 at 06:34:50 UTC, Siarhei Siamashka wrote: Also are we allowed to artificially construct needle and haystack to blow up this test rather than only benchmarking it on typical real data? Such as generating the input data via running: python -c "print(('a' * 49 +

Re: Replacing tango.text.Ascii.isearch

2022-10-07 Thread Siarhei Siamashka via Digitalmars-d-learn
On Friday, 7 October 2022 at 00:57:38 UTC, rassoc wrote: On 10/7/22 01:39, torhu via Digitalmars-d-learn wrote: regex is about ten times faster then. Interesting! Using your code, I'm seeing a 1.5x max difference for ldc, nothing close to 10x. Welp, the woes of superficial benchmarking. :)

Re: Replacing tango.text.Ascii.isearch

2022-10-06 Thread rassoc via Digitalmars-d-learn
On 10/7/22 01:39, torhu via Digitalmars-d-learn wrote: regex is about ten times faster then. Interesting! Using your code, I'm seeing a 1.5x max difference for ldc, nothing close to 10x. Welp, the woes of superficial benchmarking. :)

Re: Replacing tango.text.Ascii.isearch

2022-10-06 Thread torhu via Digitalmars-d-learn
On Thursday, 6 October 2022 at 21:36:48 UTC, rassoc wrote: And what kind of testing was that? Mind to share? Because I did the following real quick and wasn't able to measure a "two orders of magnitude" difference. Sure, the regex version came on top, but they were both faster than the ruby

Re: Replacing tango.text.Ascii.isearch

2022-10-06 Thread rassoc via Digitalmars-d-learn
On 10/5/22 23:50, torhu via Digitalmars-d-learn wrote: I did some basic testing, and regex was two orders of magnitude faster. So now I know, I guess. And what kind of testing was that? Mind to share? Because I did the following real quick and wasn't able to measure a "two orders of

Re: Replacing tango.text.Ascii.isearch

2022-10-06 Thread Sergey via Digitalmars-d-learn
On Thursday, 6 October 2022 at 08:15:10 UTC, Siarhei Siamashka wrote: On Wednesday, 5 October 2022 at 21:50:32 UTC, torhu wrote: Please don’t tell us that D will be slower than Python again?)

Re: Replacing tango.text.Ascii.isearch

2022-10-06 Thread Siarhei Siamashka via Digitalmars-d-learn
On Wednesday, 5 October 2022 at 21:50:32 UTC, torhu wrote: I did some basic testing, and regex was two orders of magnitude faster. So now I know, I guess. Substring search functionality is currently in a very bad shape in Phobos. I discovered this myself a few weeks ago when I was trying to

Re: Replacing tango.text.Ascii.isearch

2022-10-05 Thread torhu via Digitalmars-d-learn
On Wednesday, 5 October 2022 at 17:29:25 UTC, Steven Schveighoffer wrote: ```d bool isearch(S1, S2)(S1 haystack, S2 needle) { import std.uni; import std.algorithm; return haystack.asLowerCase.canFind(needle.asLowerCase); } ``` untested. -Steve I did some basic testing, and

Re: Replacing tango.text.Ascii.isearch

2022-10-05 Thread Ali Çehreli via Digitalmars-d-learn
On 10/5/22 13:40, torhu wrote: auto sw = StopWatch(); Either this: auto sw = StopWatch(AutoStart.yes); or this: auto sw = StopWatch(); sw.start(); Ali

Re: Replacing tango.text.Ascii.isearch

2022-10-05 Thread torhu via Digitalmars-d-learn
On Wednesday, 5 October 2022 at 20:45:55 UTC, torhu wrote: On Wednesday, 5 October 2022 at 20:40:46 UTC, torhu wrote: Am I doing something wrong here? Right, you can instantiate structs without arguments. It's been ten years since I last used D, I was thinking of structs like if they were

Re: Replacing tango.text.Ascii.isearch

2022-10-05 Thread torhu via Digitalmars-d-learn
On Wednesday, 5 October 2022 at 20:40:46 UTC, torhu wrote: Am I doing something wrong here? Right, you can instantiate structs without arguments. It's been ten years since I last used D, I was thinking of structs like if they were classes.

Re: Replacing tango.text.Ascii.isearch

2022-10-05 Thread torhu via Digitalmars-d-learn
On Wednesday, 5 October 2022 at 17:29:25 UTC, Steven Schveighoffer wrote: [...] I wanted to do some quick benchmarking to figure out what works. When I run this: ```d import std.stdio; import std.datetime.stopwatch; void main() { auto sw = StopWatch(); sw.stop();

Re: Replacing tango.text.Ascii.isearch

2022-10-05 Thread Steven Schveighoffer via Digitalmars-d-learn
On 10/5/22 12:59 PM, torhu wrote: I need a case-insensitive check to see if a string contains another string for a "quick filter" feature. It should preferrably be perceived as instant by the user, and needs to check a few thousand strings in typical cases. Is a regex the best option, or what

Replacing tango.text.Ascii.isearch

2022-10-05 Thread torhu via Digitalmars-d-learn
I need a case-insensitive check to see if a string contains another string for a "quick filter" feature. It should preferrably be perceived as instant by the user, and needs to check a few thousand strings in typical cases. Is a regex the best option, or what would you suggest?