Re: [swift-evolution] Fast String Comparison From String Arrays

Omar Charif via swift-evolution Tue, 01 Aug 2017 02:10:53 -0700

Hello Jon,

My algorithm prioritizes time complexity and the cost is the space complexity.


Time Complexity
————————————
Time Complexity O(C), where C = number of matching String pyramids
We usually get to compare only the very similar string pyramids together, and 
the comparison is usually float to float comparison which is way quicker than 
character in a string to character in a string comparison, let me give an 
example:
[“make”, “lake”, “fake”, “take”, “cook”, “book”, “took”]

if we were to ask for end matching for “ke” we would get to compare only “ke” 
with [“make, “lake”, “fake”, “take”] simply because each word of these is 
having same blur value, or in other words they have same string pyramid coarse 
value, almost the same

After checking we get to return the result which is  [“make, “lake”, “fake”, 
“take”]

So we almost get O(1) since we don’t waste any time checking for the different 
items and we directly jump into the similar ones.


Space complexity
————————————
Space complexity here is O(C * 2), where C = number of characters of a given 
string
We build a pyramid for each given string with number of levels = C / 2

We can cut the Space complexity down to O(C) if we remove the base level of the 
Pyramid, let me give you an example

if we were to encode the word “California” we get a pyramid like this

“California” <———— coarse
“Cali forn ia__"
“Ca li fo rn ia"
“C a l i f o r n i a” <—— base

if we get to remove the base level we remove half of the space we need, but 
once we do that we will not be able to match any string that has characters 
less than 2 characters minimum (just to reach the pyramid level after the base 
that we will remove)


That’s why the more we feed items in an array the more we benefit from the fact 
that they are already going to be grouped together and we will save time 
checking on wrong items.
Also this is better than a Set because a set is yes O(1) but can’t help 
providing any information more than straight forward equality. 
My algorithm encodes the string with its features and characters chaining 
informations which helps when doing completion or substring matching.


I will leave it here for now, my message is getting larger :D

regards,
Omar







> On Aug 1, 2017, at 2:40 AM, Jonathan Hull <[email protected]> wrote:
> 
> Hi Omar,
> 
> Could you talk in a little more detail about your algorithm, including 
> space/time complexity?  It sounds interesting, but I am having trouble 
> finding the relevant portions in the source code...
> 
> Thanks,
> Jon
> 
>> On Jul 31, 2017, at 1:26 AM, Omar Charif via swift-evolution 
>> <[email protected] <mailto:[email protected]>> wrote:
>> 
>> The main advantage I introduce is actually no just a substring search, my 
>> main goal is to achieve high performance repeated search on the same given 
>> array of strings. I have tested my algorithm against something like 
>> spotlight for example searching for files etc … Actually my code is a bit 
>> faster on execution in a 1 to 1 single search. But if you go further and try 
>> to search for 2, 3, or more substrings in the same array my implementation 
>> will be even faster. My implementation is also simple and easy to use, you 
>> can simply see how easy and intuitive it is once you try it. I think this is 
>> basic String functionality that should already come with Swift Foundation 
>> library and it is additive with zero conflicts with Swift Foundation 
>> library. One last thing I have to say is that I have been testing this for 5 
>> months so it is not new and I am not experimenting here :)
>> 
>>  
>> 
>>> On Jul 30, 2017, at 7:11 PM, Kenny Leung via swift-evolution 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> 
>>> This is a bit off topic, but does anybody know the data structure that 
>>> supports Xcode’s fabulous case-insensitive-in-order-yet-disjoint-substring 
>>> search?
>>> 
>>> -Kenny
>>> 
>>> 
>>>> On Jul 28, 2017, at 1:57 PM, Huon Wilson via swift-evolution 
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>>> On Jul 28, 2017, at 05:54, Omar Charif via swift-evolution 
>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I wonder whether there is already a way in Swift to compare a string 
>>>>> against a large string array quickly without using the traditional ways 
>>>>> of comparison. 
>>>>> 
>>>>> Say we have ["a", "b", "c", "d"] and we would like to find whether this 
>>>>> array contains "a", then we decide to check if we have "b" in that same 
>>>>> array. Don't you think there is a way to represent the array in a 
>>>>> different way and make this comparison a lot quicker ?
>>>>> 
>>>>> I know there are recurrent neural networks etc ... I am talking here 
>>>>> about solution without learning anything, just representing the array 
>>>>> differently so we can minimize that O(N).
>>>>> 
>>>>> I have developed an algorithm and it is doing pretty well so far and I 
>>>>> wonder whether it would be accepted so I came to propose and see if this 
>>>>> is interesting from your perspective.
>>>>> 
>>>>> I developed a Javascript version here 
>>>>> https://omarshariffathi.github.io/quickhint/ 
>>>>> <https://omarshariffathi.github.io/quickhint/>
>>>>> 
>>>>> If you think this is welcome in Swift Foundation I am ready for a pull 
>>>>> request.
>>>>> Thanks for reading.
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> [email protected] <mailto:[email protected]>
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>> 
>>>> If you’re doing only direct containment, the builtin Set will give O(1) 
>>>> lookups.
>>>> 
>>>> If you’re looking for, say, finding all values with a given prefix then a 
>>>> trie might be appropriate, and if you’re trying to do more interesting 
>>>> things (e.g. fuzzy search) there’s techniques like “finite state 
>>>> transducers” http://blog.burntsushi.net/transducers/ 
>>>> <http://blog.burntsushi.net/transducers/> . I don’t believe either of 
>>>> these have anything built-in, and I suspect they (especially FSTs) are too 
>>>> specialized, or have too many possible variations, to be worth including 
>>>> directly in the current standard library, and a SwiftPM package would work 
>>>> almost as well as others have suggested.
>>>> 
>>>> Huon
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Fast String Comparison From String Arrays

Reply via email to