Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
there are much faster ways...just a little more intricate. Nando -- Open WebMail Project (http://openwebmail.org) -- Original Message --- From: Fernando CabralTo: mailing list for gambas users Sent: Sat, 1 Jul 2017 00:18:42 -0300 Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array > I thank you guys for the hints on counting and eliminating duplicates. In > the end, I resorted to something that is very simple and does the trick in > three steps. In the first step I sort the array. > In the second step I count the number of occurrences and prepend it to the > word itself (with a separator). In the third step I sort the array again, > so now I have it sorted by the number of occurrences from the largest to > the smallest. > > That is all I need. > > Nevertheless, I am concerned with the performance. For 69,725 words, from > which 8,987 were unique, it took 28 seconds for the code below to execute. > I will survive this 28 seconds, if I have to. But I still would like to > find a faster solution. > > On the other hand, I think I am close to the fastest possible solution. > Basically, the array will be traversed once only, no matter how many terms > and how many repetitions it may have. > > (What do you think about this efficiency, Tobi?) > > *MatchedWords.Sort(gb.ascent + gb.language + gb.IgnoreCase) For i = 0 To > MatchedWords.Maxn = 1For j = i + 1 To MatchedWords.Max If > (Comp(MatchedWords[i], MatchedWords[j], gb.language + gb.ignorecase) = 0) > Then n += 1 Else Break EndifNext > UniqWords.Push(Format(n, "0###") & "#" & MatchedWords[i])i += (n - 1) > NextUniqWords.Sort(gb.descent + gb.language + gb.ignorecase)For i = 0 To > UniqWords.Max Print UniqWords[i]Next* > > 2017-06-30 15:10 GMT-03:00 Gianluigi : > > > Just for curiosity, on my computer, my function (double) processes 10 > > million strings (first and last name) in about 3 seconds. > > Very naif measurement using Timers and a limited number of names and > > surnames eg Willy Weber has come up 11051 times > > > > To demonstrate the goodness of Tobias' arguments, about 1 million 3 cents a > > second I really understood (I hope) what he wanted to say. > > > > Sorry my response times but today my modem works worse than my brain. > > > > Regards > > Gianluigi > > > > 2017-06-30 17:58 GMT+02:00 Gianluigi : > > > > > Sorry Tobias, > > > other explanations are not necessary. > > > I would not be able to understand :-( > > > I accept what you already explained to me as a dogma and I will try to > > put > > > it into practice by copying your code :-). > > > > > > Thanks again. > > > > > > Gianluigi > > > > > > 2017-06-30 17:44 GMT+02:00 Gianluigi : > > > > > >> > > >> 2017-06-30 17:21 GMT+02:00 Tobias Boege : > > >> > > >>> > > >>> I wouldn't say there is anything *wrong* with it, but it also has > > >>> quadratic > > >>> worst-case running time. You use String[].Push() which is just another > > >>> name > > >>> for String[].Add(). Adding an element to an array (the straightforward > > >>> way) > > >>> is done by extending the space of that array by one further element and > > >>> storing the value there. But extending the space of an array could > > >>> potentially > > >>> require you to copy the whole array somewhere else (where you have > > enough > > >>> free memory at the end of the array to enlarge it). Doing worst-case > > >>> analysis, > > >>> we have to assume that this bad case always occurs. > > >>> > > >>> If you fill an array with n values, e.g. > > >>> > > >>> Dim a As New Integer[] > > >>> For i = 1 To n > > >>> a.Add(i) > > >>> Next > > >>> > > >>> then you loop n times and in the i-th iteration there will be already > > >>> i-many elements in your array. Adding one further element to it will, > > >>> in the worst case, require i copy operations to be performed. > > 9-year-old > > >>> C.F. Gauss will tell you that the amount of store operations is about > > >>> n^2. > > >>> > > >>> > > >> Tobias you are always kind and thank you very much. > > >> Is possible for you to explain this more elementarily, for me (a poorly > > >> educated boy :-) ) > > >> > > >> > > >> > > >>> And your function does two jobs simultaneously but only returns the > > >>> result > > >>> of one of the jobs. The output you get is only worth half the time you > > >>> spent. > > >>> > > >>> > > >> I did two functions in one, just to save space, this is a simple > > example. > > >> :-) > > >> > > >> Regards > > >> Gianluigi > > >> > > > > > > > > > > -- > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > >
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
I thank you guys for the hints on counting and eliminating duplicates. In the end, I resorted to something that is very simple and does the trick in three steps. In the first step I sort the array. In the second step I count the number of occurrences and prepend it to the word itself (with a separator). In the third step I sort the array again, so now I have it sorted by the number of occurrences from the largest to the smallest. That is all I need. Nevertheless, I am concerned with the performance. For 69,725 words, from which 8,987 were unique, it took 28 seconds for the code below to execute. I will survive this 28 seconds, if I have to. But I still would like to find a faster solution. On the other hand, I think I am close to the fastest possible solution. Basically, the array will be traversed once only, no matter how many terms and how many repetitions it may have. (What do you think about this efficiency, Tobi?) *MatchedWords.Sort(gb.ascent + gb.language + gb.IgnoreCase) For i = 0 To MatchedWords.Maxn = 1For j = i + 1 To MatchedWords.Max If (Comp(MatchedWords[i], MatchedWords[j], gb.language + gb.ignorecase) = 0) Then n += 1 Else Break EndifNext UniqWords.Push(Format(n, "0###") & "#" & MatchedWords[i])i += (n - 1) NextUniqWords.Sort(gb.descent + gb.language + gb.ignorecase)For i = 0 To UniqWords.Max Print UniqWords[i]Next* 2017-06-30 15:10 GMT-03:00 Gianluigi: > Just for curiosity, on my computer, my function (double) processes 10 > million strings (first and last name) in about 3 seconds. > Very naif measurement using Timers and a limited number of names and > surnames eg Willy Weber has come up 11051 times > > To demonstrate the goodness of Tobias' arguments, about 1 million 3 cents a > second I really understood (I hope) what he wanted to say. > > Sorry my response times but today my modem works worse than my brain. > > Regards > Gianluigi > > 2017-06-30 17:58 GMT+02:00 Gianluigi : > > > Sorry Tobias, > > other explanations are not necessary. > > I would not be able to understand :-( > > I accept what you already explained to me as a dogma and I will try to > put > > it into practice by copying your code :-). > > > > Thanks again. > > > > Gianluigi > > > > 2017-06-30 17:44 GMT+02:00 Gianluigi : > > > >> > >> 2017-06-30 17:21 GMT+02:00 Tobias Boege : > >> > >>> > >>> I wouldn't say there is anything *wrong* with it, but it also has > >>> quadratic > >>> worst-case running time. You use String[].Push() which is just another > >>> name > >>> for String[].Add(). Adding an element to an array (the straightforward > >>> way) > >>> is done by extending the space of that array by one further element and > >>> storing the value there. But extending the space of an array could > >>> potentially > >>> require you to copy the whole array somewhere else (where you have > enough > >>> free memory at the end of the array to enlarge it). Doing worst-case > >>> analysis, > >>> we have to assume that this bad case always occurs. > >>> > >>> If you fill an array with n values, e.g. > >>> > >>> Dim a As New Integer[] > >>> For i = 1 To n > >>> a.Add(i) > >>> Next > >>> > >>> then you loop n times and in the i-th iteration there will be already > >>> i-many elements in your array. Adding one further element to it will, > >>> in the worst case, require i copy operations to be performed. > 9-year-old > >>> C.F. Gauss will tell you that the amount of store operations is about > >>> n^2. > >>> > >>> > >> Tobias you are always kind and thank you very much. > >> Is possible for you to explain this more elementarily, for me (a poorly > >> educated boy :-) ) > >> > >> > >> > >>> And your function does two jobs simultaneously but only returns the > >>> result > >>> of one of the jobs. The output you get is only worth half the time you > >>> spent. > >>> > >>> > >> I did two functions in one, just to save space, this is a simple > example. > >> :-) > >> > >> Regards > >> Gianluigi > >> > > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Gambas-user mailing list > Gambas-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gambas-user > -- Fernando Cabral Blogue: http://fernandocabral.org Twitter: http://twitter.com/fjcabral e-mail: fernandojosecab...@gmail.com Facebook: f...@fcabral.com.br Telegram: +55 (37) 99988-8868 Wickr ID: fernandocabral WhatsApp: +55 (37) 99988-8868 Skype: fernandojosecabral Telefone fixo: +55 (37) 3521-2183 Telefone celular: +55 (37) 99988-8868 Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos, nenhum político ou cientista poderá se gabar de nada.
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
Just for curiosity, on my computer, my function (double) processes 10 million strings (first and last name) in about 3 seconds. Very naif measurement using Timers and a limited number of names and surnames eg Willy Weber has come up 11051 times To demonstrate the goodness of Tobias' arguments, about 1 million 3 cents a second I really understood (I hope) what he wanted to say. Sorry my response times but today my modem works worse than my brain. Regards Gianluigi 2017-06-30 17:58 GMT+02:00 Gianluigi: > Sorry Tobias, > other explanations are not necessary. > I would not be able to understand :-( > I accept what you already explained to me as a dogma and I will try to put > it into practice by copying your code :-). > > Thanks again. > > Gianluigi > > 2017-06-30 17:44 GMT+02:00 Gianluigi : > >> >> 2017-06-30 17:21 GMT+02:00 Tobias Boege : >> >>> >>> I wouldn't say there is anything *wrong* with it, but it also has >>> quadratic >>> worst-case running time. You use String[].Push() which is just another >>> name >>> for String[].Add(). Adding an element to an array (the straightforward >>> way) >>> is done by extending the space of that array by one further element and >>> storing the value there. But extending the space of an array could >>> potentially >>> require you to copy the whole array somewhere else (where you have enough >>> free memory at the end of the array to enlarge it). Doing worst-case >>> analysis, >>> we have to assume that this bad case always occurs. >>> >>> If you fill an array with n values, e.g. >>> >>> Dim a As New Integer[] >>> For i = 1 To n >>> a.Add(i) >>> Next >>> >>> then you loop n times and in the i-th iteration there will be already >>> i-many elements in your array. Adding one further element to it will, >>> in the worst case, require i copy operations to be performed. 9-year-old >>> C.F. Gauss will tell you that the amount of store operations is about >>> n^2. >>> >>> >> Tobias you are always kind and thank you very much. >> Is possible for you to explain this more elementarily, for me (a poorly >> educated boy :-) ) >> >> >> >>> And your function does two jobs simultaneously but only returns the >>> result >>> of one of the jobs. The output you get is only worth half the time you >>> spent. >>> >>> >> I did two functions in one, just to save space, this is a simple example. >> :-) >> >> Regards >> Gianluigi >> > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
Sorry Tobias, other explanations are not necessary. I would not be able to understand :-( I accept what you already explained to me as a dogma and I will try to put it into practice by copying your code :-). Thanks again. Gianluigi 2017-06-30 17:44 GMT+02:00 Gianluigi: > > 2017-06-30 17:21 GMT+02:00 Tobias Boege : > >> >> I wouldn't say there is anything *wrong* with it, but it also has >> quadratic >> worst-case running time. You use String[].Push() which is just another >> name >> for String[].Add(). Adding an element to an array (the straightforward >> way) >> is done by extending the space of that array by one further element and >> storing the value there. But extending the space of an array could >> potentially >> require you to copy the whole array somewhere else (where you have enough >> free memory at the end of the array to enlarge it). Doing worst-case >> analysis, >> we have to assume that this bad case always occurs. >> >> If you fill an array with n values, e.g. >> >> Dim a As New Integer[] >> For i = 1 To n >> a.Add(i) >> Next >> >> then you loop n times and in the i-th iteration there will be already >> i-many elements in your array. Adding one further element to it will, >> in the worst case, require i copy operations to be performed. 9-year-old >> C.F. Gauss will tell you that the amount of store operations is about n^2. >> >> > Tobias you are always kind and thank you very much. > Is possible for you to explain this more elementarily, for me (a poorly > educated boy :-) ) > > > >> And your function does two jobs simultaneously but only returns the result >> of one of the jobs. The output you get is only worth half the time you >> spent. >> >> > I did two functions in one, just to save space, this is a simple example. > :-) > > Regards > Gianluigi > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
2017-06-30 17:21 GMT+02:00 Tobias Boege: > > I wouldn't say there is anything *wrong* with it, but it also has quadratic > worst-case running time. You use String[].Push() which is just another name > for String[].Add(). Adding an element to an array (the straightforward way) > is done by extending the space of that array by one further element and > storing the value there. But extending the space of an array could > potentially > require you to copy the whole array somewhere else (where you have enough > free memory at the end of the array to enlarge it). Doing worst-case > analysis, > we have to assume that this bad case always occurs. > > If you fill an array with n values, e.g. > > Dim a As New Integer[] > For i = 1 To n > a.Add(i) > Next > > then you loop n times and in the i-th iteration there will be already > i-many elements in your array. Adding one further element to it will, > in the worst case, require i copy operations to be performed. 9-year-old > C.F. Gauss will tell you that the amount of store operations is about n^2. > > Tobias you are always kind and thank you very much. Is possible for you to explain this more elementarily, for me (a poorly educated boy :-) ) > And your function does two jobs simultaneously but only returns the result > of one of the jobs. The output you get is only worth half the time you > spent. > > I did two functions in one, just to save space, this is a simple example. :-) Regards Gianluigi -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
On Fri, 30 Jun 2017, Gianluigi wrote: > What was wrong in my example which meant this? > > Public Sub Main() > > Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", "E", > "E", "E", "F"] > Dim s As String > > For Each s In ReturnArrays(sSort, 0) > Print s > Next > For Each s In ReturnArrays(sSort, -1) > Print s > Next > > End > > Private Function ReturnArrays(SortedArray As String[], withNumber As > Boolean) As String[] > > Dim sSingle, sWithNumber As New String[] > Dim i, n As Integer > > For i = 0 To SortedArray.Max > ' You can avoid with Tobias's trick (For i = 1 To ...) > If i < SortedArray.Max Then > If SortedArray[i] = SortedArray[i + 1] Then > Inc n > Else > Inc n > sSingle.Push(SortedArray[i]) > sWithNumber.Push(n & SortedArray[i]) > n = 0 > Endif > Endif > Next > Inc n > sSingle.Push(SortedArray[SortedArray.Max]) > sWithNumber.Push(n & SortedArray[SortedArray.Max]) > If withNumber Then > Return sWithNumber > Else > Return sSingle > Endif > > End > I wouldn't say there is anything *wrong* with it, but it also has quadratic worst-case running time. You use String[].Push() which is just another name for String[].Add(). Adding an element to an array (the straightforward way) is done by extending the space of that array by one further element and storing the value there. But extending the space of an array could potentially require you to copy the whole array somewhere else (where you have enough free memory at the end of the array to enlarge it). Doing worst-case analysis, we have to assume that this bad case always occurs. If you fill an array with n values, e.g. Dim a As New Integer[] For i = 1 To n a.Add(i) Next then you loop n times and in the i-th iteration there will be already i-many elements in your array. Adding one further element to it will, in the worst case, require i copy operations to be performed. 9-year-old C.F. Gauss will tell you that the amount of store operations is about n^2. And your function does two jobs simultaneously but only returns the result of one of the jobs. The output you get is only worth half the time you spent. Regards, Tobi -- "There's an old saying: Don't change anything... ever!" -- Mr. Monk -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
What was wrong in my example which meant this? Public Sub Main() Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", "E", "E", "E", "F"] Dim s As String For Each s In ReturnArrays(sSort, 0) Print s Next For Each s In ReturnArrays(sSort, -1) Print s Next End Private Function ReturnArrays(SortedArray As String[], withNumber As Boolean) As String[] Dim sSingle, sWithNumber As New String[] Dim i, n As Integer For i = 0 To SortedArray.Max ' You can avoid with Tobias's trick (For i = 1 To ...) If i < SortedArray.Max Then If SortedArray[i] = SortedArray[i + 1] Then Inc n Else Inc n sSingle.Push(SortedArray[i]) sWithNumber.Push(n & SortedArray[i]) n = 0 Endif Endif Next Inc n sSingle.Push(SortedArray[SortedArray.Max]) sWithNumber.Push(n & SortedArray[SortedArray.Max]) If withNumber Then Return sWithNumber Else Return sSingle Endif End Regards Gianluigi 2017-06-30 15:05 GMT+02:00 Tobias Boege: > On Fri, 30 Jun 2017, Fernando Cabral wrote: > > 2017-06-30 7:44 GMT-03:00 Fabien Bodard : > > > > > The best way is the nando one ... at least for gambas. > > > > > > As you have not to matter about what is the index value or the order, > > > the walk ahead option is the better. > > > > > > > > > Then Fernando ... for big, big things... I think you need to use a DB. > > > Or a native language maybe a sqlite memory structure can be good. > > > > > > > Fabien, since this is a one-time only thing, I don't think I'd be better > > off witha database. > > Basically, I read a text file an then break it down into words, sentences > > and paragraphs. > > Next I count the items in each array (words, sentences paragraphs). > > Array.count works wonderfully. > > After that, have to eliminate the duplicate words (Array.words). But in > > doing it, al also have to count > > how many times each word appeared. > > > > Finally I sort the Array.Sentences and the Array.Paragraphs by size > > (string.len()). The Array.WOrds are > > sorted by count + lenght. This is all woring good. > > > > So, my quest is for the fastest way do eliminate the words duplicates > while > > I count them. > > For the time being, here is a working solution based on system' s sort | > > uniq: > > > > Here is one of the versions I have been using: > > > > Exec ["/usr/bin/uniq", "Unsorted.txt", "Sorted.srt2"] Wait > > Exec ["/usr/bin/uniq", "-ci", "SortedWords.srt2", SortedWords.srt3"] > Wait > > Exec ["/usr/bin/sort", "-bnr", SortedWords.srt3] To UniqWords > > > > Are those temporary files? You can avoid those by piping your data into the > processes and reading their output directly. Otherwise the Temp$() function > gives you better temporary files. > > > WordArray = split (UniqWords, "\n") > > > > So, I end up with the result I want. It's effective. Now, it would be > more > > elegant If I could do the same > > with Gambas. Of course, the sorting would be easy with the builting > > WordArray.sort (). > > But how about te '"/usr/bin/uniq", "-ci" ...' part? > > > > I feel like my other mail answered this, but I can give you another version > of that routine (which I said I would leave as an exercise to you): > > ' Remove duplicates in an array like "uniq -ci". String comparison is > ' case insensitive. The i-th entry in the returned array counts how many > ' times aStrings[i] (in the de-duplicated array) was present in the > input. > ' The data in ~aStrings~ is overridden. Assumes the array is sorted. > Private Function Uniq(aStrings As String[]) As Integer[] > Dim iSrc, iLast As Integer > Dim aCount As New Integer[](aStrings.Count) > > If Not aStrings.Count Then Return [] > iLast = 0 > aCount[iLast] = 1 > For iSrc = 1 To aStrings.Max > If String.Comp(aStrings[iSrc], aStrings[iLast], gb.IgnoreCase) Then > Inc iLast > aStrings[iLast] = aStrings[iSrc] > aCount[iLast] = 1 > Else > Inc aCount[iLast] > Endif > Next > > ' Now shrink the arrays to the memory they actually need > aStrings.Resize(iLast + 1) > aCount.Resize(iLast + 1) > Return aCount > End > > What, in my opinion, is at least theoretically better here than the other > proposed solutions is that it runs in linear time, while nando's is > quadratic[*]. (Of course, if you sort beforehand, it will become n*log(n), > which is still better than quadratic.) > > Attached is a test script with some words. It runs the sort + uniq > utilities > first and then Array.Sort() + the Uniq() function above. The program then > prints the *diff* between the two outputs. I get an empty diff, meaning > that > my Gambas routines produce exactly the same output as the shell utilities. > > Regards, > Tobi > > [*] He calls array functions Add() and Find() inside a For loop that runs > over an array of size n. Adding elements to an array or
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
On Fri, 30 Jun 2017, Fernando Cabral wrote: > 2017-06-30 7:44 GMT-03:00 Fabien Bodard: > > > The best way is the nando one ... at least for gambas. > > > > As you have not to matter about what is the index value or the order, > > the walk ahead option is the better. > > > > > > Then Fernando ... for big, big things... I think you need to use a DB. > > Or a native language maybe a sqlite memory structure can be good. > > > > Fabien, since this is a one-time only thing, I don't think I'd be better > off witha database. > Basically, I read a text file an then break it down into words, sentences > and paragraphs. > Next I count the items in each array (words, sentences paragraphs). > Array.count works wonderfully. > After that, have to eliminate the duplicate words (Array.words). But in > doing it, al also have to count > how many times each word appeared. > > Finally I sort the Array.Sentences and the Array.Paragraphs by size > (string.len()). The Array.WOrds are > sorted by count + lenght. This is all woring good. > > So, my quest is for the fastest way do eliminate the words duplicates while > I count them. > For the time being, here is a working solution based on system' s sort | > uniq: > > Here is one of the versions I have been using: > > Exec ["/usr/bin/uniq", "Unsorted.txt", "Sorted.srt2"] Wait > Exec ["/usr/bin/uniq", "-ci", "SortedWords.srt2", SortedWords.srt3"] Wait > Exec ["/usr/bin/sort", "-bnr", SortedWords.srt3] To UniqWords > Are those temporary files? You can avoid those by piping your data into the processes and reading their output directly. Otherwise the Temp$() function gives you better temporary files. > WordArray = split (UniqWords, "\n") > > So, I end up with the result I want. It's effective. Now, it would be more > elegant If I could do the same > with Gambas. Of course, the sorting would be easy with the builting > WordArray.sort (). > But how about te '"/usr/bin/uniq", "-ci" ...' part? > I feel like my other mail answered this, but I can give you another version of that routine (which I said I would leave as an exercise to you): ' Remove duplicates in an array like "uniq -ci". String comparison is ' case insensitive. The i-th entry in the returned array counts how many ' times aStrings[i] (in the de-duplicated array) was present in the input. ' The data in ~aStrings~ is overridden. Assumes the array is sorted. Private Function Uniq(aStrings As String[]) As Integer[] Dim iSrc, iLast As Integer Dim aCount As New Integer[](aStrings.Count) If Not aStrings.Count Then Return [] iLast = 0 aCount[iLast] = 1 For iSrc = 1 To aStrings.Max If String.Comp(aStrings[iSrc], aStrings[iLast], gb.IgnoreCase) Then Inc iLast aStrings[iLast] = aStrings[iSrc] aCount[iLast] = 1 Else Inc aCount[iLast] Endif Next ' Now shrink the arrays to the memory they actually need aStrings.Resize(iLast + 1) aCount.Resize(iLast + 1) Return aCount End What, in my opinion, is at least theoretically better here than the other proposed solutions is that it runs in linear time, while nando's is quadratic[*]. (Of course, if you sort beforehand, it will become n*log(n), which is still better than quadratic.) Attached is a test script with some words. It runs the sort + uniq utilities first and then Array.Sort() + the Uniq() function above. The program then prints the *diff* between the two outputs. I get an empty diff, meaning that my Gambas routines produce exactly the same output as the shell utilities. Regards, Tobi [*] He calls array functions Add() and Find() inside a For loop that runs over an array of size n. Adding elements to an array or searching an array have themselves worst-case linear complexity, giving quadratic overall. My implementation reserves some more space in advance to avoid calling Add() in a loop. Since the array is sorted, we can go without Find(), too. Actually, as you may know, adding an element to the end of an array can be implemented in amortized constant time (as C++'s std::vector does), by wasting space, but AFAICS Gambas doesn't do this, but I could be wrong. -- "There's an old saying: Don't change anything... ever!" -- Mr. Monk #!/usr/bin/gbs3 Private Const WORDS As String = "" "Are those temporary files You can avoid those by piping your data into the " "processes and reading their output directly Otherwise the Temp function " "gives you better temporary files " "Fabien since this is a onetime only thing I dont think Id be better " "off witha database Basically I read a text file an then break it down into " "words sentences and paragraphs Next I count the items in each array " "words sentences paragraphs Arraycount works wonderfully After that " "have to eliminate the duplicate words Arraywords But in doing it al " "also have to count how many times each word appeared" Public Sub Main() Dim aStrings
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
On 30/06/17 08:20, Fernando Cabral wrote: > 2017-06-30 7:44 GMT-03:00 Fabien Bodard: >> The best way is the nando one ... at least for gambas. >> As you have not to matter about what is the index value or the order, >> the walk ahead option is the better. >> Then Fernando ... for big, big things... I think you need to use a DB. >> Or a native language maybe a sqlite memory structure can be good. > Fabien, since this is a one-time only thing, I don't think I'd be > better off witha database. > Basically, I read a text file an then break it down into words, > sentences and paragraphs. > Next I count the items in each array (words, sentences paragraphs). > Array.count works wonderfully. > After that, have to eliminate the duplicate words (Array.words). But > in doing it, al also have to count how many times each word appeared. > Finally I sort the Array.Sentences and the Array.Paragraphs by size > (string.len()). The Array.WOrds are sorted by count + lenght. This is > all woring good. > So, my quest is for the fastest way do eliminate the words duplicates > while I count them. > For the time being, here is a working solution based on system' s sort > | uniq: > Here is one of the versions I have been using: > Exec ["/usr/bin/uniq", "Unsorted.txt", "Sorted.srt2"] Wait > Exec ["/usr/bin/uniq", "-ci", "SortedWords.srt2", SortedWords.srt3"] Wait > Exec ["/usr/bin/sort", "-bnr", SortedWords.srt3] To UniqWords > WordArray = split (UniqWords, "\n") > So, I end up with the result I want. It's effective. Now, it would be > more elegant If I could do the same with Gambas. Of course, the > sorting would be easy with the builting WordArray.sort (). > But how about te '"/usr/bin/uniq", "-ci" ...' part? > Regards > - fernando Not tried, but for the duplicate count, what about iterating the word array copying each word to a keyed collection? For any new given word, the value (item) added would be 1 (integer), and the key would be UCase(word$). If an error happens, the handler would just Inc the keyed Item value. So (please note my syntax may be slightly off, especially in If Error): Public Function CountWordsInArray(sortedWordArray As String[]) As Collection Dim wordCount As New Collection Dim currentWord As String = Null For Each currentWord In sortedWordArray Try wordCount.Add(1, UCase$(currentWord)) If Error Then Inc wordCount(UCase$(currentWord)) Error.Clear 'Is this needed, or even correct? End If Next Return (wordCollection) End The returned collection should be sorted if the array was, and for each item you will have a numeric count as the item and the word as the key. Hope it helps, zxMarce. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
2017-06-30 7:44 GMT-03:00 Fabien Bodard: > The best way is the nando one ... at least for gambas. > > As you have not to matter about what is the index value or the order, > the walk ahead option is the better. > > > Then Fernando ... for big, big things... I think you need to use a DB. > Or a native language maybe a sqlite memory structure can be good. > Fabien, since this is a one-time only thing, I don't think I'd be better off witha database. Basically, I read a text file an then break it down into words, sentences and paragraphs. Next I count the items in each array (words, sentences paragraphs). Array.count works wonderfully. After that, have to eliminate the duplicate words (Array.words). But in doing it, al also have to count how many times each word appeared. Finally I sort the Array.Sentences and the Array.Paragraphs by size (string.len()). The Array.WOrds are sorted by count + lenght. This is all woring good. So, my quest is for the fastest way do eliminate the words duplicates while I count them. For the time being, here is a working solution based on system' s sort | uniq: Here is one of the versions I have been using: Exec ["/usr/bin/uniq", "Unsorted.txt", "Sorted.srt2"] Wait Exec ["/usr/bin/uniq", "-ci", "SortedWords.srt2", SortedWords.srt3"] Wait Exec ["/usr/bin/sort", "-bnr", SortedWords.srt3] To UniqWords WordArray = split (UniqWords, "\n") So, I end up with the result I want. It's effective. Now, it would be more elegant If I could do the same with Gambas. Of course, the sorting would be easy with the builting WordArray.sort (). But how about te '"/usr/bin/uniq", "-ci" ...' part? Regards - fernando > > > > > > > > > > > > 2017-06-27 15:43 GMT-03:00 Jussi Lahtinen : > > > >> As Fernando stated your code is good only for small arrays. But if > someone > >> is going to use it, here is correct implementation: > >> > >> For x = 0 to a.Max > >> if z.Find(a[x]) = -1 Then z.Add(a[x]) > >> Next > >> > >> > >> z.Exist() might be faster... I don't know. > >> > >> > >> > >> Jussi > >> > >> > >> > >> On Tue, Jun 27, 2017 at 6:59 PM, wrote: > >> > >> > Well, there is complicated, then there is simplicity: > >> > I tested this. Works for sorted, unsorted. > >> > Can't be any simpler. > >> > > >> > Public Function RemoveMultiple(a As String[]) As String[] > >> > > >> > Dim x as Integer > >> > Dim z as NEW STRING[] > >> > > >> > For x = 1 to a.count() > >> > if z.Find(a) = 0 Then z.Add(a[x]) > >> > Next > >> > > >> > 'if you want it sorted, do it here > >> > Return z > >> > > >> > END > >> > > >> > ' - - - - - > >> > use it this way: > >> > > >> > myArray = RemoveMultiple(myArray) > >> > 'the z array is now myArray. > >> > 'the original array is destroyed because there are no references. > >> > > >> > > >> > > >> > -- > >> > Open WebMail Project (http://openwebmail.org) > >> > > >> > > >> > -- Original Message --- > >> > From: Gianluigi > >> > To: mailing list for gambas users > >> > Sent: Tue, 27 Jun 2017 16:52:48 +0200 > >> > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate > >> items > >> > in a array > >> > > >> > > My two cents. > >> > > > >> > > Public Sub Main() > >> > > > >> > > Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", > "E", > >> > > "E", "E", "F"] > >> > > Dim sSame As String[] = sSort > >> > > Dim bb As New Byte[] > >> > > Dim sSingle As New String[] > >> > > Dim i, n As Integer > >> > > > >> > > For i = 0 To sSort.Max > >> > > If i < sSort.Max Then > >> > > If sSort[i] = sSame[i + 1] Then > >> > > Inc n > >> > > Else > >> > > sSingle.Push(sSort[i]) > >> > > bb.Push(n + 1) > >> > > n = 0 > >> > > Endif > >> > > Endif > >> > > Next > >> > > sSingle.Push(sSort[sSort.Max]) > >> > > bb.Push(n + 1) > >> > > For i = 0 To sSingle.Max > >> > > Print sSingle[i] > >> > > Next > >> > > For i = 0 To bb.Max > >> > > Print bb[i] & sSingle[i] > >> > > Next > >> > > > >> > > End > >> > > > >> > > Regards > >> > > Gianluigi > >> > > > >> > > 2017-06-27 16:33 GMT+02:00 : > >> > > > >> > > > Another very effective and simple would be: > >> > > > > >> > > > You have your array with data > >> > > > You create a new empty array. > >> > > > > >> > > > Loop through each item in your array with data > >> > > > If it's not in the new array, then add it. > >> > > > > >> > > > Destroy the original array. > >> > > > Keep the new one. > >> > > > ...something like (syntax may not be correct) > >> > > > > >> > > > Public Function RemoveMultiple(a As String[]) As String[] > >> > > > > >> > > > Dim x as Integer > >> > > > Dim z as NEW STRING[] > >> > > > > >> > > > For x = 1 to a.count() > >> > > > if z.Find(a) = 0 Then z.Add(a[x]) > >> > > > Next > >> > > > > >> > > > Return z
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
The best way is the nando one ... at least for gambas. As you have not to matter about what is the index value or the order, the walk ahead option is the better. Then Fernando ... for big, big things... I think you need to use a DB. Or a native language maybe a sqlite memory structure can be good. 2017-06-27 20:51 GMT+02:00 Fernando Cabral: > Jussi said: > >> As Fernando stated your code is good only for small arrays. But if someone >>is going to use it, here is correct implementation: > > No, Jussi, I didn't say it is good only for small arrays. I said some > suggestions apply only > to small arrays because if I have to traverse the array again and again, > advancing one item at a time, and coming back to the next item, to repeat > it one more time, then time requirement will grow exponentially. This makes > most suggestion unusable for large arrays. The arrays I have might grow to > thousands and thousands os items. > > Regards > > - fernando > > > > > > > 2017-06-27 15:43 GMT-03:00 Jussi Lahtinen : > >> As Fernando stated your code is good only for small arrays. But if someone >> is going to use it, here is correct implementation: >> >> For x = 0 to a.Max >> if z.Find(a[x]) = -1 Then z.Add(a[x]) >> Next >> >> >> z.Exist() might be faster... I don't know. >> >> >> >> Jussi >> >> >> >> On Tue, Jun 27, 2017 at 6:59 PM, wrote: >> >> > Well, there is complicated, then there is simplicity: >> > I tested this. Works for sorted, unsorted. >> > Can't be any simpler. >> > >> > Public Function RemoveMultiple(a As String[]) As String[] >> > >> > Dim x as Integer >> > Dim z as NEW STRING[] >> > >> > For x = 1 to a.count() >> > if z.Find(a) = 0 Then z.Add(a[x]) >> > Next >> > >> > 'if you want it sorted, do it here >> > Return z >> > >> > END >> > >> > ' - - - - - >> > use it this way: >> > >> > myArray = RemoveMultiple(myArray) >> > 'the z array is now myArray. >> > 'the original array is destroyed because there are no references. >> > >> > >> > >> > -- >> > Open WebMail Project (http://openwebmail.org) >> > >> > >> > -- Original Message --- >> > From: Gianluigi >> > To: mailing list for gambas users >> > Sent: Tue, 27 Jun 2017 16:52:48 +0200 >> > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate >> items >> > in a array >> > >> > > My two cents. >> > > >> > > Public Sub Main() >> > > >> > > Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", "E", >> > > "E", "E", "F"] >> > > Dim sSame As String[] = sSort >> > > Dim bb As New Byte[] >> > > Dim sSingle As New String[] >> > > Dim i, n As Integer >> > > >> > > For i = 0 To sSort.Max >> > > If i < sSort.Max Then >> > > If sSort[i] = sSame[i + 1] Then >> > > Inc n >> > > Else >> > > sSingle.Push(sSort[i]) >> > > bb.Push(n + 1) >> > > n = 0 >> > > Endif >> > > Endif >> > > Next >> > > sSingle.Push(sSort[sSort.Max]) >> > > bb.Push(n + 1) >> > > For i = 0 To sSingle.Max >> > > Print sSingle[i] >> > > Next >> > > For i = 0 To bb.Max >> > > Print bb[i] & sSingle[i] >> > > Next >> > > >> > > End >> > > >> > > Regards >> > > Gianluigi >> > > >> > > 2017-06-27 16:33 GMT+02:00 : >> > > >> > > > Another very effective and simple would be: >> > > > >> > > > You have your array with data >> > > > You create a new empty array. >> > > > >> > > > Loop through each item in your array with data >> > > > If it's not in the new array, then add it. >> > > > >> > > > Destroy the original array. >> > > > Keep the new one. >> > > > ...something like (syntax may not be correct) >> > > > >> > > > Public Function RemoveMultiple(a As String[]) As String[] >> > > > >> > > > Dim x as Integer >> > > > Dim z as NEW STRING[] >> > > > >> > > > For x = 1 to a.count() >> > > > if z.Find(a) = 0 Then z.Add(a[x]) >> > > > Next >> > > > >> > > > Return z >> > > > >> > > > END >> > > > >> > > > -Nando (Canada) >> > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Open WebMail Project (http://openwebmail.org) >> > > > >> > > > >> > > > -- Original Message --- >> > > > From: Hans Lehmann >> > > > To: gambas-user@lists.sourceforge.net >> > > > Sent: Tue, 27 Jun 2017 15:51:19 +0200 >> > > > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate >> > items >> > > > in a array >> > > > >> > > > > Hello, >> > > > > >> > > > > look here: >> > > > > >> > > > > 8<-- >> > > > - >> > > > > -- Public Function RemoveMultiple(aStringListe As String[]) >> > As >> > > > String[] >> > > > > Dim iCount As Integer Dim iIndex As Integer Dim sElement As >> > String >> > > > > >> > > > >iIndex = 0 ' Initialisierung NICHT notwendig >> > > > >While
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
Jussi said: > As Fernando stated your code is good only for small arrays. But if someone >is going to use it, here is correct implementation: No, Jussi, I didn't say it is good only for small arrays. I said some suggestions apply only to small arrays because if I have to traverse the array again and again, advancing one item at a time, and coming back to the next item, to repeat it one more time, then time requirement will grow exponentially. This makes most suggestion unusable for large arrays. The arrays I have might grow to thousands and thousands os items. Regards - fernando 2017-06-27 15:43 GMT-03:00 Jussi Lahtinen: > As Fernando stated your code is good only for small arrays. But if someone > is going to use it, here is correct implementation: > > For x = 0 to a.Max > if z.Find(a[x]) = -1 Then z.Add(a[x]) > Next > > > z.Exist() might be faster... I don't know. > > > > Jussi > > > > On Tue, Jun 27, 2017 at 6:59 PM, wrote: > > > Well, there is complicated, then there is simplicity: > > I tested this. Works for sorted, unsorted. > > Can't be any simpler. > > > > Public Function RemoveMultiple(a As String[]) As String[] > > > > Dim x as Integer > > Dim z as NEW STRING[] > > > > For x = 1 to a.count() > > if z.Find(a) = 0 Then z.Add(a[x]) > > Next > > > > 'if you want it sorted, do it here > > Return z > > > > END > > > > ' - - - - - > > use it this way: > > > > myArray = RemoveMultiple(myArray) > > 'the z array is now myArray. > > 'the original array is destroyed because there are no references. > > > > > > > > -- > > Open WebMail Project (http://openwebmail.org) > > > > > > -- Original Message --- > > From: Gianluigi > > To: mailing list for gambas users > > Sent: Tue, 27 Jun 2017 16:52:48 +0200 > > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate > items > > in a array > > > > > My two cents. > > > > > > Public Sub Main() > > > > > > Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", "E", > > > "E", "E", "F"] > > > Dim sSame As String[] = sSort > > > Dim bb As New Byte[] > > > Dim sSingle As New String[] > > > Dim i, n As Integer > > > > > > For i = 0 To sSort.Max > > > If i < sSort.Max Then > > > If sSort[i] = sSame[i + 1] Then > > > Inc n > > > Else > > > sSingle.Push(sSort[i]) > > > bb.Push(n + 1) > > > n = 0 > > > Endif > > > Endif > > > Next > > > sSingle.Push(sSort[sSort.Max]) > > > bb.Push(n + 1) > > > For i = 0 To sSingle.Max > > > Print sSingle[i] > > > Next > > > For i = 0 To bb.Max > > > Print bb[i] & sSingle[i] > > > Next > > > > > > End > > > > > > Regards > > > Gianluigi > > > > > > 2017-06-27 16:33 GMT+02:00 : > > > > > > > Another very effective and simple would be: > > > > > > > > You have your array with data > > > > You create a new empty array. > > > > > > > > Loop through each item in your array with data > > > > If it's not in the new array, then add it. > > > > > > > > Destroy the original array. > > > > Keep the new one. > > > > ...something like (syntax may not be correct) > > > > > > > > Public Function RemoveMultiple(a As String[]) As String[] > > > > > > > > Dim x as Integer > > > > Dim z as NEW STRING[] > > > > > > > > For x = 1 to a.count() > > > > if z.Find(a) = 0 Then z.Add(a[x]) > > > > Next > > > > > > > > Return z > > > > > > > > END > > > > > > > > -Nando (Canada) > > > > > > > > > > > > > > > > > > > > -- > > > > Open WebMail Project (http://openwebmail.org) > > > > > > > > > > > > -- Original Message --- > > > > From: Hans Lehmann > > > > To: gambas-user@lists.sourceforge.net > > > > Sent: Tue, 27 Jun 2017 15:51:19 +0200 > > > > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate > > items > > > > in a array > > > > > > > > > Hello, > > > > > > > > > > look here: > > > > > > > > > > 8<-- > > > > - > > > > > -- Public Function RemoveMultiple(aStringListe As String[]) > > As > > > > String[] > > > > > Dim iCount As Integer Dim iIndex As Integer Dim sElement As > > String > > > > > > > > > >iIndex = 0 ' Initialisierung NICHT notwendig > > > > >While iIndex < aStringListe.Count > > > > > iCount = 0 > > > > > sElement = aStringListe[iIndex] > > > > > While aStringListe.Find(sElement) <> -1 > > > > >Inc iCount > > > > >aStringListe.Remove(aStringListe.Find(sElement)) > > > > > Wend > > > > > If iCount Mod 2 = 1 Then > > > > > aStringListe.Add(sElement, iIndex) > > > > > Inc iIndex > > > > > Endif ' iCount Mod 2 = 1 ? > > > > >Wend > > > > > > > > > >Return aStringListe > > > > > > > > > > End ' RemoveMultiple(...) > > > > >
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
As Fernando stated your code is good only for small arrays. But if someone is going to use it, here is correct implementation: For x = 0 to a.Max if z.Find(a[x]) = -1 Then z.Add(a[x]) Next z.Exist() might be faster... I don't know. Jussi On Tue, Jun 27, 2017 at 6:59 PM,wrote: > Well, there is complicated, then there is simplicity: > I tested this. Works for sorted, unsorted. > Can't be any simpler. > > Public Function RemoveMultiple(a As String[]) As String[] > > Dim x as Integer > Dim z as NEW STRING[] > > For x = 1 to a.count() > if z.Find(a) = 0 Then z.Add(a[x]) > Next > > 'if you want it sorted, do it here > Return z > > END > > ' - - - - - > use it this way: > > myArray = RemoveMultiple(myArray) > 'the z array is now myArray. > 'the original array is destroyed because there are no references. > > > > -- > Open WebMail Project (http://openwebmail.org) > > > -- Original Message --- > From: Gianluigi > To: mailing list for gambas users > Sent: Tue, 27 Jun 2017 16:52:48 +0200 > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate items > in a array > > > My two cents. > > > > Public Sub Main() > > > > Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", "E", > > "E", "E", "F"] > > Dim sSame As String[] = sSort > > Dim bb As New Byte[] > > Dim sSingle As New String[] > > Dim i, n As Integer > > > > For i = 0 To sSort.Max > > If i < sSort.Max Then > > If sSort[i] = sSame[i + 1] Then > > Inc n > > Else > > sSingle.Push(sSort[i]) > > bb.Push(n + 1) > > n = 0 > > Endif > > Endif > > Next > > sSingle.Push(sSort[sSort.Max]) > > bb.Push(n + 1) > > For i = 0 To sSingle.Max > > Print sSingle[i] > > Next > > For i = 0 To bb.Max > > Print bb[i] & sSingle[i] > > Next > > > > End > > > > Regards > > Gianluigi > > > > 2017-06-27 16:33 GMT+02:00 : > > > > > Another very effective and simple would be: > > > > > > You have your array with data > > > You create a new empty array. > > > > > > Loop through each item in your array with data > > > If it's not in the new array, then add it. > > > > > > Destroy the original array. > > > Keep the new one. > > > ...something like (syntax may not be correct) > > > > > > Public Function RemoveMultiple(a As String[]) As String[] > > > > > > Dim x as Integer > > > Dim z as NEW STRING[] > > > > > > For x = 1 to a.count() > > > if z.Find(a) = 0 Then z.Add(a[x]) > > > Next > > > > > > Return z > > > > > > END > > > > > > -Nando (Canada) > > > > > > > > > > > > > > > -- > > > Open WebMail Project (http://openwebmail.org) > > > > > > > > > -- Original Message --- > > > From: Hans Lehmann > > > To: gambas-user@lists.sourceforge.net > > > Sent: Tue, 27 Jun 2017 15:51:19 +0200 > > > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate > items > > > in a array > > > > > > > Hello, > > > > > > > > look here: > > > > > > > > 8<-- > > > - > > > > -- Public Function RemoveMultiple(aStringListe As String[]) > As > > > String[] > > > > Dim iCount As Integer Dim iIndex As Integer Dim sElement As > String > > > > > > > >iIndex = 0 ' Initialisierung NICHT notwendig > > > >While iIndex < aStringListe.Count > > > > iCount = 0 > > > > sElement = aStringListe[iIndex] > > > > While aStringListe.Find(sElement) <> -1 > > > >Inc iCount > > > >aStringListe.Remove(aStringListe.Find(sElement)) > > > > Wend > > > > If iCount Mod 2 = 1 Then > > > > aStringListe.Add(sElement, iIndex) > > > > Inc iIndex > > > > Endif ' iCount Mod 2 = 1 ? > > > >Wend > > > > > > > >Return aStringListe > > > > > > > > End ' RemoveMultiple(...) > > > > 8<-- > > > - > > > > -- > > > > > > > > Hans > > > > gambas-buch.de > > > > > > > > > > > -- > > > > Check out the vibrant tech community on one of the world's most > > > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > ___ > > > > Gambas-user mailing list > > > > Gambas-user@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/gambas-user > > > --- End of Original Message --- > > > > > > > > > > > > -- > > > Check out the vibrant tech community on one of the world's most > > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > ___ > > > Gambas-user mailing list > > > Gambas-user@lists.sourceforge.net > > >
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
Nando The problem with this search and destroy method without pre-sorting is the exponentional growth in time needed to do the job. If my math is not wrong, this is how quickly it gets unmanageable: Items / Comparisons needed (worst case scenario) 10 = 45 100 = 4,950 1000 = 499,500 1000 = 49,995,000 My program has to face a few thousand items, so not sorting does not seem a good option. Regards - fernando 2017-06-27 12:59 GMT-03:00: > Well, there is complicated, then there is simplicity: > I tested this. Works for sorted, unsorted. > Can't be any simpler. > > Public Function RemoveMultiple(a As String[]) As String[] > > Dim x as Integer > Dim z as NEW STRING[] > > For x = 1 to a.count() > if z.Find(a) = 0 Then z.Add(a[x]) > Next > > 'if you want it sorted, do it here > Return z > > END > > ' - - - - - > use it this way: > > myArray = RemoveMultiple(myArray) > 'the z array is now myArray. > 'the original array is destroyed because there are no references. > > > > -- > Open WebMail Project (http://openwebmail.org) > > > -- Original Message --- > From: Gianluigi > To: mailing list for gambas users > Sent: Tue, 27 Jun 2017 16:52:48 +0200 > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate items > in a array > > > My two cents. > > > > Public Sub Main() > > > > Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", "E", > > "E", "E", "F"] > > Dim sSame As String[] = sSort > > Dim bb As New Byte[] > > Dim sSingle As New String[] > > Dim i, n As Integer > > > > For i = 0 To sSort.Max > > If i < sSort.Max Then > > If sSort[i] = sSame[i + 1] Then > > Inc n > > Else > > sSingle.Push(sSort[i]) > > bb.Push(n + 1) > > n = 0 > > Endif > > Endif > > Next > > sSingle.Push(sSort[sSort.Max]) > > bb.Push(n + 1) > > For i = 0 To sSingle.Max > > Print sSingle[i] > > Next > > For i = 0 To bb.Max > > Print bb[i] & sSingle[i] > > Next > > > > End > > > > Regards > > Gianluigi > > > > 2017-06-27 16:33 GMT+02:00 : > > > > > Another very effective and simple would be: > > > > > > You have your array with data > > > You create a new empty array. > > > > > > Loop through each item in your array with data > > > If it's not in the new array, then add it. > > > > > > Destroy the original array. > > > Keep the new one. > > > ...something like (syntax may not be correct) > > > > > > Public Function RemoveMultiple(a As String[]) As String[] > > > > > > Dim x as Integer > > > Dim z as NEW STRING[] > > > > > > For x = 1 to a.count() > > > if z.Find(a) = 0 Then z.Add(a[x]) > > > Next > > > > > > Return z > > > > > > END > > > > > > -Nando (Canada) > > > > > > > > > > > > > > > -- > > > Open WebMail Project (http://openwebmail.org) > > > > > > > > > -- Original Message --- > > > From: Hans Lehmann > > > To: gambas-user@lists.sourceforge.net > > > Sent: Tue, 27 Jun 2017 15:51:19 +0200 > > > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate > items > > > in a array > > > > > > > Hello, > > > > > > > > look here: > > > > > > > > 8<-- > > > - > > > > -- Public Function RemoveMultiple(aStringListe As String[]) > As > > > String[] > > > > Dim iCount As Integer Dim iIndex As Integer Dim sElement As > String > > > > > > > >iIndex = 0 ' Initialisierung NICHT notwendig > > > >While iIndex < aStringListe.Count > > > > iCount = 0 > > > > sElement = aStringListe[iIndex] > > > > While aStringListe.Find(sElement) <> -1 > > > >Inc iCount > > > >aStringListe.Remove(aStringListe.Find(sElement)) > > > > Wend > > > > If iCount Mod 2 = 1 Then > > > > aStringListe.Add(sElement, iIndex) > > > > Inc iIndex > > > > Endif ' iCount Mod 2 = 1 ? > > > >Wend > > > > > > > >Return aStringListe > > > > > > > > End ' RemoveMultiple(...) > > > > 8<-- > > > - > > > > -- > > > > > > > > Hans > > > > gambas-buch.de > > > > > > > > > > > -- > > > > Check out the vibrant tech community on one of the world's most > > > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > ___ > > > > Gambas-user mailing list > > > > Gambas-user@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/gambas-user > > > --- End of Original Message --- > > > > > > > > > > > > -- > > > Check out the vibrant tech community on one of the world's most > > > engaging tech sites, Slashdot.org!
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
On Tue, 27 Jun 2017, Fernando Cabral wrote: > So, my question is basically if Gambas has some built in method do > eliminate duplicates. > The reason I am asking this is because I am new to Gambas, so I have found > myself coding > things that were not needed. For instance, I coded some functions to do > quicksort and bubble sort and then I found Array.sort () was available. > Therefore, I waisted my time coding those quicksort and bubble sort > functions :-( > Ah, ok. I'm almost sure there is no built-in "uniq" function which gets rid of consecutive duplicates, so you can go ahead and write your own :-) -- "There's an old saying: Don't change anything... ever!" -- Mr. Monk -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
2017-06-27 11:29 GMT-03:00 Tobias Boege: > > Your first sentence is a bit confusing. First you say that your array is > sorted but then you say that duplicates may be scattered across the array. > You are right. My fault. The array is sorted. What I meant by scattered was that pairs, duples, triplets or a bunch of duplicates may appear all over interspersed with non-duplicated items. My items are either words or sentences (extracted from an ODT file. After the extraction, the words (or sentences) are sorted with the method Array.sort(gb.descent). After sorting it is much more efficient to search for the duplicates. And it can be done with some simple code (as some people have exemplified in this thread). So, my question is basically if Gambas has some built in method do eliminate duplicates. The reason I am asking this is because I am new to Gambas, so I have found myself coding things that were not needed. For instance, I coded some functions to do quicksort and bubble sort and then I found Array.sort () was available. Therefore, I waisted my time coding those quicksort and bubble sort functions :-( Regards - fernando > If you have a sorting where duplicates are consecutive, the solution is > very easy: just go through the array linearly and kick out these > consecutive > duplicates (which is precisely what uniq does), e.g. for integers: > > Dim aInts As Integer[] = ... > Dim iInd, iLast As Integer > > If Not aInts.Count Then Return > iLast = aInts[0] > iInd = 1 > While iInd < aInts.Count > If aInts[iInd] = iLast Then ' consecutive duplicate > aInts.Remove(iInd, 1) > Else > iLast = aInts[iInd] > Inc iInd > Endif > Wend > > Note that the way I wrote it to get the idea across is not a linear-time > operation (it depends on the complexity of aInts.Remove()), but you can > achieve linear performance by writing better code. Think of it as an > exercise. (Of course, you can't hope to be more efficient than linear > time in a general situation.) > > The counting task is solved with a similar pattern, but while you kick > an element out, you also increment a dedicated counter: > > Dim aInts As Integer[] = ... > Dim aDups As New Integer[] > Dim iInd, iLast As Integer > > If Not aInts.Count Then Return > iLast = aInts[0] > iInd = 1 > aDups.Add(0) > While iInd < aInts.Count > If aInts[iInd] = iLast Then ' consecutive duplicate > aInts.Remove(iInd, 1) > Inc aDups[aDups.Max] > Else > iLast = aInts[iInd] > aDups.Add(0) > Inc iInd > Endif > Wend > > After this executed, the array aInts will not contain duplicates (supposing > it was sorted before) and aDups[i] will contain the number of duplicates of > the item aInts[i] that were removed. > > Regards, > Tobi > > -- > "There's an old saying: Don't change anything... ever!" -- Mr. Monk > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Gambas-user mailing list > Gambas-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gambas-user > -- Fernando Cabral Blogue: http://fernandocabral.org Twitter: http://twitter.com/fjcabral e-mail: fernandojosecab...@gmail.com Facebook: f...@fcabral.com.br Telegram: +55 (37) 99988-8868 Wickr ID: fernandocabral WhatsApp: +55 (37) 99988-8868 Skype: fernandojosecabral Telefone fixo: +55 (37) 3521-2183 Telefone celular: +55 (37) 99988-8868 Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos, nenhum político ou cientista poderá se gabar de nada. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
Well, there is complicated, then there is simplicity: I tested this. Works for sorted, unsorted. Can't be any simpler. Public Function RemoveMultiple(a As String[]) As String[] Dim x as Integer Dim z as NEW STRING[] For x = 1 to a.count() if z.Find(a) = 0 Then z.Add(a[x]) Next 'if you want it sorted, do it here Return z END ' - - - - - use it this way: myArray = RemoveMultiple(myArray) 'the z array is now myArray. 'the original array is destroyed because there are no references. -- Open WebMail Project (http://openwebmail.org) -- Original Message --- From: GianluigiTo: mailing list for gambas users Sent: Tue, 27 Jun 2017 16:52:48 +0200 Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array > My two cents. > > Public Sub Main() > > Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", "E", > "E", "E", "F"] > Dim sSame As String[] = sSort > Dim bb As New Byte[] > Dim sSingle As New String[] > Dim i, n As Integer > > For i = 0 To sSort.Max > If i < sSort.Max Then > If sSort[i] = sSame[i + 1] Then > Inc n > Else > sSingle.Push(sSort[i]) > bb.Push(n + 1) > n = 0 > Endif > Endif > Next > sSingle.Push(sSort[sSort.Max]) > bb.Push(n + 1) > For i = 0 To sSingle.Max > Print sSingle[i] > Next > For i = 0 To bb.Max > Print bb[i] & sSingle[i] > Next > > End > > Regards > Gianluigi > > 2017-06-27 16:33 GMT+02:00 : > > > Another very effective and simple would be: > > > > You have your array with data > > You create a new empty array. > > > > Loop through each item in your array with data > > If it's not in the new array, then add it. > > > > Destroy the original array. > > Keep the new one. > > ...something like (syntax may not be correct) > > > > Public Function RemoveMultiple(a As String[]) As String[] > > > > Dim x as Integer > > Dim z as NEW STRING[] > > > > For x = 1 to a.count() > > if z.Find(a) = 0 Then z.Add(a[x]) > > Next > > > > Return z > > > > END > > > > -Nando (Canada) > > > > > > > > > > -- > > Open WebMail Project (http://openwebmail.org) > > > > > > -- Original Message --- > > From: Hans Lehmann > > To: gambas-user@lists.sourceforge.net > > Sent: Tue, 27 Jun 2017 15:51:19 +0200 > > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate items > > in a array > > > > > Hello, > > > > > > look here: > > > > > > 8<-- > > - > > > -- Public Function RemoveMultiple(aStringListe As String[]) As > > String[] > > > Dim iCount As Integer Dim iIndex As Integer Dim sElement As String > > > > > >iIndex = 0 ' Initialisierung NICHT notwendig > > >While iIndex < aStringListe.Count > > > iCount = 0 > > > sElement = aStringListe[iIndex] > > > While aStringListe.Find(sElement) <> -1 > > >Inc iCount > > >aStringListe.Remove(aStringListe.Find(sElement)) > > > Wend > > > If iCount Mod 2 = 1 Then > > > aStringListe.Add(sElement, iIndex) > > > Inc iIndex > > > Endif ' iCount Mod 2 = 1 ? > > >Wend > > > > > >Return aStringListe > > > > > > End ' RemoveMultiple(...) > > > 8<-- > > - > > > -- > > > > > > Hans > > > gambas-buch.de > > > > > > > > -- > > > Check out the vibrant tech community on one of the world's most > > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > ___ > > > Gambas-user mailing list > > > Gambas-user@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/gambas-user > > --- End of Original Message --- > > > > > > > > -- > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > ___ > > Gambas-user mailing list > > Gambas-user@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gambas-user > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Gambas-user mailing list > Gambas-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gambas-user --- End of Original Message --- -- Check out the vibrant tech community on one of the world's most engaging tech
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
My two cents. Public Sub Main() Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", "E", "E", "E", "F"] Dim sSame As String[] = sSort Dim bb As New Byte[] Dim sSingle As New String[] Dim i, n As Integer For i = 0 To sSort.Max If i < sSort.Max Then If sSort[i] = sSame[i + 1] Then Inc n Else sSingle.Push(sSort[i]) bb.Push(n + 1) n = 0 Endif Endif Next sSingle.Push(sSort[sSort.Max]) bb.Push(n + 1) For i = 0 To sSingle.Max Print sSingle[i] Next For i = 0 To bb.Max Print bb[i] & sSingle[i] Next End Regards Gianluigi 2017-06-27 16:33 GMT+02:00: > Another very effective and simple would be: > > You have your array with data > You create a new empty array. > > Loop through each item in your array with data > If it's not in the new array, then add it. > > Destroy the original array. > Keep the new one. > ...something like (syntax may not be correct) > > Public Function RemoveMultiple(a As String[]) As String[] > > Dim x as Integer > Dim z as NEW STRING[] > > For x = 1 to a.count() > if z.Find(a) = 0 Then z.Add(a[x]) > Next > > Return z > > END > > -Nando (Canada) > > > > > -- > Open WebMail Project (http://openwebmail.org) > > > -- Original Message --- > From: Hans Lehmann > To: gambas-user@lists.sourceforge.net > Sent: Tue, 27 Jun 2017 15:51:19 +0200 > Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate items > in a array > > > Hello, > > > > look here: > > > > 8<-- > - > > -- Public Function RemoveMultiple(aStringListe As String[]) As > String[] > > Dim iCount As Integer Dim iIndex As Integer Dim sElement As String > > > >iIndex = 0 ' Initialisierung NICHT notwendig > >While iIndex < aStringListe.Count > > iCount = 0 > > sElement = aStringListe[iIndex] > > While aStringListe.Find(sElement) <> -1 > >Inc iCount > >aStringListe.Remove(aStringListe.Find(sElement)) > > Wend > > If iCount Mod 2 = 1 Then > > aStringListe.Add(sElement, iIndex) > > Inc iIndex > > Endif ' iCount Mod 2 = 1 ? > >Wend > > > >Return aStringListe > > > > End ' RemoveMultiple(...) > > 8<-- > - > > -- > > > > Hans > > gambas-buch.de > > > > > -- > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > ___ > > Gambas-user mailing list > > Gambas-user@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gambas-user > --- End of Original Message --- > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Gambas-user mailing list > Gambas-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gambas-user > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
Another very effective and simple would be: You have your array with data You create a new empty array. Loop through each item in your array with data If it's not in the new array, then add it. Destroy the original array. Keep the new one. ...something like (syntax may not be correct) Public Function RemoveMultiple(a As String[]) As String[] Dim x as Integer Dim z as NEW STRING[] For x = 1 to a.count() if z.Find(a) = 0 Then z.Add(a[x]) Next Return z END -Nando (Canada) -- Open WebMail Project (http://openwebmail.org) -- Original Message --- From: Hans LehmannTo: gambas-user@lists.sourceforge.net Sent: Tue, 27 Jun 2017 15:51:19 +0200 Subject: Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array > Hello, > > look here: > > 8<--- > -- Public Function RemoveMultiple(aStringListe As String[]) As > String[] > Dim iCount As Integer Dim iIndex As Integer Dim sElement As String > >iIndex = 0 ' Initialisierung NICHT notwendig >While iIndex < aStringListe.Count > iCount = 0 > sElement = aStringListe[iIndex] > While aStringListe.Find(sElement) <> -1 >Inc iCount >aStringListe.Remove(aStringListe.Find(sElement)) > Wend > If iCount Mod 2 = 1 Then > aStringListe.Add(sElement, iIndex) > Inc iIndex > Endif ' iCount Mod 2 = 1 ? >Wend > >Return aStringListe > > End ' RemoveMultiple(...) > 8<--- > -- > > Hans > gambas-buch.de > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Gambas-user mailing list > Gambas-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gambas-user --- End of Original Message --- -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
On Tue, 27 Jun 2017, Fernando Cabral wrote: > Hi > > I have a sorted array that may contain several repeated items scattered all > over. > > I have to do two different things at different times: > a) Eliminate the duplicates leaving a single specimen from each repeated > item; > b) Eliminate the duplicates but having a count of the original number. > > So, if I have, say > > A > B > B > C > D > D > > In the first option, I want to have > A > B > C > D > In the second option, I want to have > 1 A > 2 B > 1 C > 2 D > > Any hints on how to do this using some Gambas buit in method? > > Note; Presently I have been doing it using external calls to > the utilities sort and uniq. > Your first sentence is a bit confusing. First you say that your array is sorted but then you say that duplicates may be scattered across the array. There are notions of order (namely *preorder*) which are so weak that this could happen, but are you actually dealing with a preorder on your items? What are your items, anyway? When I hear "sorted", I think of a partial order and if you have a partial order, then sorted implies that duplicates are consecutive! Anyway, I don't want to bore you with elementary concepts of order theory. There are ways to handle preorders, partial orders and every stronger notion of order, of course, from within Gambas. You simply have to ask a better question, by giving more details. If you have a sorting where duplicates are consecutive, the solution is very easy: just go through the array linearly and kick out these consecutive duplicates (which is precisely what uniq does), e.g. for integers: Dim aInts As Integer[] = ... Dim iInd, iLast As Integer If Not aInts.Count Then Return iLast = aInts[0] iInd = 1 While iInd < aInts.Count If aInts[iInd] = iLast Then ' consecutive duplicate aInts.Remove(iInd, 1) Else iLast = aInts[iInd] Inc iInd Endif Wend Note that the way I wrote it to get the idea across is not a linear-time operation (it depends on the complexity of aInts.Remove()), but you can achieve linear performance by writing better code. Think of it as an exercise. (Of course, you can't hope to be more efficient than linear time in a general situation.) The counting task is solved with a similar pattern, but while you kick an element out, you also increment a dedicated counter: Dim aInts As Integer[] = ... Dim aDups As New Integer[] Dim iInd, iLast As Integer If Not aInts.Count Then Return iLast = aInts[0] iInd = 1 aDups.Add(0) While iInd < aInts.Count If aInts[iInd] = iLast Then ' consecutive duplicate aInts.Remove(iInd, 1) Inc aDups[aDups.Max] Else iLast = aInts[iInd] aDups.Add(0) Inc iInd Endif Wend After this executed, the array aInts will not contain duplicates (supposing it was sorted before) and aDups[i] will contain the number of duplicates of the item aInts[i] that were removed. Regards, Tobi -- "There's an old saying: Don't change anything... ever!" -- Mr. Monk -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] I need a hint on how to deleted duplicate items in a array
Hello, look here: 8<- Public Function RemoveMultiple(aStringListe As String[]) As String[] Dim iCount As Integer Dim iIndex As Integer Dim sElement As String iIndex = 0 ' Initialisierung NICHT notwendig While iIndex < aStringListe.Count iCount = 0 sElement = aStringListe[iIndex] While aStringListe.Find(sElement) <> -1 Inc iCount aStringListe.Remove(aStringListe.Find(sElement)) Wend If iCount Mod 2 = 1 Then aStringListe.Add(sElement, iIndex) Inc iIndex Endif ' iCount Mod 2 = 1 ? Wend Return aStringListe End ' RemoveMultiple(...) 8<- Hans gambas-buch.de -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
[Gambas-user] I need a hint on how to deleted duplicate items in a array
Hi I have a sorted array that may contain several repeated items scattered all over. I have to do two different things at different times: a) Eliminate the duplicates leaving a single specimen from each repeated item; b) Eliminate the duplicates but having a count of the original number. So, if I have, say A B B C D D In the first option, I want to have A B C D In the second option, I want to have 1 A 2 B 1 C 2 D Any hints on how to do this using some Gambas buit in method? Note; Presently I have been doing it using external calls to the utilities sort and uniq. Regards - fernando -- Fernando Cabral Blogue: http://fernandocabral.org Twitter: http://twitter.com/fjcabral e-mail: fernandojosecab...@gmail.com Facebook: f...@fcabral.com.br Telegram: +55 (37) 99988-8868 Wickr ID: fernandocabral WhatsApp: +55 (37) 99988-8868 Skype: fernandojosecabral Telefone fixo: +55 (37) 3521-2183 Telefone celular: +55 (37) 99988-8868 Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos, nenhum político ou cientista poderá se gabar de nada. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user