>> For example are the file names in A in B and if so return a new list with >> those items. Long story short, I wrote some functions to do that. They are >> quite simple and fast (due to timsort in no small part). >> Even the plain python code is faster than the built in set functions (afaik). >You’re testing pretty small values in a very narrow use case, and testing them >inaccurately by using datetime instead of timeit (the fact that some of your >results are 0 vs. 0 should be a clue…), and you haven’t even given us the >results, just claimed that they’re faster.
I'm sorry I haven't used the mail list before and didn't send my comments to the list. Please see my response to Chris for some example times. I'm reluctant to use anything less than about a ms for timing purposes. There's too many interactions at the us level that can skew results. I'd prefer to expand the sample size until it's more clear. Here is an example For A not in B: trying A 10000, B 50000 naive test 0:00:10.319032 set test 0:00:00.009001 listex test 0:00:00.005000 For A subset of B: trying A 100000, B 500000 set test 0:00:00.180018 listex test 0:00:00.054006 If I increase a and b by 10 times the execution time goes up by about 100x on the set. This is because its related to a * b. the listex only goes up by about 10x which is about 10a + 10b. >When I run my own test on lists of 10000 strings made up 5 random lowercase >characters, >I get 485us for set(a).intersection(b), and 6609us for list_intersection(a, >b), so it’s >actually more than 10x slower. I can't reproduce this here is what I get For A not in B: trying A 10000, B 10000 naive test 0:00:02.661266 set test 0:00:00.002000 listex test 0:00:00.005001 >>The complexity ends up being about the longer of list a or b. >How can that be true? Sorting a takes nlogn steps, sorting b takes mlogm steps, >step-comparing then takes n+m steps, so that’s O(nlogn + mlogm * n + m) = >O(nlogn), which i>s bigger than O(n). I should have been more specific. The readme.md in the project explains a little more as does my response to Chris. You are right the sort + alg operation takes n log n and m log m just to sort the lists. The intersection,union, other functions though are m + n. The total is O(m+n+nlogn+mlogm). There is no multiplication of the sort time and the relation operation time because you only have to do it once no matter how many items are in a or b. >Why would you want this as a method of list rather than a function that works >just as well on any sortable iterables? It doesn't have to be. I just thought that was the purpose of this list so I should start with that. A stand alone module works for me at the moment. If it were included in the list functions or as some "extra" function that worked on lists I figured the final result would be done in C like the sort() function. That would make it a significant change so my idea was just to present the algorithms and see if there was interest. If it is in a addon package that would be ok with me, but I think think people would use the set method and probably never know there was a better way. >c = [val for val in a if val in filterset] This is the crux I think. The complexity is the number of elements in filterset. Since filterset has to be check for every value in a its O(len(a) * len(b) * filter search time). You wrote O(n+m) but I think that is incorrect. The module (listex) I linked to implements the other operations as well. I didn't mean just intersection I was just using that as an example. If you run the module it will generate some time comparison for that, but its just for comparison not meant to be exhaustive. It's the set of functions I have found useful in the past. It's not all of the ones in the module I regularly use for purely academic purposes actually. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PSIALU4ITFPYUMSRU45AQS64XXISZ2V5/ Code of Conduct: http://python.org/psf/codeofconduct/