I was able to get about a 20% speed up over Steve's solution, on some benchmark data I created, by:
* converting LOAD_GLOBAL to LOAD_FAST for __builtins__ * eliminating the conditional in each loop in favour of a conditional on pop only * eliminating string comparison for the operation in favour of testing line[1] against byte values None of these had a significant effect in either direction: * replacing range objects with counts * replacing the string.split()/int() calls with a hand-rolled space-separated integer parser in Python. Oh, in my benchmarking, the print() calls were huge and highly variable... I stubbed them out first. On Thu, 8 Jun 2017, 21:39 Stestagg, <stest...@gmail.com> wrote: > Apologies, In my previous email, I meant 'insert a marker', rather than > 'push a marker' > > On Thu, Jun 8, 2017 at 7:17 PM Stestagg <stest...@gmail.com> wrote: > >> I tracked down the challenge on the site, and have a working solution (I >> won't share for obvious reasons). Basically the timeouts were being caused >> by 'add_to_first_n' being called in horrible ways in the test cases. >> >> Because add_to_first_n alters the bottom of the stack, you can just push >> a marker onto the stack rather than iterating and mutating each entry, >> doing this made those test cases pass >> >> Personally, I think it's not a well-described problem, because it's >> expecting you to tune the algo to specific shapes of data without allowing >> any visibility on the data, or a description of what to code for. An algo >> junkie may jump straight to the optimized version, but a pragmatic >> developer would, in my opinion, hesitate to do that without any actual >> evidence that the problem required it. >> >> Steve >> >> >> >> >> >> On Thu, Jun 8, 2017 at 5:27 PM Jonathan Hartley <tart...@tartley.com> >> wrote: >> >>> Yep, that's a great elimination of the suspicious small overheads. >>> >>> line_profiler is beautiful, I'll definitely be adding it to my toolbox, >>> thanks for that! >>> >>> I tried a variant of accumulating the output and printing it all as a >>> single string, but of course this didn't help, printing is already buffered. >>> >>> Jonathan >>> >>> On 6/8/2017 03:54, Stestagg wrote: >>> >>> I honestly can't see a way to improve this in python. My best solution >>> is: >>> >>> def main(lines): >>> stack = [] >>> sa = stack.append >>> sp = stack.pop >>> si = stack.__getitem__ >>> for line in lines: >>> meth = line[:3] >>> if meth == b'pus': >>> sa(int(line[5:])) >>> elif meth == b'pop': >>> sp() >>> else: >>> parts = line[15:].split() >>> end = len(stack)-1 >>> amount = int(parts[1]) >>> for x in range(int(parts[0])): >>> index = end - x >>> stack[index] += amount >>> print(stack[-1] if stack else None) >>> >>> which comes out about 25% faster than your solution. >>> >>> One tool that's interesting to use here is: line_profiler: >>> https://github.com/rkern/line_profiler >>> >>> putting a @profile decorator on the above main() call, and running with >>> kernprof produces the following output: >>> >>> Line # Hits Time Per Hit % Time Line Contents >>> >>> ============================================================== >>> >>> 12 @profile >>> >>> 13 def main(lines): >>> >>> 14 1 4 4.0 0.0 stack = [] >>> >>> 15 2000001 949599 0.5 11.5 for line in lines: >>> >>> 16 2000000 1126944 0.6 13.7 meth = line[:3] >>> >>> >>> 17 2000000 974635 0.5 11.8 if meth == >>> b'pus': >>> >>> 18 1000000 1002733 1.0 12.2 >>> stack.append(int(line[5:])) >>> >>> 19 1000000 478756 0.5 5.8 elif meth == >>> b'pop': >>> >>> 20 999999 597114 0.6 7.2 stack.pop() >>> >>> 21 else: >>> >>> 22 1 6 6.0 0.0 parts = >>> line[15:].split() >>> >>> 23 1 2 2.0 0.0 end = >>> len(stack)-1 >>> >>> 24 1 1 1.0 0.0 amount = >>> int(parts[1]) >>> >>> 25 500001 241227 0.5 2.9 for x in >>> range(int(parts[0])): >>> >>> 26 500000 273477 0.5 3.3 index >>> = end - x >>> >>> 27 500000 309033 0.6 3.7 >>> stack[index] >>> += amount >>> >>> 28 2000000 2295803 1.1 27.8 >>> print(stack[-1]) >>> >>> which shows that there's no obvious bottleneck (line by line) here (for >>> my sample data). >>> >>> Note the print() overhead dominates the runtime, and that's with me >>> piping the output to /dev/null directly. >>> >>> I had a go at using arrays, deques, and numpy arrays in various ways >>> without luck, but we're getting fairly close to the native python statement >>> execution overhead here (hence folding it all into one function). >>> >>> My only thoughts would be to see if there were some magic that could be >>> done by offloading the work onto a non-python library somehow. >>> >>> Another thing that might help some situations (hence my previous >>> questions) would be to implement the add_to_first_n as a lazy operator >>> (i.e. have a stack of the add_to_first_n values and dynamically add to the >>> results of pop() but that would proabably be much slow in the average case. >>> >>> Steve >>> >>> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley <tart...@tartley.com> >>> wrote: >>> >>>> Hey. >>>> >>>> Thanks for engaging, but I can't help with the most important of those >>>> questions - the large data sets on which my solution failed due to timeout >>>> are hidden from candidates. Not unreasonable to assume that they do >>>> exercise deep stacks, and large args to add_to_first_n, etc. >>>> >>>> Yes, the input looks exactly like your example. All args are integers. >>>> The question asked for output corresponding to the top of the stack after >>>> every operation. I omitted this print from inside the 'for' loop in 'main', >>>> thinking it irrelevant. >>>> >>>> I converted to integers inside 'dispatch'. 'args' must have actually >>>> been created with: >>>> >>>> args = [int(i) for i in tokens[1:]] >>>> >>>> Where len(tokens) is never going to be bigger than 3. >>>> >>>> Return values (from 'pop') were unused. >>>> >>>> >>>> On 6/7/2017 13:25, Stestagg wrote: >>>> >>>> Do you have any more context? >>>> For example, is the add_to_first_n likely to be called with very large >>>> numbers, or very often? Does the stack get very deep, or stay shallow? >>>> >>>> I'm assuming that lines look like this: >>>> >>>> push 1 >>>> push 2 >>>> add_to_first_n 2 10 >>>> pop >>>> pop >>>> >>>> with all arguments as integers, and the final value being returned from >>>> main()? >>>> How did you convert from string inputs to numeric values? >>>> How did you manage return values? >>>> >>>> :D >>>> >>>> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley <tart...@tartley.com> >>>> wrote: >>>> >>>>> I recently submitted a solution to a coding challenge, in an >>>>> employment context. One of the questions was to model a simple stack. I >>>>> wrote a solution which appended and popped from the end of a list. This >>>>> worked, but failed with timeouts on their last few automated tests with >>>>> large (hidden) data sets. >>>>> >>>>> From memory, I think I had something pretty standard: >>>>> >>>>> class Stack: >>>>> >>>>> def __init__(self): >>>>> self.storage = [] >>>>> >>>>> def push(arg): >>>>> self.storage.append(arg) >>>>> >>>>> def pop(): >>>>> return self.storage.pop() if self.storage else None >>>>> >>>>> def add_to_first_n(n, amount): >>>>> for n in range(n): >>>>> self.storage[n] += amount >>>>> >>>>> def dispatch(self, line) >>>>> tokens = line.split() >>>>> method = getattr(self, tokens[0]) >>>>> args = tokens[1:] >>>>> method(*args) >>>>> >>>>> def main(lines): >>>>> stack = Stack() >>>>> for line in lines: >>>>> stack.dispatch(line) >>>>> >>>>> >>>>> (will that formatting survive? Apologies if not) >>>>> >>>>> Subsequent experiments have confirmed that appending to and popping >>>>> from the end of lists are O(1), amortized. >>>>> So why is my solution too slow? >>>>> >>>>> This question was against the clock, 4th question of 4 in an hour. So >>>>> I wasn't expecting to produce Cython or C optimised code in that timeframe >>>>> (Besides, my submitted .py file runs on their servers, so the environment >>>>> is limited.) >>>>> >>>>> So what am I missing, from a performance perspective? Are there other >>>>> data structures in stdlib which are also O(1) but with a better constant? >>>>> >>>>> Ah. In writing this out, I have begun to suspect that my slicing of >>>>> 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not >>>>> much, but some. >>>>> >>>>> Thoughts welcome, >>>>> >>>>> Jonathan >>>>> >>>>> -- >>>>> Jonathan Hartley tart...@tartley.com http://tartley.com >>>>> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> >>>>> twitter/skype: tartley >>>>> >>>>> >>>>> _______________________________________________ >>>>> python-uk mailing list >>>>> python-uk@python.org >>>>> https://mail.python.org/mailman/listinfo/python-uk >>>>> >>>> >>>> >>>> _______________________________________________ >>>> python-uk mailing >>>> listpython-uk@python.orghttps://mail.python.org/mailman/listinfo/python-uk >>>> >>>> >>>> -- >>>> Jonathan Hartley tart...@tartley.com http://tartley.com >>>> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> >>>> twitter/skype: tartley >>>> >>>> >>>> _______________________________________________ >>>> python-uk mailing list >>>> python-uk@python.org >>>> https://mail.python.org/mailman/listinfo/python-uk >>>> >>> >>> >>> _______________________________________________ >>> python-uk mailing >>> listpython-uk@python.orghttps://mail.python.org/mailman/listinfo/python-uk >>> >>> >>> -- >>> Jonathan Hartley tart...@tartley.com http://tartley.com >>> Made out of meat. +1 507-513-1101 <(507)%20513-1101> >>> twitter/skype: tartley >>> >>> >>> _______________________________________________ >>> python-uk mailing list >>> python-uk@python.org >>> https://mail.python.org/mailman/listinfo/python-uk >>> >> _______________________________________________ > python-uk mailing list > python-uk@python.org > https://mail.python.org/mailman/listinfo/python-uk >
_______________________________________________ python-uk mailing list python-uk@python.org https://mail.python.org/mailman/listinfo/python-uk