On Tuesday, 3 December 2019 06:22:52 UTC+8, DL Neil wrote: > On 3/12/19 6:00 AM, Peter Otten wrote: > > A S wrote: > > I think I've seen this question before ;) > > In addition to 'other reasons' for @Peter's comment, it is a common > ComSc worked-problem or assignment. (in which case, we'd appreciate > being told that you/OP is asking for help with "homework") > > > >> I am trying to extract all strings in nested parentheses (along with the > >> parentheses itself) in my .txt file. Please see the sample .txt file that > >> I have used in this example here: > >> (https://drive.google.com/open?id=1UKc0ZgY9Fsz5O1rSeBCLqt5dwZkMaQgr). > >> > >> I have tried and done up three different codes but none of them seems to > >> be able to extract all the nested parentheses. They can only extract a > >> portion of the nested parentheses. Any advice on what I've done wrong > >> could really help! > > One approach is to research in the hope that there are already existing > tools or techniques which may help/save you from 'reinventing the wheel' > - when you think about it, a re-statement of open-source objectives. > > How does the Python interpreter break-down Python (text) code into its > constituent parts ("tokens") *including* parentheses? Such are made > available in (perhaps) a lesser-known corner of the PSL (Python Standard > Library). Might you be able to use one such tool? > > The ComSc technique which sprang to mind involves "stacks" (a LIFO data > structure) and "RPN" (Reverse Polish Notation). Whereas we like people > to take their turn when it comes to limited resources, eg to form a > "queue" to purchase/pay for goods at the local store, which is "FIFO" > (first-in, first-out); a "stack"/LIFO (last-in, first-out) can be > problematic to put into practical application. There are plenty of > Python implementations or you can 'roll your own' with a list. Again, > I'd likely employ a "deque" from the PSL's Collections library (although > as a "stack" rather than as a "double-ended queue"), because the > optimisation comes "free". (to my laziness, but after some kind soul > sweated-bullets to make it fast (in both senses) for 'the rest of us'!) > > > > It's probably easier to understand and implement when you process the > > complete text at once. Then arbitrary splits don't get in the way of your > > quest for ( and ). You just have to remember the position of the first > > opening ( and number of opening parens that have to be closed before you > > take the complete expression: > > +1 > but as a 'silver surfer', I don't like to be asked to "remember" anything! > > > > level: 00011112222100 > > we need^ > > > Consider: > original_text (the contents of the .txt file - add buffering if volumes > are huge) > current_text (the characters we have processed/"recognised" so-far) > stack (what an original name for such a data-structure! Which will > contain each of the partial parenthetical expressions found - but yet to > be proven/made complete) > > set current_text to NULSTRING > for each current_character in original_text: > if current_character is LEFT_PARENTHESIS: > push current_text to stack > set current_text to LEFT_PARENTHESIS > concatenate current_character with current_text > if current_character is RIGHT_PARENTHESIS: > # current_text is a parenthetical expression > # do with it what you will > pop the stack > set current_text to the ex-stack string \ > concat current_text's p-expn > > Once working: cover 'special cases' (after above loop), eg original_text > which doesn't begin and/or end with parentheses; and error cases, eg > pop-ping a NULSTRING, or thinking things are finished but the stack is > not yet empty - likely events from unbalanced parentheses! > > original text = "abc(def(gh))ij" > > event 1: in-turn, concatenate characters "abc" as current_text > event 2: locate (first) left-parenthesis, push current_text to stack(&) > event 3: concatenate "(def" > event 4: push, likewise > event 5: concatenate "(gh" > event 6: locate (first) right-parenthesis (matches to left-parenthesis > begining the current_string!) > result?: ?print current_text? > event 7: pop stack and redefine current_text as "(def(gh)" > event 8: repeat, per event 6 > event 9: current_text will now become "(def(gh))" > event 10: (special case) run-out of input at "(def(gh)ij" > event 11: (special case) pop (!any) stack and report "abc(def(gh)" > > > NB not sure of need for a "level" number; but if required, you can infer > that at any time, from the depth/len() of the stack! > > NBB being a simple-boy, my preference is for a 'single layer' of code, > cf @Peter's generator. Regardless the processes are "tokenisation" and > "recognition". > > At the back of my mind, was the notion that you may (eventually) be > required to work with more than parentheses, eg pair-wise > square-brackets and/or quotation-marks. In which case, you will need to > also 'push' the token and check for token-pairs when 'pop-ping', as well > as (perhaps) recognising lists of tokens to tokenise instead of the two > parenthesis characters alone. In which case, I'd take a serious look at > the Python Language Services rather than taking a 'roll your own' approach! > > Contrarily, if this spec is 'it', then you might consider optimising the > search processes which 'trigger' the two stack operations, by re-working > the for-loop and utilising string.find() - prioritising whichever > parenthesis is left-most/comes-first - assuming LtR text. (apologies if > you have already tried this in one of your previous approaches) > Unfortunately, such likely results in 'layers' of code, and a generator > might well become the tool-of-choice (I say this before @Peter comes > back and (quite deservedly) flays me alive!). > > > WebRefs: > Python Language Services: https://docs.python.org/3/library/language.html > collections — Container datatypes: > https://docs.python.org/3/library/collections.html > > See also your ComSc text/reference materials. > -- > Regards =dn
Hey DL Neil, this is rather sophisticated for me as I am still learning the basics of Python...But I truly appreciate your help and effort! I did try to read through what you said, but some parts I could not register! -- https://mail.python.org/mailman/listinfo/python-list