On Tue, Jan 06, 2015 at 11:43:01AM +0000, Norman Khine wrote: > hello, > i have the following code: > > import os > import sys > import re > > walk_dir = ["app", "email", "views"] > #t(" ") > gettext_re = re.compile(r"""[t]\((.*)\)""").findall > > for x in walk_dir: [...]
The first step in effectively asking for help is to focus on the actual problem you are having. If you were going to the car mechanic to report some strange noises from the engine, you would tell him about the noises, not give him a long and tedious blow-by-blow account of where you were going when the noise started, why you were going there, who you were planning on seeing, and what clothes you were wearing that day. At least, for your mechanic's sake, I hope you are not the sort of person who does that. That reminds me, I need to give my Dad a call... And so it is with code. Your directory walker code appears to work correctly, so there is no need to dump it in our laps. It is just a distraction and an annoyance. To demonstrate the problem, you need the regex, a description of what result you want, a small sample of data where the regex fails, and a description of what you get instead. So, let's skip all the irrelevant directory-walking code and move on to the important part: > which traverses a directory and tries to extract all strings that are > within > > t(" ") > > for example: > > i have a blade template file, as > > replace page > .row > .large-8.columns > form( method="POST", action="/product/saveall/#{style._id}" ) > input( type="hidden" name="_csrf" value=csrf_token ) > h3 #{t("Generate Product for")} #{tt(style.name)} [...] I'm not certain that we need to see an entire Blade template file. Perhaps just an extract would do. Or perhaps not. For now, I will assume an extract will do, and skip ahead: > so, gettext_re = re.compile(r"""[t]\((.*)\)""").findall is not correct as > it includes > > results such as input( type="hidden" name="_csrf" value=csrf_token ) > > what is the correct way to pull all values that are within t(" ") but > exclude any tt( ) and input( ) > > any advice much appreciated My first instinct is to quote Jamie Zawinski: Some people, when confronted with a problem, think, "I know, I'll use regular expressions." Now they have two problems. If you have nested parentheses or quotes, you *cannot* in general solve this problem with regular expressions. If Blade templates allow nesting, then you are in trouble and you will need another solution, perhaps a proper parser. But for now, let's assume that no nesting is allowed. Let's look at the data format again: blah blah blah don't care input(type="hidden" ...) h3 #{t("blah blah blah")} #{tt(style.name)} So it looks like the part you care about looks like this: ...#{t("spam")}... where you want to extract the word "spam", nothing else. This suggests a regex: r'#{t\("(.*)"\)}' and then you want the group (.*) not the whole regex. Here it is in action: py> import re py> text = """blah blah blah don't care input(type="hidden" ...) ... h3 #{t("we want this")} #{tt(style.name)} ... more junk ... #{tt(abcd)} #{t("and this too")}blah blah blah...""" py> pat = re.compile(r'#{t\("(.*)"\)}') py> pat.findall(text) ['we want this', 'and this too'] So it appears to be working. Now you can try it on the full Blade file and see how it goes. If the regex still gives false matches, you can try: - using a non-greedy match instead: r'#{t\("(.*?)"\)}' - rather than matching all possible characters with .* can you limit it to only alphanumerics? r'#{t\("(\w*?)"\)}' - perhaps you need a two-part filter, first you use a regex to extract all the candidate matches, and then you eliminate some of them using some other test (not necessarily a regex). Good luck! -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor