Re: [BangPypers] How should I do it?
On Fri, Jan 15, 2010 at 12:01 PM, Eknath Venkataramani < eknath.i...@gmail.com> wrote: > I have a txt file in the following format: > [code] > "confident" => { > count => 4, > trans => { > "ashahvasahta" => 0.74918568, >"atahmavaishahvaasa" => 0.09095465, >"pahraaram\.nbha" => 0.06990729, > "mailatae" => 0.02856427, > "utanai" => 0.01929341, > "anaa" => 0.01578552, > "uthaanae" => 0.01403157, > "jaitanae" => 0.01227762, >}, > }, > "consumers" => { > count => 4, > trans => { >"upabhaokahtaa" => 0.75144362, >"upabhaokahtaaom\.n" => 0.12980166, >"sauda\�\�\�dha" => 0.11875471, >}, > }, > "a" => { > count => 1164, > trans => { > "eka" => 0.14900491, > "kaisai" => 0.08834675, > "haai" => 0.06774697, > "kaoi" => 0.05394308, > "kai" => 0.04981982, > "\(none\)" => 0.04400085, > "kaa" => 0.03726579, > "kae" => 0.03446450, >}, > }, > [/code] > > and I need to extract "confident" , "ashahvasahta" from the first > record, "consumers", "upabhaokahtaa" from the second record... > i.e. "word in english" and the "first word in the probable-translations" > > Thanks is advance > Eknath > ___ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > Since I hadn't had a chance to write a recursive descent parser, took this opportunity to do a bit of an exercise. I have used a parser called pyparsing. -- Begin Code -- # coding=utf-8 from pyparsing import * import pprint import sys data = ''' "confident" => { count => 4, trans => { "ashahvasahta" => 0.74918568, "atahmavaishahvaasa" => 0.09095465, "pahraaram\.nbha" => 0.06990729, "mailatae" => 0.02856427, "utanai" => 0.01929341, "anaa" => 0.01578552, "uthaanae" => 0.01403157, "jaitanae" => 0.01227762, }, }, "consumers" => { count => 4, trans => { "upabhaokahtaa" => 0.75144362, "upabhaokahtaaom\.n" => 0.12980166, "sauda\�\�\�dha" => 0.11875471, }, }, "a" => { count => 1164, trans => { "eka" => 0.14900491, "kaisai" => 0.08834675, "haai" => 0.06774697, "kaoi" => 0.05394308, "kai" => 0.04981982, "\(none\)" => 0.04400085, "kaa" => 0.03726579, "kae" => 0.03446450, }, } ''' # Setup pyparsing tokens dct = Forward() pair_op = Literal("=>") comma = Literal(",").suppress() beg_brace = Literal("{").suppress() end_brace = Literal("}").suppress() num = Word("0123456789.") key = (Word(alphas + nums) ^ quotedString).setResultsName("key") val = (num ^ dct).setResultsName("value") key_value_pair = Group(key + pair_op.suppress() + val) key_value_pair_list = delimitedList(key_value_pair) dct << Group(beg_brace + key_value_pair_list + Optional(comma) + end_brace) # parse data parsed = key_value_pair_list.parseString(data) # function to extract ie. form a python datastructure def extract(result): if 'key' in result.keys() : if isinstance(result.value,ParseResults) : return ( result.key, extract(result.value) ) else : return ( result.key, result.value ) else : return(dict(extract(elem) for elem in result)) # extract extracted = extract(parsed) # print extracted data pprint.pprint(extracted, sys.stdout) # print the english word and first translated word print "\n\n\nTranslations\n\n" print dict( (english, reduce(lambda x,y : (y[0],float(y[1])) if float(y[1]) > x[1] else x , translations['trans'].items(), ('',0.0))[0] ) for english,translations in extracted.items() ) -- End Code -- Dhananjay -- blog: http://blog.dhananjaynene.com twitter: http://twitter.com/dnene http://twitter.com/_pythonic ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] psyco V2
On Fri, Jan 15, 2010 at 5:27 PM, Vishal wrote: > Hi, > > Came across this post on codespeak. > Christian Tismer of Stackless fame as taken up pysco and created a V2 of > it...and seems the effort continues... > > http://codespeak.net/pipermail/pypy-dev/2009q3/005288.html Nice.. Thanks for pointing this out. -- ~noufal http://nibrahim.net.in ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
[BangPypers] psyco V2
Hi, Came across this post on codespeak. Christian Tismer of Stackless fame as taken up pysco and created a V2 of it...and seems the effort continues... http://codespeak.net/pipermail/pypy-dev/2009q3/005288.html Interesting stuff... Best regards, Vishal Sapre ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] How should I do it?
On Fri, Jan 15, 2010 at 4:13 PM, Anand Chitipothu wrote: > On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose > wrote: > >> It is a clever hack, taking advantage of the nature of the data. But > >> it is far more faster than the other approaches posted here. > > > > I thought eval was evil :) > > The date looks like valid json. You can use simplejson.loads instead of > eval. > Don't the '=>' characters mess things up? One of the nice things about the repr of Python objects is that they're almost valid JSON. The same can't be said for PHP though. -- ~noufal http://nibrahim.net.in ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] How should I do it?
On Fri, Jan 15, 2010 at 4:13 PM, Anand Chitipothu wrote: > On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose > wrote: > >> It is a clever hack, taking advantage of the nature of the data. But > >> it is far more faster than the other approaches posted here. > > > > I thought eval was evil :) > > The date looks like valid json. You can use simplejson.loads instead of > eval. > > Python 2.6.2 (r262:71600, Aug 21 2009, 12:23:57) [GCC 4.4.1 20090818 (Red Hat 4.4.1-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import simplejson >>> data=open('data.txt').read().replace('[code]','').replace('[/code]','') >>> data '\n"confident" => {\n count => 4,\n trans => {\n"ashahvasahta" => 0.74918568,\n "atahmavaishahvaasa" => 0.09095465,\n "pahraaram\\.nbha" => 0.06990729,\n"mailatae" => 0.02856427,\n "utanai" => 0.01929341,\n"anaa" => 0.01578552,\n"uthaanae" => 0.01403157,\n"jaitanae" => 0.01227762,\n },\n},\n"consumers" => {\n count => 4,\n trans => {\n "upabhaokahtaa" => 0.75144362,\n "upabhaokahtaaom\\.n" => 0.12980166,\n "sauda\\\xef\xbf\xbd\\\xef\xbf\xbd\\\xef\xbf\xbddha" => 0.11875471,\n },\n},\n"a" => {\n count => 1164,\n trans => {\n "eka" => 0.14900491,\n "kaisai" => 0.08834675,\n"haai" => 0.06774697,\n"kaoi" => 0.05394308,\n "kai" => 0.04981982,\n"\\(none\\)" => 0.04400085,\n "kaa" => 0.03726579,\n "kae" => 0.03446450,\n },\n},\n\n' >>> simplejson.loads(data) Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python2.6/site-packages/simplejson/__init__.py", line 307, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.6/site-packages/simplejson/decoder.py", line 338, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 2 column 13 - line 37 column 1 (char 13 - 815) Anand > ___ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > -- --Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] How should I do it?
On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose wrote: >> It is a clever hack, taking advantage of the nature of the data. But >> it is far more faster than the other approaches posted here. > > I thought eval was evil :) The date looks like valid json. You can use simplejson.loads instead of eval. Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] How should I do it?
On Fri, Jan 15, 2010 at 4:03 PM, Noufal Ibrahim wrote: > On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose >wrote: > > > > It is a clever hack, taking advantage of the nature of the data. But > > > it is far more faster than the other approaches posted here. > > > > I thought eval was evil :) > > > > Given that the OPs data is fixed, eval is okay. :) > > Otherwise, it could be evil or unreliable (eg. => inside some of the data > strings etc.) > My sentiments are the same. As long as you are sure of the safety of your data such as absence of %s etc which could cause security issues, eval is safe. It can often be used for quick short-cuts such as the one above. I started off with a regular expression solution first, but after I observed that the pattern fits a recursive dict, changed tack. Btw "eval" is spelled e-v-a-l, not e-v-i-l :) > > -- > ~noufal > http://nibrahim.net.in > ___ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > -- --Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] How should I do it?
On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose wrote: > > It is a clever hack, taking advantage of the nature of the data. But > > it is far more faster than the other approaches posted here. > > I thought eval was evil :) > Given that the OPs data is fixed, eval is okay. :) Otherwise, it could be evil or unreliable (eg. => inside some of the data strings etc.) -- ~noufal http://nibrahim.net.in ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] How should I do it?
> It is a clever hack, taking advantage of the nature of the data. But > it is far more faster than the other approaches posted here. I thought eval was evil :) Regards, BG -- Baishampayan Ghose b.ghose at gmail.com ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] How should I do it?
On Fri, Jan 15, 2010 at 2:40 PM, Anand Balachandran Pillai wrote: > # Now, count and trans are not strings in > # data, so Python will complain, hence we > # define these as strings with same name! > count, trans = 'count','trans' > Clever, that. I got to there, threw up my hands and went downstairs to eat lunch. -- rm ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] How should I do it?
On Friday 15 Jan 2010 12:01:56 pm Eknath Venkataramani wrote: > and I need to extract "confident" , "ashahvasahta" from the first > record, "consumers", "upabhaokahtaa" from the second record... > i.e. "word in english" and the "first word in the probable-translations" > #!/usr/bin/python words = [{'english':"confident", 'count' : 4, 'trans' : [ ("ashahvasahta" , 0.74918568), ("atahmavaishahvaasa" , 0.09095465), ("pahraaram\.nbha" , 0.06990729), ("mailatae" , 0.02856427), ("utanai" , 0.01929341), ("anaa" , 0.01578552), ("uthaanae" , 0.01403157), ("jaitanae" , 0.01227762), ], }, {'english':"consumers", 'count' : 4, 'trans' : [ ("upabhaokahtaa" , 0.75144362), ("upabhaokahtaaom\.n" , 0.12980166), ] }, { 'english':"a", 'count' : 1164, 'trans' : [ ("eka" , 0.14900491), ("kaisai" , 0.08834675), ("haai" , 0.06774697), ("kaoi" , 0.05394308), ("kai" , 0.04981982), ("\(none\)" , 0.04400085), ("kaa" , 0.03726579), ("kae" , 0.03446450), ], } ] for word in words: print word['english'],word['trans'][0][0] -- regards kg http://lawgon.livejournal.com ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] How should I do it?
On Fri, Jan 15, 2010 at 2:17 PM, Noufal Ibrahim wrote: > On Fri, Jan 15, 2010 at 1:04 PM, Dhananjay Nene >wrote: > > > This seems to be an output of print_r of PHP. If you have a flexibility, > > try > > to have the PHP code output the data into a language neutral format (eg > > json, yaml, xml etc.) and then parse it in python using the appropriate > > parser. If not you may have to write a custom parser. I did google to > find > > if one existed, but couldn't easily locate one. > > > > > There is > http://www.php.net/manual/en/book.json.php for PHP and Python2.6 onwards > has json part of the stdlib. > > If you don't have access to the webserver, you might be able to use the php > interpreter on your own machine to parse this into something more language > neutral > If you take a look at your data, it is surprisingly close to how a nested Python dictionary will look like, except that instead of ':' to separate key from value, it uses '=>', which is what Perl and PHP uses. So, the following solution takes advantage of this fact and converts your data to a Python dictionary. Here is the complete solution. def scrub(data): # First replace [code][/code] parts data = data.replace('[code]','').replace('[/code]','') # Replace '=>' with ':' data = data.replace('=>',':') # Now, count and trans are not strings in # data, so Python will complain, hence we # define these as strings with same name! count, trans = 'count','trans' # Now prefix data with { and post-fix with } data = '{' + data + '}' print data # Eval it to a dictionary mydict = eval(data) print mydict if __name__ == "__main__": scrub(open('data.txt').read()) And it neatly prints as, {'a': {'count': 1164, 'trans': {'kaoi': 0.0539430797, 'kaa': 0.03726579, 'haai': 0.0677469704, 'kaisai': 0.0883467502, 'kae': 0.0344645002, 'kai': 0.0498198201, 'eka': 0.149004909, '\\(none\\)': 0.0440008501}}, 'confident': {'count': 4, 'trans': {'mailatae': 0.0285642699, 'ashahvasahta': 0.749185676, 'anaa': 0.0157855201, 'jaitanae': 0.01227762, 'pahraaram\\.nbha': 0.0699072897, 'utanai': 0.01929341, 'atahmavaishahvaasa': 0.0909546498, 'uthaanae': 0.01403157}}, 'consumers': {'count': 4, 'trans': {'sauda\\\xef\xbf\xbd\\\xef\xbf\xbd\\\xef\xbf\xbddha': 0.11875471, 'upabhaokahtaa': 0.751443618, 'upabhaokahtaaom\\.n': 0.129801661}}} Now, use the data as a Python dictionary. It is a clever hack, taking advantage of the nature of the data. But it is far more faster than the other approaches posted here. --Anand > > > -- > ~noufal > http://nibrahim.net.in > ___ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > -- --Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] How should I do it?
On Fri, Jan 15, 2010 at 1:04 PM, Dhananjay Nene wrote: > This seems to be an output of print_r of PHP. If you have a flexibility, > try > to have the PHP code output the data into a language neutral format (eg > json, yaml, xml etc.) and then parse it in python using the appropriate > parser. If not you may have to write a custom parser. I did google to find > if one existed, but couldn't easily locate one. > There is http://www.php.net/manual/en/book.json.php for PHP and Python2.6 onwards has json part of the stdlib. If you don't have access to the webserver, you might be able to use the php interpreter on your own machine to parse this into something more language neutral -- ~noufal http://nibrahim.net.in ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers