date:20100115

Re: [BangPypers] How should I do it?

2010-01-15 Thread Dhananjay Nene

On Fri, Jan 15, 2010 at 12:01 PM, Eknath Venkataramani <
eknath.i...@gmail.com> wrote:

> I have a txt file in the following format:
> [code]
> "confident" => {
>  count => 4,
>  trans => {
> "ashahvasahta" => 0.74918568,
>"atahmavaishahvaasa" => 0.09095465,
>"pahraaram\.nbha" => 0.06990729,
> "mailatae" => 0.02856427,
>   "utanai" => 0.01929341,
> "anaa" => 0.01578552,
> "uthaanae" => 0.01403157,
> "jaitanae" => 0.01227762,
>},
> },
> "consumers" => {
>  count => 4,
>  trans => {
>"upabhaokahtaa" => 0.75144362,
>"upabhaokahtaaom\.n" => 0.12980166,
>"sauda\�\�\�dha" => 0.11875471,
>},
> },
> "a" => {
>  count => 1164,
>  trans => {
>  "eka" => 0.14900491,
>   "kaisai" => 0.08834675,
> "haai" => 0.06774697,
> "kaoi" => 0.05394308,
>  "kai" => 0.04981982,
> "\(none\)" => 0.04400085,
>  "kaa" => 0.03726579,
>  "kae" => 0.03446450,
>},
> },
> [/code]
>
> and I need to extract "confident" , "ashahvasahta" from the first
> record, "consumers",  "upabhaokahtaa" from the second record...
> i.e. "word in english" and the "first word in the probable-translations"
>
> Thanks is advance
> Eknath
> ___
> BangPypers mailing list
> BangPypers@python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>

Since I hadn't had a chance to write a recursive descent parser, took this
opportunity to do a bit of an exercise.
I have used a parser called pyparsing.

-- Begin Code --
# coding=utf-8
from pyparsing import *
import pprint
import sys

data = '''
"confident" => {
   count => 4,
   trans => {
 "ashahvasahta" => 0.74918568,
 "atahmavaishahvaasa" => 0.09095465,
 "pahraaram\.nbha" => 0.06990729,
 "mailatae" => 0.02856427,
 "utanai" => 0.01929341,
 "anaa" => 0.01578552,
 "uthaanae" => 0.01403157,
 "jaitanae" => 0.01227762,
   },
},
"consumers" => {
 count => 4,
 trans => {
   "upabhaokahtaa" => 0.75144362,
   "upabhaokahtaaom\.n" => 0.12980166,
   "sauda\�\�\�dha" => 0.11875471,
   },
},
"a" => {
 count => 1164,
 trans => {
 "eka" => 0.14900491,
  "kaisai" => 0.08834675,
"haai" => 0.06774697,
"kaoi" => 0.05394308,
 "kai" => 0.04981982,
"\(none\)" => 0.04400085,
 "kaa" => 0.03726579,
 "kae" => 0.03446450,
   },
}
'''

# Setup pyparsing tokens
dct = Forward()
pair_op = Literal("=>")
comma = Literal(",").suppress()
beg_brace = Literal("{").suppress()
end_brace = Literal("}").suppress()
num = Word("0123456789.")
key = (Word(alphas + nums) ^ quotedString).setResultsName("key")
val = (num ^ dct).setResultsName("value")
key_value_pair = Group(key + pair_op.suppress() + val)
key_value_pair_list = delimitedList(key_value_pair)
dct << Group(beg_brace + key_value_pair_list + Optional(comma) + end_brace)

# parse data
parsed = key_value_pair_list.parseString(data)

# function to extract ie. form a python datastructure
def extract(result):
if 'key' in result.keys() :
if isinstance(result.value,ParseResults) :
return ( result.key,  extract(result.value) )
else :
return ( result.key,  result.value )
else :
return(dict(extract(elem) for elem in result))

# extract
extracted = extract(parsed)

# print extracted data
pprint.pprint(extracted, sys.stdout)

# print the english word and first translated word

print "\n\n\nTranslations\n\n"
print dict(
(english,
 reduce(lambda x,y : (y[0],float(y[1])) if float(y[1]) > x[1] else x
,
translations['trans'].items(),
('',0.0))[0]
) for english,translations in extracted.items()
)

-- End Code --

Dhananjay

-- 

blog: http://blog.dhananjaynene.com
twitter: http://twitter.com/dnene http://twitter.com/_pythonic
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] psyco V2

2010-01-15 Thread Noufal Ibrahim

On Fri, Jan 15, 2010 at 5:27 PM, Vishal  wrote:

> Hi,
>
> Came across this post on codespeak.
> Christian Tismer of Stackless fame as taken up pysco and created a V2 of
> it...and seems the effort continues...
>
> http://codespeak.net/pipermail/pypy-dev/2009q3/005288.html


Nice.. Thanks for pointing this out.


-- 
~noufal
http://nibrahim.net.in
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

[BangPypers] psyco V2

2010-01-15 Thread Vishal

Hi,

Came across this post on codespeak.
Christian Tismer of Stackless fame as taken up pysco and created a V2 of
it...and seems the effort continues...

http://codespeak.net/pipermail/pypy-dev/2009q3/005288.html

Interesting stuff...

Best regards,
Vishal Sapre
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

2010-01-15 Thread Noufal Ibrahim

On Fri, Jan 15, 2010 at 4:13 PM, Anand Chitipothu wrote:

> On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose 
> wrote:
> >> It is a clever hack, taking advantage of the nature of the data. But
> >> it is far more faster than the other approaches posted here.
> >
> > I thought eval was evil :)
>
> The date looks like valid json. You can use simplejson.loads instead of
> eval.
>

Don't the '=>' characters mess things up?

One of the nice things about the repr of Python objects is that they're
almost valid JSON. The same can't be said for PHP though.

-- 
~noufal
http://nibrahim.net.in
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

2010-01-15 Thread Anand Balachandran Pillai

On Fri, Jan 15, 2010 at 4:13 PM, Anand Chitipothu wrote:

> On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose 
> wrote:
> >> It is a clever hack, taking advantage of the nature of the data. But
> >> it is far more faster than the other approaches posted here.
> >
> > I thought eval was evil :)
>
> The date looks like valid json. You can use simplejson.loads instead of
> eval.
>
> Python 2.6.2 (r262:71600, Aug 21 2009, 12:23:57)
[GCC 4.4.1 20090818 (Red Hat 4.4.1-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import simplejson
>>> data=open('data.txt').read().replace('[code]','').replace('[/code]','')
>>> data
'\n"confident" => {\n count => 4,\n trans => {\n"ashahvasahta" =>
0.74918568,\n   "atahmavaishahvaasa" => 0.09095465,\n   "pahraaram\\.nbha"
=> 0.06990729,\n"mailatae" => 0.02856427,\n  "utanai" =>
0.01929341,\n"anaa" => 0.01578552,\n"uthaanae" =>
0.01403157,\n"jaitanae" => 0.01227762,\n   },\n},\n"consumers" =>
{\n count => 4,\n trans => {\n   "upabhaokahtaa" => 0.75144362,\n
"upabhaokahtaaom\\.n" => 0.12980166,\n
"sauda\\\xef\xbf\xbd\\\xef\xbf\xbd\\\xef\xbf\xbddha" => 0.11875471,\n
},\n},\n"a" => {\n count => 1164,\n trans => {\n "eka" =>
0.14900491,\n  "kaisai" => 0.08834675,\n"haai" =>
0.06774697,\n"kaoi" => 0.05394308,\n "kai" =>
0.04981982,\n"\\(none\\)" => 0.04400085,\n "kaa" =>
0.03726579,\n "kae" => 0.03446450,\n   },\n},\n\n'
>>> simplejson.loads(data)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python2.6/site-packages/simplejson/__init__.py", line
307, in loads
return _default_decoder.decode(s)
  File "/usr/lib64/python2.6/site-packages/simplejson/decoder.py", line 338,
in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 13 - line 37 column 1 (char 13 - 815)


Anand
> ___
> BangPypers mailing list
> BangPypers@python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>



-- 
--Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

2010-01-15 Thread Anand Chitipothu

On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose  wrote:
>> It is a clever hack, taking advantage of the nature of the data. But
>> it is far more faster than the other approaches posted here.
>
> I thought eval was evil :)

The date looks like valid json. You can use simplejson.loads instead of eval.

Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

2010-01-15 Thread Anand Balachandran Pillai

On Fri, Jan 15, 2010 at 4:03 PM, Noufal Ibrahim  wrote:

> On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose  >wrote:
>
> > > It is a clever hack, taking advantage of the nature of the data. But
> > > it is far more faster than the other approaches posted here.
> >
> > I thought eval was evil :)
> >
>
> Given that the OPs data is fixed, eval is okay. :)
>
> Otherwise, it could be evil or unreliable (eg. => inside some of the data
> strings etc.)
>

 My sentiments are the same. As long as you are sure of
 the safety of your data such as absence of %s etc which
 could cause security issues, eval is safe. It can often be
 used for quick short-cuts such as the one above.

 I started off with a regular expression solution first, but
 after I observed that the pattern fits a recursive dict,
 changed tack.

 Btw "eval" is spelled e-v-a-l, not e-v-i-l :)

>
> --
> ~noufal
> http://nibrahim.net.in
> ___
> BangPypers mailing list
> BangPypers@python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>

-- 
--Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

2010-01-15 Thread Noufal Ibrahim

On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose wrote:

> > It is a clever hack, taking advantage of the nature of the data. But
> > it is far more faster than the other approaches posted here.
>
> I thought eval was evil :)
>

Given that the OPs data is fixed, eval is okay. :)

Otherwise, it could be evil or unreliable (eg. => inside some of the data
strings etc.)

-- 
~noufal
http://nibrahim.net.in
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

2010-01-15 Thread Baishampayan Ghose

> It is a clever hack, taking advantage of the nature of the data. But
> it is far more faster than the other approaches posted here.

I thought eval was evil :)

Regards,
BG

-- 
Baishampayan Ghose
b.ghose at gmail.com
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

2010-01-15 Thread Roshan Mathews

On Fri, Jan 15, 2010 at 2:40 PM, Anand Balachandran Pillai
 wrote:
>    # Now, count and trans are not strings in
>    # data, so Python will complain, hence we
>    # define these as strings with same name!
>    count, trans = 'count','trans'
>
Clever, that.  I got to there, threw up my hands and went downstairs
to eat lunch.

  -- rm
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

2010-01-15 Thread Kenneth Gonsalves

On Friday 15 Jan 2010 12:01:56 pm Eknath Venkataramani wrote:
> and I need to extract "confident" , "ashahvasahta" from the first
> record, "consumers",  "upabhaokahtaa" from the second record...
> i.e. "word in english" and the "first word in the probable-translations"
> 

#!/usr/bin/python

words = [{'english':"confident",
  'count' : 4,
  'trans' : [
 ("ashahvasahta" , 0.74918568),
("atahmavaishahvaasa" , 0.09095465),
("pahraaram\.nbha" , 0.06990729),
 ("mailatae" , 0.02856427),
   ("utanai" , 0.01929341),
 ("anaa" , 0.01578552),
 ("uthaanae" , 0.01403157),
 ("jaitanae" , 0.01227762),
],
},
{'english':"consumers",
  'count' : 4,
  'trans' : [
("upabhaokahtaa" , 0.75144362),
("upabhaokahtaaom\.n" , 0.12980166),
]
},
{ 'english':"a",
  'count' : 1164,
  'trans' : [
  ("eka" , 0.14900491),
   ("kaisai" , 0.08834675),
 ("haai" , 0.06774697),
 ("kaoi" , 0.05394308),
  ("kai" , 0.04981982),
 ("\(none\)" , 0.04400085),
  ("kaa" , 0.03726579),
  ("kae" , 0.03446450),
],
}
]

for word in words:
print word['english'],word['trans'][0][0]
-- 
regards
kg
http://lawgon.livejournal.com
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

2010-01-15 Thread Anand Balachandran Pillai

On Fri, Jan 15, 2010 at 2:17 PM, Noufal Ibrahim  wrote:

> On Fri, Jan 15, 2010 at 1:04 PM, Dhananjay Nene  >wrote:
>
> > This seems to be an output of print_r of PHP. If you have a flexibility,
> > try
> > to have the PHP code output the data into a language neutral format (eg
> > json, yaml, xml etc.) and then parse it in python using the appropriate
> > parser. If not you may have to write a custom parser. I did google to
> find
> > if one existed, but couldn't easily locate one.
> >
>
>
> There is
>  http://www.php.net/manual/en/book.json.php for PHP and Python2.6 onwards
> has json part of the stdlib.
>
> If you don't have access to the webserver, you might be able to use the php
> interpreter on your own machine to parse this into something more language
> neutral
>

 If you take a look at your data, it is surprisingly close to how
 a nested Python dictionary will look like, except that instead of
 ':' to separate key from value, it uses '=>', which is what Perl
 and PHP uses.

 So, the following solution takes advantage of this fact and
 converts your data to a Python dictionary.

 Here is the complete solution.

def scrub(data):
# First replace [code][/code] parts
data = data.replace('[code]','').replace('[/code]','')
# Replace '=>' with ':'
data = data.replace('=>',':')
# Now, count and trans are not strings in
# data, so Python will complain, hence we
# define these as strings with same name!
count, trans = 'count','trans'

# Now prefix data with { and post-fix with }
data = '{' + data + '}'
print data

# Eval it to a dictionary
mydict = eval(data)
print mydict

if __name__ == "__main__":
scrub(open('data.txt').read())

 And it neatly prints as,

{'a': {'count': 1164, 'trans': {'kaoi': 0.0539430797, 'kaa':
0.03726579, 'haai': 0.0677469704, 'kaisai': 0.0883467502,
'kae': 0.0344645002, 'kai': 0.0498198201, 'eka':
0.149004909, '\\(none\\)': 0.0440008501}}, 'confident':
{'count': 4, 'trans': {'mailatae': 0.0285642699, 'ashahvasahta':
0.749185676, 'anaa': 0.0157855201, 'jaitanae': 0.01227762,
'pahraaram\\.nbha': 0.0699072897, 'utanai': 0.01929341,
'atahmavaishahvaasa': 0.0909546498, 'uthaanae': 0.01403157}},
'consumers': {'count': 4, 'trans':
{'sauda\\\xef\xbf\xbd\\\xef\xbf\xbd\\\xef\xbf\xbddha': 0.11875471,
'upabhaokahtaa': 0.751443618, 'upabhaokahtaaom\\.n':
0.129801661}}}

 Now, use the data as a Python dictionary.

It is a clever hack, taking advantage of the nature of the data. But
it is far more faster than the other approaches posted here.

--Anand

>
>
> --
> ~noufal
> http://nibrahim.net.in
> ___
> BangPypers mailing list
> BangPypers@python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>

-- 
--Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

2010-01-15 Thread Noufal Ibrahim

On Fri, Jan 15, 2010 at 1:04 PM, Dhananjay Nene wrote:

> This seems to be an output of print_r of PHP. If you have a flexibility,
> try
> to have the PHP code output the data into a language neutral format (eg
> json, yaml, xml etc.) and then parse it in python using the appropriate
> parser. If not you may have to write a custom parser. I did google to find
> if one existed, but couldn't easily locate one.
>


There is
 http://www.php.net/manual/en/book.json.php for PHP and Python2.6 onwards
has json part of the stdlib.

If you don't have access to the webserver, you might be able to use the php
interpreter on your own machine to parse this into something more language
neutral


-- 
~noufal
http://nibrahim.net.in
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] How should I do it?

Re: [BangPypers] psyco V2

[BangPypers] psyco V2

Re: [BangPypers] How should I do it?

Re: [BangPypers] How should I do it?

Re: [BangPypers] How should I do it?

Re: [BangPypers] How should I do it?

Re: [BangPypers] How should I do it?

Re: [BangPypers] How should I do it?

Re: [BangPypers] How should I do it?

Re: [BangPypers] How should I do it?

Re: [BangPypers] How should I do it?

Re: [BangPypers] How should I do it?

13 matches

Site Navigation

Mail list logo

Footer information