python php/html file upload issue

2009-03-21 Thread S.Selvam Siva
 Hi all,
I want to upload a file from python to php/html form using urllib2,and my
code is below

PYTHON CODE:
import urllib
import urllib2,sys,traceback
url='http://localhost/index2.php'
values={}
f=open('addons.xcu','r')
values['datafile']=f.read() #is this correct ?
values['Submit']='True'
data = urllib.urlencode(values)
req = urllib2.Request(url,data)
response = urllib2.urlopen(req,data)
the_page = response.read()
print the_page

PHP/HTML CODE:
html

body
?php
if(isset($_POST[Submit]))
{
   echo upload-file-name:.$_FILES['datafile']['name'].br/;

} else {
  ?
form enctype=multipart/form-data action=?php echo
$_SERVER[PHP_SELF]; ? method=post
Please enter a file name :input type=file name=datafile
size=100
 input type=submit value=submit name=Submit/
/form
?php
}
?
/body
/html

But the *$_FILES['datafile']['name']* in response is always empty...I can
not guess what went wrong with my code ,
I will be happy, if you can figure out the problem.


-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Need to store dictionary in file

2009-02-23 Thread S.Selvam Siva
Hi all,

I have a dictionary in which each key is associated with a list as value.

eg: *dic={'a':['aa','ant','all']}*

The dictionary contains *1.5 lakh keys*.
Now i want to store it to a file,and need to be loaded to python program
during execution.
I expect your ideas/suggestions.

Note:I think cPickle,bsddb can be used for this purpose,but i want to know
the best solution which runs *faster*.

-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Re: Levenshtein word comparison -performance issue

2009-02-19 Thread S.Selvam Siva
On Sat, Feb 14, 2009 at 3:01 PM, Peter Otten __pete...@web.de wrote:

 Gabriel Genellina wrote:

  En Fri, 13 Feb 2009 08:16:00 -0200, S.Selvam Siva 
 s.selvams...@gmail.com
  escribió:
 
  I need some help.
  I tried to find top n(eg. 5) similar words for a given word, from a
  dictionary of 50,000 words.
  I used python-levenshtein module,and sample code is as follow.
 
  def foo(searchword):
  disdict={}
  for word in self.dictionary-words:
 distance=Levenshtein.ratio(searchword,word)
 disdict[word]=distance
  
   sort the disdict dictionary by values in descending order
  
  similarwords=sorted(disdict, key=disdict.__getitem__, reverse=True)
 
  return similarwords[:5]
 
  You may replace the last steps (sort + slice top 5) by heapq.nlargest -
 at
  least you won't waste time sorting 49995 irrelevant words...
  Anyway you should measure the time taken by the first part (Levenshtein),
  it may be the most demanding. I think there is a C extension for this,
  should be much faster than pure Python calculations.
 

 [I didn't see the original post]

 You can use the distance instead of the ratio and put the words into bins
 of
 the same length. Then if you find enough words with a distance = 1 in the
 bin with the same length as the search word you can stop looking.

 You might be able to generalize this approach to other properties that are
 fast to calculate and guarantee a minimum distance, e. g. set(word).

 Peter
 --
 http://mail.python.org/mailman/listinfo/python-list



Thank you all for your response,

[sorry,I was away for a while.]
I used functools,heapq modules but that helped me a little,
then i categorized the words depending on the length and
compares with a small set(each set 5/4=12,500),
so now its taking quarter of time as compared to older method.

Further, can i use Thread to achieve parallel comparison ?,as i have little
knowledge on python-thread.
Will the following code achive parallelism?

thread1= threading.Thread(target=self.findsimilar,
args=(1,searchword,dic-word-set1)
   thread2= threading.Thread(target=self.findsimilar,
args=(2,searchword,dic-word-set1)
   thread3= threading.Thread(target=self.findsimilar,
args=(3,searchword,dic-word-set1)
   thread1.start()
   thread2.start()
   thread3.start()
   thread1.join()
   thread2.join()
   thread3.join()

I would like to hear suggestion.
Note:The issue is i am developing spell checker for my local languge,i may
use more than 2.5 lakh words,so i need to have a best way to find out
alternative wordlist
-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Levenshtein word comparison -performance issue

2009-02-13 Thread S.Selvam Siva
Hi all,

I need some help.
I tried to find top n(eg. 5) similar words for a given word, from a
dictionary of 50,000 words.
I used python-levenshtein module,and sample code is as follow.

def foo(searchword):
disdict={}
for word in self.dictionary-words:
   distance=Levenshtein.ratio(searchword,word)
   disdict[word]=distance

 sort the disdict dictionary by values in descending order

similarwords=sorted(disdict, key=disdict.__getitem__, reverse=True)

return similarwords[:5]

foo() takes a search word and compares it with dictionary of 50,000 and
assigns each word a value(lies between 0 to 1).
Then after sorting in descending order it returns top 5 similar words.

The problem is, it* takes long time* for processing(as i need to pass more
search words within a loop),i guess the code could be improved to work
efficiently.Your suggestions are welcome...
-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


BeautifulSoup -converting unicode to numerical representaion

2009-02-09 Thread S.Selvam Siva
Hi all,

I need to parse feeds and post the data to SOLR.I want the special
characters(Unicode char) to be posted as numerical representation,

For eg,
*'* -- #8217; (for which HTML equivalent is rsquo;)
I used BeautifulSoup,which seems to be allowing conversion from #;(
numeric values )to unicode characters as follow,

*hdes=str(BeautifulStoneSoup(strdesc,
convertEntities=BeautifulStoneSoup.HTML_ENTITIES))
xdesc=str(BeautifulStoneSoup(hdes,
convertEntities=BeautifulStoneSoup.XML_ENTITIES))*

But i want *numerical representation of unicode characters.*
I also want to convert html representation like rsquo; to its numeric
equivalent #8217;

Thanks in advance.

*Note:*
The reason for the above requirement is i need a standard way to post to
SOLR to avoid errors.
-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


string replace for back slash

2009-02-05 Thread S.Selvam Siva
Hi all,

I tried to do a string replace as follows,

 s=hi  people
 s.replace(,\)
'hi \\ people'


but i was expecting 'hi \ people'.I dont know ,what is something different
here with escape sequence.

-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Re: string replace for back slash

2009-02-05 Thread S.Selvam Siva
On Thu, Feb 5, 2009 at 5:59 PM, rdmur...@bitdance.com wrote:

 S.Selvam Siva s.selvams...@gmail.com wrote:
  I tried to do a string replace as follows,
 
   s=hi  people
   s.replace(,\)
  'hi \\ people'
  
 
  but i was expecting 'hi \ people'.I dont know ,what is something
 different
  here with escape sequence.

 You are running into the difference between the 'repr' of a string (which
 is what is printed by default at the python prompt) and the actual
 contents of the string.  In the repr the backslash needs to be escaped
 by prefixing it with a backslash, just as you would if you wanted to
 enter a backslash into a string in your program.  If you print the string,
 you'll see there is only one backslash.  Note that you didn't need to
 double the backslash in your replacement string only because it wasn't
 followed by a character that forms an escape...but the repr of that
 string will still have the backslash doubled, and that is really the
 way you should write it in your program to begin with for safety's sake.

 Python 2.6.1 (r261:67515, Jan  7 2009, 17:09:13)
 [GCC 4.3.2] on linux2
 Type help, copyright, credits or license for more information.
  s=hi  people
  replacementstring = \
  replacementstring
 '\\'
  print replacementstring
 \
  x = s.replace(,\\)
  x
 'hi \\ people'
  print x
 hi \ people

 --
 http://mail.python.org/mailman/listinfo/python-list



Thank you all for your response,

Now i understood the way python terminal expose '\'.



-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


importing module-performance

2009-02-02 Thread S.Selvam Siva
Hi all,
I have a small query,
Consider there is a task A which i want to perform.

To perform it ,i have two option.
1)Writing a small piece of code(approx. 50 lines) as efficient as possible.
2)import a suitable module to perform task A.


I am eager to know,which method will produce best performance?
-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Re: importing module-performance

2009-02-02 Thread S.Selvam Siva
On Mon, Feb 2, 2009 at 3:11 PM, Chris Rebert c...@rebertia.com wrote:

 On Mon, Feb 2, 2009 at 1:29 AM, S.Selvam Siva s.selvams...@gmail.com
 wrote:
  Hi all,
  I have a small query,
  Consider there is a task A which i want to perform.
 
  To perform it ,i have two option.
  1)Writing a small piece of code(approx. 50 lines) as efficient as
 possible.
  2)import a suitable module to perform task A.
 
 
  I am eager to know,which method will produce best performance?

 A. Your question seems much too vague to answer.
 B. Premature optimization is the root of all evil. In all likelihood,
 the time taken by the `import` will be absolutely trivial compared to
 the rest of the script, so don't bother micro-optimizing ahead of
 time; write readable code first, then worry about optimization once
 it's working perfectly.

 Cheers,
 Chris
 --
 Follow the path of the Iguana...
 http://rebertia.com



Thank you Chris,

 I faced a optimization problem as follow,

For fuzzy string comparison initially i used 15 lines of code which compares
a word with 29,000 words in a list .For each set of words compared, the
15-line code produce number of differences characters of the two words.

But when i used python-levenshtein module for same reason it has run faster
than the old method.This invoked me to raise that query.

Now i understood that, it is not an issue of importing the module/writing
the code, but the problem must be with my 15-line code.


-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Re: date handling problem

2009-01-29 Thread S.Selvam Siva
On Thu, Jan 29, 2009 at 2:27 PM, M.-A. Lemburg m...@egenix.com wrote:

 On 2009-01-29 03:38, Gabriel Genellina wrote:
  En Wed, 28 Jan 2009 18:55:21 -0200, S.Selvam Siva
  s.selvams...@gmail.com escribió:
 
  I need to parse rss-feeds based on time stamp,But rss-feeds follow
  different
  standards of date(IST,EST etc).
  I dont know,how to standardize this standards.It will be helpful if
  you can
  hint me.
 
  You may find the Olson timezone database useful.
  http://pytz.sourceforge.net/

 Or have a look at the date/time parser in mxDateTime:

 http://www.egenix.com/products/python/mxBase/mxDateTime/

 --
 Marc-Andre Lemburg
 eGenix.com

 Professional Python Services directly from the Source  (#1, Jan 29 2009)
  Python/Zope Consulting and Support ...http://www.egenix.com/
  mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
  mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/
 

 ::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
 --
 http://mail.python.org/mailman/listinfo/python-list


Thank you all,
  The link was really nice and i will try it out.

-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


date handling problem

2009-01-28 Thread S.Selvam Siva
Hi all,

I need to parse rss-feeds based on time stamp,But rss-feeds follow different
standards of date(IST,EST etc).
I dont know,how to standardize this standards.It will be helpful if you can
hint me.



-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Re: String comparision

2009-01-25 Thread S.Selvam Siva
Thank You Gabriel,

On Sun, Jan 25, 2009 at 7:12 AM, Gabriel Genellina
gagsl-...@yahoo.com.arwrote:

 En Sat, 24 Jan 2009 15:08:08 -0200, S.Selvam Siva s.selvams...@gmail.com
 escribió:


  I am developing spell checker for my local language(tamil) using python.
 I need to generate alternative word list for a miss-spelled word from the
 dictionary of words.The alternatives must be as much as closer to the
 miss-spelled word.As we know, ordinary string comparison wont work here .
 Any suggestion for this problem is welcome.


 I think it would better to add Tamil support to some existing library like
 GNU aspell: http://aspell.net/



That was my plan earlier,But i am not sure how aspell integrates with other
editors.Better i will ask it in aspell mailing list.


 You are looking for fuzzy matching:
 http://en.wikipedia.org/wiki/Fuzzy_string_searching
 In particular, the Levenshtein distance is widely used; I think there is a
 Python extension providing those calculations.

 --
 Gabriel Genellina

The following code served my purpose,(thanks for some unknown contributors)
def distance(a,b):
c = {}
n = len(a); m = len(b)

for i in range(0,n+1):
c[i,0] = i
for j in range(0,m+1):
c[0,j] = j

for i in range(1,n+1):
for j in range(1,m+1):
x = c[i-1,j]+1
y = c[i,j-1]+1
if a[i-1] == b[j-1]:
z = c[i-1,j-1]
else:
z = c[i-1,j-1]+1
c[i,j] = min(x,y,z)
return c[n,m]

a=sys.argv[1]
b=sys.argv[2]
d=distance(a,b)
print d=,d
longer = float(max((len(a), len(b
shorter = float(min((len(a), len(b
r = ((longer - d) / longer) * (shorter / longer)
# r ranges between 0 and 1




-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


String comparision

2009-01-24 Thread S.Selvam Siva
Hi all,

I am developing spell checker for my local language(tamil) using python.
I need to generate alternative word list for a miss-spelled word from the
dictionary of words.The alternatives must be as much as closer to the
miss-spelled word.As we know, ordinary string comparison wont work here .
Any suggestion for this problem is welcome.

-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


python resource management

2009-01-20 Thread S.Selvam Siva
Hi all,

I have found the actual solution for this problem.
I tried using BeautifulSoup.SoupStrainer() and it improved memory usage
to the greatest extent.Now it uses max of 20 MB(earlier
it was 800 MB on 1GB RAM system).
thanks all.

-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Re: python resource management

2009-01-20 Thread S.Selvam Siva
On Tue, Jan 20, 2009 at 7:27 PM, Tim Arnold tim.arn...@sas.com wrote:

 I had the same problem you did, but then I changed the code to create a new
 soup object for each file.That drastically increased the speed.  I don't
 know why, but it looks like the soup object just keeps getting bigger with
 each feed.

 --Tim

 I have found the actual solution for this problem.
I tried using *BeautifulSoup.SoupStrainer()* and it improved memory
usage to the
greatest extent.Now it uses max of 20 MB(earlier
it was 800 MB on 1GB RAM system).
thanks all.



-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


python resource management

2009-01-19 Thread S.Selvam Siva
Hi all,

I am running a python script which parses nearly 22,000 html files locally
stored using BeautifulSoup.
The problem is the memory usage linearly increases as the files are being
parsed.
When the script has crossed parsing 200 files or so, it consumes all the
available RAM and The CPU usage comes down to 0% (may be due to excessive
paging).

We tried 'del soup_object'  and used 'gc.collect()'. But, no improvement.

Please guide me how to limit python's memory-usage or proper method for
handling BeautifulSoup object in resource effective manner

-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Re: Extracting real-domain-name (without sub-domains) from a given URL

2009-01-13 Thread S.Selvam Siva
On Tue, Jan 13, 2009 at 1:50 PM, Chris Rebert c...@rebertia.com wrote:

 On Mon, Jan 12, 2009 at 11:46 PM, S.Selvam Siva s.selvams...@gmail.com 
 wrote:
  Hi all,
 
I need to extract the domain-name from a given url(without sub-domains).
  With urlparse, i am able to fetch only the domain-name(which includes the
  sub-domain also).
 
  eg:
http://feeds.huffingtonpost.com/posts/ , http://www.huffingtonpost.de/,
   all must lead to huffingtonpost.com or huffingtonpost.de
 
  Please suggest me some ideas regarding this problem.

 That would require (pardon the pun) domain-specific logic. For most
 TLDs (e.g. .com, .org) the domain name is just blah.com, blah.org,
 etc. But for ccTLDs, often only second-level registrations are
 allowed, e.g. for www.bbc.co.uk, so the main domain name would be
 bbc.co.uk  I think a few TLDs have even more complicated rules.

 I doubt anyone's created a general ready-made solution for this, you'd
 have to code it yourself.
 To handle the common case, you can cheat and just .split() at the
 periods and then slice and rejoin the list of domain parts, ex:
 '.'.join(domain.split('.')[-2:])

 Cheers,
 Chris


Thank you Chris Rebert,
  Actually i tried with domain specific logic.Having 200 TLD like
.com,co.in,co.uk and tried to extract the domain name.
  But my boss want more reliable solution than this method,any way i
will try to find some alternative solution.


--
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Extracting real-domain-name (without sub-domains) from a given URL

2009-01-12 Thread S.Selvam Siva
Hi all,

  I need to extract the domain-name from a given url(without sub-domains).
With urlparse, i am able to fetch only the domain-name(which includes the
sub-domain also).

eg:
  http://feeds.huffingtonpost.com/posts/ , http://www.huffingtonpost.de/,
 all must lead to *huffingtonpost.com or huffingtonpost.de**
*
Please suggest me some ideas regarding this problem.


-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


pygtkspell-help

2009-01-01 Thread S.Selvam Siva
Hello,
i am in a process of writing spell checker for my local language(Tamil).
I wrote a plugin for gedit with pygtk for gui.Recently i came to know about
pygtkspell ,that can be used for spell checking and suggestion offering.
I am bit congused about it and could not able to get useful info by
googling.It will be nice if someone can direct me in right way(may be by
giving appropriate links or example program)

-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Posting File as a parameter to PHP/HTML using HTTP POST

2008-12-02 Thread S.Selvam Siva
I am trying to post file from python to php using HTTP POST method. I tried
mechanize but not able to pass the file object.

from mechanize import Browser
br=Browser()
response=br.open(http://localhost/test.php;)
br.select_form('form1')
br['uploadedfile']=open(C:/Documents and
Settings/user/Desktop/Today/newurl-ideas.txt)
response=br.submit()
print response.read()

But, i get the error:
br['uploadedfile']=open(C:/Documents and
Settings/user/Desktop/Today/newurl
-ideas.txt)
  File
C:\Python25\lib\site-packages\clientform-0.2.9-py2.5.egg\ClientForm.py,
 line 2880, in __setitem__
ValueError: value attribute is readonly

But,
When uploading is done using browser, it works.
-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Re: Posting File as a parameter to PHP/HTML using HTTP POST

2008-12-02 Thread S.Selvam Siva
I myself have found the solution.

Instead of:
br[br['uploadedfile']=open(C:/

 Documents and Settings/user/Desktop/Today/newurl-ideas.txt)


We Need to use:
br.add_file(open(C:/

 Documents and Settings/user/Desktop/Today/newurl-ideas.txt),
 filename=newurl-ideas.txt,name=uploadedfile)



On Tue, Dec 2, 2008 at 1:33 PM, S.Selvam Siva [EMAIL PROTECTED]wrote:

 I am trying to post file from python to php using HTTP POST method. I tried
 mechanize but not able to pass the file object.

 from mechanize import Browser
 br=Browser()
 response=br.open(http://localhost/test.php;)
 br.select_form('form1')
 br['uploadedfile']=open(C:/Documents and
 Settings/user/Desktop/Today/newurl-ideas.txt)
 response=br.submit()
 print response.read()

 But, i get the error:
 br['uploadedfile']=open(C:/Documents and
 Settings/user/Desktop/Today/newurl
 -ideas.txt)
   File
 C:\Python25\lib\site-packages\clientform-0.2.9-py2.5.egg\ClientForm.py,
  line 2880, in __setitem__
 ValueError: value attribute is readonly

 But,
 When uploading is done using browser, it works.
 --
 Yours,
 S.Selvam




-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


parsing javascript

2008-10-12 Thread S.Selvam Siva
I have to do a parsing on webpagesand fetch urls.My problem is ,many urls i
need to parse are dynamically loaded using javascript function
(onload()).How to fetch those links from python? Thanks in advance.
--
http://mail.python.org/mailman/listinfo/python-list