python php/html file upload issue
Hi all, I want to upload a file from python to php/html form using urllib2,and my code is below PYTHON CODE: import urllib import urllib2,sys,traceback url='http://localhost/index2.php' values={} f=open('addons.xcu','r') values['datafile']=f.read() #is this correct ? values['Submit']='True' data = urllib.urlencode(values) req = urllib2.Request(url,data) response = urllib2.urlopen(req,data) the_page = response.read() print the_page PHP/HTML CODE: html body ?php if(isset($_POST[Submit])) { echo upload-file-name:.$_FILES['datafile']['name'].br/; } else { ? form enctype=multipart/form-data action=?php echo $_SERVER[PHP_SELF]; ? method=post Please enter a file name :input type=file name=datafile size=100 input type=submit value=submit name=Submit/ /form ?php } ? /body /html But the *$_FILES['datafile']['name']* in response is always empty...I can not guess what went wrong with my code , I will be happy, if you can figure out the problem. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Need to store dictionary in file
Hi all, I have a dictionary in which each key is associated with a list as value. eg: *dic={'a':['aa','ant','all']}* The dictionary contains *1.5 lakh keys*. Now i want to store it to a file,and need to be loaded to python program during execution. I expect your ideas/suggestions. Note:I think cPickle,bsddb can be used for this purpose,but i want to know the best solution which runs *faster*. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Re: Levenshtein word comparison -performance issue
On Sat, Feb 14, 2009 at 3:01 PM, Peter Otten __pete...@web.de wrote: Gabriel Genellina wrote: En Fri, 13 Feb 2009 08:16:00 -0200, S.Selvam Siva s.selvams...@gmail.com escribió: I need some help. I tried to find top n(eg. 5) similar words for a given word, from a dictionary of 50,000 words. I used python-levenshtein module,and sample code is as follow. def foo(searchword): disdict={} for word in self.dictionary-words: distance=Levenshtein.ratio(searchword,word) disdict[word]=distance sort the disdict dictionary by values in descending order similarwords=sorted(disdict, key=disdict.__getitem__, reverse=True) return similarwords[:5] You may replace the last steps (sort + slice top 5) by heapq.nlargest - at least you won't waste time sorting 49995 irrelevant words... Anyway you should measure the time taken by the first part (Levenshtein), it may be the most demanding. I think there is a C extension for this, should be much faster than pure Python calculations. [I didn't see the original post] You can use the distance instead of the ratio and put the words into bins of the same length. Then if you find enough words with a distance = 1 in the bin with the same length as the search word you can stop looking. You might be able to generalize this approach to other properties that are fast to calculate and guarantee a minimum distance, e. g. set(word). Peter -- http://mail.python.org/mailman/listinfo/python-list Thank you all for your response, [sorry,I was away for a while.] I used functools,heapq modules but that helped me a little, then i categorized the words depending on the length and compares with a small set(each set 5/4=12,500), so now its taking quarter of time as compared to older method. Further, can i use Thread to achieve parallel comparison ?,as i have little knowledge on python-thread. Will the following code achive parallelism? thread1= threading.Thread(target=self.findsimilar, args=(1,searchword,dic-word-set1) thread2= threading.Thread(target=self.findsimilar, args=(2,searchword,dic-word-set1) thread3= threading.Thread(target=self.findsimilar, args=(3,searchword,dic-word-set1) thread1.start() thread2.start() thread3.start() thread1.join() thread2.join() thread3.join() I would like to hear suggestion. Note:The issue is i am developing spell checker for my local languge,i may use more than 2.5 lakh words,so i need to have a best way to find out alternative wordlist -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Levenshtein word comparison -performance issue
Hi all, I need some help. I tried to find top n(eg. 5) similar words for a given word, from a dictionary of 50,000 words. I used python-levenshtein module,and sample code is as follow. def foo(searchword): disdict={} for word in self.dictionary-words: distance=Levenshtein.ratio(searchword,word) disdict[word]=distance sort the disdict dictionary by values in descending order similarwords=sorted(disdict, key=disdict.__getitem__, reverse=True) return similarwords[:5] foo() takes a search word and compares it with dictionary of 50,000 and assigns each word a value(lies between 0 to 1). Then after sorting in descending order it returns top 5 similar words. The problem is, it* takes long time* for processing(as i need to pass more search words within a loop),i guess the code could be improved to work efficiently.Your suggestions are welcome... -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
BeautifulSoup -converting unicode to numerical representaion
Hi all, I need to parse feeds and post the data to SOLR.I want the special characters(Unicode char) to be posted as numerical representation, For eg, *'* -- #8217; (for which HTML equivalent is rsquo;) I used BeautifulSoup,which seems to be allowing conversion from #;( numeric values )to unicode characters as follow, *hdes=str(BeautifulStoneSoup(strdesc, convertEntities=BeautifulStoneSoup.HTML_ENTITIES)) xdesc=str(BeautifulStoneSoup(hdes, convertEntities=BeautifulStoneSoup.XML_ENTITIES))* But i want *numerical representation of unicode characters.* I also want to convert html representation like rsquo; to its numeric equivalent #8217; Thanks in advance. *Note:* The reason for the above requirement is i need a standard way to post to SOLR to avoid errors. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
string replace for back slash
Hi all, I tried to do a string replace as follows, s=hi people s.replace(,\) 'hi \\ people' but i was expecting 'hi \ people'.I dont know ,what is something different here with escape sequence. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Re: string replace for back slash
On Thu, Feb 5, 2009 at 5:59 PM, rdmur...@bitdance.com wrote: S.Selvam Siva s.selvams...@gmail.com wrote: I tried to do a string replace as follows, s=hi people s.replace(,\) 'hi \\ people' but i was expecting 'hi \ people'.I dont know ,what is something different here with escape sequence. You are running into the difference between the 'repr' of a string (which is what is printed by default at the python prompt) and the actual contents of the string. In the repr the backslash needs to be escaped by prefixing it with a backslash, just as you would if you wanted to enter a backslash into a string in your program. If you print the string, you'll see there is only one backslash. Note that you didn't need to double the backslash in your replacement string only because it wasn't followed by a character that forms an escape...but the repr of that string will still have the backslash doubled, and that is really the way you should write it in your program to begin with for safety's sake. Python 2.6.1 (r261:67515, Jan 7 2009, 17:09:13) [GCC 4.3.2] on linux2 Type help, copyright, credits or license for more information. s=hi people replacementstring = \ replacementstring '\\' print replacementstring \ x = s.replace(,\\) x 'hi \\ people' print x hi \ people -- http://mail.python.org/mailman/listinfo/python-list Thank you all for your response, Now i understood the way python terminal expose '\'. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
importing module-performance
Hi all, I have a small query, Consider there is a task A which i want to perform. To perform it ,i have two option. 1)Writing a small piece of code(approx. 50 lines) as efficient as possible. 2)import a suitable module to perform task A. I am eager to know,which method will produce best performance? -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Re: importing module-performance
On Mon, Feb 2, 2009 at 3:11 PM, Chris Rebert c...@rebertia.com wrote: On Mon, Feb 2, 2009 at 1:29 AM, S.Selvam Siva s.selvams...@gmail.com wrote: Hi all, I have a small query, Consider there is a task A which i want to perform. To perform it ,i have two option. 1)Writing a small piece of code(approx. 50 lines) as efficient as possible. 2)import a suitable module to perform task A. I am eager to know,which method will produce best performance? A. Your question seems much too vague to answer. B. Premature optimization is the root of all evil. In all likelihood, the time taken by the `import` will be absolutely trivial compared to the rest of the script, so don't bother micro-optimizing ahead of time; write readable code first, then worry about optimization once it's working perfectly. Cheers, Chris -- Follow the path of the Iguana... http://rebertia.com Thank you Chris, I faced a optimization problem as follow, For fuzzy string comparison initially i used 15 lines of code which compares a word with 29,000 words in a list .For each set of words compared, the 15-line code produce number of differences characters of the two words. But when i used python-levenshtein module for same reason it has run faster than the old method.This invoked me to raise that query. Now i understood that, it is not an issue of importing the module/writing the code, but the problem must be with my 15-line code. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Re: date handling problem
On Thu, Jan 29, 2009 at 2:27 PM, M.-A. Lemburg m...@egenix.com wrote: On 2009-01-29 03:38, Gabriel Genellina wrote: En Wed, 28 Jan 2009 18:55:21 -0200, S.Selvam Siva s.selvams...@gmail.com escribió: I need to parse rss-feeds based on time stamp,But rss-feeds follow different standards of date(IST,EST etc). I dont know,how to standardize this standards.It will be helpful if you can hint me. You may find the Olson timezone database useful. http://pytz.sourceforge.net/ Or have a look at the date/time parser in mxDateTime: http://www.egenix.com/products/python/mxBase/mxDateTime/ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 29 2009) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- http://mail.python.org/mailman/listinfo/python-list Thank you all, The link was really nice and i will try it out. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
date handling problem
Hi all, I need to parse rss-feeds based on time stamp,But rss-feeds follow different standards of date(IST,EST etc). I dont know,how to standardize this standards.It will be helpful if you can hint me. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Re: String comparision
Thank You Gabriel, On Sun, Jan 25, 2009 at 7:12 AM, Gabriel Genellina gagsl-...@yahoo.com.arwrote: En Sat, 24 Jan 2009 15:08:08 -0200, S.Selvam Siva s.selvams...@gmail.com escribió: I am developing spell checker for my local language(tamil) using python. I need to generate alternative word list for a miss-spelled word from the dictionary of words.The alternatives must be as much as closer to the miss-spelled word.As we know, ordinary string comparison wont work here . Any suggestion for this problem is welcome. I think it would better to add Tamil support to some existing library like GNU aspell: http://aspell.net/ That was my plan earlier,But i am not sure how aspell integrates with other editors.Better i will ask it in aspell mailing list. You are looking for fuzzy matching: http://en.wikipedia.org/wiki/Fuzzy_string_searching In particular, the Levenshtein distance is widely used; I think there is a Python extension providing those calculations. -- Gabriel Genellina The following code served my purpose,(thanks for some unknown contributors) def distance(a,b): c = {} n = len(a); m = len(b) for i in range(0,n+1): c[i,0] = i for j in range(0,m+1): c[0,j] = j for i in range(1,n+1): for j in range(1,m+1): x = c[i-1,j]+1 y = c[i,j-1]+1 if a[i-1] == b[j-1]: z = c[i-1,j-1] else: z = c[i-1,j-1]+1 c[i,j] = min(x,y,z) return c[n,m] a=sys.argv[1] b=sys.argv[2] d=distance(a,b) print d=,d longer = float(max((len(a), len(b shorter = float(min((len(a), len(b r = ((longer - d) / longer) * (shorter / longer) # r ranges between 0 and 1 -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
String comparision
Hi all, I am developing spell checker for my local language(tamil) using python. I need to generate alternative word list for a miss-spelled word from the dictionary of words.The alternatives must be as much as closer to the miss-spelled word.As we know, ordinary string comparison wont work here . Any suggestion for this problem is welcome. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
python resource management
Hi all, I have found the actual solution for this problem. I tried using BeautifulSoup.SoupStrainer() and it improved memory usage to the greatest extent.Now it uses max of 20 MB(earlier it was 800 MB on 1GB RAM system). thanks all. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Re: python resource management
On Tue, Jan 20, 2009 at 7:27 PM, Tim Arnold tim.arn...@sas.com wrote: I had the same problem you did, but then I changed the code to create a new soup object for each file.That drastically increased the speed. I don't know why, but it looks like the soup object just keeps getting bigger with each feed. --Tim I have found the actual solution for this problem. I tried using *BeautifulSoup.SoupStrainer()* and it improved memory usage to the greatest extent.Now it uses max of 20 MB(earlier it was 800 MB on 1GB RAM system). thanks all. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
python resource management
Hi all, I am running a python script which parses nearly 22,000 html files locally stored using BeautifulSoup. The problem is the memory usage linearly increases as the files are being parsed. When the script has crossed parsing 200 files or so, it consumes all the available RAM and The CPU usage comes down to 0% (may be due to excessive paging). We tried 'del soup_object' and used 'gc.collect()'. But, no improvement. Please guide me how to limit python's memory-usage or proper method for handling BeautifulSoup object in resource effective manner -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Re: Extracting real-domain-name (without sub-domains) from a given URL
On Tue, Jan 13, 2009 at 1:50 PM, Chris Rebert c...@rebertia.com wrote: On Mon, Jan 12, 2009 at 11:46 PM, S.Selvam Siva s.selvams...@gmail.com wrote: Hi all, I need to extract the domain-name from a given url(without sub-domains). With urlparse, i am able to fetch only the domain-name(which includes the sub-domain also). eg: http://feeds.huffingtonpost.com/posts/ , http://www.huffingtonpost.de/, all must lead to huffingtonpost.com or huffingtonpost.de Please suggest me some ideas regarding this problem. That would require (pardon the pun) domain-specific logic. For most TLDs (e.g. .com, .org) the domain name is just blah.com, blah.org, etc. But for ccTLDs, often only second-level registrations are allowed, e.g. for www.bbc.co.uk, so the main domain name would be bbc.co.uk I think a few TLDs have even more complicated rules. I doubt anyone's created a general ready-made solution for this, you'd have to code it yourself. To handle the common case, you can cheat and just .split() at the periods and then slice and rejoin the list of domain parts, ex: '.'.join(domain.split('.')[-2:]) Cheers, Chris Thank you Chris Rebert, Actually i tried with domain specific logic.Having 200 TLD like .com,co.in,co.uk and tried to extract the domain name. But my boss want more reliable solution than this method,any way i will try to find some alternative solution. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Extracting real-domain-name (without sub-domains) from a given URL
Hi all, I need to extract the domain-name from a given url(without sub-domains). With urlparse, i am able to fetch only the domain-name(which includes the sub-domain also). eg: http://feeds.huffingtonpost.com/posts/ , http://www.huffingtonpost.de/, all must lead to *huffingtonpost.com or huffingtonpost.de** * Please suggest me some ideas regarding this problem. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
pygtkspell-help
Hello, i am in a process of writing spell checker for my local language(Tamil). I wrote a plugin for gedit with pygtk for gui.Recently i came to know about pygtkspell ,that can be used for spell checking and suggestion offering. I am bit congused about it and could not able to get useful info by googling.It will be nice if someone can direct me in right way(may be by giving appropriate links or example program) -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Posting File as a parameter to PHP/HTML using HTTP POST
I am trying to post file from python to php using HTTP POST method. I tried mechanize but not able to pass the file object. from mechanize import Browser br=Browser() response=br.open(http://localhost/test.php;) br.select_form('form1') br['uploadedfile']=open(C:/Documents and Settings/user/Desktop/Today/newurl-ideas.txt) response=br.submit() print response.read() But, i get the error: br['uploadedfile']=open(C:/Documents and Settings/user/Desktop/Today/newurl -ideas.txt) File C:\Python25\lib\site-packages\clientform-0.2.9-py2.5.egg\ClientForm.py, line 2880, in __setitem__ ValueError: value attribute is readonly But, When uploading is done using browser, it works. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Re: Posting File as a parameter to PHP/HTML using HTTP POST
I myself have found the solution. Instead of: br[br['uploadedfile']=open(C:/ Documents and Settings/user/Desktop/Today/newurl-ideas.txt) We Need to use: br.add_file(open(C:/ Documents and Settings/user/Desktop/Today/newurl-ideas.txt), filename=newurl-ideas.txt,name=uploadedfile) On Tue, Dec 2, 2008 at 1:33 PM, S.Selvam Siva [EMAIL PROTECTED]wrote: I am trying to post file from python to php using HTTP POST method. I tried mechanize but not able to pass the file object. from mechanize import Browser br=Browser() response=br.open(http://localhost/test.php;) br.select_form('form1') br['uploadedfile']=open(C:/Documents and Settings/user/Desktop/Today/newurl-ideas.txt) response=br.submit() print response.read() But, i get the error: br['uploadedfile']=open(C:/Documents and Settings/user/Desktop/Today/newurl -ideas.txt) File C:\Python25\lib\site-packages\clientform-0.2.9-py2.5.egg\ClientForm.py, line 2880, in __setitem__ ValueError: value attribute is readonly But, When uploading is done using browser, it works. -- Yours, S.Selvam -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
parsing javascript
I have to do a parsing on webpagesand fetch urls.My problem is ,many urls i need to parse are dynamically loaded using javascript function (onload()).How to fetch those links from python? Thanks in advance. -- http://mail.python.org/mailman/listinfo/python-list