Re: [Web-SIG] Announcing bobo

2009-06-17 Thread Noah Gift
+1 Bobo.  I like stuff Jim writes.

-- 
Cheers,

Noah
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI Open Space @ PyCon.

2009-03-28 Thread Noah Gift
On Sun, Mar 29, 2009 at 5:10 PM, Robert Brewer fuman...@aminus.org wrote:

 Hi all,

 We had a good second meeting and answered more issues. My understanding
 is that there is another BoF scheduled for tomorrow (Sunday). Check the
 Open Space board for details.

 Those present at the second meeting:

  * Mark Ramm (TG)
  * Mike Orr (Pylons)
  * Bob Brewer (CherryPy)
  * Ian Bicking (Paste, etc)
  * Alan Kennedy (WSGI gateway servlets/Jython)
  * Rick Copeland (TG)
  * James Bennett (Django)
  * Gary Poster (Launchpad)
  * Chris McDonough (Zope, repoze, etc)
  * Garrett Smith (async WSGI server and middleware)
  * Kumar McMillan (Pylons)
  * Alex Morega (WSGI user)
  * Andrew Sawyer (lurker)
  * Marcus Cavanaugh (Pylons)
  * David Reed (used to be Twisted.web2 maintainer)
  * 8+ others, mostly lurking


 Revisited Topic: Unicode values in the WSGI environ
 ---

 Consensus: Response status and headers MUST BE unicode. Doing otherwise
 (handling both unicode and byte string) would unnecessarily complicate
 the construction of middleware components. Origin HTTP servers MUST
 decode these to the appropriate bytestrings (all ISO-8859-1?) before
 writing them out to the socket.


 Revisited Topic: wsgi.input
 ---

 I raised the issue that, if wsgi.input were an iterable, many apps would
 just have to take the extra step of wrapping it in a file-like object
 anyway to pass to cgi.Fieldstorage. Others reopened the desire to allow
 the app to determine the size of each read().

 We didn't reach consensus, IMO. Alan argued for an iterable to more
 easily support asynchronous servers.


+1 on the iterator, although I might just like the idea and might be missing
something important.  It seems like there are a lot of powerful things being
developed with generators in mind, and there are some nifty things you can
do with them like the contextlib example:
http://docs.python.org/library/contextlib.html#contextlib.closing

Glad to hear a wide range of people showed, even a Django person :)


 The counter-argument was that
 servers could use non-blocking sockets to allow apps which read() to
 yield in the case of no immediate data rather than block indefinitely.
 If a file-like object were retained, it would help to publish a
 chainable file example to help middleware re-stream files they read any
 part of.


 Response iterable type
 --

 The current spec says all strings referred to in this specification
 must be of type str or StringType. James asked if this could be
 loosened to str-like objects. Perhaps we could replace strict typing
 with an ABC requirement? General consensus: -0.


 Continuing deferred issues
 --

  * Lots of little changes: the server's supported HTTP version,
   file_wrapper edge cases, etc.
  * Python 3, and the scheduling of WSGI improvements
  * Asynchronous WSGI support. Mostly non-existent. Fix it? Fork it?
   Drop it?
  * Lifecycle methods (start/stop/etc event API driven by the container)
  * Remove app_iter.close?


 Robert Brewer
 fuman...@aminus.org

 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/noah.gift%40gmail.com




-- 
Cheers,

Noah
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HTML parsing - get text position and font size

2009-01-12 Thread Noah Gift
2009/1/13 Girish Redekar girish.rede...@gmail.com:
 I'm trying to build a search engine in python am stuck at the place where I
 parse HTML to get useful text. One should ideally be able to parse the text
 (out of HTML tags) along with its position (for phrase searches) and
 font-size (to weigh words appropriately).

 However, this part gets very tedious (especially with bad html and css) and
 my code is already unwieldy. It seems to me that this task should've been a
 part of any python based semi-sophisticated screen scraper and that it would
 be a commonly solved problem. Yet, no amount of googling has returned
 anything useful.

 Any ideas?

I wrote this article a way back:

http://www.ibm.com/developerworks/aix/library/au-threadingpython/

I didn't fully explore it, but it seems like thread pools and
Beautiful Soup could work...


 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/noah.gift%40gmail.com


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com