Hello everyone, Just to state it again, I'm no Pylons developer, but a Python veteran, and the discussion about search engines caught my attention and I thought that I might add my 2 cents to it :)
On 8/13/07, Marcin Kasperski <[EMAIL PROTECTED]> wrote: > >> 3) What is the typical approach to generate things like > >> table of contents, page-list menu etc. Do people tend > >> to write such things manually? If I would like to > >> scan my package for files and generate indexes automatically, > >> should I use pkg_resources, os, or sth else? > > > > Dynamic applications usually require dynamic indexes, so you write > > these like any other dynamic page. If you have a lot of static files > > you want to index, you can use a template to create a static > > index.html in websetup.py. If you want to guarantee the index is up > > to date every time the application starts, put the code in > > environment.py or middleware.py. > > I am not sure whether I wrote my question correctly - I wanted > to ask HOW usually such code looks like - considering that > a) app can be packaged into egg and b) app can mix dynamic > forms, templates and static pages it need not be so obvious. > > Maybe I should rather ask about example of content-oriented > (noticeable number of text pages) pylons website I could > take a look at to learn sth... > > Thanks for the idea of generating some indexes during startup, > it can be more efficient than building them always on demand. On a pure myghty project I developed, I did an own component which renders the menu (with caching), and just call it whereever I need it. It parses an XML file and generates the menu out of it, so it is quite flexible, and the caching makes it quite fast. But I'm an XML junkie ;) I don't know if this would work in a pylons/mako environment, just to give you a hint what other people did in similar projects. > >> 4) What about search? (...) > >> > > packages like pyLucene might help. (...) > > Thanks for the hint, I will take a look. I understand that there > is no established practice for pylons. My experiences with pyLucene are honestly not the best. I tried to use it in the same Myghty project, where the pages are served by mod_python. The problem with pyLucene in my eyes is that it uses Jython to make calls to the actual .jar files, and just wraps a thin layer of python around the actual java. So you get some strange datatypes when using pyLucene. And the memory usage of my apache process just blasted up so much, because it had to load a complete java environment as well. But the real problem was more the one of search database corruptions, file locks which only disappeared by restarting apache and things like this. After a week of playing around with it, I disbanded the approach, because it wasn't nearly as stable as I would need it in a production environment. But this was like 10 months ago, maybe stability has increased, I didn't check it lately. It may also work better in a pure paste environment, but I can't recommend it together with Apache. Lucene by itsself is incredibly powerful in terms of indexing and searching, and really really fast. I use it now, but in combination with solr (http://lucene.apache.org/solr/). This is an own wrapper for lucene which runs in a Java servlet container, and you can index and search using standard http requests, and solr returns datastructures which can just be eval'd by python, and you have the results as a standard python object, quite neat. I'm no fan of java servlet containers, but in that case it works better for me, because I have a real distinction between the processes that server the html pages and the processes that perform the search, and I can control how memory usage is and stuff like that. But lucene may be a bit bloated for just indexing some simple webpages. It strengths lie more in indexing custom data (like XML ;) ), with indexing on different fields, and things like that. It is also incredibly fast, once I test-loaded my solr with 1 million XML documents, and even combined searches on subfields, or fulltext searches over all the documents finished in 100ms or so. You may take also a look into Xapian, I have heard it getting mentioned once or twice in the past months when coming to search. I haven't actually tried it yet, but it looks interesting. But for just indexing web pages (even dynamically generated) ones, a solution with htdig or similar programs may be better. I had great success for one project using swish-e for fulltext indexing and search. And you can define crawler programs to index the rendered pages. But searching and indexing just runs by spawning a subprocess to the shell, and letting a program do the search. The result is easily parsable, but it may be beyond the restrictions for the project you are working. But I had no problems yet on a high traffic/high volume site with it (a site serving around 500k pageviews/50k visits a month), it just works. For simple HTML indexing this may be the tool of my choice. > >> 5) How would you implement simple constants referred thorough > >> the site in different templates? At the moment I have sth > >> like > >> _const.mako > >> <%def name="author_name()" buffered="True">Marcin > >> Kasperski</%def> > > > > I put those in the base controller's .__call__ method: > > > > c.author_name = u"Marcin Kaperski". > > Hmm. This is a little bit troublesome when I need to override sth > for just a few pages (duplicating controller looks like overkill). > But, well, maybe there is no good solution. Thx for the hint. Again, in myghty I have stuff in my base template which fills an m.global_args["config"] dictionary, so the stuff is available on all pages which use the template..... But again, it may work differently for myghty and pylons ;) I hope I could offer some help! Jens --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~----------~----~----~----~------~----~------~--~---
