Re: Tagging is coming along

Allen Gilliland Thu, 21 Sep 2006 13:45:24 -0700


Elias Torres wrote:


Allen Gilliland wrote:


Elias Torres wrote:

Guys,

I have committed my initial coding in tagging and believe have gotten
quiet far on this first pass.

http://svn.apache.org/viewvc?view=rev&revision=448460

If you have a chance to try it, you should be able to do the following:

- Create a new entry with tags
- Edit an existing entry and add tags
- Frontpage should display tags in the entry summary
- A new tab "Site Tags" in the frontpage will give you a nice cloud
- A Hot Tags section on the right
- /roller/frontpageblog/tags/tag1+tag2 should work.

The part that got really confusing to me was the pager part. I
definitely need guidance here on what is the right way to go. I'm also
confused at the moment on caching. It seems like the frontpage is
caching anything as the front page. I just noticed all of this last once
I finally got a basic pager working, but it's a start. Please start
reviewing and tell me what parts of the code are too hacky and need to
be cleaned up.

cool, that's definitely a good start.

I think we may have goofed a little on one part of the proposal though,
which is the getWeblogEntriesByTags() method in WeblogManager.  I think
that instead of adding that method what we really want to do is update
the existing getWeblogEntriesXXXMap() methods to support tags.  The
problem with the existing method is that it returns a list, which is not
really how we do things right now, instead we return a map which has
sorted the result set by time and keyed each map entry by date.  i.e. so
that each map entry is a date with a list of entries from that date.


I had mentioned it to Dave but I ran into a problem. The query (an
optimal one) was written using HQL and the current
getWeblogEntriesXXXMap() uses CriteriaAPI. I'd love help converting my
simple query to CriteriaAPI, if not, another API is more efficient.

I don't have any preference between using HQL or the CriteriaAPI, butthe root of the problem is still that we don't actually want to returnthose results as a list, and that we probably need to add tags to thatgetWeblogEntriesXXXMap() method anyways.

Part of this decision hinges on whether or not we want to provide thepossibility of querying for a set of entries not only constrained bytags, but also constrained by other criteria as well. i.e. entries withtags=foo+bar and category=blah and date=200608

The way the method is setup now is pretty limiting because it onlyaccepts a weblog and a set of tags, so you can't limit the result set,can't specify a time boundary, etc. I think we probably want to improveon that and it's probably easiest to do that by adding tag support tothe existing methods.

Once we do that then I think the pager part is easier.  I don't think
you need your own pager actually, I think you can just make sure that
the existing pagers are able to make use of tags.  That way when someone
uses the url /tags/foo+bar you are just using the LatestPager with a
constraint to a set of tags.


I'm fine with that.

Caching is another big consideration which wasn't addressed in the
proposal at all, so we'll still have to sort that out.  A basic rundown
of caching goes like this ...

there are 3 rendering caches: sitewide, weblog pages, weblog feeds.
everything from the sitewide weblog goes into that cache because it has
special considerations with the way it expires content.  everything else
 goes into the page and feed caches.

everything that is cached for a weblog is based off of the
weblog.lastModified attribute.  if that attribute is updated then all
the content for a weblog is expired and will be rerendered on the next
request.

content is put in the caches in the same way for all 3 caches and only
the page and feed servlets use the caches.  the process is basically ...

1. parse request into a version of XXXWeblogRequest object.
2. build cache key based on XXXWeblogRequest object.
3. check if cached content exists and is still valid.
4. render page (if needed)
5. cache it

you can look in the WeblogPageCache and WeblogFeedCache objects to see
how the cache key generation works, it basically just inspects the
attributes in the XXXWeblogRequest object and builds a unique string out
of it.  you will definitely need to update that to make sure the cache
keys work for tags.


Thanks for the description. I'm afraid to mention this, but developers
docs would be cool for the community. I'll definitely update the cache
key generation code for tags.


agreed.

the tricky part now is figuring out how we actually want the caching to
work for tags.  this is something that should probably be configurable
if at all possible.  i can see people wanted to completely disable
caching for tags (possibly default?), enable it for single tags only, etc.


For single blogs we can cache everything, the question is how to expire
site-wide tags output.

i would agree that for individual weblogs the default should probably beto cache everything, but there are also potential risks in that. forreasonably large sites you could quickly fill up your cache with lots ofrendered pages for tag combinations that that would only be viewed once,i.e. if someone looked for tags=foo+bar+etc and it's unlikely many otherpeople looked for that exact same combo. that becomes a problem ifthose pages push other important pages out of the cache.

caching for the site-wide weblog in general is pretty tricky sincetechnically it is supposed to respond and invalidate when *any* data inthe system changes. that's obviously not very efficient, so it may makesense to provide some ways for people to tune that a bit. onepossibility is a forced cache time, so that pages are always used forXXX minutes before being expired regardless of if the data changes.heck, it may even make some sense to define a completely new cachespecifically for tag pages and feeds so that they can be treatedspecially if desired.


-- Allen

let me know if you have more questions about it.

-- Allen

Happy tagging!

-Elias

Re: Tagging is coming along

Reply via email to