On 9/14/05, Allen Gilliland <[EMAIL PROTECTED]> wrote: > I have comments, but let me first explain the work I prototyped so that you > have an idea of how I approached the problem. > > To add tag support I did 2 things: > 1. added a form field for collecting tags on the weblog entry form. > 2. created a TagServlet for finding entries with various combinations of tags. > > I created custom tables for storing my tags ... > > create table roller_weblogentrytags( > id varchar(48) not null, > tag varchar(32) not null, > weblogentryid varchar(48) not null, > primary key(id)); > > I puposely setup the table to be ultra simple and partially denormalized > (i.e. the relationship is really many-to-many, but i didn't setup the tables > that way). So the table only maintains the tag value and what entry it > corresponds to. At the time I wasn't concerned with maintaining any other > metadata regarding a single tag. > > The TagServlet then allows users to enter a combination of keywords and see > what entries come up. The urls for the TagServlet are like this ... > > /entries/some+combo+of+tags > /entries/atagvalue > > using a "+" indicates an intersection, i.e. only show results that include > all the listed tags. > > Obviously adding the tag collection form fields is trivial, but knowing how > to store the data properly so that it's efficient to lookup various tag > combinations is tough. I would love to see what the data model for a site > like del.icio.us looks like because it would give us some great insights into > how they maintain efficiency. > > Caching could be tough because we will likely get extremely varied queries > with different combinations of tags. Then on top of that it would be nice to > have the ability to do a lot of the "popular tags" stuff that del.icio.us > does. > > Anways ... I have comments on your email inline below ...
The reason, why I'm not striving for lookup efficiency it's because I wanted to leave it up to Lucene or in the IBM case to OmniFind search engine to deal with the queries. I believe Lucene has a way to add query terms so you can say posts with tag:apple and tag:farm, etc. Plus of course, the added benefit of having tags for technorati to consume in the rendered templates. I don't think that /tag/apple+farm is something that Roller users are in desperate need of at this moment, but I could be wrong. Let's talk more to see what we should be concentrating on for this feature. > > > On Wed, 2005-09-14 at 12:25, Elias Torres wrote: > > Allen, > > > > I was thinking of using the entryattribute table, what do you think? I > > don't think that we want another table for every little feature. At > > first I was thinking of something simple, like a "text" field a la > > del.icio.us as another + in the settings section of the post that can > > be edited by the user anytime. Maybe then using the Tag render plugin > > for just rendering the tags and also making sure that Lucene indexes > > the tags as well. I don't think we need to worry about the big content > > or technorati style dashboards yet, but at least start collecting the > > data. > > hmmm, it depends on how you want to use the entry attribute table. were you > going to set a single attribute called "tags" which is a list of all the > entry tags? or are you planning to do an attribute per tag? > > i agree that getting tag data is important, but if you can't use that tag > data for something useful then what's the point? if we are going to do tag > support then i'd at least like to see some way of finding tagged entries > included in the first release. > > > > > I do have a problem with the entryattribute table in general because > > it's very limiting. For example, it's really cumbersome if I want to > > store both the tag and the date it was inserted on. Even worse, if I > > had another piece of metadata about that tagging to insert. It works > > for MediaCast right now because you only have attributes about the > > entry and not about the actualy entry metadata. I had mentioned to > > Dave on IRC that since my day job is on Semantic Web stuff, maybe > > making that table a more RDF-friendly table would be really cool for > > Roller. > > this is murky water if you ask me. i think i like the fact that the > entryattribute table is a simple hashtable of data attached to a weblog > entry, that keeps it simple. if you need to relate complex data to an entry > then it's probably best that you create a new table for that data. > > i am all for reuse of existing architecture as long as it works, but if it is > going to inhibit our ability to effectively use the tags then i say forget > the entryattribute table and go ahead and do whatever you need to do. > > i don't really know what you mean by RDF-friendly, so you'd have to elaborate > more. > Basically the entryid column should be a normal column that can take a URI/URL and not just an entryid and maybe another column for entryid so we can fetch quickly all of the triples associated with that entry. I can then do this: <entryid-1> <hasTagging> <tagging1> <entryid-1> <hasTagging> <tagging2> <tagging1> <dc:date> "2005-09-13" <tagging1> <tag> "blogs" <tagging2> <dc:date> "2005-09-15" <tagging2> <tag> "farm" plus things like: <tagging1> <syn> "weblogs" <tagging1> <syn> "blog" ... Again, if we are going to bake Tags into the core, then the table you mentioned would be best for the servlet to render entries. But for any entryattribute/metadata, I think the RDF might be more flexible for things like structuredblogging. > -- Allen > > > > > > What does everyone think? Who else is using the entryattribute table > > besides MediaCast? > > > > Elias > > > > On 9/14/05, Allen Gilliland <[EMAIL PROTECTED]> wrote: > > > Elias, > > > > > > I had actually began working on tag support and prototyped it back in > > > July, but I didn't get much feedback/support on it so I focused on some > > > other things instead. I still have some code that works if that would > > > help. > > > > > > http://www.rollerweblogger.org/wiki/Wiki.jsp?page=Proposal_WeblogTags > > > > > > It looks like I had also started a very simple design doc which you are > > > welcome to elaborate on. To be honest I don't think adding tag support > > > takes much code, but it will require a significant amount of design > > > because it will require a lot of dynamic content on concievably large > > > sets of data. > > > > > > I'm definitely looking forward to what you come up with, this would be a > > > great addition to Roller. > > > > > > -- Allen > > > > > > > > > Elias Torres wrote: > > > > > > >I have updated my patch to now work with DB2. Everything seems to be > > > >working beautifully. > > > > > > > >http://torrez.us/2005/08/23/roller/patches/db2_derby.hibernate3.patch > > > > > > > >Regards, > > > > > > > >Elias > > > > > > > >PS> Now onto tagging. > > > > > > > >Heads up. I would like to add tagging to Roller, possibly using the > > > >metadata table. I'll try to draft something up on the wiki. > > > > > > > >On 9/13/05, Elias Torres <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > >>Hi Everyone, > > > >> > > > >>After getting the nice upgrade to Hibernate 3 by Dave, I started > > > >>working on testing Derby support first, then DB2. I only found a > > > >>couple of issues with Derby so far, everything seems to run fine. > > > >> > > > >>Here's my patch: > > > >>http://torrez.us/2005/08/23/roller/patches/derby_hibernate3.patch > > > >> > > > >>Basically, > > > >> > > > >>There was a getInt() that doesn't seem to work on strings for Derby, > > > >>so I did this: > > > >>- dbversion = rs.getInt(1); > > > >>+ dbversion = Integer.parseInt(rs.getString(1)); > > > >> > > > >>The next one was a query in HibernateRefererManagerImpl.java which is > > > >>not performed via Hibernate and there was a "limit" keyword which is > > > >>not supported by Derby. I first tried the HSQL version, but Derby > > > >>doesn't support TOP either. I added a check on the loop for max > > > >>results, somebody please verify that this is ok. Thanks. > > > >> > > > >>Elias > > > >> > > > >>PS> Now onto DB2. > > > >> > > > >> > > > >> > > > > >