Re: Tagging support in Roller

Elias Torres Wed, 13 Sep 2006 11:03:32 -0700

I replied to your good comments and updated the wiki. Could we discuss
making the 3.1 branch and moving 3.0 branch to trunk?


-Elias

Allen Gilliland wrote:
> 
> 
> Elias Torres wrote:
>>
>> Allen Gilliland wrote:
>>> Ok.  I took another pass over the proposal and have a few thoughts/ideas
>>> I think we can talk about here on the list to get this moving.
>>
>> Excellent, thanks for taking a look at it.
>>
>>> 1. We should ditch the current branch based on roller 2.1 and recreate
>>> it using the 3.0 codebase, migrating whatever code is still relevant.
>>
>> +1
>>
>>> 2. I suggest we follow Anil's suggestion in the issues section and not
>>> try and make an decisions about categories vs. tags.  That's definitely
>>> still an important debate, but I suggest we get tags going first then we
>>> can actually see how each is used and make a decision based on acquired
>>> feedback rather than speculation.
>>
>> +1
>>
>>> 3. I suggest we hold off on the tagging for weblogs and just do tags for
>>> weblog entries.  This will just narrow the scope of the proposal a bit.
>>>
>>
>> +0 I hadn't noticed the suggestion until yesterday but it seems like a
>> fine idea. If we want to do it later, that's fine, but I don't mind
>> doing it together.
> 
> I mainly suggested this to keep things simple.  We know that we want
> tagging on entries, but tagging on weblogs could be debated.  If we
> leave it out for now then the proposal means less work and greater
> liklihood that we can get it done in time.  Plus, we don't even know if
> people really want tagging at the weblog level, it may be something that
> we would do that's not really all that useful.
> 
> So I still suggest we strip it out and just do tagging on weblog entries.
> 
> 

I'm ok with it.

>>
>>> As far as the data model, classes, and methods are concerned ...
>>>
>>> i think the weblogentrytag table looks pretty good, but i'm wondering if
>>> we really need a reverence to username, what is that for?  also, how is
>>> the tagtime supposed to work?  does it only get set once when the tag is
>>> added or does it get updated when the tags are updated?
>>
>> tagtime is mostly for analysis of when a specific tag was inserted to
>> that entry, since authors can re-tag their entries. I'm fine with just
>> using entry modified time, but then the question is whether we duplicate
>> that time as well. For now, I'm fine with taking it out as well. What do
>> you think?
>>
>>> i'm not sure i understand all of the proposed manager method additions
>>> either ...
>>>
>>> getWebsiteTags(WebsiteData website) - not needed
>>
>> If we don't implement Website tagging.
>>
>>> getWeblogEntriesByTag(WebsiteData website, String tag) - ok
>>> getWeblogEntriesByTag(String tag) - ditch, just use method above
>>
>> +1
>>
>>> getAllTags() - how? this could return thousands of results
>>
>> This is for tcloud (I forgot to mention that the return is not TagData
>> but TagCloudEntry (a pair of tagname and count)).
>>
>>> getAllTags(WebsiteData website) - again, how?  why?
>>
>> Website cloud of entry tags.
> 
> doesn't that make it of even greater concern?  i would still expect a
> decent sized site to have thousands of unique tags and then to get an
> aggregate count of each of those tags to return in this method would be
> a lot of data.
> 
> i don't have a problem with this as long as the results can be limited
> some how.
> 
> 

ok. I have been thinking of having a table

create table websitetagcloud {
  id
  websiteid
  name
  count
};

so we can return this data quickly and we can do some limits here such
as only tags with count > 1 or something like that. I've updated the
Wiki page with this and other changes.

>>
>>> getTagsOrderByCount(WebsiteData website, int count) - ok, for cloud?
>>
>> I guess we don't need a hottags for a specific site and could probably
>> be done with getAllTags(WebsiteData)
>>
>>> getTagsOrderByCount(int count) - ditch, just use method above
>>
>> This is used for HotTags for the entire site.
> 
> all of the ones where i said 'ditch, just use method above' i was trying
> to suggest that we only need the 1 method signature and if it accepts a
> website then that param is optional.  so if the website is non-null then
> the results are restricted to the website, otherwise they apply to the
> site as a whole.

I understood your suggestion and I'm taking it, I was just clarifying
the difference between the two calls.

> 
> that just cuts down on the number of methods and in all likelihood the
> implementation of getTagsOrderByCount(count) would have been just to
> call getTagsOrderByCount(null, count), so why have the extra method
> signature in the manager interface.
> 
> 

+1

>>
>>> removeTag(String id) - ok, also need removeTag(tag)
>>
>> +1
>>
>>> findTags(WebsiteData website, String pattern, int maxResults) - ok
>>> findTags(String pattern, int maxResults) - ditch, just use method above
>>
>> +1
>>
>>> also, i think every method needs to have a 'limit' parameter to limit
>>> the result set and the maxResults should be configurable at the site
>>> wide level so that we can prevent methods provided to users from
>>> returning overly large result sets.
>>
>> Could we use pagers instead? Limits feel too artificial for me and we
>> could be cutting out important information all of the time.
> 
> Yes we can, although our concept of a pager isn't like an iterator where
> you want walk through the results one chunk at a time.  it only gives a
> view of a portion of an overall collection and provides a standard way
> to link to alternate views of the collection.  I'm not sure if that fits
> with what you are expecting to do.
> 
> 

I think for somethings like getting hottest tags a pager would work,
since we can retrieve just the first page. Pagers will definitely be
useful when display entries for a specific tag, since that number is
unbounded. However, for tag cloud, not sure pagers would help much,
caching will be our friend.

>>
>>> none of the methods reference username, so that makes me think we don't
>>> really need the username associated with a tag.
>>
>> My thoughts on username were for the case you want type-ahead on *your*
>> tags and not just a specific weblog. I think a personal tagcloud would
>> be nice. Disclaimer: I can't believe I'm asking for all these clouds
>> when in reality I'm not a big fan of them, but oh well. I guess username
>> is important if more than one blog author exists, should we know who
>> entered which tag?
> 
> That makes sense and I would think we definitely would want to do that.
>  However, maybe the reference should be to user id then, since that is
> the primary key for a user.  The problem with username is that it's not
> the primary key of the user table, and I believe that at some point we
> expect that users should be allowed to change their username.

+1 fixed on the wiki.

> 
> 
>>
>>> the getAllTags() methods bother me a bit because i would think that on
>>> any site that gets a reasonable amount of usage those methods would
>>> return enormous result sets.  what do we need them for anyways?
>>
>> clouds. Would paging resolve the concern?
> 
> yes, paging could help.  Dave and I discussed but never implemented any
> restriction on pagers.  Some pagers have natural boundries, like entries
> in a day, but the pager of the weblogs recent entries does not and it
> should.
> 
> this would be another example of where a site owner should be allowed to
> restrict tag paging to a certain limit so that users can't abuse the
> data they are given access to.

Definitely.

> 
> -- Allen
> 
> 
>>
>>> everything else sounds about right, although it would be nice to see a
>>> bit more info about what methods we think are needed in the site and
>>> page models.
>>
>> I'll give that more thinking later today.
>>
>>> -- Allen
>>>
>>>
>>> Elias Torres wrote:
>>>> I have updated the proposal on the wiki page for tagging. Please
>>>> comment/delete/change/add/etc to it. I'll be glad to discuss and
>>>> improve it.
>>>>
>>>> http://rollerweblogger.org/wiki/Wiki.jsp?page=Proposal_WeblogTags
>>>>
>>>> -Elias
>>>>
>>>> On 9/11/06, Elias Torres <[EMAIL PROTECTED]> wrote:
>>>>> Hi Guys,
>>>>>
>>>>> We initially implemented a tagging function into Roller 2.0 (at
>>>>> IBM) but
>>>>> that really never made it into core because of my lack of effort in
>>>>> completing a few things that Allen had suggested before it was
>>>>> functional enough. I replied to his feedback answering some of the
>>>>> concerns (which I didn't think were major) [1], but I never got a
>>>>> direct
>>>>> reply to my email. We would like to move to 3.0+ but we can't until
>>>>> tagging is in place.
>>>>>
>>>>> There's the big decision to whether we support either categories or
>>>>> tags
>>>>> or both. I'm fine with supporting both as long as we can disable
>>>>> either
>>>>> one or none in the UI through roller.properties.
>>>>>
>>>>> I'm willing to code it in any specific way and don't have a set way in
>>>>> mind. I'm fine with use Lucene as Ian Kellen had suggested long
>>>>> time ago
>>>>> for performance and use the db just as a persistent storage. I'll very
>>>>> through and make sure there's a tab for the tag cloud, feeds and
>>>>> proper
>>>>> methods in beans, velocity models, etc.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Elias
>>>>>
>>>>> [1]
>>>>> http://www.nabble.com/Re%3A-Evalutating-tag-support-p3972587s12275.html
>>>>>
>>>>>
>

Re: Tagging support in Roller

Reply via email to