Re: Tagging support in Roller

Elias Torres Wed, 13 Sep 2006 11:42:17 -0700


Allen Gilliland wrote:
> 
> 
> Elias Torres wrote:
>> I replied to your good comments and updated the wiki. Could we discuss
>> making the 3.1 branch and moving 3.0 branch to trunk?
> 
> yeah, but lets talk about that in a different thread so we don't get
> mixed up.  i have a couple more comments below ...
> 
> 
>>
>> -Elias
>>
>> Allen Gilliland wrote:
>>>
> 
> *snip*
> 
>>>>
>>>>> getAllTags() - how? this could return thousands of results
>>>> This is for tcloud (I forgot to mention that the return is not TagData
>>>> but TagCloudEntry (a pair of tagname and count)).
>>>>
>>>>> getAllTags(WebsiteData website) - again, how?  why?
>>>> Website cloud of entry tags.
>>> doesn't that make it of even greater concern?  i would still expect a
>>> decent sized site to have thousands of unique tags and then to get an
>>> aggregate count of each of those tags to return in this method would be
>>> a lot of data.
>>>
>>> i don't have a problem with this as long as the results can be limited
>>> some how.
>>>
>>>
>>
>> ok. I have been thinking of having a table
>>
>> create table websitetagcloud {
>>   id
>>   websiteid
>>   name
>>   count
>> };
>>
>> so we can return this data quickly and we can do some limits here such
>> as only tags with count > 1 or something like that. I've updated the
>> Wiki page with this and other changes.
> 
> i like that idea.  i'm not sure it's a 'cloud', its more of a tagcount
> or tagaggregate which is likely to be used to create the cloud.  i
> actually see even more opportunity to extract relevant tag data into
> this table.
> 
> would it also make sense to put the date in this table?  that way a the
> tag count could be time sensitive, so you could restrict the set to tags
> used in a certain timeframe, like tags used within the last hour.  it
> would also be cool to do counts given various timeframes, so a dayCnt,
> weekCnt, monthCnt, totalCnt.  that way you could track what tags are
> popular for a given day or week.  of course the downside to that is you
> have to worry about reseting those cnts :/


+1 although I wouldn't necessarily keep dayCnt in the table instead we
could convert the day, week, month to an integer and leave it in the
store, then for any given week for example, we just query 'WHERE
mode=week(or enum) and value=50' so we don't have to reset anything ever.

 create table websitetagcloud {
   id
   websiteid
   name
   count
   mode
   value
 };

Just brainstorming, but there might be better ways to do this in SQL. My
question is how to do this in Hibernate. For example:

WeblogEntryTagData.getTagSet();

In FolderData, you have private setTagSet and added
add/removeBookmark(). However, can't I always do this according to the
Hibernate Docs

WeblogEntryTagData.getTagSet().add(bookmarkData);

Anyways, I'm just trying to figure out how "manage" the aggregate table
using hibernate.

> 
> is there any other aggregate data which could be useful in this table?
> 
> 

Not top of my head, but good idea on the time ranges, this could
definitely help a lot in regards to tag limits and very fresh "useful" data.

>>
>>>>> getTagsOrderByCount(WebsiteData website, int count) - ok, for cloud?
>>>> I guess we don't need a hottags for a specific site and could probably
>>>> be done with getAllTags(WebsiteData)
>>>>
>>>>> getTagsOrderByCount(int count) - ditch, just use method above
>>>> This is used for HotTags for the entire site.
>>> all of the ones where i said 'ditch, just use method above' i was trying
>>> to suggest that we only need the 1 method signature and if it accepts a
>>> website then that param is optional.  so if the website is non-null then
>>> the results are restricted to the website, otherwise they apply to the
>>> site as a whole.
>>
>> I understood your suggestion and I'm taking it, I was just clarifying
>> the difference between the two calls.
>>
>>> that just cuts down on the number of methods and in all likelihood the
>>> implementation of getTagsOrderByCount(count) would have been just to
>>> call getTagsOrderByCount(null, count), so why have the extra method
>>> signature in the manager interface.
>>>
>>>
>>
>> +1
>>
>>>>> removeTag(String id) - ok, also need removeTag(tag)
>>>> +1
>>>>
>>>>> findTags(WebsiteData website, String pattern, int maxResults) - ok
>>>>> findTags(String pattern, int maxResults) - ditch, just use method
>>>>> above
>>>> +1
>>>>
>>>>> also, i think every method needs to have a 'limit' parameter to limit
>>>>> the result set and the maxResults should be configurable at the site
>>>>> wide level so that we can prevent methods provided to users from
>>>>> returning overly large result sets.
>>>> Could we use pagers instead? Limits feel too artificial for me and we
>>>> could be cutting out important information all of the time.
>>> Yes we can, although our concept of a pager isn't like an iterator where
>>> you want walk through the results one chunk at a time.  it only gives a
>>> view of a portion of an overall collection and provides a standard way
>>> to link to alternate views of the collection.  I'm not sure if that fits
>>> with what you are expecting to do.
>>>
>>>
>>
>> I think for somethings like getting hottest tags a pager would work,
>> since we can retrieve just the first page. Pagers will definitely be
>> useful when display entries for a specific tag, since that number is
>> unbounded. However, for tag cloud, not sure pagers would help much,
>> caching will be our friend.
> 
> yep.
> 
> -- Allen
> 
> 
>>
>>>>> none of the methods reference username, so that makes me think we
>>>>> don't
>>>>> really need the username associated with a tag.
>>>> My thoughts on username were for the case you want type-ahead on *your*
>>>> tags and not just a specific weblog. I think a personal tagcloud would
>>>> be nice. Disclaimer: I can't believe I'm asking for all these clouds
>>>> when in reality I'm not a big fan of them, but oh well. I guess
>>>> username
>>>> is important if more than one blog author exists, should we know who
>>>> entered which tag?
>>> That makes sense and I would think we definitely would want to do that.
>>>  However, maybe the reference should be to user id then, since that is
>>> the primary key for a user.  The problem with username is that it's not
>>> the primary key of the user table, and I believe that at some point we
>>> expect that users should be allowed to change their username.
>>
>> +1 fixed on the wiki.
>>
>>>
>>>>> the getAllTags() methods bother me a bit because i would think that on
>>>>> any site that gets a reasonable amount of usage those methods would
>>>>> return enormous result sets.  what do we need them for anyways?
>>>> clouds. Would paging resolve the concern?
>>> yes, paging could help.  Dave and I discussed but never implemented any
>>> restriction on pagers.  Some pagers have natural boundries, like entries
>>> in a day, but the pager of the weblogs recent entries does not and it
>>> should.
>>>
>>> this would be another example of where a site owner should be allowed to
>>> restrict tag paging to a certain limit so that users can't abuse the
>>> data they are given access to.
>>
>> Definitely.
>>
>>> -- Allen
>>>
>>>
>>>>> everything else sounds about right, although it would be nice to see a
>>>>> bit more info about what methods we think are needed in the site and
>>>>> page models.
>>>> I'll give that more thinking later today.
>>>>
>>>>> -- Allen
>>>>>
>>>>>
>>>>> Elias Torres wrote:
>>>>>> I have updated the proposal on the wiki page for tagging. Please
>>>>>> comment/delete/change/add/etc to it. I'll be glad to discuss and
>>>>>> improve it.
>>>>>>
>>>>>> http://rollerweblogger.org/wiki/Wiki.jsp?page=Proposal_WeblogTags
>>>>>>
>>>>>> -Elias
>>>>>>
>>>>>> On 9/11/06, Elias Torres <[EMAIL PROTECTED]> wrote:
>>>>>>> Hi Guys,
>>>>>>>
>>>>>>> We initially implemented a tagging function into Roller 2.0 (at
>>>>>>> IBM) but
>>>>>>> that really never made it into core because of my lack of effort in
>>>>>>> completing a few things that Allen had suggested before it was
>>>>>>> functional enough. I replied to his feedback answering some of the
>>>>>>> concerns (which I didn't think were major) [1], but I never got a
>>>>>>> direct
>>>>>>> reply to my email. We would like to move to 3.0+ but we can't until
>>>>>>> tagging is in place.
>>>>>>>
>>>>>>> There's the big decision to whether we support either categories or
>>>>>>> tags
>>>>>>> or both. I'm fine with supporting both as long as we can disable
>>>>>>> either
>>>>>>> one or none in the UI through roller.properties.
>>>>>>>
>>>>>>> I'm willing to code it in any specific way and don't have a set
>>>>>>> way in
>>>>>>> mind. I'm fine with use Lucene as Ian Kellen had suggested long
>>>>>>> time ago
>>>>>>> for performance and use the db just as a persistent storage. I'll
>>>>>>> very
>>>>>>> through and make sure there's a tab for the tag cloud, feeds and
>>>>>>> proper
>>>>>>> methods in beans, velocity models, etc.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Elias
>>>>>>>
>>>>>>> [1]
>>>>>>> http://www.nabble.com/Re%3A-Evalutating-tag-support-p3972587s12275.html
>>>>>>>
>>>>>>>
>>>>>>>
>

Re: Tagging support in Roller

Reply via email to