Allen Gilliland wrote:
> 
> 
> Elias Torres wrote:
>> 
>> Allen Gilliland wrote:
>>> Elias, there are a number of things in this commit that seem a
>>> little strange to me ...
>>> 
>>> 1. why do we need 3 separate tables for the tag aggregate data?
>> 
>> Actually we only need 2. The usertagagg is not going to be used.
>> The reason is that if we have n unique tags and m sites, then
>> queries for a site tag cloud would be against a table n x m rows
>> long vs n rows long. Weblog tagclouds can be answered quickly from
>> the weblogtagagg table by simply restricting the rows by websiteid
>> and taking every row. But the site tagcloud from the same table
>> would require reading the entire table. This might not be a big
>> deal, if you guys think is overkill, one could do.
> 
> I don't care if you keep a seperate record for the site-wide and 
> individual weblog aggregate data, but I think we can keep it all in
> one table right?  Basically, the site-wide version for a tag is a 
> weblogtagagg row with websiteid = NULL.
> 
> So the query for hot tags site-wide would be something like ...
> 
> select * from weblogtaggagg where websiteid = NULL order by cnt;
> 
> 

You are definitely right.

>> 
>>> 2. why do we need a SummariesTask to run asynchronously?  i
>>> thought the aggregate data was going to be compiled at the time
>>> the tags were saved with the entry?
>> 
>> This is a very good question and I'll gladly change it if we have a
>>  better option. However, I'm not sure if we can do it correctly.
>> Let me explain. If we have a new entry using an existent tag and I
>> make a query to get its current count and then increment it with
>> one won't I have problems with the way hibernate works? I'd think
>> that I would be clobbering values as multiple requests are coming
>> for the same tag since we are not "locking" the table and calling
>> flush() at different stages in the code. I used a task approach to
>> make sure I wouldn't lose any values and that we could easily
>> recompute the aggregate table through an admin setting if needed.
> 
> I think the likelihood of that is very slim.  Since the data is
> getting updated at entry publish time then the only chance for that
> to occur is if 2 or more people are posting their entry at the exact
> same time and using the exact same tags.  Even then I would expect
> Hibernate to account for that.
> 
> We do a far more impacting version of that same thing with the way
> the current ReferrerManager works for counting referrers throughout
> the day. That thing works the same way ... select RefererData, update
> cnt, save.  And it does that on every single request to a weblog, so
> it's far more likely that we would get a collision problem there on a
> high volume site then with the tag agg cnt.

k. sounds good then.

> 
> The other problem I see with doing it as a task is that then the data
>  isn't kept as fresh and you are spending a lot of extra time
> iterating over every single tag and recomputing its count.  I think I
> would prefer if we try and stick with the original plan first and
> update the aggregate data when the entry is actually saved.
> 

It runs hourly (same as our cache anyways, right?) and it doesn't spend
a lot of extra time. It stores the last time it ran and selects only on
the new tags since and does all of the counting in the query. Anyways,
I'll switch it over to the referer way.

> 
>> 
>>> 3. why do we need to move things into a separate TagManager
>>> class?
>> 
>> I was chatting with Dave yesterday and initially wrote the
>> sumarizing code directly to hibernate but I then noticed that I
>> needed to abstract that. I asked which one he preferred: new
>> interface or add them to the same one. He leaned towards a new one,
>> but I can shove them into WeblogManager.
> 
> I don't follow that at all.  Maybe you can start by explaining what
> that method is for and why we need it, it's not documented in the
> code.

getTags() was in the proposal and summarize() is the one used by the
task to summarize the tag table into the aggregate table. It doesn't
matter anymore since, I'll be removing the TagManager and use the
Referer approach.

> 
> 
>> 
>>> Also, I am noticing that the formatting of your code is not
>>> always consistent.  For example, in the WeblogTagAggregateData
>>> class you are mixing code formatting styles.  Maybe that's just a
>>> copy/paste issue, but I think it's important that as we add new
>>> code we keep the formatting consistent.
>> 
>> I have set my IDE to convert tabs to spaces and using 2 spaces for
>> each tab. Maybe I need to change my settings. Sorry.
> 
> It looks like it was code that you copied and pasted and probably
> didn't reformat.  It's not that big of a deal, but it's definitely
> nice if everyone can keep their code formatted nicely when adding it
> to the repository so it doesn't have to be cleaned up later.

Sure. What's the coding convention so I can fix my IDE settings? tabs?
spaces? how many?

-Elias


Reply via email to