I wish I could make pathetic whining noises and puppydog eyes...

If anyone with some relational nouse could give me even some vague 
pointers that would be great...

Chris

Chris Withers wrote:
> Hi All,
> 
> I'm looking to do (something like) weblog analysis and thought it'd be 
> interesting to try an rdb through SQLAlchemy.
> 
> Apologies in advance, my knowledge of rdb and sa is much poorer than I'd 
> like, so simple pointers may help...
> 
> So, assuming I have data something like:
> 
> 2009-05-20-00:01:02.345 user1 192.168.1.1  /some/folder/some/path
> 2009-05-20-00:01:03.452 user1 192.168.1.1  /some/folder/other/path
> 2009-05-20-00:01:02.345 user2 192.168.1.10 /some/folder/some/path
> 
> ...what would people recommend for storage?
> 
> One table mapped to a Row class or some such?
> What would the primary key be? (since two entries could theoretically 
> happen at exactly the same time, I can't use the date...)
> 
> I'd like to then aggregate first by user, then by month, giving a UI like:
> 
> Users
> =====
> 
> Username   Total Pages    Last Viewed
> --------   -----------    -----------
> user1      32434          27th July 2009 16:05
> user2      15000          1st Jan 2009 01:05
> ...lots more users, probably batched...
> 
> Clicking on a username gives:
> 
> User1
> =====
> 
> Month       Total Pages  Last Viewed
> ----------  -----------  -----------
>    May 2009  1000         15th May 2009 13:50
> April 2009  1000         30th April 2009 23:55
> ...lots of months...
> 
> Clicking on a month gives:
> 
> User1 -> May 2009
> =================
> 
> Page                     Hits   Last Viewed
> -----------------------  -----  -----------
> /some/folder/some/path   20     15th May 2009 13:50
> /some/folder/other/path  33     1st May 2009 13:50
> 
> What queries would I use to generate these views? I'm envisaging some 
> group-by's here, but don't know how I'd plug those into SA. Would I have 
> classes for users, months and pages that had selects map to them or 
> something else?
> 
> How would I batch them such that if there are several thousand rows in a 
> table I can show the old "1 to 20 of 3000" without having to load all 
> 3000 rows?
> 
> Another option I thought of was to not actually store the raw rows, but 
> have several tables:
> 
> class Users(Base):
>     __tablename__ 'users'
>     name = Column(String,primary_key=True)
>     total_pages = Column(Integer)
>     last_viewed = Column(DateTime)
> 
> class Month(Base):
>     __tablename__ 'months'
>     username = Column(
>                 String,ForeignKey('users.username'),primary_key=True
>                 )
>     monthname = Column(String,primary_key=True)
>     total_pages = Column(Integer)
>     last_viewed = Column(DateTime)
> 
> class Page(Base):
>     __tablename__ 'users'
>     username = Column(
>                 String,ForeignKey('months.username'),primary_key=True
>                 )
>     monthname = Column(
>                 String,ForeignKey('months.monthname'),primary_key=True
>                 )
>     hits = Column(Integer)
>     last_viewed = Column(DateTime)
> 
> ...and then populate them while parsing the logs. I can see this storing 
> a lot less data, but would it be faster? Still, how would I do batching 
> of results?
> Ideally, I'd prefer not to go down this route as it limits different 
> types of analysis later down the line but...
> 
> What do people think? What's the normal/correct way of handling large 
> amounts of data to be sliced and diced like this?
> 
> cheers,
> 
> Chris
> 
> 

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to