I wish I could make pathetic whining noises and puppydog eyes...
If anyone with some relational nouse could give me even some vague
pointers that would be great...
Chris
Chris Withers wrote:
> Hi All,
>
> I'm looking to do (something like) weblog analysis and thought it'd be
> interesting to try an rdb through SQLAlchemy.
>
> Apologies in advance, my knowledge of rdb and sa is much poorer than I'd
> like, so simple pointers may help...
>
> So, assuming I have data something like:
>
> 2009-05-20-00:01:02.345 user1 192.168.1.1 /some/folder/some/path
> 2009-05-20-00:01:03.452 user1 192.168.1.1 /some/folder/other/path
> 2009-05-20-00:01:02.345 user2 192.168.1.10 /some/folder/some/path
>
> ...what would people recommend for storage?
>
> One table mapped to a Row class or some such?
> What would the primary key be? (since two entries could theoretically
> happen at exactly the same time, I can't use the date...)
>
> I'd like to then aggregate first by user, then by month, giving a UI like:
>
> Users
> =====
>
> Username Total Pages Last Viewed
> -------- ----------- -----------
> user1 32434 27th July 2009 16:05
> user2 15000 1st Jan 2009 01:05
> ...lots more users, probably batched...
>
> Clicking on a username gives:
>
> User1
> =====
>
> Month Total Pages Last Viewed
> ---------- ----------- -----------
> May 2009 1000 15th May 2009 13:50
> April 2009 1000 30th April 2009 23:55
> ...lots of months...
>
> Clicking on a month gives:
>
> User1 -> May 2009
> =================
>
> Page Hits Last Viewed
> ----------------------- ----- -----------
> /some/folder/some/path 20 15th May 2009 13:50
> /some/folder/other/path 33 1st May 2009 13:50
>
> What queries would I use to generate these views? I'm envisaging some
> group-by's here, but don't know how I'd plug those into SA. Would I have
> classes for users, months and pages that had selects map to them or
> something else?
>
> How would I batch them such that if there are several thousand rows in a
> table I can show the old "1 to 20 of 3000" without having to load all
> 3000 rows?
>
> Another option I thought of was to not actually store the raw rows, but
> have several tables:
>
> class Users(Base):
> __tablename__ 'users'
> name = Column(String,primary_key=True)
> total_pages = Column(Integer)
> last_viewed = Column(DateTime)
>
> class Month(Base):
> __tablename__ 'months'
> username = Column(
> String,ForeignKey('users.username'),primary_key=True
> )
> monthname = Column(String,primary_key=True)
> total_pages = Column(Integer)
> last_viewed = Column(DateTime)
>
> class Page(Base):
> __tablename__ 'users'
> username = Column(
> String,ForeignKey('months.username'),primary_key=True
> )
> monthname = Column(
> String,ForeignKey('months.monthname'),primary_key=True
> )
> hits = Column(Integer)
> last_viewed = Column(DateTime)
>
> ...and then populate them while parsing the logs. I can see this storing
> a lot less data, but would it be faster? Still, how would I do batching
> of results?
> Ideally, I'd prefer not to go down this route as it limits different
> types of analysis later down the line but...
>
> What do people think? What's the normal/correct way of handling large
> amounts of data to be sliced and diced like this?
>
> cheers,
>
> Chris
>
>
--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---