Re: [libreoffice-users] Currently Using Spreadsheet for Personal Project - Thinking About Database

Mark Phillips Mon, 11 Aug 2014 15:35:25 -0700

Calibre is also a great idea/solution for this project, unless one of the
goals of the project is to learn about databases and how to use a database
to solve a real world problem. I added the second clause, because just
learning about databases an admirable goal, but imo, one also should learn
how to build the bits that allow an end user to create/read/update/delete
data in the database. Django is one vehicle for doing this. Paul, I have to
disagree with your analysis of django...I have worked with many
non-programmers on building useful projects with that framework. There is a
learning curve, but it is not insurrmountable by a non-programmer. ymmv.


Mark


On Mon, Aug 11, 2014 at 3:26 PM, Paul <[email protected]> wrote:

> Hi Joel,
>
> Some comments on your diagram, in the order I thought of them, not of
> importance:
>
> 1) I tend to use integers for primary keys. Smallints only go into the
> thousands, and most of the stuff I have worked on has needed to be able
> to scale well beyond that. Even things that seem small to start with
> have a habit of growing unexpectedly, and you don't want to have to
> change your database design down the road for something small like
> that.
>
> Planning ahead really is the name of the game with databases, as
> half the reason we use them to begin with is to handle things that
> have grown beyond our simple spreadsheets, or that are expected to
> become unwieldy in the future. The whole point of a good design is
> that it should handle this sort of thing. And the extra storage space
> really shouldn't be an issue. For example, a quick estimate
> suggests that using integer instead of smallint for keys in the diagram
> you gave will only use an extra 1 mb of storage space if you go all the
> way up to the limit of the smallint range, and beyond that it is
> academic as you would be *required* to use int instead of smallint.
>
> But for a home project you're unlikely to exceed that number, so this
> is fine. It's just worth being aware of this when designing it, so that
> you know the choices you have made, and why you have made them.
>
> I would make them unsigned smallints, though. Not really a good idea to
> use negative numbers for keys, and so making them signed just wastes
> half their range.
>
> 2) I wouldn't put a Read flag in the Book table. You have a Status
> table with a ReadStatus, so this would be duplication of data.
>
> 3) Same with the SeriesComplete flag in the Series table. You can get
> that information by doing a query that would check if all the books in
> the series are in the read status.
>
> It's usually much better practice to generate this sort of information
> as you need it from the relevant pieces of information in the database
> (assuming you have all the required pieces in the database), than to
> store this information a second time, which is just duplication. Even
> if the space wasted isn't important, it just leads to the possibility
> of the data being inconsistant.
>
> The only time one stores such things is when the actual calculation to
> work this out from the pieces of data takes so long that it is actually
> necessary to store calculated results to save time in the application.
>
> 4) Same with AuthorID in the Series table. Each book will have an
> author, so you don't need a series author. That way if the books in the
> series are written by different authors everything still works. This
> way, who would the series author be, and how would that be any
> different from the authors of the individual books?
>
> Note that you can still show this information in the frontend when
> viewing the series. You can make a query to check if the authors of all
> the books in the series are the same, and if so show that author, but
> if they are different just show "various" or something.
>
> Also, when entering all the books in a series, you can still have the
> frontend ask you for the series information first, including author,
> and if you fill it in then it will prepopulate the author field when
> adding each book.
>
> Of course, maybe you actually do have a use for a series author as
> separate to the authors of the individual books, so maybe this does
> make sense to you, it just doesn't seem right to me.
>
> 5) Some people want author names to be a single field, which is simpler
> for some things, and avoids hassles with authors with multiple names,
> and other people want them to be a name and surname field, which does
> mean you can do things like order by surname or order by first name. And
> avoids the problem of you sometimes entering the name surname first and
> sometimes first name first.
>
> With the name split into parts it is easier to check if the author
> exists by both name and surname when entering new authors, so even if
> you mistype one of them the author will still be found if they are
> already in the database, but this can still be done if the name is all
> in one field, it is just harder. You can also still sort the author
> list by surname and by first name, it is just harder if both are in one
> field.
>
> The choice is up to you.
>
> 6) I would make the ReadStatus slightly larger, say text(30).
> If you use varchar there is no wasted space if some of the entries have
> less text, so you could even make this varchar(100) or something. Or
> give it a short code and a longer description. Something like "could not
> finish" is already too long for 15 characters. Sooner or later you're
> going to want to add a Status that is longer than the 15...
>
> 7) The Status table is incorrect. The way you have it you could have
> multiple statuses for a book, and each one would be a hand typed status
> description. One time you might type "reading", and another time
> you might mistype "raeding". Instead, you should change the BookID to a
> StatusID in the Status table, and add a StatusID to the Book table. The
> PageNum field would also need to move to the Book table.
>
> This way you would put all the possible statuses into the Status table,
> and each book would have a link to that table, showing which of those
> possible statuses the book currently is in.
>
> Uh...
>
> I think that's all for now. :)
>
> Otherwise it looks good. You're almost there.
>
>
> Paul
>
>
>
> On Sun, 10 Aug 2014 18:48:21 -0700
> Joel Madero <[email protected]> wrote:
>
> > I decided to do the smart thing and diagram it out - the diagram
> > isn't perfect (looks wise) but I hope that I can get a +1 before
> > actually making the databases.
> >
> > Note: I decided to put rankings with date read table, this way I can
> > have different rankings for the same book (for each instance that I
> > read it, some books seem better - or worse - the second time
> > around) :)
> >
> > Thoughts appreciated, a million thanks for all the advice given so
> > far.
> >
> > Link:
> >
> https://drive.google.com/file/d/0B2kdRhc960qdZzJZenR1Qno2LWM/edit?usp=sharing
> >
> >
> > Best,
> > Joel
> >
> >
> >
> > On 08/10/2014 01:35 PM, Paul wrote:
> > >
> > >
> > > On Sun, 10 Aug 2014 21:30:18 +0100
> > > Mark Bourne <[email protected]> wrote:
> > >
> > >> It looks like each book should only have one of the "ReadStatus"
> > >> flags set, so I'd make that an enum field on the BookInformation
> > >> table, with possible values of "Not Read", "Reading" and "Read".
> > >> You can set the default value for the field to "Not Read" so a new
> > >> record will be set to that status if no value is specified for
> > >> that field.
> > > That's essentially the same idea as the status tables I was
> > > suggesting. In my experience we've always used status tables, so I
> > > would suggest those. Partly because I'm not familiar with database
> > > support for enums. How well supported and widely supported is that
> > > by the most common databases?
> > >
> > >> "NumberOfBooksByAuthor" and "ReadBooksByAuthor" are not needed on
> > >> the "AuthorInfo" table - you can get those by querying the
> > >> database. I may have the syntax slightly wrong here, but along the
> > >> lines of: SELECT `ai`.`AuthorID`, `ai`.`AuthorName`,
> > >> COUNT(`bi`.`BookID`) from `AuthorInfo` `ai` LEFT JOIN
> > >> `BookInformation` `bi` ON `bi`.`AuthorID` = `ai`.`AuthorID` GROUP
> > >> BY `ai`.`AuthorID` should give the number of books by each author.
> > >> Add: AND `bi`.`ReadStatus` = "Read"
> > >> to the ON condition and you can get the number of read books by
> > >> each author.
> > >>
> > >> That's the kind of thing a database enables you to do much more
> > >> easily than with a spreadsheet ;o)
> > >>
> > >> Mark.
> > >>
> > >>
> > >> Joel Madero wrote:
> > >>> Hi All,
> > >>>
> > >>> So I went back to planning stage. Link to what I think might work
> > >>> - hoping to get the planning stage done today so I can start
> > >>> actually putting together the db - I have 3 days off so now's a
> > >>> good time for me to get the basic structure together :) Thanks in
> > >>> advance!
> > >>>
> > >>>
> https://drive.google.com/file/d/0B2kdRhc960qdbGJIQ1M3NWtrdmc/edit?usp=sharing
> > >>>
> > >>>
> > >>> Best,
> > >>> Joel
> > >
> >
> >
>
>
> --
> To unsubscribe e-mail to: [email protected]
> Problems?
> http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
> Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
> List archive: http://listarchives.libreoffice.org/global/users/
> All messages sent to this list will be publicly archived and cannot be
> deleted
>
>

-- 
To unsubscribe e-mail to: [email protected]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Re: [libreoffice-users] Currently Using Spreadsheet for Personal Project - Thinking About Database

Reply via email to