Re: [Koha-devel] Proposed "metadata" table for Koha
2015-12-01 1:44 GMT-03:00 David Cook: > > My main concern about Zebra is with it not being fast enough. Tomas mentioned that Zebra runs updates every 5 seconds, but it looks to me like rebuild_zebra.pl (via /etc/cron.d/koha-common) is only run every 5 minutes on a Debian package install. At least that was the case in Koha 3.20. That’s a huge gap when it comes to importing records. Say you import record A at 5:01pm… and then you try to import record A again at 5:03 using a matching rule. The update from 5:01pm hasn’t been processed yet, so you wind up with 2 copies of record A in your Koha catalogue. You are right that the default setup is setting a cronjob to check the queue every 5 minutes. I planned to change that default behaviour to set USE_INDEXER_DAEMON=yes in /etc/default/koha-common and have the cron line commented or conditional to USE_INDEXER_DAEMON=no. But I got distracted the last couple weeks before the release and forgot to post a patch for that. Anyway, rebuild_zebra.pl -daemon should be run by default. > We run openSUSE and we define our own Zebra indexer, which does run every 5 seconds. I haven’t stress tested it yet, but 5 seconds might be a bit long even under ideal circumstances, if an import job is running every 2-3 seconds. Sure, that 2-3 seconds might be a bit optimistic… maybe it will also be every 5 seconds. But what happens if the Zebra queue backs up? Someone runs “touch_all_biblios.pl” or something like that and fills up the zebra queue, while you’re importing records. Zebra is going to be out of date. True > There needs to be a source of truth, and that’s the metadata record in MySQL. Zebra is an indexed cache, and while usually it doesn’t matter too much if that cache is a little bit stale, it can matter when you’re importing records. I agree the importing step should rely on the source of truth. > Another scenario… what happens if Zebra is down? You’re going to get duplicate records because the matcher won’t work. That said, we could mitigate that by double-checking that Zebra is actually alive programmatically before commencing an import. There could be other heuristics as well… like not running an import (which uses matching rules) unless the zebraqueue only has X number of waiting updates. Ideally, it would be 0, but that’s unlikely in large systems. It also is impossible if you’re importing at a rate of more than one every 5 seconds (which would be absurdly slow). I wouldn't create a workaround for Zebra being down only for being able to match existing records... I would just print a red box saying the tool is not available. > I am curious as to whether the default Zebra update time for Debian packages is 5 minutes or 5 seconds. While it doesn’t affect me too much as I don’t use Debian, it matters for regular users of Koha. Ok, i'll provide the mentioned patch :-P -- Tomás Cohen Arazi Theke Solutions (http://theke.io) ✆ +54 9351 3513384 GPG: B76C 6E7C 2D80 551A C765 E225 0A27 2EA1 B2F3 C15F ___ Koha-devel mailing list Koha-devel@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
Re: [Koha-devel] Proposed "metadata" table for Koha
Just to contradict myself a bit, it might be worth mentioning that Zebra will do a better job with ISSN and ISBN matching, as I think it normalizes those strings. That would be nicer than a regular string matching SQL query… David Cook Systems Librarian Prosentient Systems 72/330 Wattle St, Ultimo, NSW 2007 From: Tomas Cohen Arazi [mailto:tomasco...@gmail.com] Sent: Tuesday, 1 December 2015 1:20 AM To: David Cook <dc...@prosentient.com.au> Cc: koha-devel <koha-devel@lists.koha-community.org> Subject: Re: [Koha-devel] Proposed "metadata" table for Koha 2015-11-29 21:52 GMT-03:00 David Cook <dc...@prosentient.com.au <mailto:dc...@prosentient.com.au> >: Hi all: For those not following along at http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662, we’ve recently started talking about the possibility of adding a “metadata” table to Koha. The basic schema I have in mind would be something like: metadata.id <http://metadata.id> , metadata.record_id, metadata.scheme, metadata.qualifier, metadata.value. The row would look like: 1, 1, marc21, 001, 123456789 It might also be necessary to store “metadata.record_type” so as to know where metadata.record_id points. This obviously has a lot of disadvantages… redundant data between “metadata” rows, no database cascades via foreign keys, etc. However, it might be necessary in the short-term as a temporary measure. Of course, adding “yet another place” to store metadata might not seem like a great idea. We already store metadata in biblioitems.marcxml (and biblioitems.marc), Zebra, and other biblio/biblioitems/items relational database fields. Do we really need a new place to worry about data? I think we should have a metadata_record table storing the serialized metadata, and more needed information (basically the fields Koha::MetadataRecord has...) and let the fulltext engine do the job for accessing those values. The codebase is already too bloated trying to band-aid our "minimal" usage of the search engines' features. Of course, while trying to fix that we might find our search engine has problems and/or broken functionalities (zebra facets are so slow that are not cool). But we should definitely get rid of tons of code in favour of using the search engine more, and probably have QueryParser be the standard, having a driver for ES... -- Tomás Cohen Arazi Theke Solutions (http://theke.io <http://theke.io/> ) ✆ +54 9351 3513384 GPG: B76C 6E7C 2D80 551A C765 E225 0A27 2EA1 B2F3 C15F ___ Koha-devel mailing list Koha-devel@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
Re: [Koha-devel] Proposed "metadata" table for Koha
I’m not 100% sure what I think yet, but in the past I was certainly in favour of a metadata_record table that stored the serialized metadata and whatever else it needed to support that. I still think it’s an all right idea to have that table. In general, I’m in favour of using the full text engine for searching, although as Katrin has noted on <http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662> http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662, Zebra isn’t necessarily updated fast enough to be used for “matching” when importing records, especially when records are being downloaded and imported every 2-3 seconds. Also, what happens if Zebra goes down? Suddenly your catalogue gets flooded with duplicates. I suppose one way of getting around that is querying Zebra to make sure it is alive before starting an import. However, that doesn’t solve the speed problem. I don’t think there’s any reliable way of knowing if the record you want to match on has already been indexed (or indexed again) in Koha. Don’t we only update Zebra once every 60 seconds? The OAI-PMH import wouldn’t be the only one affected by the indexing. The OCLC Connexion daemon and any Staged Import both use Zebra for matching. If Zebra hasn’t indexed relevant additions or updates, the matching won’t work when it should work. For records in the hundreds, thousands, and millions, that can cause major problems both with duplicates and failed updates. Maybe a “metadata” table is overkill. In fact, I can’t necessarily see a lot of advantages to storing mass quantities of metadata in the relational database off the top of my head , but perhaps some way of keeping record identifiers in the relational database would be doable. If we think about the metadata in terms of a “source of truth”, the relational database is always going to contain the source of truth. Zebra is basically just an indexed cache, and when it comes to importing records… I rather be querying a source of truth than a cache as the cache might be stale. At the moment, it’s going to be stale by at least 1-59 seconds… longer if Zebra has a lot of indexing jobs to do when it receives an update. Maybe there’s a way to mitigate that. Like… waiting to do an import until Zebra has reported that it’s emptied the zebraqueue X seconds ago, although zebraqueue may never be empty. There’s always that possibility that you’re going to miss something, and that possibility doesn’t exist in the relational database, as it’s the source of truth. If the identifier doesn’t exist in the database, then it doesn’t exist for that record (or there’s a software bug which can be fixed). While we probably could use the search engine more throughout Koha, I think it might not be wise to use it during an import. (As for the QueryParser, I totally agree about it being the standard, and creating a driver for ES. I chatted with Robin about this a bit over the past few months, but I haven’t had time to help out with that. The QueryParser also isn’t quite right for Zebra either yet, so it would probably make sense to focus on finalizing the PQF driver first.) David Cook Systems Librarian Prosentient Systems 72/330 Wattle St, Ultimo, NSW 2007 From: Tomas Cohen Arazi [mailto:tomasco...@gmail.com] Sent: Tuesday, 1 December 2015 1:20 AM To: David Cook <dc...@prosentient.com.au> Cc: koha-devel <koha-devel@lists.koha-community.org> Subject: Re: [Koha-devel] Proposed "metadata" table for Koha I think we should have a metadata_record table storing the serialized metadata, and more needed information (basically the fields Koha::MetadataRecord has...) and let the fulltext engine do the job for accessing those values. The codebase is already too bloated trying to band-aid our "minimal" usage of the search engines' features. Of course, while trying to fix that we might find our search engine has problems and/or broken functionalities (zebra facets are so slow that are not cool). But we should definitely get rid of tons of code in favour of using the search engine more, and probably have QueryParser be the standard, having a driver for ES... -- Tomás Cohen Arazi Theke Solutions (http://theke.io <http://theke.io/> ) ✆ +54 9351 3513384 GPG: B76C 6E7C 2D80 551A C765 E225 0A27 2EA1 B2F3 C15F ___ Koha-devel mailing list Koha-devel@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
Re: [Koha-devel] Proposed "metadata" table for Koha
If we do this, I very much vote for doing it the way Tomas is describing (aka, storing entire chunks of metadata as blobs). Koha 2.2 had a row-per-subfield structure kind of like what you're suggesting, and it required a lot of monkeying around to accurately represent all the vagaries and ordering of MARC subfields. It was also (from what I remember) quite a disk hog. 2015-11-30 15:42 GMT-07:00 David Cook <dc...@prosentient.com.au>: > Just to contradict myself a bit, it might be worth mentioning that Zebra > will do a better job with ISSN and ISBN matching, as I think it normalizes > those strings. That would be nicer than a regular string matching SQL query… > > > > David Cook > > Systems Librarian > > Prosentient Systems > > 72/330 Wattle St, Ultimo, NSW 2007 > > > > *From:* Tomas Cohen Arazi [mailto:tomasco...@gmail.com] > *Sent:* Tuesday, 1 December 2015 1:20 AM > *To:* David Cook <dc...@prosentient.com.au> > *Cc:* koha-devel <koha-devel@lists.koha-community.org> > *Subject:* Re: [Koha-devel] Proposed "metadata" table for Koha > > > > > > > > 2015-11-29 21:52 GMT-03:00 David Cook <dc...@prosentient.com.au>: > > Hi all: > > > > For those not following along at > http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662, we’ve > recently started talking about the possibility of adding a “metadata” table > to Koha. > > > > The basic schema I have in mind would be something like: metadata.id, > metadata.record_id, metadata.scheme, metadata.qualifier, metadata.value. > > > > The row would look like: 1, 1, marc21, 001, 123456789 > > > > It might also be necessary to store “metadata.record_type” so as to know > where metadata.record_id points. This obviously has a lot of disadvantages… > redundant data between “metadata” rows, no database cascades via foreign > keys, etc. However, it might be necessary in the short-term as a temporary > measure. > > > > Of course, adding “yet another place” to store metadata might not seem > like a great idea. We already store metadata in biblioitems.marcxml (and > biblioitems.marc), Zebra, and other biblio/biblioitems/items relational > database fields. Do we really need a new place to worry about data? > > > > I think we should have a metadata_record table storing the serialized > metadata, and more needed information (basically the fields > Koha::MetadataRecord has...) and let the fulltext engine do the job for > accessing those values. > > > > The codebase is already too bloated trying to band-aid our "minimal" usage > of the search engines' features. Of course, while trying to fix that we > might find our search engine has problems and/or broken functionalities > (zebra facets are so slow that are not cool). But we should definitely get > rid of tons of code in favour of using the search engine more, and probably > have QueryParser be the standard, having a driver for ES... > > > > -- > > Tomás Cohen Arazi > > Theke Solutions (http://theke.io) > ✆ +54 9351 3513384 > GPG: B76C 6E7C 2D80 551A C765 E225 0A27 2EA1 B2F3 C15F > > ___ > Koha-devel mailing list > Koha-devel@lists.koha-community.org > http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel > website : http://www.koha-community.org/ > git : http://git.koha-community.org/ > bugs : http://bugs.koha-community.org/ > -- Jesse Weaver ___ Koha-devel mailing list Koha-devel@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
Re: [Koha-devel] Proposed "metadata" table for Koha
I can’t remember if I sent another message just to Tomas or if it was to the list, but I can see how a “metadata” table might be heavy handed, so I suggested an “identifier” table which would extract identifiers from the record for easy recall. Of course, as I said in the previous email, Zebra is going to do a better job in some cases because it can normalize values for indexing and retrieval while MySQL would not. My main concern about Zebra is with it not being fast enough. Tomas mentioned that Zebra runs updates every 5 seconds, but it looks to me like rebuild_zebra.pl (via /etc/cron.d/koha-common) is only run every 5 minutes on a Debian package install. At least that was the case in Koha 3.20. That’s a huge gap when it comes to importing records. Say you import record A at 5:01pm… and then you try to import record A again at 5:03 using a matching rule. The update from 5:01pm hasn’t been processed yet, so you wind up with 2 copies of record A in your Koha catalogue. We run openSUSE and we define our own Zebra indexer, which does run every 5 seconds. I haven’t stress tested it yet, but 5 seconds might be a bit long even under ideal circumstances, if an import job is running every 2-3 seconds. Sure, that 2-3 seconds might be a bit optimistic… maybe it will also be every 5 seconds. But what happens if the Zebra queue backs up? Someone runs “touch_all_biblios.pl” or something like that and fills up the zebra queue, while you’re importing records. Zebra is going to be out of date. There needs to be a source of truth, and that’s the metadata record in MySQL. Zebra is an indexed cache, and while usually it doesn’t matter too much if that cache is a little bit stale, it can matter when you’re importing records. Another scenario… what happens if Zebra is down? You’re going to get duplicate records because the matcher won’t work. That said, we could mitigate that by double-checking that Zebra is actually alive programmatically before commencing an import. There could be other heuristics as well… like not running an import (which uses matching rules) unless the zebraqueue only has X number of waiting updates. Ideally, it would be 0, but that’s unlikely in large systems. It also is impossible if you’re importing at a rate of more than one every 5 seconds (which would be absurdly slow). So MySQL is the source of truth… but we can’t very well do an ExtractValue query on biblioitems.marcxml when we have a database with over a million rows (the threshold for a speedy query is probably actually much lower than that). (On a related note, I think we’re also going to run into massive problems with Authority records, as I don’t think the 001 of incoming records is saved at all. Maybe the MARC templates can handle that scenario though. I admit I haven’t looked at them much. We probably should be moving the 001 into the 035 for both bibliographic and authority records during an import…) -- I suppose it’s possible that we might have to suck up that Zebra might not be fast enough. I suppose the onus is on me (and whoever else is interested in not using Zebra for matching like Katrina and Andreas) to prove that Zebra is too slow under stress. I am curious as to whether the default Zebra update time for Debian packages is 5 minutes or 5 seconds. While it doesn’t affect me too much as I don’t use Debian, it matters for regular users of Koha. David Cook Systems Librarian Prosentient Systems 72/330 Wattle St, Ultimo, NSW 2007 From: Jesse [mailto:pianohac...@gmail.com] Sent: Tuesday, 1 December 2015 12:03 PM To: David Cook <dc...@prosentient.com.au> Cc: koha-devel <koha-devel@lists.koha-community.org> Subject: Re: [Koha-devel] Proposed "metadata" table for Koha If we do this, I very much vote for doing it the way Tomas is describing (aka, storing entire chunks of metadata as blobs). Koha 2.2 had a row-per-subfield structure kind of like what you're suggesting, and it required a lot of monkeying around to accurately represent all the vagaries and ordering of MARC subfields. It was also (from what I remember) quite a disk hog. 2015-11-30 15:42 GMT-07:00 David Cook <dc...@prosentient.com.au <mailto:dc...@prosentient.com.au> >: Just to contradict myself a bit, it might be worth mentioning that Zebra will do a better job with ISSN and ISBN matching, as I think it normalizes those strings. That would be nicer than a regular string matching SQL query… David Cook Systems Librarian Prosentient Systems 72/330 Wattle St, Ultimo, NSW 2007 From: Tomas Cohen Arazi [mailto:tomasco...@gmail.com <mailto:tomasco...@gmail.com> ] Sent: Tuesday, 1 December 2015 1:20 AM To: David Cook <dc...@prosentient.com.au <mailto:dc...@prosentient.com.au> > Cc: koha-devel <koha-devel@lists.koha-community.org <mailto:koha-devel@lists.koha-community.org> > Subject: Re: [Koha-deve
Re: [Koha-devel] Proposed "metadata" table for Koha
2015-11-29 21:52 GMT-03:00 David Cook: > Hi all: > > > > For those not following along at > http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662, we’ve > recently started talking about the possibility of adding a “metadata” table > to Koha. > > > > The basic schema I have in mind would be something like: metadata.id, > metadata.record_id, metadata.scheme, metadata.qualifier, metadata.value. > > > > The row would look like: 1, 1, marc21, 001, 123456789 > > > > It might also be necessary to store “metadata.record_type” so as to know > where metadata.record_id points. This obviously has a lot of disadvantages… > redundant data between “metadata” rows, no database cascades via foreign > keys, etc. However, it might be necessary in the short-term as a temporary > measure. > > > > Of course, adding “yet another place” to store metadata might not seem > like a great idea. We already store metadata in biblioitems.marcxml (and > biblioitems.marc), Zebra, and other biblio/biblioitems/items relational > database fields. Do we really need a new place to worry about data? > I think we should have a metadata_record table storing the serialized metadata, and more needed information (basically the fields Koha::MetadataRecord has...) and let the fulltext engine do the job for accessing those values. The codebase is already too bloated trying to band-aid our "minimal" usage of the search engines' features. Of course, while trying to fix that we might find our search engine has problems and/or broken functionalities (zebra facets are so slow that are not cool). But we should definitely get rid of tons of code in favour of using the search engine more, and probably have QueryParser be the standard, having a driver for ES... -- Tomás Cohen Arazi Theke Solutions (http://theke.io) ✆ +54 9351 3513384 GPG: B76C 6E7C 2D80 551A C765 E225 0A27 2EA1 B2F3 C15F ___ Koha-devel mailing list Koha-devel@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-devel] Proposed "metadata" table for Koha
Hi all: For those not following along at http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662, weve recently started talking about the possibility of adding a metadata table to Koha. The basic schema I have in mind would be something like: metadata.id, metadata.record_id, metadata.scheme, metadata.qualifier, metadata.value. The row would look like: 1, 1, marc21, 001, 123456789 It might also be necessary to store metadata.record_type so as to know where metadata.record_id points. This obviously has a lot of disadvantages redundant data between metadata rows, no database cascades via foreign keys, etc. However, it might be necessary in the short-term as a temporary measure. Of course, adding yet another place to store metadata might not seem like a great idea. We already store metadata in biblioitems.marcxml (and biblioitems.marc), Zebra, and other biblio/biblioitems/items relational database fields. Do we really need a new place to worry about data? That said, if were ever going to move away from MARC as the internal metadata format, we need to start transitioning to something new. Ive noticed this metadata table model in DSpace and other library systems, and it seems to work reasonable well. I dont know if wed break down the whole record into this structure, or if wed just break down certain fields as defined by a configuration file. In the short term, Id like to use something like this to access a records 001 without going to Zebra, which can be slow to update. I need to be able to query a record using the 001 as soon as its added to the database, and I cant necessarily get that from Zebra. I also need to be able to query a record, even if Zebra is down. Failing the metadata table idea, Im not sure how else wed expose the 001 and any number of other fields without using Zebra. We store the 020 and 022 in biblioitems.isbn and biblioitems.issn, but were putting multiple values in a single field, and thats not so great for searching. We might also want to add the 035 to the fields were searching, so I dont think just adding to the biblio or biblioitems tables will really do especially since were trying to move away from MARC. Anyway, please let me know your thoughts. David Cook Systems Librarian Prosentient Systems 72/330 Wattle St, Ultimo, NSW 2007 ___ Koha-devel mailing list Koha-devel@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
Re: [Koha-devel] Proposed "metadata" table for Koha
For now, I think the metadata.record_id would link to biblionumber, but long-term it would probably link to some “record” table. So if you wanted to get all bibliographic records, you’d do something like: select * from record join metadata ON record.id = metadata.record_id where record.type = ‘bibliographic’ Or maybe you want to search for a bibliographic record with a 001 of 123456789: select * from record join metadata ON record.id = metadata.record_id where record.type = ‘bibliographic’ and metadata.qualifier = ‘001’ and metadata.value = ‘123456789’ -- Of course, off the top of my head, I don’t know how you’d store indicators and subfields in an extensible way. I suppose indicators are attributes and subfields are child elements... I suppose DSpace actually does a “element” and “qualifier” approach for DC. So you’d have a “dc”, “author”, “primary”. Or “marc21” “100” “a”. Of course, that creates a limit of a single level of hierarchy which may or may not be desirable… and still doesn’t account for indicators/attributes. I suppose there is more thinking to do there. David Cook Systems Librarian Prosentient Systems 72/330 Wattle St, Ultimo, NSW 2007 From: Barton Chittenden [mailto:bar...@bywatersolutions.com] Sent: Monday, 30 November 2015 2:17 PM To: David Cook <dc...@prosentient.com.au> Subject: Re: [Koha-devel] Proposed "metadata" table for Koha > The basic schema I have in mind would be something like: metadata.id > <http://metadata.id> , metadata.record_id, metadata.scheme, > metadata.qualifier, metadata.value. > > > > The row would look like: 1, 1, marc21, 001, 123456789 I think this is an interesting idea... Obviously the replication of biblio data is not ideal, but I think that that's a necessary and worthwhile trade off in terms of moving away from MARC. How do you propose linking the metadata fields to the biblio records? Does the metadata.record_id link to biblionumber? ___ Koha-devel mailing list Koha-devel@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
Re: [Koha-devel] Proposed "metadata" table for Koha
> > > > > Of course, off the top of my head, I don’t know how you’d store indicators > and subfields in an extensible way. I suppose indicators are attributes and > subfields are child elements... > > > > I suppose DSpace actually does a “element” and “qualifier” approach for > DC. So you’d have a “dc”, “author”, “primary”. Or “marc21” “100” “a”. Of > course, that creates a limit of a single level of hierarchy which may or > may not be desirable… and still doesn’t account for indicators/attributes. > > > > I suppose there is more thinking to do there. > My mind flew off into several different schemes for recursively sub-dividing metadata. I had to reboot my brain because I ran out of stack space. Dang infinite recursion. This reminded me of a Larry Wall quote ... my memory of the quote was about abstraction, but there was a bit more to it: I think that the biggest mistake people make is latching onto the first > idea that comes to them and trying to do that. It really comes to a thing > that my folks taught me about money. Don't buy something unless you've > wanted it three times. Similarly, don't throw in a feature when you first > think of it. Think if there's a way to generalize it, think if it should be > generalized. Sometimes you can generalize things too much. I think like the > things in Scheme were generalized too much. There is a level of abstraction > beyond which people don't want to go. Take a good look at what you want to > do, and try to come up with the long-term lazy way, not the short-term lazy > way. So... what's the long-term lazy way of handling the sub-division of metadata? --Barton ___ Koha-devel mailing list Koha-devel@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
Re: [Koha-devel] Proposed "metadata" table for Koha
I've been thinking along these lines too recently. I've been thinking 'wouldn't it be nice to do a No-SQL or directory hierarchy sort of thing where you just add subfields and attributes to the record as needed'. Of course in a relational database you would do that by having an attributed field that was serialized by some standard method. But you could only have non-critical information there, since it would have to be unserialized to query it. As far as indicators you would have to have some internally consistent way to map them to serialized attributes (and some subfields would be able to be handled that way too). For example with the indicator for MARC21 245$a you would have an attribute like 'ii',3 for 'indexing ignore first three characters', and you could do the same with authority id's instead of using $9 (of the top of my head) And of course you would have to have some framework(s) in order to convert the metadata to other formats (MARC21, UNIMARC, NORMARC, and DC for example), which would make the requirements for attributes quite large (to handle all the possible indicators and serializable subfields). Something like that I suppose. On Sun, Nov 29, 2015 at 10:02 PM, Barton Chittenden < bar...@bywatersolutions.com> wrote: > >> >> >> Of course, off the top of my head, I don’t know how you’d store >> indicators and subfields in an extensible way. I suppose indicators are >> attributes and subfields are child elements... >> >> >> >> I suppose DSpace actually does a “element” and “qualifier” approach for >> DC. So you’d have a “dc”, “author”, “primary”. Or “marc21” “100” “a”. Of >> course, that creates a limit of a single level of hierarchy which may or >> may not be desirable… and still doesn’t account for indicators/attributes. >> >> >> >> I suppose there is more thinking to do there. >> > > My mind flew off into several different schemes for recursively > sub-dividing metadata. I had to reboot my brain because I ran out of stack > space. Dang infinite recursion. This reminded me of a Larry Wall quote ... > my memory of the quote was about abstraction, but there was a bit more to > it: > > I think that the biggest mistake people make is latching onto the first >> idea that comes to them and trying to do that. It really comes to a thing >> that my folks taught me about money. Don't buy something unless you've >> wanted it three times. Similarly, don't throw in a feature when you first >> think of it. Think if there's a way to generalize it, think if it should be >> generalized. Sometimes you can generalize things too much. I think like the >> things in Scheme were generalized too much. There is a level of abstraction >> beyond which people don't want to go. Take a good look at what you want to >> do, and try to come up with the long-term lazy way, not the short-term lazy >> way. > > > So... what's the long-term lazy way of handling the sub-division of > metadata? > > --Barton > > ___ > Koha-devel mailing list > Koha-devel@lists.koha-community.org > http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel > website : http://www.koha-community.org/ > git : http://git.koha-community.org/ > bugs : http://bugs.koha-community.org/ > -- Michael Hafen Washington County School District Technology Department Systems Analyst ___ Koha-devel mailing list Koha-devel@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/