Re: [Koha-devel] Proposed "metadata" table for Koha

2015-12-08 Thread Tomas Cohen Arazi
2015-12-01 1:44 GMT-03:00 David Cook :
>
> My main concern about Zebra is with it not being fast enough. Tomas
mentioned that Zebra runs updates every 5 seconds, but it looks to me like
rebuild_zebra.pl (via /etc/cron.d/koha-common) is only run every 5 minutes
on a Debian package install. At least that was the case in Koha 3.20.
That’s a huge gap when it comes to importing records. Say you import record
A at 5:01pm… and then you try to import record A again at 5:03 using a
matching rule. The update from 5:01pm hasn’t been processed yet, so you
wind up with 2 copies of record A in your Koha catalogue.

You are right that the default setup is setting a cronjob to check the
queue every 5 minutes. I planned to change that default behaviour to set
USE_INDEXER_DAEMON=yes in /etc/default/koha-common and have the cron line
commented or conditional to USE_INDEXER_DAEMON=no. But I got distracted the
last couple weeks before the release and forgot to post a patch for that.
Anyway, rebuild_zebra.pl -daemon should be run by default.

> We run openSUSE and we define our own Zebra indexer, which does run every
5 seconds. I haven’t stress tested it yet, but 5 seconds might be a bit
long even under ideal circumstances, if an import job is running every 2-3
seconds. Sure, that 2-3 seconds might be a bit optimistic… maybe it will
also be every 5 seconds. But what happens if the Zebra queue backs up?
Someone runs “touch_all_biblios.pl” or something like that and fills up the
zebra queue, while you’re importing records. Zebra is going to be out of
date.

True

> There needs to be a source of truth, and that’s the metadata record in
MySQL. Zebra is an indexed cache, and while usually it doesn’t matter too
much if that cache is a little bit stale, it can matter when you’re
importing records.

I agree the importing step should rely on the source of truth.

> Another scenario… what happens if Zebra is down? You’re going to get
duplicate records because the matcher won’t work. That said, we could
mitigate that by double-checking that Zebra is actually alive
programmatically before commencing an import. There could be other
heuristics as well… like not running an import (which uses matching rules)
unless the zebraqueue only has X number of waiting updates. Ideally, it
would be 0, but that’s unlikely in large systems. It also is impossible if
you’re importing at a rate of more than one every 5 seconds (which would be
absurdly slow).

I wouldn't create a workaround for Zebra being down only for being able to
match existing records... I would just print a red box saying the tool is
not available.

> I am curious as to whether the default Zebra update time for Debian
packages is 5 minutes or 5 seconds. While it doesn’t affect me too much as
I don’t use Debian, it matters for regular users of Koha.

Ok, i'll provide the mentioned patch :-P

--
Tomás Cohen Arazi
Theke Solutions (http://theke.io)
✆ +54 9351 3513384
GPG: B76C 6E7C 2D80 551A C765  E225 0A27 2EA1 B2F3 C15F
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Re: [Koha-devel] Proposed "metadata" table for Koha

2015-11-30 Thread David Cook
Just to contradict myself a bit, it might be worth mentioning that Zebra will 
do a better job with ISSN and ISBN matching, as I think it normalizes those 
strings. That would be nicer than a regular string matching SQL query…

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St, Ultimo, NSW 2007

 

From: Tomas Cohen Arazi [mailto:tomasco...@gmail.com] 
Sent: Tuesday, 1 December 2015 1:20 AM
To: David Cook <dc...@prosentient.com.au>
Cc: koha-devel <koha-devel@lists.koha-community.org>
Subject: Re: [Koha-devel] Proposed "metadata" table for Koha

 

 

 

2015-11-29 21:52 GMT-03:00 David Cook <dc...@prosentient.com.au 
<mailto:dc...@prosentient.com.au> >:

Hi all:

 

For those not following along at 
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662, we’ve recently 
started talking about the possibility of adding a “metadata” table to Koha.

 

The basic schema I have in mind would be something like: metadata.id 
<http://metadata.id> , metadata.record_id, metadata.scheme, metadata.qualifier, 
metadata.value.

 

The row would look like: 1, 1, marc21, 001, 123456789

 

It might also be necessary to store “metadata.record_type” so as to know where 
metadata.record_id points. This obviously has a lot of disadvantages… redundant 
data between “metadata” rows, no database cascades via foreign keys, etc. 
However, it might be necessary in the short-term as a temporary measure.

 

Of course, adding “yet another place” to store metadata might not seem like a 
great idea. We already store metadata in biblioitems.marcxml (and 
biblioitems.marc), Zebra, and other biblio/biblioitems/items relational 
database fields. Do we really need a new place to worry about data?

 

I think we should have a metadata_record table storing the serialized metadata, 
and more needed information (basically the fields Koha::MetadataRecord has...) 
and let the fulltext engine do the job for accessing those values.

 

The codebase is already too bloated trying to band-aid our "minimal" usage of 
the search engines' features. Of course, while trying to fix that we might find 
our search engine has problems and/or broken functionalities (zebra facets are 
so slow that are not cool). But we should definitely get rid of tons of code in 
favour of using the search engine more, and probably have QueryParser be the 
standard, having a driver for ES...

 

-- 

Tomás Cohen Arazi

Theke Solutions (http://theke.io <http://theke.io/> )
✆ +54 9351 3513384
GPG: B76C 6E7C 2D80 551A C765  E225 0A27 2EA1 B2F3 C15F

___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Re: [Koha-devel] Proposed "metadata" table for Koha

2015-11-30 Thread David Cook
I’m not 100% sure what I think yet, but in the past I was certainly in favour 
of a metadata_record table that stored the serialized metadata and whatever 
else it needed to support that. I still think it’s an all right idea to have 
that table.

 

In general, I’m in favour of using the full text engine for searching, although 
as Katrin has noted on  
<http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662> 
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662, Zebra isn’t 
necessarily updated fast enough to be used for “matching” when importing 
records, especially when records are being downloaded and imported every 2-3 
seconds. Also, what happens if Zebra goes down? Suddenly your catalogue gets 
flooded with duplicates. I suppose one way of getting around that is querying 
Zebra to make sure it is alive before starting an import. However, that doesn’t 
solve the speed problem. I don’t think there’s any reliable way of knowing if 
the record you want to match on has already been indexed (or indexed again) in 
Koha. Don’t we only update Zebra once every 60 seconds? 

 

The OAI-PMH import wouldn’t be the only one affected by the indexing. The OCLC 
Connexion daemon and any Staged Import both use Zebra for matching. If Zebra 
hasn’t indexed relevant additions or updates, the matching won’t work when it 
should work. For records in the hundreds, thousands, and millions, that can 
cause major problems both with duplicates and failed updates. 

 

Maybe a “metadata” table is overkill. In fact, I can’t necessarily see a lot of 
advantages to storing mass quantities of metadata in the relational database 
off the top of my head , but perhaps some way of keeping record identifiers in 
the relational database would be doable. 

 

If we think about the metadata in terms of a “source of truth”, the relational 
database is always going to contain the source of truth. Zebra is basically 
just an indexed cache, and when it comes to importing records… I rather be 
querying a source of truth than a cache as the cache might be stale. At the 
moment, it’s going to be stale by at least 1-59 seconds… longer if Zebra has a 
lot of indexing jobs to do when it receives an update.

 

Maybe there’s a way to mitigate that. Like… waiting to do an import until Zebra 
has reported that it’s emptied the zebraqueue X seconds ago, although 
zebraqueue may never be empty. There’s always that possibility that you’re 
going to miss something, and that possibility doesn’t exist in the relational 
database, as it’s the source of truth. If the identifier doesn’t exist in the 
database, then it doesn’t exist for that record (or there’s a software bug 
which can be fixed). 

 

While we probably could use the search engine more throughout Koha, I think it 
might not be wise to use it during an import. 

 

(As for the QueryParser, I totally agree about it being the standard, and 
creating a driver for ES. I chatted with Robin about this a bit over the past 
few months, but I haven’t had time to help out with that. The QueryParser also 
isn’t quite right for Zebra either yet, so it would probably make sense to 
focus on finalizing the PQF driver first.)

 

 

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St, Ultimo, NSW 2007

 

From: Tomas Cohen Arazi [mailto:tomasco...@gmail.com] 
Sent: Tuesday, 1 December 2015 1:20 AM
To: David Cook <dc...@prosentient.com.au>
Cc: koha-devel <koha-devel@lists.koha-community.org>
Subject: Re: [Koha-devel] Proposed "metadata" table for Koha

 

I think we should have a metadata_record table storing the serialized metadata, 
and more needed information (basically the fields Koha::MetadataRecord has...) 
and let the fulltext engine do the job for accessing those values.

 

The codebase is already too bloated trying to band-aid our "minimal" usage of 
the search engines' features. Of course, while trying to fix that we might find 
our search engine has problems and/or broken functionalities (zebra facets are 
so slow that are not cool). But we should definitely get rid of tons of code in 
favour of using the search engine more, and probably have QueryParser be the 
standard, having a driver for ES...

 

-- 

Tomás Cohen Arazi

Theke Solutions (http://theke.io <http://theke.io/> )
✆ +54 9351 3513384
GPG: B76C 6E7C 2D80 551A C765  E225 0A27 2EA1 B2F3 C15F

___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Re: [Koha-devel] Proposed "metadata" table for Koha

2015-11-30 Thread Jesse
If we do this, I very much vote for doing it the way Tomas is describing
(aka, storing entire chunks of metadata as blobs). Koha 2.2 had a
row-per-subfield structure kind of like what you're suggesting, and it
required a lot of monkeying around to accurately represent all the vagaries
and ordering of MARC subfields. It was also (from what I remember) quite a
disk hog.

2015-11-30 15:42 GMT-07:00 David Cook <dc...@prosentient.com.au>:

> Just to contradict myself a bit, it might be worth mentioning that Zebra
> will do a better job with ISSN and ISBN matching, as I think it normalizes
> those strings. That would be nicer than a regular string matching SQL query…
>
>
>
> David Cook
>
> Systems Librarian
>
> Prosentient Systems
>
> 72/330 Wattle St, Ultimo, NSW 2007
>
>
>
> *From:* Tomas Cohen Arazi [mailto:tomasco...@gmail.com]
> *Sent:* Tuesday, 1 December 2015 1:20 AM
> *To:* David Cook <dc...@prosentient.com.au>
> *Cc:* koha-devel <koha-devel@lists.koha-community.org>
> *Subject:* Re: [Koha-devel] Proposed "metadata" table for Koha
>
>
>
>
>
>
>
> 2015-11-29 21:52 GMT-03:00 David Cook <dc...@prosentient.com.au>:
>
> Hi all:
>
>
>
> For those not following along at
> http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662, we’ve
> recently started talking about the possibility of adding a “metadata” table
> to Koha.
>
>
>
> The basic schema I have in mind would be something like: metadata.id,
> metadata.record_id, metadata.scheme, metadata.qualifier, metadata.value.
>
>
>
> The row would look like: 1, 1, marc21, 001, 123456789
>
>
>
> It might also be necessary to store “metadata.record_type” so as to know
> where metadata.record_id points. This obviously has a lot of disadvantages…
> redundant data between “metadata” rows, no database cascades via foreign
> keys, etc. However, it might be necessary in the short-term as a temporary
> measure.
>
>
>
> Of course, adding “yet another place” to store metadata might not seem
> like a great idea. We already store metadata in biblioitems.marcxml (and
> biblioitems.marc), Zebra, and other biblio/biblioitems/items relational
> database fields. Do we really need a new place to worry about data?
>
>
>
> I think we should have a metadata_record table storing the serialized
> metadata, and more needed information (basically the fields
> Koha::MetadataRecord has...) and let the fulltext engine do the job for
> accessing those values.
>
>
>
> The codebase is already too bloated trying to band-aid our "minimal" usage
> of the search engines' features. Of course, while trying to fix that we
> might find our search engine has problems and/or broken functionalities
> (zebra facets are so slow that are not cool). But we should definitely get
> rid of tons of code in favour of using the search engine more, and probably
> have QueryParser be the standard, having a driver for ES...
>
>
>
> --
>
> Tomás Cohen Arazi
>
> Theke Solutions (http://theke.io)
> ✆ +54 9351 3513384
> GPG: B76C 6E7C 2D80 551A C765  E225 0A27 2EA1 B2F3 C15F
>
> ___
> Koha-devel mailing list
> Koha-devel@lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/
> git : http://git.koha-community.org/
> bugs : http://bugs.koha-community.org/
>



-- 
Jesse Weaver
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Re: [Koha-devel] Proposed "metadata" table for Koha

2015-11-30 Thread David Cook
I can’t remember if I sent another message just to Tomas or if it was to the 
list, but I can see how a “metadata” table might be heavy handed, so I 
suggested an “identifier” table which would extract identifiers from the record 
for easy recall. Of course, as I said in the previous email, Zebra is going to 
do a better job in some cases because it can normalize values for indexing and 
retrieval while MySQL would not. 

 

My main concern about Zebra is with it not being fast enough. Tomas mentioned 
that Zebra runs updates every 5 seconds, but it looks to me like 
rebuild_zebra.pl (via /etc/cron.d/koha-common) is only run every 5 minutes on a 
Debian package install. At least that was the case in Koha 3.20. That’s a huge 
gap when it comes to importing records. Say you import record A at 5:01pm… and 
then you try to import record A again at 5:03 using a matching rule. The update 
from 5:01pm hasn’t been processed yet, so you wind up with 2 copies of record A 
in your Koha catalogue. 

 

We run openSUSE and we define our own Zebra indexer, which does run every 5 
seconds. I haven’t stress tested it yet, but 5 seconds might be a bit long even 
under ideal circumstances, if an import job is running every 2-3 seconds. Sure, 
that 2-3 seconds might be a bit optimistic… maybe it will also be every 5 
seconds. But what happens if the Zebra queue backs up? Someone runs 
“touch_all_biblios.pl” or something like that and fills up the zebra queue, 
while you’re importing records. Zebra is going to be out of date. 

 

There needs to be a source of truth, and that’s the metadata record in MySQL. 
Zebra is an indexed cache, and while usually it doesn’t matter too much if that 
cache is a little bit stale, it can matter when you’re importing records.

 

Another scenario… what happens if Zebra is down? You’re going to get duplicate 
records because the matcher won’t work. That said, we could mitigate that by 
double-checking that Zebra is actually alive programmatically before commencing 
an import. There could be other heuristics as well… like not running an import 
(which uses matching rules) unless the zebraqueue only has X number of waiting 
updates. Ideally, it would be 0, but that’s unlikely in large systems. It also 
is impossible if you’re importing at a rate of more than one every 5 seconds 
(which would be absurdly slow).

 

So MySQL is the source of truth… but we can’t very well do an ExtractValue 
query on biblioitems.marcxml when we have a database with over a million rows 
(the threshold for a speedy query is probably actually much lower than that).

 

(On a related note, I think we’re also going to run into massive problems with 
Authority records, as I don’t think the 001 of incoming records is saved at 
all. Maybe the MARC templates can handle that scenario though. I admit I 
haven’t looked at them much. We probably should be moving the 001 into the 035 
for both bibliographic and authority records during an import…)

 

--

 

I suppose it’s possible that we might have to suck up that Zebra might not be 
fast enough. I suppose the onus is on me (and whoever else is interested in not 
using Zebra for matching like Katrina and Andreas) to prove that Zebra is too 
slow under stress. 

 

I am curious as to whether the default Zebra update time for Debian packages is 
5 minutes or 5 seconds. While it doesn’t affect me too much as I don’t use 
Debian, it matters for regular users of Koha. 

 

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St, Ultimo, NSW 2007

 

From: Jesse [mailto:pianohac...@gmail.com] 
Sent: Tuesday, 1 December 2015 12:03 PM
To: David Cook <dc...@prosentient.com.au>
Cc: koha-devel <koha-devel@lists.koha-community.org>
Subject: Re: [Koha-devel] Proposed "metadata" table for Koha

 

If we do this, I very much vote for doing it the way Tomas is describing (aka, 
storing entire chunks of metadata as blobs). Koha 2.2 had a row-per-subfield 
structure kind of like what you're suggesting, and it required a lot of 
monkeying around to accurately represent all the vagaries and ordering of MARC 
subfields. It was also (from what I remember) quite a disk hog. 

 

2015-11-30 15:42 GMT-07:00 David Cook <dc...@prosentient.com.au 
<mailto:dc...@prosentient.com.au> >:

Just to contradict myself a bit, it might be worth mentioning that Zebra will 
do a better job with ISSN and ISBN matching, as I think it normalizes those 
strings. That would be nicer than a regular string matching SQL query…

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St, Ultimo, NSW 2007

 

From: Tomas Cohen Arazi [mailto:tomasco...@gmail.com 
<mailto:tomasco...@gmail.com> ] 
Sent: Tuesday, 1 December 2015 1:20 AM
To: David Cook <dc...@prosentient.com.au <mailto:dc...@prosentient.com.au> >
Cc: koha-devel <koha-devel@lists.koha-community.org 
<mailto:koha-devel@lists.koha-community.org> >
Subject: Re: [Koha-deve

Re: [Koha-devel] Proposed "metadata" table for Koha

2015-11-30 Thread Tomas Cohen Arazi
2015-11-29 21:52 GMT-03:00 David Cook :

> Hi all:
>
>
>
> For those not following along at
> http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662, we’ve
> recently started talking about the possibility of adding a “metadata” table
> to Koha.
>
>
>
> The basic schema I have in mind would be something like: metadata.id,
> metadata.record_id, metadata.scheme, metadata.qualifier, metadata.value.
>
>
>
> The row would look like: 1, 1, marc21, 001, 123456789
>
>
>
> It might also be necessary to store “metadata.record_type” so as to know
> where metadata.record_id points. This obviously has a lot of disadvantages…
> redundant data between “metadata” rows, no database cascades via foreign
> keys, etc. However, it might be necessary in the short-term as a temporary
> measure.
>
>
>
> Of course, adding “yet another place” to store metadata might not seem
> like a great idea. We already store metadata in biblioitems.marcxml (and
> biblioitems.marc), Zebra, and other biblio/biblioitems/items relational
> database fields. Do we really need a new place to worry about data?
>

I think we should have a metadata_record table storing the serialized
metadata, and more needed information (basically the fields
Koha::MetadataRecord has...) and let the fulltext engine do the job for
accessing those values.

The codebase is already too bloated trying to band-aid our "minimal" usage
of the search engines' features. Of course, while trying to fix that we
might find our search engine has problems and/or broken functionalities
(zebra facets are so slow that are not cool). But we should definitely get
rid of tons of code in favour of using the search engine more, and probably
have QueryParser be the standard, having a driver for ES...

-- 
Tomás Cohen Arazi
Theke Solutions (http://theke.io)
✆ +54 9351 3513384
GPG: B76C 6E7C 2D80 551A C765  E225 0A27 2EA1 B2F3 C15F
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-devel] Proposed "metadata" table for Koha

2015-11-29 Thread David Cook
Hi all:

 

For those not following along at
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662, we’ve
recently started talking about the possibility of adding a “metadata” table
to Koha.

 

The basic schema I have in mind would be something like: metadata.id,
metadata.record_id, metadata.scheme, metadata.qualifier, metadata.value.

 

The row would look like: 1, 1, marc21, 001, 123456789

 

It might also be necessary to store “metadata.record_type” so as to know
where metadata.record_id points. This obviously has a lot of disadvantages…
redundant data between “metadata” rows, no database cascades via foreign
keys, etc. However, it might be necessary in the short-term as a temporary
measure.

 

Of course, adding “yet another place” to store metadata might not seem like
a great idea. We already store metadata in biblioitems.marcxml (and
biblioitems.marc), Zebra, and other biblio/biblioitems/items relational
database fields. Do we really need a new place to worry about data?

That said, if we’re ever going to move away from MARC as the internal
metadata format, we need to start transitioning to something new. I’ve
noticed this “metadata” table model in DSpace and other library systems, and
it seems to work reasonable well. 

 

I don’t know if we’d break down the whole record into this structure, or if
we’d just break down certain fields as defined by a configuration file. In
the short term, I’d like to use something like this to access a record’s 001
without going to Zebra, which can be slow to update. I need to be able to
query a record using the 001 as soon as its added to the database, and I
can’t necessarily get that from Zebra. I also need to be able to query a
record, even if Zebra is down.

 

Failing the “metadata” table idea, I’m not sure how else we’d expose the 001
and any number of other fields without using Zebra. We store the 020 and 022
in biblioitems.isbn and biblioitems.issn, but we’re putting multiple values
in a single field, and that’s not so great for searching. We might also want
to add the 035 to the fields we’re searching, so I don’t think just adding
to the biblio or biblioitems tables will really do… especially since we’re
trying to move away from MARC.

 

Anyway, please let me know your thoughts. 

 

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St, Ultimo, NSW 2007

 

___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Re: [Koha-devel] Proposed "metadata" table for Koha

2015-11-29 Thread David Cook
For now, I think the metadata.record_id would link to biblionumber, but 
long-term it would probably link to some “record” table. So if you wanted to 
get all bibliographic records, you’d do something like:

select * from record

join metadata ON record.id = metadata.record_id

where record.type = ‘bibliographic’

 

Or maybe you want to search for a bibliographic record with a 001 of 123456789:

 

select * from record

join metadata ON record.id = metadata.record_id

where record.type = ‘bibliographic’ and metadata.qualifier = ‘001’ and 
metadata.value = ‘123456789’

 

--

 

Of course, off the top of my head, I don’t know how you’d store indicators and 
subfields in an extensible way. I suppose indicators are attributes and 
subfields are child elements...

 

I suppose DSpace actually does a “element” and “qualifier” approach for DC. So 
you’d have a “dc”, “author”, “primary”. Or “marc21” “100” “a”. Of course, that 
creates a limit of a single level of hierarchy which may or may not be 
desirable… and still doesn’t account for indicators/attributes.

 

I suppose there is more thinking to do there.

 

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St, Ultimo, NSW 2007

 

From: Barton Chittenden [mailto:bar...@bywatersolutions.com] 
Sent: Monday, 30 November 2015 2:17 PM
To: David Cook <dc...@prosentient.com.au>
Subject: Re: [Koha-devel] Proposed "metadata" table for Koha

 


> The basic schema I have in mind would be something like: metadata.id 
> <http://metadata.id> , metadata.record_id, metadata.scheme, 
> metadata.qualifier, metadata.value.
>
>  
>
> The row would look like: 1, 1, marc21, 001, 123456789

I think this is an interesting idea... Obviously the replication of biblio data 
is not ideal, but I think that that's a necessary and worthwhile trade off in 
terms of moving away from MARC.

How do you propose linking the metadata fields to the biblio records? Does the 
metadata.record_id link to biblionumber?

___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Re: [Koha-devel] Proposed "metadata" table for Koha

2015-11-29 Thread Barton Chittenden
>
>
>
>
> Of course, off the top of my head, I don’t know how you’d store indicators
> and subfields in an extensible way. I suppose indicators are attributes and
> subfields are child elements...
>
>
>
> I suppose DSpace actually does a “element” and “qualifier” approach for
> DC. So you’d have a “dc”, “author”, “primary”. Or “marc21” “100” “a”. Of
> course, that creates a limit of a single level of hierarchy which may or
> may not be desirable… and still doesn’t account for indicators/attributes.
>
>
>
> I suppose there is more thinking to do there.
>

My mind flew off into several different schemes for recursively
sub-dividing metadata. I had to reboot my brain because I ran out of stack
space. Dang infinite recursion. This reminded me of a Larry Wall quote ...
my memory of the quote was about abstraction, but there was a bit more to
it:

 I think that the biggest mistake people make is latching onto the first
> idea that comes to them and trying to do that. It really comes to a thing
> that my folks taught me about money. Don't buy something unless you've
> wanted it three times. Similarly, don't throw in a feature when you first
> think of it. Think if there's a way to generalize it, think if it should be
> generalized. Sometimes you can generalize things too much. I think like the
> things in Scheme were generalized too much. There is a level of abstraction
> beyond which people don't want to go. Take a good look at what you want to
> do, and try to come up with the long-term lazy way, not the short-term lazy
> way.


So... what's the long-term lazy way of handling the sub-division of
metadata?

--Barton
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Re: [Koha-devel] Proposed "metadata" table for Koha

2015-11-29 Thread Michael Hafen
I've been thinking along these lines too recently.  I've been thinking
'wouldn't it be nice to do a No-SQL or directory hierarchy sort of thing
where you just add subfields and attributes to the record as needed'.  Of
course in a relational database you would do that by having an attributed
field that was serialized by some standard method.  But you could only have
non-critical information there, since it would have to be unserialized to
query it.  As far as indicators you would have to have some internally
consistent way to map them to serialized attributes (and some subfields
would be able to be handled that way too).  For example with the indicator
for MARC21 245$a you would have an attribute like 'ii',3 for 'indexing
ignore first three characters', and you could do the same with authority
id's instead of using $9 (of the top of my head)  And of course you would
have to have some framework(s) in order to convert the metadata to other
formats (MARC21, UNIMARC, NORMARC, and DC for example), which would make
the requirements for attributes quite large (to handle all the possible
indicators and serializable subfields).
Something like that I suppose.

On Sun, Nov 29, 2015 at 10:02 PM, Barton Chittenden <
bar...@bywatersolutions.com> wrote:

>
>>
>>
>> Of course, off the top of my head, I don’t know how you’d store
>> indicators and subfields in an extensible way. I suppose indicators are
>> attributes and subfields are child elements...
>>
>>
>>
>> I suppose DSpace actually does a “element” and “qualifier” approach for
>> DC. So you’d have a “dc”, “author”, “primary”. Or “marc21” “100” “a”. Of
>> course, that creates a limit of a single level of hierarchy which may or
>> may not be desirable… and still doesn’t account for indicators/attributes.
>>
>>
>>
>> I suppose there is more thinking to do there.
>>
>
> My mind flew off into several different schemes for recursively
> sub-dividing metadata. I had to reboot my brain because I ran out of stack
> space. Dang infinite recursion. This reminded me of a Larry Wall quote ...
> my memory of the quote was about abstraction, but there was a bit more to
> it:
>
>  I think that the biggest mistake people make is latching onto the first
>> idea that comes to them and trying to do that. It really comes to a thing
>> that my folks taught me about money. Don't buy something unless you've
>> wanted it three times. Similarly, don't throw in a feature when you first
>> think of it. Think if there's a way to generalize it, think if it should be
>> generalized. Sometimes you can generalize things too much. I think like the
>> things in Scheme were generalized too much. There is a level of abstraction
>> beyond which people don't want to go. Take a good look at what you want to
>> do, and try to come up with the long-term lazy way, not the short-term lazy
>> way.
>
>
> So... what's the long-term lazy way of handling the sub-division of
> metadata?
>
> --Barton
>
> ___
> Koha-devel mailing list
> Koha-devel@lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/
> git : http://git.koha-community.org/
> bugs : http://bugs.koha-community.org/
>



-- 
Michael Hafen
Washington County School District Technology Department
Systems Analyst
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/