Hi all,
John, the supplemented approach you describe is how we go about it in
our Lemon8-XML (L8X) software (http://pkp.sfu.ca/lemon8); The way L8X
handles parsing is it passes the original unparsed string to a number
of different parsers in turn (Freecite, each of the 3 Paracite
complex topic that I'm hoping to make the
subject of a submission to the Code4Lib journal. :-)
MJ
MJ Suhonos [EMAIL PROTECTED] 11/14/08 3:18 PM
Hi all,
John, the supplemented approach you describe is how we go about it in
our Lemon8-XML (L8X) software (http://pkp.sfu.ca/lemon8); The way L8X
Hi Phil,
I was just at a PKP workshop in Sydney in December, and the same
developers from the Australian National University who developed the
OJS METS export plugin unveiled a SWORD 1.2 deposit plugin that works
with both Fedora and DSpace:
Hi all, couldn't resist jumping in on this one:
But appears that the handle system is quite a bit more fleshed out than a
simple purl server, it's a distributed protocol-independent network. The
protocol-independent part may or may not be useful, but it certainly seems
like it could be,
I would definitely nominate the Qubit Toolkit and the PKP software suite as
candidates for this list:
http://qubit-toolkit.org/
http://pkp.sfu.ca/
Qubit is somewhat nascent, but is actively being developed and is fairly
well-supported (by the ICA, UNESCO, LAC, among others), and the PKP suite
I think that the single critical question to ask about any
development in a digital library environment is it's ability
to deal with Unicode and it's related standards such as UTF-8.
Last time I looked at it, PHP had problems is that area.
These problems will bedevil anything you write
in from elsewhere.
MJ
who also loves Kingston in the spring (but more in the summer when CORK is on)
On 2010-01-20, at 10:32 AM, Walter Lewis wrote:
On 20 Jan 10, at 10:16 AM, MJ Suhonos wrote:
I think mode of transportation is something to consider; for those of us in
South/Eastern Ontario
+1 Thursday-Friday 6-7 May here as well.
MJ
On 2010-01-27, at 10:50 PM, William Denton wrote:
I went through all the mail about this and counted a + for each of the top
two choices people made (if they made two; otherwise just one + for their
single vote). The results:
Kingston
Yes, a group of us at the University of British Columbia and Simon Fraser
University in sushi-ski-beach-beer-MichaelBuble-soaked Vancouver, BC are
intending on submitting a proposal to host.
More specifically, I wonder what thoughts people have about how a VanC4L2011
might affect / be
More specifically, I wonder what thoughts people have about how a
VanC4L2011 might affect / be affected by the C4L North proposal, and
Eric's comment that C4L was originally envisioned as an Access USA.
There seems to be a strong contingent on both sides of the 49th
parallel these days.
Contemporary library web development: a Series of Hoses.
http://en.wikipedia.org/wiki/Series_of_tubes
MJ
On 2010-03-25, at 11:00 AM, Joe Hourcle wrote:
On Thu, 25 Mar 2010, Brian Stamper wrote:
On Wed, 24 Mar 2010 17:51:38 -0400, Mark Tomko mark.to...@simmons.edu
wrote:
I wouldn't
Also...it's pretty good for plugging leaks in ducts.
Actually, true story:
I was in the hardware store, poking around the tape section, with a roll of
your typical silver duct tape in my hand, obviously browsing. An employee came
up to me asking what I was looking for, and for what purpose.
So far there are just three people with ideas for talks (me, Walter Lewis,
Art Rhyno).
I added mytpl.ca (alluringly entitled Location-aware Mobile Search). I
figure it could be a good trailer for the forthcoming journal article. ;-)
MJ
- there is a JavaScript CSL-Processor. JavaScript is kind of a punishment but
it is the natural environment for the Web 2.0 Mashup crowd that is going to
implement applications that use Twitter annotations
A quick word of caution here; we got excited about citeproc-js until learning
that it
Hi all,
I'm digging into earlier threads on Code4Lib and NGC4lib and trying to get some
concrete examples around the DCTERMS element set — maybe I haven't been a
subscriber for long enough.
What I'm looking for in particular are things I can work with *in
code/implementation*, most notably:
Okay, I know it's cool to hate on OpenURL, but I feel I have to clarify a few
points:
OpenURL is of no use if you seperate it from the existing infrastructure
which is mainly held by companies. No sane person will try to build an open
alternative infrastructure because OpenURL is a crapy
It's not that it's cool to hate on OpenURL, but if you've really
worked with it it's easy to grow bitter.
Well, fair enough. Perhaps what I'm defending isn't OpenURL per se, but rather
the concept of being able to transport descriptive assertions the way the 1.0
spec proposes.
The reason
Let me correct myself (for the detail-oriented among us):
Actually the difference between OpenURL and DC is that one is a transport
protocol and one is a metadata schema. :-)
OpenURL is a *serialization format* which happens to be actionable by a
transport protocol (HTTP), which is its main
What I hope for is that OpenURL 1.0 eventually takes a place alongside SGML
as a too-complex standard that directly paves the way for a universally
adopted foundational technology like XML. What I fear is that it takes a
place alongside MARC as an anachronistic standard that paralyzes an
dcterms so so terribly lossy that it would be a shame to reduce MARC to it.
This is *precisely* the other half of my rationale — a shame? Why? If MARC is
the mind prison that some purport it to be, then let's see what a system built
devoid of MARC, but based on the best alternative we have
NB: When Karen Coyle, Eric Morgan, and Roy Tennant all reply to your thread
within half an hour of each other, you know you've hit the big time. Time to
retire young I think.
That would be Eric *Lease* Morgan — oh my god, you're right! I'm already
losing data! It *is* insidious! I
I'd just like to say a word of thanks for everyone who has contributed so far
on this thread. The viewpoints raised certainly help clarify at least my
understanding of some of the issues and concepts involved.
MARCXML is a step in the right direction. MODS goes even further. Neither
really
Let me give another example: the Open Library API returns a JSON tree, eg.
http://openlibrary.org/books/OL1M.json
But what schema is this? And if it doesn't conform to a standard schema,
does that make it useless? If it were based on DCTERMS, at least I'd have a
reference at
functionally requires
semantics beyond those in the DCTERMS. All the better if some of those terms
just happen to be available in Bibliontology or some other namespace...
Thanks again,
-Corey
MJ Suhonos wrote:
Let me give another example: the Open Library API returns a JSON tree,
eg. http
There's been some talk in code4lib about using MongoDB to store MARC
records in some kind of JSON format. I'd like to know if you have
experimented with indexing those documents in MongoDB. From my limited
exposure to MongoDB, it seems difficult, unless MongoDB supports some
kind of custom
Sorry, meant to include this link, which compares Elastic Search and Solr:
http://blog.sematext.com/2010/05/03/elastic-search-distributed-lucene/
MJ
It's helpful to think of MARCXML as a sort of lingua franca.
- Existing libraries for reading, manipulating and searching XML-based
documents are very mature.
Including XSLT and XPath; very powerful stuff.
There's nothing stopping you from reading the MARCXML into a binary blob and
working
I'll just leave this here:
http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records
That trade-off ought to offend both camps, though I happen to think it's quite
clever.
MJ
On 2010-10-25, at 3:22 PM, Eric Hellman wrote:
I think you'd have a very hard time demonstrating any
JSON++
I routinely re-index about 2.5M JSON records (originally from binary MARC), and
it's several orders of magnitude faster than XML (measured in single-digit
minutes rather than double-digit hours). I'm not sure if it's in the same
range as binary MARC, but as Tim says, it's plenty fast
But it looks just like the old thing using insert data scheme and some
templates?
Ah yes, but now we're doing it in XML!
I think this applies to 90% of instances where XML was adopted, especially
within the enterprise IT industry. Through marketing or misunderstanding,
XML was
be
a non-alphanumeric attribute value in MARCXML? Is this a non-MARC21 thing?
C
On 10/25/10 3:35 PM, MJ Suhonos wrote:
I'll just leave this here:
http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records
That trade-off ought to offend both camps, though I happen to think
The first comment claims a 30-40% increase in XML parsing, which seems
obvious when you compare the number of characters in the example provided:
277 vs. 419, or about 34% fewer going through the parser.
The speedup can be much greater than that -- from the blog post
itself, Using
Hi all,
I've actually worked with the Public Knowledge Project for many years, so just
to shed a little light on the PHP framework that we use: Alec Smecher, our lead
architect, has gone on record several times as saying that the last thing the
world needs is another PHP framework.
It came
Likewise, I've been using it since mid-2010 (0.6.0). What do you want to know
about it?
MJ
To these responses, I would also add: extremely easy to install and configure
-- that is, NO configuration is required to get it running out-of-the-box
(including schema definitions, servlet containers, etc.) This alone was what
drew me to ES in lieu of Solr way back, though I don't know if it
35 matches
Mail list logo