Re: [dspace-tech] Special characters in metadata

2017-08-14 Thread Andrea Schweer

  
  
Hi Gary,

I think the answer unfortunately is, when repositories were
"invented" the stance on special characters in metadata was not to
use them. That's what the ecosystem around them has used as an
underlying assumption -- see harvesters etc. So at this stage,
workarounds are the best you're going to get unfortunately.

With the curation task approach, I don't think you're necessarily
looking at a nightmare of keeping the fields in synch, just one-off
custom development to create the task and hook it up to metadata
change events.

cheers,
Andrea

On 08/15/2017 09:52 AM, Gary Browne
  wrote:


  
  
  
  
  
  
Thanks Claudia, Mark
and Andrea for your comments.
 
It makes intuitive
sense to me to avoid HTML “pollution” within the metadata
fields. But it still raises the issue of what to do about
special characters in metadata fields.
 
The field + formatted
field idea seems ok, but I fear it will be a bit of a data
management nightmare.
 
Thanks again for your
thoughts on this.
 
Gary
 

   
  Gary Browne | Technical Manager,
Developments
Online Services
University of Sydney Library
THE UNIVERSITY OF SYDNEY
Level 1, Fisher Library F03, The University of Sydney NSW
2006
T +61 2 9351 5946 | M +61 405 647 868
E gary.bro...@sydney.edu.au

 
 

  From: 
Andrea
  Schweer 
  Date: Tuesday, 15 August 2017 at 7:26 am
  To: "Mark H. Wood" ,
  DSpace Technical Support
  , Gary Browne
      
      Subject: Re: [dspace-tech] Special characters in
  metadata


   

Hi Gary, all,

  On 08/15/2017 02:03 AM, Mark H. Wood
wrote:


  
On Sunday, August 13, 2017 at 9:26:56
  PM UTC-4, Gary Browne wrote:
  

  This leads me to a more general
question of how people handle special characters in the
metadata, generally speaking?


Is this usually accomplished using Unicode, or are there
hacks to allow HTML (I presume including HTML in
metadata values is generally frowned upon)?



  

They must be using Unicode.  Only a few fields are
equipped to render HTML *as* HTML.  I haven't checked,
but I think we'd find that all of these are fields such
as abstract which are displayed as block elements, not
inline fields like title and author.

  


  It's pretty easy to make DSpace (XMLUI) render HTML as HTML.
  Look at how the introductory text for collection pages is
  rendered; it's really just a matter of using copy-of not
  value-of in the XSL crosswalk.
  
  https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace-xmlui-mirage2/src/main/webapp/xsl/aspect/artifactbrowser/collection-view.xsl#L58
  However, I'd be very careful with this; you wouldn't want to
  allow just about anything and risk showing malicious content
  on your item pages. Plus of course, Mark's comment on
  harvesters:
  
  
  

  

  And those HTML-enabled fields raise
another question:  what are harvesters to make of
metadata which are sprinkled with HTML?  Even if the
harvesting site is using the data for display, it may
not be taking any trouble to render embedded HTML.  If
the harvesting site wants plain text (e.g. for
searching), what will it do with the HTML pollution?

  


  The best way I can think of (this has already been suggested
  to the U Sydney folks on a different mailing list by someone
  else) is to have two parallel fields: one for the "formatted"
  version, one for plain text. Then you can expose the plain
  text one to harvesters / search indexing and use the formatted
  one for item pages in your repository. The challenge will be
  keeping the two in synch -- I guess you could instruct
  repository admin staff to only edit the formatted version, and
  w

Re: [dspace-tech] Special characters in metadata

2017-08-14 Thread Gary Browne
Thanks Claudia, Mark and Andrea for your comments.

It makes intuitive sense to me to avoid HTML “pollution” within the metadata 
fields. But it still raises the issue of what to do about special characters in 
metadata fields.

The field + formatted field idea seems ok, but I fear it will be a bit of a 
data management nightmare.

Thanks again for your thoughts on this.

Gary


Gary Browne | Technical Manager, Developments
Online Services
University of Sydney Library
THE UNIVERSITY OF SYDNEY
Level 1, Fisher Library F03, The University of Sydney NSW 2006
T +61 2 9351 5946 | M +61 405 647 868
E 
gary.bro...@sydney.edu.au<https://webmail.sydney.edu.au/owa/redir.aspx?C=OXYu29eFmlOiJviVN3CHunM5oGoASVvNNYb-H0ZnmZGiO6bY9qPUCA..&URL=mailto%3agary.browne%40sydney.edu.au>


From: Andrea Schweer 
Date: Tuesday, 15 August 2017 at 7:26 am
To: "Mark H. Wood" , DSpace Technical Support 
, Gary Browne 
Subject: Re: [dspace-tech] Special characters in metadata

Hi Gary, all,
On 08/15/2017 02:03 AM, Mark H. Wood wrote:
On Sunday, August 13, 2017 at 9:26:56 PM UTC-4, Gary Browne wrote:
This leads me to a more general question of how people handle special 
characters in the metadata, generally speaking?

Is this usually accomplished using Unicode, or are there hacks to allow HTML (I 
presume including HTML in metadata values is generally frowned upon)?


They must be using Unicode.  Only a few fields are equipped to render HTML *as* 
HTML.  I haven't checked, but I think we'd find that all of these are fields 
such as abstract which are displayed as block elements, not inline fields like 
title and author.

It's pretty easy to make DSpace (XMLUI) render HTML as HTML. Look at how the 
introductory text for collection pages is rendered; it's really just a matter 
of using copy-of not value-of in the XSL crosswalk.
https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace-xmlui-mirage2/src/main/webapp/xsl/aspect/artifactbrowser/collection-view.xsl#L58<https://protect-au.mimecast.com/s/GN1YBofl4L5S3?domain=github.com>
However, I'd be very careful with this; you wouldn't want to allow just about 
anything and risk showing malicious content on your item pages. Plus of course, 
Mark's comment on harvesters:


And those HTML-enabled fields raise another question:  what are harvesters to 
make of metadata which are sprinkled with HTML?  Even if the harvesting site is 
using the data for display, it may not be taking any trouble to render embedded 
HTML.  If the harvesting site wants plain text (e.g. for searching), what will 
it do with the HTML pollution?

The best way I can think of (this has already been suggested to the U Sydney 
folks on a different mailing list by someone else) is to have two parallel 
fields: one for the "formatted" version, one for plain text. Then you can 
expose the plain text one to harvesters / search indexing and use the formatted 
one for item pages in your repository. The challenge will be keeping the two in 
synch -- I guess you could instruct repository admin staff to only edit the 
formatted version, and write a curation task that strips the formatting and 
puts the remainder into the plain text version. Or of course keep the two 
values in synch manually.

cheers,
Andrea



--

Dr Andrea Schweer

Lead Software Developer, ITS Information Systems

The University of Waikato, Hamilton, New Zealand

+64-7-837 9120

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Special characters in metadata

2017-08-14 Thread Andrea Schweer

  
  
Hi Gary, all,

On 08/15/2017 02:03 AM, Mark H. Wood
  wrote:


  On Sunday, August 13, 2017 at 9:26:56 PM UTC-4,
Gary Browne wrote:
This
  leads me to a more general question of how people handle
  special characters in the metadata, generally speaking?
  
  
  Is this usually accomplished using Unicode, or are there hacks
  to allow HTML (I presume including HTML in metadata values is
  generally frowned upon)?
  


  
  They must be using Unicode.  Only a few fields are equipped to
  render HTML *as* HTML.  I haven't checked, but I think we'd
  find that all of these are fields such as abstract which are
  displayed as block elements, not inline fields like title and
  author.

  


It's pretty easy to make DSpace (XMLUI) render HTML as HTML. Look at
how the introductory text for collection pages is rendered; it's
really just a matter of using copy-of not value-of in the XSL
crosswalk. 
https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace-xmlui-mirage2/src/main/webapp/xsl/aspect/artifactbrowser/collection-view.xsl#L58
However, I'd be very careful with this; you wouldn't want to allow
just about anything and risk showing malicious content on your item
pages. Plus of course, Mark's comment on harvesters:


  
And those HTML-enabled fields raise another question:  what
  are harvesters to make of metadata which are sprinkled with
  HTML?  Even if the harvesting site is using the data for
  display, it may not be taking any trouble to render embedded
  HTML.  If the harvesting site wants plain text (e.g. for
  searching), what will it do with the HTML pollution?

  


The best way I can think of (this has already been suggested to the
U Sydney folks on a different mailing list by someone else) is to
have two parallel fields: one for the "formatted" version, one for
plain text. Then you can expose the plain text one to harvesters /
search indexing and use the formatted one for item pages in your
repository. The challenge will be keeping the two in synch -- I
guess you could instruct repository admin staff to only edit the
formatted version, and write a curation task that strips the
formatting and puts the remainder into the plain text version. Or of
course keep the two values in synch manually.

cheers,
Andrea

-- 
Dr Andrea Schweer
Lead Software Developer, ITS Information Systems
The University of Waikato, Hamilton, New Zealand
+64-7-837 9120
  




-- 
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Special characters in metadata

2017-08-14 Thread Mark H. Wood
On Sunday, August 13, 2017 at 9:26:56 PM UTC-4, Gary Browne wrote:
>
> This leads me to a more general question of how people handle special 
> characters in the metadata, generally speaking? 
>
> Is this usually accomplished using Unicode, or are there hacks to allow 
> HTML (I presume including HTML in metadata values is generally frowned 
> upon)? 
>


They must be using Unicode.  Only a few fields are equipped to render HTML 
*as* HTML.  I haven't checked, but I think we'd find that all of these are 
fields such as abstract which are displayed as block elements, not inline 
fields like title and author.

And those HTML-enabled fields raise another question:  what are harvesters 
to make of metadata which are sprinkled with HTML?  Even if the harvesting 
site is using the data for display, it may not be taking any trouble to 
render embedded HTML.  If the harvesting site wants plain text (e.g. for 
searching), what will it do with the HTML pollution?

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Special characters in metadata

2017-08-13 Thread Gary Browne
This leads me to a more general question of how people handle special 
characters in the metadata, generally speaking?

Is this usually accomplished using Unicode, or are there hacks to allow HTML (I 
presume including HTML in metadata values is generally frowned upon)?

Thanks,
Gary

 
Gary Browne | Technical Manager, Developments
Online Services
University of Sydney Library
THE UNIVERSITY OF SYDNEY
Level 1, Fisher Library F03, The University of Sydney NSW 2006
T +61 2 9351 5946 | M +61 405 647 868
E gary.bro...@sydney.edu.au 

 

On 14/8/17, 10:36 am, "dspace-tech@googlegroups.com on behalf of Gary Browne" 
 wrote:

Thanks Claudia,

Unfortunately the Unicode generators only seem to strikethrough the 
individual letters, rather than whole words like the  tag (see attached 
sample).

Gary

 
Gary Browne | Technical Manager, Developments
Online Services
University of Sydney Library
THE UNIVERSITY OF SYDNEY
Level 1, Fisher Library F03, The University of Sydney NSW 2006
T +61 2 9351 5946 | M +61 405 647 868
E gary.bro...@sydney.edu.au 

 

On 11/8/17, 4:56 pm, "dspace-tech@googlegroups.com on behalf of Claudia 
Jürgen"  wrote:

Hello Gary,

try something like 
https://protect-au.mimecast.com/s/44GqB7UAq3bi1?domain=yaytext.com

Hope this helps

Claudia



Am 11.08.2017 um 01:38 schrieb Gary Browne:
> Hi all,
>
> We have an item submitted where the title has strikethrough text in 
one of
> the title words - how can this be represented in item metadata?
>
> I tried adding Blah to the title in the metadata (dc.title), 
but it
> doesn't parse the HTML.
>
> Thanks,
> Gary
>

--
Claudia Juergen
Eldorado

Technische Universität Dortmund
Universitätsbibliothek
Vogelpothsweg 76
44227 Dortmund

Tel.: +49 231-755 40 43
Fax: +49 231-755 40 32
claudia.juer...@tu-dortmund.de
www.ub.tu-dortmund.de

Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. 
Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für 
diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und 
vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen 
ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform 
(mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen 
Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is 
confidential. It is solely intended for the recipient. If you are not the 
intended recipient of this e-mail please contact the sender and delete this 
message. Thank you. Without prejudice of e-mail correspondence, our statements 
are only legally binding when they are made in the conventional written form 
(with personal signature) or when such documents are sent by fax.

-- 
You received this message because you are subscribed to a topic in the 
Google Groups "DSpace Technical Support" group.
To unsubscribe from this topic, visit 
https://protect-au.mimecast.com/s/drxzBeuAMkLid?domain=groups.google.com.
To unsubscribe from this group and all its topics, send an email to 
dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at 
https://protect-au.mimecast.com/s/Db1pBJUAoLxi4?domain=groups.google.com.
For more options, visit 
https://protect-au.mimecast.com/s/87W8BlUdoYOtE?domain=groups.google.com.



-- 
You received this message because you are subscribed to a topic in the 
Google Groups "DSpace Technical Support" group.
To unsubscribe from this topic, visit 
https://protect-au.mimecast.com/s/drxzBeuAMkLid?domain=groups.google.com.
To unsubscribe from this group and all its topics, send an email to 
dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at 
https://protect-au.mimecast.com/s/Db1pBJUAoLxi4?domain=groups.google.com.
For more options, visit 
https://protect-au.mimecast.com/s/87W8BlUdoYOtE?domain=groups.google.com.


-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: [dspace-tech] Special characters in metadata

2017-08-13 Thread Gary Browne
Thanks Claudia,

Unfortunately the Unicode generators only seem to strikethrough the individual 
letters, rather than whole words like the  tag (see attached sample).

Gary

 
Gary Browne | Technical Manager, Developments
Online Services
University of Sydney Library
THE UNIVERSITY OF SYDNEY
Level 1, Fisher Library F03, The University of Sydney NSW 2006
T +61 2 9351 5946 | M +61 405 647 868
E gary.bro...@sydney.edu.au 

 

On 11/8/17, 4:56 pm, "dspace-tech@googlegroups.com on behalf of Claudia Jürgen" 
 
wrote:

Hello Gary,

try something like 
https://protect-au.mimecast.com/s/YZ5qB0u86MoSO?domain=yaytext.com

Hope this helps

Claudia



Am 11.08.2017 um 01:38 schrieb Gary Browne:
> Hi all,
>
> We have an item submitted where the title has strikethrough text in one of
> the title words - how can this be represented in item metadata?
>
> I tried adding Blah to the title in the metadata (dc.title), but it
> doesn't parse the HTML.
>
> Thanks,
> Gary
>

--
Claudia Juergen
Eldorado

Technische Universität Dortmund
Universitätsbibliothek
Vogelpothsweg 76
44227 Dortmund

Tel.: +49 231-755 40 43
Fax: +49 231-755 40 32
claudia.juer...@tu-dortmund.de
www.ub.tu-dortmund.de

Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie 
ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese 
E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und 
vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen 
ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform 
(mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen 
Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is confidential. It 
is solely intended for the recipient. If you are not the intended recipient of 
this e-mail please contact the sender and delete this message. Thank you. 
Without prejudice of e-mail correspondence, our statements are only legally 
binding when they are made in the conventional written form (with personal 
signature) or when such documents are sent by fax.

-- 
You received this message because you are subscribed to a topic in the 
Google Groups "DSpace Technical Support" group.
To unsubscribe from this topic, visit 
https://protect-au.mimecast.com/s/K410BnUvLlXs9?domain=groups.google.com.
To unsubscribe from this group and all its topics, send an email to 
dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at 
https://protect-au.mimecast.com/s/e4MrBZUn0N4Uw?domain=groups.google.com.
For more options, visit 
https://protect-au.mimecast.com/s/W91ABZiElz6s1?domain=groups.google.com.



-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Special characters in metadata

2017-08-10 Thread Claudia Jürgen

Hello Gary,

try something like https://yaytext.com/strike/

Hope this helps

Claudia



Am 11.08.2017 um 01:38 schrieb Gary Browne:

Hi all,

We have an item submitted where the title has strikethrough text in one of
the title words - how can this be represented in item metadata?

I tried adding Blah to the title in the metadata (dc.title), but it
doesn't parse the HTML.

Thanks,
Gary



--
Claudia Juergen
Eldorado

Technische Universität Dortmund
Universitätsbibliothek
Vogelpothsweg 76
44227 Dortmund

Tel.: +49 231-755 40 43
Fax: +49 231-755 40 32
claudia.juer...@tu-dortmund.de
www.ub.tu-dortmund.de

Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist 
ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese 
E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und 
vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen 
ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform 
(mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen 
Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is confidential. It is 
solely intended for the recipient. If you are not the intended recipient of 
this e-mail please contact the sender and delete this message. Thank you. 
Without prejudice of e-mail correspondence, our statements are only legally 
binding when they are made in the conventional written form (with personal 
signature) or when such documents are sent by fax.

--
You received this message because you are subscribed to the Google Groups "DSpace 
Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.