Re: [dspace-tech] Re: [Dspace-tech] Media Filter does not work with non-english documents (?) SOLVED

2018-10-14 Thread Alan Orth
Hi,

If you're using Tomcat 7, make sure your HTTP connector is using UTF-8 in
server.xml (this is the default in Tomcat 8):

 wrote:

> Hi Dimitris,
>
> can you elaborate on "changing the system locale to UTF-8"?
> When I view bitstream of extracted text in a browser I cannot see special
> characters (e.g. Polish), but the search seems to work ok. I was wondering
> if that's expected or there is some way to set up text extraction to be in
> line with UTF-8 encoding. I use DSpace 6.3.
>
> Best, Peter
>
> W dniu poniedziałek, 24 sierpnia 2015 19:14:16 UTC+1 użytkownik Dimitrios
> A. Koutsomitropoulos napisał:
>>
>>
>>
>> It simply was a matter of changing the system locale to UTF-8. Of course
>> verbose output is still not readable (no ??? this time though), but
>> searching works ok.
>>
>> Many thanks,
>>
>> Dimitris
>>
>> > -Original Message-
>> > From: Stuve, David H [mailto:david...@hp.com]
>> > Sent: Wednesday, November 17, 2004 9:22 PM
>> > To: Dimitrios A. Koutsomitropoulos; dspac...@lists.sourceforge.net
>> > Subject: RE: [Dspace-tech] Media Filter does not work with
>> > non-english documents (?)
>> >
>> > Hi Dimitrios,
>> >
>> > Have you tried searching for Greek words that should be
>> > extracted and can't find them?  It is possible that the text
>> > extraction is working properly and the -verbose flag just
>> > isn't printing the extracted text correctly.  However, if
>> > search isn't finding the terms that should be there, then
>> > there is probably a bug in the encoding used by the filtering code.
>> >
>> > Dave
>> >
>> > -Original Message-
>> > From: dspace-t...@lists.sourceforge.net
>> > [mailto:dspace-t...@lists.sourceforge.net] On Behalf Of
>> > Dimitrios A. Koutsomitropoulos
>> > Sent: Monday, November 15, 2004 11:40 AM
>> > To: dspac...@lists.sourceforge.net
>> > Subject: [Dspace-tech] Media Filter does not work with
>> > non-english documents (?)
>> >
>> >
>> >
>> >
>> > Hello,
>> >
>> > I notice that the media filter facility and particulary the
>> > pdf and MS Word filtering does not work well with Greek documents.
>> >
>> > When executing filter-media in verbose mode I get a series of
>> > questionmarks
>> > () while english text shows correctly.
>> >
>> > I 've tried to run MediaFilterManager with the
>> > -Dfile.encoding = UTF-8 parameter but still...
>> >
>> > I also changed the PDFFilter.java method getDestinationStream to:
>> >
>> >
>> > public InputStream getDestinationStream(InputStream source)
>> > throws Exception
>> > {
>> > // get input stream from bitstream
>> > // pass to filter, get string back
>> > PDFTextStripper pts = new PDFTextStripper();
>> > PDFParser parser = new PDFParser(source);
>> >
>> > parser.parse();
>> >
>> > COSDocument cos = parser.getDocument();
>> >
>> > String extractedText = new String("UTF-8");
>> > String extractedText = pts.getText(parser.getDocument());
>> > extractedText = new String(extractedText.getBytes("UTF-8"),
>> > "UTF-8");
>> >
>> > // now close the pdf
>> > cos.close();
>> >
>> > // if verbose flag is set, print out extracted text
>> > // to STDOUT
>> > if( MediaFilterManager.isVerbose )
>> >{
>> > System.out.println(extractedText);
>> >}
>> >
>> > // generate an input stream with the extracted text
>> > byte[] textBytes = extractedText.getBytes("UTF-8");
>> > ByteArrayInputStream bais = new
>> > ByteArrayInputStream(textBytes);
>> >
>> > return bais;  // will this work? or will the byte
>> > array be out of scope?
>> > }
>> >
>> > But no luck.
>> >
>> >
>> > Is this the expected behavior or is there a workaround?
>> >
>> >
>> > Many thanks,
>> >
>> >
>> > Dimitrios A. Koutsomitropoulos, M.Sc.
>> >
>> > Computer & Informatics Engineer
>> > Postgraduate Researcher
>> > High Performance Information Systems Laboratory
>> >
>> >  Contact
>> >  e-mail: kots...@hpclab.ceid.upatras.gr
>> >  work:  +30 2610 993805
>> >  fax:+30 2610 997706
>> >  http://www.hpclab.ceid.upatras.gr
>> >
>> >
>> >
>> >
>> > ---
>> > This SF.Net email is sponsored by: InterSystems CACHE FREE
>> > OODBMS DOWNLOAD - A multidimensional database that combines
>> > robust object and relational technologies, making it a
>> > perfect match for Java,
>> > C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
>> > ___
>> > DSpace-tech mailing list
>> > dspac...@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/dspace-tech
>> >
>> >
>>
>>
>>
>> --
> All messages to this mailing list should adhere to the DuraSpace Code of
> Conduct: https://duraspace.org/about/policies/code-of-conduct/
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from 

Re: [dspace-tech] Thumbnails and sRGB

2018-10-14 Thread Alan Orth
Hello,

The CMYK support is only related to correct handling of colors when
creating JPG thumbnails from CMYK PDFs. If you set the cmyk_profile and
srgb_profile options in the DSpace configuration, filter-media will attempt
to detect if a PDF is using the CMYK color space. This does not affect PDFs
that are using the sRGB color space.

To test whether it's working you need to find a PDF that is using the CMYK
color space. Here is one on our repository[0]:

$ identify alc_contrastes_desafios.pdf\[0\]
alc_contrastes_desafios.pdf[0]=>alc_contrastes_desafios.pdf PDF 612x792
612x792+0+0 16-bit ColorSeparation CMYK 1.8491MiB 0.090u 0:00.080

Without this filter-media patch the colors in the resulting thumbnail were
horrible and inaccurate (our editors even noticed)!

Hope that helps,

[0] https://hdl.handle.net/10568/51999

On Sun, Oct 14, 2018 at 12:30 AM admin  wrote:

> Hi,
>
> I enabled:
> org.dspace.app.mediafilter.ImageMagickThumbnailFilter.cmyk_profile =
> /usr/share/color/icc/ghostscript/default_cmyk.icc
> org.dspace.app.mediafilter.ImageMagickThumbnailFilter.srgb_profile =
> /usr/share/color/icc/ghostscript/default_rgb.icc
>
> 1. should I expect now that all generated thumbnails will have sRGB
> profile embedded?
> 2. what actually does
> org.dspace.app.mediafilter.ImageMagickThumbnailFilter.cmyk_profile? It
> embeds CMYK profile into a thumbnail? I'd rather thumbnails have only sRGB
> profile, should I disable CMYK filter then? If so, will CMYK files have
> sRGB thumbnail generated anyways?
>
> However, having these two lines uncommented in local.cfg I noticed that
> graphic profiles are not embedded into thumbnails after running
> filter-media -f command. The paths seem to be valid.
>
>
> Peter
>
> --
> All messages to this mailing list should adhere to the DuraSpace Code of
> Conduct: https://duraspace.org/about/policies/code-of-conduct/
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dspace-tech+unsubscr...@googlegroups.com.
> To post to this group, send email to dspace-tech@googlegroups.com.
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Alan Orth
alan.o...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Need to delete records on OAI

2018-10-14 Thread admin
Did you try running [dspace]/bin/dspace oai import -c ?
I had many old records, with old handle, and this command helped to clear 
them.
See: https://wiki.duraspace.org/display/DSDOC6x/OAI+2.0+Server

Peter

W dniu piątek, 13 kwietnia 2018 19:51:41 UTC+1 użytkownik Aprendiz 2018 
napisał:
>
> Hello Andrea, I have a problem with the DSPACE and the OAI-PHM, similar to 
> the one of Demmis, it is not possible to update the records in the OAI 
> ../oai/request?verb=ListSets, It shows me the old configuration and old 
> records (33) Now they are 97. As I show you the output of the command:
> # curl 'http://localhost:8080/solr/oai/select?q=*:*=true=0'
> 
> 
>
> 
>   0
>   0
>   
> *:*
> true
> 0
>   
> 
> 
> 
> 
>
>
> A help please,
> Thanks
>
> El lunes, 8 de febrero de 2016, 23:15:17 (UTC-5), Andrea Schweer escribió:
>>
>> Hi,
>>
>> I'm really not sure what's going on. Your log shows DSpace posting 
>> information to Solr, which is what I'd expect. Have you checked the 
>> permissions of the Solr directory tree? It needs to be owned by the same 
>> user that Tomcat runs under ([dspace]/solr/oai and subdirectories).
>>
>> As a last resort, you could manually delete everything from that solr 
>> index, then run another import. It shouldn't really do anything different 
>> compared to clean-cache, but you never know. 
>>
>> If you run eg
>> curl 'http://localhost:8080/solr/oai/select?q=*:*=true=0'
>> then you'll see how many items the OAI solr core currently knows about -- 
>> look for numFound. numFound should be equal to the number of items (this 
>> might include non-public items and deleted items) after running import.
>>
>> You can run
>> curl --globoff http://localhost:8080/solr/oai/update -H "Content-Type: 
>> text/xml" --data-binary '*:*'
>> followed by
>> curl --globoff 'http://localhost:8080/solr/oai/update?commit=true'
>> to force-delete everything in that solr index (you might want to try this 
>> on a test server first).
>>
>> numFound in the first query above should then be 0. Run clean-cache and 
>> access your web interface *without* running the import command -- it too 
>> should tell you that there are 0 identifiers. If it still claims that there 
>> are items, you may have a web cache layer in front of your repository or 
>> something? If you do get 0 identifiers via the web interface after the 
>> delete, run the import command and check again. Hopefully you'll then see 
>> exactly those items you're expecting to see.
>>
>> Just a very final comment -- what are you looking at in the OAI 
>> interface? I just followed the links in your initial e-mail again and the 
>> OAI interface now shows 30 records/identifiers, the same number of items as 
>> shown in JSPUI. So perhaps your earlier steps actually did work and you're 
>> just looking at an out-of-date web page for some reason?
>>
>> cheers,
>> Andrea
>>
>> On 09/02/16 16:48, Demis Estabridis wrote:
>>
>> Hi Andrea, 
>>
>> Thanks again for your answer.
>> I can confirm that we're using SOLR as database source for OAI.
>>
>> We run again the suggested commands and we had no errors on the logs but 
>> no new result. Nothing changed. 
>>
>> /opt/dspace/bin/dspace oai clean-cache
>> /opt/dspace/bin/dspace oai import –c
>>
>> Please, find here the logs with debug mode on.
>> We did it with the tomcat user:
>>
>> 2016-02-08 22:24:29,115 INFO  org.dspace.core.ConfigurationManager @ 
>> Loading from classloader: file:/opt/dspace/config/dspace.cfg
>> 2016-02-08 22:24:29,148 INFO  org.dspace.core.ConfigurationManager @ 
>> Using dspace provided log configuration (log.init.config)
>> 2016-02-08 22:24:29,148 INFO  org.dspace.core.ConfigurationManager @ 
>> Loading: /opt/dspace/config/log4j.properties
>> 2016-02-08 22:24:33,149 DEBUG net.sf.ehcache.config.ConfigurationFactory 
>> @ Configuring ehcache from InputStream
>> 2016-02-08 22:24:33,271 DEBUG net.sf.ehcache.config.BeanHandler @ 
>> Ignoring ehcache attribute xmlns:xsi
>> 2016-02-08 22:24:33,271 DEBUG net.sf.ehcache.config.BeanHandler @ 
>> Ignoring ehcache attribute xsi:noNamespaceSchemaLocation
>> 2016-02-08 22:24:33,280 DEBUG 
>> net.sf.ehcache.config.DiskStoreConfiguration @ Disk Store Path: /tmp
>> 2016-02-08 22:24:33,306 DEBUG net.sf.ehcache.config.ConfigurationHelper @ 
>> No CacheManagerEventListenerFactory class specified. Skipping...
>> 2016-02-08 22:24:33,346 DEBUG net.sf.ehcache.config.ConfigurationHelper @ 
>> No BootstrapCacheLoaderFactory class specified. Skipping...
>> 2016-02-08 22:24:33,346 DEBUG net.sf.ehcache.config.ConfigurationHelper @ 
>> No CacheExceptionHandlerFactory class specified. Skipping...
>> 2016-02-08 22:24:35,926 INFO  org.dspace.storage.rdbms.DatabaseManager @ 
>> DBMS is 'PostgreSQL'
>> 2016-02-08 22:24:35,926 INFO  org.dspace.storage.rdbms.DatabaseManager @ 
>> DBMS driver version is '9.4.5'
>> 2016-02-08 22:24:35,966 INFO  org.dspace.storage.rdbms.DatabaseUtils @ 
>> Loading Flyway DB migrations from: filesystem:/opt/dspace/etc/postgres, 
>> 

[dspace-tech] filter-media vs. filter-media -f

2018-10-14 Thread admin
Hi,

I noticed that when I run filter-media (no params) command via cron the 
thumbnails look differently to those generated manually via filter-media -f.
I did a test and:
1. Thumbnail generated via filter-media (cron) had no frame around image 
and it was a grey-mode image (checked in Photoshop).
2. Thumbnail generated manually via filter-media -f had a frame around 
image and it was an RGB image.

Both image had no color profile embedded (for this issue I started another 
topic: https://groups.google.com/forum/#!topic/dspace-tech/xVFm0WNYYIE).

So:
1. what are the reasons of these differences and
2. how can I control resulting thumbnails features when running 
filter-media (with or without any parameters) command?

Are these two commands use different plugins?


Peter

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] Re: filter-media vs. filter-media -f

2018-10-14 Thread admin
Ok I had enabled different plugins for the same purpose (like PDFBox JPEG 
Thumbnail and ImageMagick PDF Thumbnail Generator) that probably caused 
overlapping.
Anyways, I'm still curious why the same commands but with different 
parameters give different results.

W dniu niedziela, 14 października 2018 10:14:32 UTC+1 użytkownik admin 
napisał:
>
> Hi,
>
> I noticed that when I run filter-media (no params) command via cron the 
> thumbnails look differently to those generated manually via filter-media -f.
> I did a test and:
> 1. Thumbnail generated via filter-media (cron) had no frame around image 
> and it was a grey-mode image (checked in Photoshop).
> 2. Thumbnail generated manually via filter-media -f had a frame around 
> image and it was an RGB image.
>
> Both image had no color profile embedded (for this issue I started another 
> topic: https://groups.google.com/forum/#!topic/dspace-tech/xVFm0WNYYIE).
>
> So:
> 1. what are the reasons of these differences and
> 2. how can I control resulting thumbnails features when running 
> filter-media (with or without any parameters) command?
>
> Are these two commands use different plugins?
>
>
> Peter
>

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] Moving repositories, need to modify existing handles

2018-10-14 Thread Gary Browne
Hi all,

Running DSpace 4.1 on Tomcat 7, RHEL 7.

We are moving our repository to another platform and I need to update the 
handles from our existing repository to point to the migrated items in the 
new location.

I am told I need to "recreate the current DSpace handles". Is that correct? 
How do I do that?

The new repository can give me a map of current handles against new URLs. I 
assume this is useful for doing the handle redirect/modification?

I don't know how to access the DSpace handle server and issue a modify 
command (or batch modify commands). Is there documentation somewhere on 
this, or can anyone describe the technical process please?

Thanks very much,
Gary

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.