A successful execution of update-discovery-index -b with the proper LANG
environment variable (pt_BR, UTF-8) seems to have fixed the issue.
Ats,
Alcides Carlos de Moraes Neto
Sometimes I think we're alone. Sometimes I think we're not. In either
case, the thought is staggering.
- R. Buckminster
As I suspected, it's the SOLR index that's messed up.
Executing this SOLR query:
I ran update-discovery-index -f, but the results still show encoding issues.
http://www2.senado.leg.br/bdsf/discover?filtertype_0=typefilter_relational_operator_0=notequalsfilter_0=not%C3%ADcia+de+jornalsubmit_apply_filter=Aplicarquery=andr%C3%A9+luiz+lopes+de+alcantara
I'm stumped right now,
Just a follow up.
filter-media -f seems to have fixed the issue with the OCR txt.
But some search results still show encoding issues.
I believe I need to regenerate the solr index.
Ats,
Alcides Carlos de Moraes Neto
Sometimes I think we're alone. Sometimes I think we're not. In either
case, the
Hi Alcides,
We edit the Tomcat file server.xml to force UTF-8:
Connector port=8080 protocol=HTTP/1.1
connectionTimeout=2
*URIEncoding=UTF-8*
redirectPort=8443 /
Att,
Tiago R. M. Murakami
Comunicação Científica e Acadêmica
Departamento Técnico -
Hello helix, thank you for your input.
Indeed, it is a problem with the filter-media generated txt.
A filter-media -f resolved the issue for this specific item. I scheduled a
full filter-media -f of the repository tonight.
Ats,
Alcides Carlos de Moraes Neto
2013/9/3 helix84
Thank you all,
Tomcat is set to URIEncoding=UTF-8, so that's not the issue.
I'm suspecting that filter-media is generating invalid .txt but haven't
found anything yet.
Ats,
Alcides Carlos de Moraes Neto
2013/9/3 Tiago Rodrigo Marçal Murakami tiago.murak...@dt.sibi.usp.br
Hi Alcides,
We
On Tue, Sep 3, 2013 at 1:24 AM, Alcides Carlos de Moraes Neto
alcides.n...@gmail.com wrote:
I have checked the .txt media-filter generates, they are all UTF-8.
What I see (see attachment) looks like double-encoded UTF-8 (it
happens when a charset converter is told that a file is to be encoded
Hello all,
We have this problem with our current dspace 3.1 installation. Discovery
search results show some invalid characters due to encoding issues.
Only the full text search/highlight portion of the results has this
problem. Example:
Is it a problem that the extension is pdf.txt ?
On Mon, 2013-09-02 at 20:24 -0300, Alcides Carlos de Moraes Neto wrote:
Hello all,
We have this problem with our current dspace 3.1 installation.
Discovery search results show some invalid characters due to encoding
issues.
Only the full
10 matches
Mail list logo