Re: Config option (or hack) to disable fulltext indexing?

theod Thu, 3 Dec 2009 09:32:01 +0100

If you want you can also `pause' fulltext-indexing by rewinding the
time:

$ echo "UPDATE idxINDEX SET last_updated='2100-01-01' WHEREname='fulltext'" | \

   /opt/cds-invenio/bin/dbexec
and resume it manually later. (e.g. when encoding troubles are solved)

Super! That's _exactly_ what i was looking for...


We should not be generating such errors.  What version of pdftotext
(xpdf, poppler) are you running?  Is its output UTF-8 perfect?  Is your
DB running in nice UTF-8 mode?  Can we get the test file to check if our
dev branch behaves fine?

I was running poppler 0.6.x, but after your reply, I realized thatseveral new (stable) versions have been released for that package, soI updated to 0.10.5... I'm still getting the same errors.The output of "pdftotext -enc UTF-8 input.pdf output.txt" is notperfect (some words in the exported text file are split the wrong way,probably the fact that non-latin 2byte characters are used is nottaken into consideration, but this is not your fault :). Having saidthat, adding the "-layout" switch, solves the problem. Oh, and I souldprobably mention that some of our pdf docs are simply jpg images,converted to pdf. Running pdftotext on these should probably create alot of garbage...


mysql should be ok as far as charset/collation is concerned:
character set client    utf8
character set connection        utf8
character set database  utf8
character set filesystem        binary
character set results   utf8
character set server    utf8
character set system    utf8
collation connection    utf8_unicode_ci
(Global value)  utf8_general_ci
collation database      utf8_general_ci
collation server        utf8_general_ci

btw, you are more than welcome to use the fulltext in order to performany test you wish!

Just for the history of things, I'm using Upload_Files.py websubmitfunction, so up to now, i couldn't take advantage of the template(*.tpl) files to insert 8564_u into MARC (but even without it, inveniois smart enough to figure out the related fulltext files). Having saidthat, I was recently asked to put the fulltext links in the searchpage as well, so I had to run bibdocfile --fix-marc for somecollections, so several 856s were created and after the scheduledbibindex was run, I begun to get the registered exceptions.

I'm not sure, but the exception _seems_ to be thrown only in filenamesthat contain spaces and/or greek characters... I'll be happy to giveyou any additional info/fulltext files/logs/etc you may need...


Best regards,
Theodoropoulos Theodoros

ps. The fact that ghostscript is also complaining ("GPL Ghostscript8.62: Unrecoverable error, exit code 1") should not worry me?

Re: Config option (or hack) to disable fulltext indexing?

Reply via email to