Dear Jörn, On 22.08.2013, at 23:49, Jörn Friedrich Dreyer wrote:
> The warnings about pdf and word are from getid3 lib and can be ignored if you > are using search lucene. It comes with special indexers for these filetypes. > > The error about not beeing able to determine the file format for txt files > also is from getid3 and might be caused by empty txt files. Can get we rid of the error messages? > Can you check if the reported txt file has 0 bytes? Can you search for a text > in the pdf or word files and see if you get any results? The text file is not empty. We have manually scheduled a re-scan of all files and this might be the reason that now *some* search terms yield results with that txt-file, we also have hits inside the PDF file. So, in principle, search_lucene does seem to do something. Is there a way to monitor what lucene is doing exactly and whether it has already indexed a particular file at all? However, simple matching of file names (which should be much simpler and is really helpful if you have a nested directory structure with many files) is not nearly as good as it could be: it required the full "readme" before "readme.txt" is offered as a hit, likewise all characters of "tourismus" before "tourismus.jpg" turns up as a potential hit. Likewise "Serverraum" finds "Serverraum" in a PDF, however "server" or "raum" triggers nothing. I will not say that this is useless, but it does not compare favorably with either the Google or the Spotlight search engine - is this maybe something that is configurable? Many thanks in advance. Warm regards, Stefan > > So long > > Jörn > > > > Stefan Vollmar <[email protected]> schrieb: > Hello, > > we seem to have problems with indexing files - this apparently works well for > some files and does not for others - so far we have not worked out a pattern. > > uname -a > Linux owncloud 3.5.0-39-generic #60~precise1-Ubuntu > > ownCloud 5.0.10 > > Error messages in /owncloud/data/owncloud.log (see below) seem to suggest > that the file type of simple ".txt" files could not be determined? These > days, I would also expect indexing of PDF data - but a failure to index > ".txt"-files definitely sound like a bug, right? > > Many thanks in advance. > > Best regards, > Stefan > > {"app":"PHP","message":"iconv(): Detected an illegal character in input > string at > \/var\/www\/owncloud\/apps\/search_lucene\/3rdparty\/Zend\/Search\/Lucene\/Analysis\/Analyzer\/Common\/TextNum.php#58","level":2,"time":"2013-08-22T20:00:07+00:00"} > {"app":"PHP","message":"Only variables should be passed by reference at > \/var\/www\/owncloud\/apps\/search_lucene\/lib\/indexer.php#163","level":2,"time":"2013-08-22T20:02:33+00:00"} > > {"app":"search_lucene","message":"failed to extract meta information for > \/stefan\/files\/x.pdf: PDF parsing not enabled in this version of getID3() > [1.9.3-20111213]","level":2,"time":"2013-08-22T20:02:34+00:00"} > {"app":"search_lucene","message":"failed to extract meta information for > \/stefan\/files\/y.doc: MS Office (.doc, .xls, etc) parsing not enabled in > this version of getID3() > [1.9.3-20111213]","level":2,"time":"2013-08-22T20:02:55+00:00"} > {"app":"search_lucene","message":"failed to extract meta information for > \/stefan\/files\/z.txt: unable to determine file > format","level":2,"time":"2013-08-22T20:03:22+00:00"} > {"app":"search_lucene","message":"failed to extract meta information for > \/stefan\/files\/z (2).txt: unable to determine file > format","level":2,"time":"2013-08-22T20:03:42+00:00"} > > -- > Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. > _______________________________________________ > Owncloud mailing list > [email protected] > https://mail.kde.org/mailman/listinfo/owncloud -- Dr. Stefan Vollmar, Dipl.-Phys. Head of IT group Max-Planck-Institut für neurologische Forschung Gleuelerstr. 50, 50931 Köln, Germany Tel.: +49-221-4726-213 FAX +49-221-4726-298 Tel.: +49-221-478-5713 Mobile: 0160-93874279 Email: [email protected] http://www.nf.mpg.de
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Owncloud mailing list [email protected] https://mail.kde.org/mailman/listinfo/owncloud
