Hi everyone, I took the day off to polish the initial work on the Lucene Search app that was born during the sweeet owncloud developer meeting.
Before issueing a merge request I would like some more feedback on the integration with the web frontend. Especially async ajax calls, as I seem to be doing something wrong: my browser is frozen while receiving an OC_Eventsource stream. Let me give you a rundown of the current state and the hacks still in use: After checking out the app from [1] it will automagically reindex your files (even encrypted files) upon a page reload. There is the first hack: currently I synchronize an indexer state table with the oc_fscache table on every web page reload (@klaas webdav accedd bypasses the indexing for speed, but requires marking changed files as dirty). Upon a page reload I meant to use an ajax call to run the indexing in the background while the connection is open. This somehow locks my browser, so I'm doing it wrong. Nevertheless, it is happyly building an index which will be used to present new search results. File deletion is also handled correctly and cleans up the lucene index now. Improvements: * We now have full text search in plain text files! * We now have full text search in HTML files by using the classes provided by Zend Lucene Search (BSD license)! * We have limited full text search in PDF files with code from [2] which lacks a proper license [3] and features a github project [4] with outdated sources ... meh. * We could use the nice lucene query language [5] but I implemented it is as similar to the current search as possible. Problems I still need to figure out how to solve: * The Zend classes for msoffice 2007 files uses ZipArchive which bypasses the OC_Filesystem layer and thus breaks indexing of encrypted files. @robin any idea? * Still no support for Open/LibreOffice, ODF, older word, rtf ... do we want to index sourcecode? * My ajax background code still locks the browser ... a progressbar on the status page woulde be nice. I tried to understand the ajax code from the gallery and calendar apps and copied some of the code to come up with somthing useable. At some point in time it stopped working and I switched to jquery ajax calls instead of Eventsource ... and I admit now I'm lost. Furthermore, I would like to start the background indexing via ajax when a file has been uploaded. * Can we somehow filter out or overwrite search results from the default search? Tedious work: * store more meta information from getID3 in the index. This would obsolete the current database based full text search. But theb I would also like to merge the current lucene search status table into the oc_fscache table. It has only one flag column, anyway. I tried to document the code and hope everything is well in place and ready for inclusion in owncloud/master. maybe disabled by default ;) so long Jörn [1] https://gitorious.org/~butonic/owncloud/butonics-owncloud/trees/lucene_search/apps/search_lucene [2] http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html [3] the "Our Philosophy" on http://www.hashbangcode.com/about states the following: 'All of the code placed onto this site has been tested to the best of our ability and resources so it should work out of the box. If you spot any problems then please let us know! You should be aware the all the code here is "use at your own risk" and we can't take any responsibility for loss of data or server downtime as a result of the code on this site.' [4] http://github.com/philipnorton42/PDFSearch -- A. Because it breaks the logical sequence of discussion Q. Why is top posting bad? _______________________________________________ Owncloud mailing list [email protected] https://mail.kde.org/mailman/listinfo/owncloud
