Re: simple questions
D Bera [EMAIL PROTECTED] writes: 4) This python/perl is needed because some of my e-mails are html-entity, quoted-printable, 8-bit, iso-8859-2, utf-8 encoded and so on. I guess this is not a problem for Beagle at all, ie it can search in any e-mail no mattter how it is encoded. What about the attached files? If you throw email files (containing single emails, maildir style) to beagle, it knows how to index emails. Also beagle will take care of the attachments itself. Sometimes there is a problem in determining the mimetype of email files, instead of message/rfc822 they are recognized as text files by our mime type sniffer. So, if you can somehow ensure that the files that are sent to beagle have the mimetype explicitly set to message/rfc822, beagle will correctly index them for you. Interesting: does this happen implicitly within the file crawler (i.e. not the other backends), or explicitly via the other backends (say the KMailQueryable)? Because if it's implicit, might as well back out whatever I'm doing for the Gnus backend and work on MIME type detection. :) P.S. I haven't really read a whole lot of the current Beagle codebase, so forgive me if the answer lies there -- I'm reading it while sending this out. -- JM Ibanez Software Architect Orange Bronze Software Labs, Ltd. Co. [EMAIL PROTECTED] http://software.orangeandbronze.com/ ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: simple questions
4) This python/perl is needed because some of my e-mails are html-entity, quoted-printable, 8-bit, iso-8859-2, utf-8 encoded and so on. I guess this is not a problem for Beagle at all, ie it can search in any e-mail no mattter how it is encoded. What about the attached files? If you throw email files (containing single emails, maildir style) to beagle, it knows how to index emails. Also beagle will take care of the attachments itself. Sometimes there is a problem in determining the mimetype of email files, instead of message/rfc822 they are recognized as text files by our mime type sniffer. So, if you can somehow ensure that the files that are sent to beagle have the mimetype explicitly set to message/rfc822, beagle will correctly index them for you. Interesting: does this happen implicitly within the file crawler (i.e. not the other backends), or explicitly via the other backends (say the KMailQueryable)? Because if it's implicit, might as well back out whatever I'm doing for the Gnus backend and work on MIME type detection. :) [Long email warning] (There is a wiki page which might be helpful http://beagle-project.org/Architecture_Overview) The tedious work of extracting data from the physical files (or embedded files in files like attachments) is done by the drones aka Filters. The work of finding data to index from who_knows_which_weird_location and maintaining state of data of some application (e.g. which mails are deleted in mbox-based Mail apps, deleted emails are not immediately deleted from the disk) is done by smart agents aka Backends. Though possible, rarely any Backend extracts indexable data from any file. They merely set up a request to the filters to index a physical file (as you have done in the Gnus backend). So, there is a generic Mail filter which can index all message/rfc822 emails. This is used by the Files backend to index any email messages it finds on the disk and by all the mail backends. The reason specialized mail backends exist is because they want to maintain/index some additional information not handled by the generic files backend. Typically the mail backends use some app some information. So, the files backend can perfectly index any email files. There are three general problems with this: 1) Sometime mimetype detection misses a valid mail file and marks it as text. We use xdgmime mimetype detection and the issue arises because some mail clients add a different than expected first line in the mail message. xdgmime checks the first line to detect mail message. 2) Once an email is found as a search result, what to do when a user clicks on it ? If the emails come from kmail or evolution or t-bird backends, beagle-search knows which application to open. For emails on the disk, I am not sure what to do. 3) Sometimes additional email-client specific information could be useful information. E.g. in the gnus backend you are writing, the information about the folder name is something that the files backend will never be able to give you. Also, if you are able to parse the .overview files (or whatever other gnus specific state files), generally they contain useful information which is useful to report to the end-user. The files backend will never be able to report such information. In a sense the mail backend is a specialized files backend. I would still encourage you to work on a gnus backend. But if you want a quick working solution, you can just use the files backend to index those email files. In case beagle is not able to detect the mimetypes properly, you can get away with writing an empty FilterGnus, subclassed from FilterMail, with AddSupportedFlavor (new FilterFlavor (file:///path/to/~/Mail/*, null, null,1)); This will force all files in ~/Mail/* to be indexed using FilterGnus and thus use FilterMail. Warning: If there is any non mail file in ~/Mail, the Mail filter might crash! You have been warned. - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: simple questions
Hi, On 4/21/07, Nagyon Almos [EMAIL PROTECTED] wrote: 1) I would like to search in some directories outside my home in which I dont have write permission. I added those directories with AddRoot. Beagle seems to ignore those files. Can I make beagle to index them right now, if yes, then how? I don't think there's any way to do it immediately; IIRC they will be processed after the first root (probably your home directory) is crawled. I'll double check on this and look into changing the behavior. I seem to remember the reason why it isn't processed until the end is to avoid unnecessary disk thrash. Since the added roots are often on the same hard drive but in a different partition, crawling them simultaneously causes a lot of disk seeks and hits the page cache pretty hard. Or does beagle need write permissions (for acls or anything)? It doesn't need write permissions, although they do provide a performance benefit (because of Beagle's use of extended attributes). 2) I am trying to make some of my own filters hence I would like to specify the extensions (in the filter xml file) with egrep-style regular expressions. Is it possible? Do you mean in the external-filters.xml file? Extensions are matched exactly there, so you'll have to provide multiple extensions to match; there's no regex matching there. 3) Does anyone have any experience with Thunderbird, Dovecot (maildir) and Beagle? I tried to look through the source but it is well beyond my programming knowledge. There is a Thunderbird backend for Beagle, but it's disabled by default due to some issues with its memory consumption. Can you give a little more detail on your setup? 4) This python/perl is needed because some of my e-mails are html-entity, quoted-printable, 8-bit, iso-8859-2, utf-8 encoded and so on. I guess this is not a problem for Beagle at all, ie it can search in any e-mail no mattter how it is encoded. It should be able to handle these. Beagle uses the GMime library to handle emails, so it should do any encoding conversions itself. What about the attached files? I don't see any files attached? Or do you mean generically files attached to emails? Beagle indexes those. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: simple questions
This should work. I never tried it but I will try it sometime and let you know. Do you see any specific error in the log files ? Are you sure those directories are added correctly - in log file of beagled, in the initial 50-60 lines, there should be a line saying Adding root / I did some experiments on this and: - If there is symbolic link pointing to such a directory, then beagle follows it and finds the results, - if I just add them with AddRoot, then beagle says: [...] Debug: Adding root: /home/tarol/test_files Debug: '' now seems to be called '/home/tarol/test_files' Debug: Loading Beagle.Util.Conf DaemonConfig from daemon.xml Debug: Loading Beagle.Util.Conf SearchingConfig from searching.xml Debug: Loading Beagle.Util.Conf NetworkingConfig from networking.xml Debug: Loading Beagle.Util.Conf WebServicesConfig from webservices.xml Debug: Parsed query 'árv' as text_query Debug: Parsed query 'árvíz' as text_query Debug: Parsed query 'árvíz*' as text_query Debug: Couldn't find path of file with name 'arvizstb.pdf' and parent 'CUJNP0ylXkSYY5ZqKOCVUw' Debug: Couldn't find path of file with name 'arvizstb.sxw' and parent 'CUJNP0ylXkSYY5ZqKOCVUw' [...] Those files contains strings that match árvíz* . Also, as Stephan pointed out, you can use a static index to index such files. I haven't tested static indices yet but this way it should work in my oppinion. 2) I am trying to make some of my own filters hence I would like to specify the extensions (in the filter xml file) with egrep-style regular expressions. Is it possible? I tried it too but this didnt work: I specified filter mimetypetext/x-vcard/mimetype extension.(vcf|vcard)/extension commandcat/command arguments%s/arguments /filter in the xml file and then beagle said: Debug: Saw event in '/home/user' Debug: *** Add '/home/user' 'szilard.vcf' (file) Debug: -file:///home/user/szilard.vcf Debug: file:///home/user/szilard.vcf Debug: No filter for /home/user/szilard.vcf (text/directory) Thanks for your help. Regards, A. Diszlexiás a gyermekem? - 25 kérdéses gyorsteszt (Hálás lesz utána, bármi is az eredmény!) http://ad.adverticum.net/b/cl,1,6022,133103,204265/click.prm ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: simple questions
Hi, On 4/23/07, Nagyon Almos [EMAIL PROTECTED] wrote: 2) I am trying to make some of my own filters hence I would like to specify the extensions (in the filter xml file) with egrep-style regular expressions. Is it possible? I tried it too but this didnt work: I specified filter mimetypetext/x-vcard/mimetype extension.(vcf|vcard)/extension commandcat/command arguments%s/arguments /filter Indeed, this won't work. You have to do: extension.vcf/extension extension.vcard/extension You can specify multiple (or zero) extension and mimetype conditions, any of which match. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: simple questions
I am sorry, I was too quick. (We seemed to post at the same time). Joe Shaw Hi, I don't think there's any way to do it immediately; IIRC they will be processed after the first root (probably your home directory) is crawled. This is not problem for me. I just asked. [..] I accidentaly deleted the text, but the point is: now beagle indexes the extra directory and find the results. (Crawling my home directory took too long). Do you mean in the external-filters.xml file? Extensions are matched exactly there, so you'll have to provide multiple extensions to match; there's no regex matching there. Yes, there. There is a Thunderbird backend for Beagle, but it's disabled by default due to some issues with its memory consumption. Can you give a little more detail on your setup? I am trying to switch to Dovecot but before that I want to make sure that it can do what I want: - allow simoultaneous access independently from the mail user agent (like pine, mutt or thunderbird) - can handle e-mails separately (maildir, just in case) - I can use beagle to search in the e-mails independently from the used MUA. It should be able to handle these. Beagle uses the GMime library to handle emails, so it should do any encoding conversions itself. This is good news. I have been switching from iso-8859-2 to utf-8 everywhere (filenames, email encodings, html encodings, tex files and so on) and this is a big pain for me. I don't see any files attached? Or do you mean generically files attached to emails? Beagle indexes those. Good to hear that. Thank you for the informations. Best wishes, A. Diszlexiás a gyermekem? - 25 kérdéses gyorsteszt (Hálás lesz utána, bármi is az eredmény!) http://ad.adverticum.net/b/cl,1,6022,133103,204265/click.prm ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
simple questions
Hello all, I have some (very) simple questions about beagle and I could not find anything about them. 1) I would like to search in some directories outside my home in which I dont have write permission. I added those directories with AddRoot. Beagle seems to ignore those files. Can I make beagle to index them right now, if yes, then how? Or does beagle need write permissions (for acls or anything)? 2) I am trying to make some of my own filters hence I would like to specify the extensions (in the filter xml file) with egrep-style regular expressions. Is it possible? 3) Does anyone have any experience with Thunderbird, Dovecot (maildir) and Beagle? I tried to look through the source but it is well beyond my programming knowledge. But if there were egrep regex extension support, then I could write my very own filter for those files using python or perl libraries... 4) This python/perl is needed because some of my e-mails are html-entity, quoted-printable, 8-bit, iso-8859-2, utf-8 encoded and so on. I guess this is not a problem for Beagle at all, ie it can search in any e-mail no mattter how it is encoded. What about the attached files? Thanks for your help in advance. Regards, A. ___ Tudod már, hol nyaralsz? Úti beszámolókat, utazási ajánlatokat találsz a travelline.hu-n! Ha Te is megírod élményeidet, megnyerhetsz egy párizsi utat vagy egy kalandtúrát az Alpokba! http://ad.adverticum.net/b/cl,1,6022,159388,223268/click.prm ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: simple questions
Hi, Nagyon Almos wrote: 1) I would like to search in some directories outside my home in which I dont have write permission. I suggest to build a static index for these directories: http://beagle-project.org/Static_Indexes I use this for a collection of ebooks outside my $HOME. Rgds, Stephan. ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers