Re: simple questions

2007-04-23 Thread JM Ibanez
D Bera [EMAIL PROTECTED] writes:

   4) This python/perl is needed because
 some of my e-mails are html-entity, quoted-printable, 8-bit,
 iso-8859-2, utf-8  encoded and so on.
 I guess this is not a problem for Beagle at all, ie it can search
 in any e-mail no mattter how it is encoded.
 What about the attached files?

 If you throw email files (containing single emails, maildir style) to
 beagle, it knows how to index emails. Also beagle will take care of
 the attachments itself. Sometimes there is a problem in determining
 the mimetype of email files, instead of message/rfc822 they are
 recognized as text files by our mime type sniffer. So, if you can
 somehow ensure that the files that are sent to beagle have the
 mimetype explicitly set to message/rfc822, beagle will correctly index
 them for you.

Interesting: does this happen implicitly within the file crawler
(i.e. not the other backends), or explicitly via the other backends (say
the KMailQueryable)? Because if it's implicit, might as well back out
whatever I'm doing for the Gnus backend and work on MIME type
detection. :)

P.S. I haven't really read a whole lot of the current Beagle codebase,
so forgive me if the answer lies there -- I'm reading it while sending
this out.

-- 
JM Ibanez
Software Architect
Orange  Bronze Software Labs, Ltd. Co.

[EMAIL PROTECTED]
http://software.orangeandbronze.com/
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: simple questions

2007-04-23 Thread Debajyoti Bera
4) This python/perl is needed because
  some of my e-mails are html-entity, quoted-printable, 8-bit,
  iso-8859-2, utf-8  encoded and so on.
  I guess this is not a problem for Beagle at all, ie it can search
  in any e-mail no mattter how it is encoded.
  What about the attached files?
 
  If you throw email files (containing single emails, maildir style) to
  beagle, it knows how to index emails. Also beagle will take care of
  the attachments itself. Sometimes there is a problem in determining
  the mimetype of email files, instead of message/rfc822 they are
  recognized as text files by our mime type sniffer. So, if you can
  somehow ensure that the files that are sent to beagle have the
  mimetype explicitly set to message/rfc822, beagle will correctly index
  them for you.

 Interesting: does this happen implicitly within the file crawler
 (i.e. not the other backends), or explicitly via the other backends (say
 the KMailQueryable)? Because if it's implicit, might as well back out
 whatever I'm doing for the Gnus backend and work on MIME type
 detection. :)

[Long email warning]

(There is a wiki page which might be helpful 
http://beagle-project.org/Architecture_Overview)

The tedious work of extracting data from the physical files (or embedded 
files in files like attachments) is done by the drones aka Filters.
The work of finding data to index from who_knows_which_weird_location and 
maintaining state of data of some application (e.g. which mails are deleted 
in mbox-based Mail apps, deleted emails are not immediately deleted from the 
disk) is done by smart agents aka Backends.

Though possible, rarely any Backend extracts indexable data from any file. 
They merely set up a request to the filters to index a physical file (as you 
have done in the Gnus backend). So, there is a generic Mail filter which can 
index all message/rfc822 emails. This is used by the Files backend to index 
any email messages it finds on the disk and by all the mail backends. The 
reason specialized mail backends exist is because they want to maintain/index 
some additional information not handled by the generic files backend. 
Typically the mail backends use some app some information.

So, the files backend can perfectly index any email files. There are three 
general problems with this:
1) Sometime mimetype detection misses a valid mail file and marks it as text. 
We use xdgmime mimetype detection and the issue arises because some mail 
clients add a different than expected first line in the mail message. xdgmime 
checks the first line to detect mail message.
2) Once an email is found as a search result, what to do when a user clicks on 
it ? If the emails come from kmail or evolution or t-bird backends, 
beagle-search knows which application to open. For emails on the disk, I am 
not sure what to do.
3) Sometimes additional email-client specific information could be useful 
information. E.g. in the gnus backend you are writing, the information about 
the folder name is something that the files backend will never be able to 
give you. Also, if you are able to parse the .overview files (or whatever 
other gnus specific state files), generally they contain useful information 
which is useful to report to the end-user. The files backend will never be 
able to report such information.

In a sense the mail backend is a specialized files backend. I would still 
encourage you to work on a gnus backend. But if you want a quick working 
solution, you can just use the files backend to index those email files. In 
case beagle is not able to detect the mimetypes properly, you can get away 
with writing an empty FilterGnus, subclassed from FilterMail, with
AddSupportedFlavor (new FilterFlavor (file:///path/to/~/Mail/*, null, 
null,1));

This will force all files in ~/Mail/* to be indexed using FilterGnus and thus 
use FilterMail. Warning: If there is any non mail file in ~/Mail, the Mail 
filter might crash! You have been warned.

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: simple questions

2007-04-23 Thread Joe Shaw
Hi,

On 4/21/07, Nagyon Almos [EMAIL PROTECTED] wrote:
   1) I would like to search in some directories outside
 my home in which I dont have write permission.
 I added those directories with AddRoot.
 Beagle seems to ignore those files.
 Can I make beagle to index them right now, if yes, then how?

I don't think there's any way to do it immediately; IIRC they will be
processed after the first root (probably your home directory) is
crawled.

I'll double check on this and look into changing the behavior.  I seem
to remember the reason why it isn't processed until the end is to
avoid unnecessary disk thrash.  Since the added roots are often on the
same hard drive but in a different partition, crawling them
simultaneously causes a lot of disk seeks and hits the page cache
pretty hard.

 Or does beagle need write permissions (for acls or anything)?

It doesn't need write permissions, although they do provide a
performance benefit (because of Beagle's use of extended attributes).

   2) I am trying to make some of my own filters hence
 I would like to specify the extensions (in the filter xml file)
 with egrep-style regular expressions. Is it possible?

Do you mean in the external-filters.xml file?  Extensions are matched
exactly there, so you'll have to provide multiple extensions to match;
there's no regex matching there.

   3) Does anyone have any experience with Thunderbird,
 Dovecot (maildir) and Beagle? I tried to look through
 the source but it is well beyond my programming knowledge.

There is a Thunderbird backend for Beagle, but it's disabled by
default due to some issues with its memory consumption.  Can you give
a little more detail on your setup?

   4) This python/perl is needed because
 some of my e-mails are html-entity, quoted-printable, 8-bit,
 iso-8859-2, utf-8  encoded and so on.
 I guess this is not a problem for Beagle at all, ie it can search
 in any e-mail no mattter how it is encoded.

It should be able to handle these.  Beagle uses the GMime library to
handle emails, so it should do any encoding conversions itself.

 What about the attached files?

I don't see any files attached?  Or do you mean generically files
attached to emails?  Beagle indexes those.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: simple questions

2007-04-23 Thread Nagyon Almos





This should work. I never tried it but I will try it sometime and let
you know. Do you see any specific error in the log files ? Are you
sure those directories are added correctly - in log file of beagled,
in the initial 50-60 lines, there should be a line saying Adding root
/
I did some experiments on this and:
- If there is symbolic link pointing to such a directory,
then beagle follows it and finds the results,
- if I just add them with AddRoot, then beagle says:
[...]
Debug: Adding root: /home/tarol/test_files
Debug: '' now seems to be called '/home/tarol/test_files'
Debug: Loading Beagle.Util.Conf DaemonConfig from daemon.xml
Debug: Loading Beagle.Util.Conf SearchingConfig from searching.xml
Debug: Loading Beagle.Util.Conf NetworkingConfig from networking.xml
Debug: Loading Beagle.Util.Conf WebServicesConfig from webservices.xml
Debug: Parsed query 'árv' as text_query
Debug: Parsed query 'árvíz' as text_query
Debug: Parsed query 'árvíz*' as text_query
Debug: Couldn't find path of file with name 'arvizstb.pdf' and parent 
'CUJNP0ylXkSYY5ZqKOCVUw'
Debug: Couldn't find path of file with name 'arvizstb.sxw' and parent 
'CUJNP0ylXkSYY5ZqKOCVUw'
[...]
Those files contains strings that match árvíz* .



Also, as Stephan pointed out, you can use a static index to index such files.

I haven't tested static  indices yet but
this way it should work in my oppinion.


   2) I am trying to make some of my own filters hence
 I would like to specify the extensions (in the filter xml file)
 with egrep-style regular expressions. Is it possible?

I tried it too but this didnt work:
I specified 
filter
  mimetypetext/x-vcard/mimetype
  extension.(vcf|vcard)/extension
  commandcat/command
  arguments%s/arguments
/filter
in the xml file and then beagle said:
Debug: Saw event in '/home/user'
Debug: *** Add '/home/user' 'szilard.vcf' (file)
Debug: -file:///home/user/szilard.vcf
Debug:  file:///home/user/szilard.vcf
Debug: No filter for /home/user/szilard.vcf (text/directory)

Thanks for your help.

Regards,
A.



Diszlexiás a gyermekem? - 25 kérdéses gyorsteszt (Hálás lesz utána, bármi is 
az eredmény!)
http://ad.adverticum.net/b/cl,1,6022,133103,204265/click.prm

___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: simple questions

2007-04-23 Thread Joe Shaw
Hi,

On 4/23/07, Nagyon Almos [EMAIL PROTECTED] wrote:
2) I am trying to make some of my own filters hence
  I would like to specify the extensions (in the filter xml file)
  with egrep-style regular expressions. Is it possible?

 I tried it too but this didnt work:
 I specified
 filter
   mimetypetext/x-vcard/mimetype
   extension.(vcf|vcard)/extension
   commandcat/command
   arguments%s/arguments
 /filter

Indeed, this won't work.  You have to do:

extension.vcf/extension
extension.vcard/extension

You can specify multiple (or zero) extension and mimetype conditions,
any of which match.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: simple questions

2007-04-23 Thread Nagyon Almos


I am sorry, I was too quick.
(We seemed to post at the same time).


Joe Shaw Hi,

I don't think there's any way to do it immediately; IIRC they will be
processed after the first root (probably your home directory) is
crawled.
This is not problem for me. I just asked.


[..]
I accidentaly deleted the text, but the point is:
now beagle indexes the extra directory and find the results.
(Crawling my home directory took too long).


Do you mean in the external-filters.xml file?  Extensions are matched
exactly there, so you'll have to provide multiple extensions to match;
there's no regex matching there.
Yes, there.


There is a Thunderbird backend for Beagle, but it's disabled by
default due to some issues with its memory consumption.  Can you give
a little more detail on your setup?
I am trying to switch to Dovecot but before that
I want to make sure that it can do what I want:
- allow simoultaneous access independently from the mail
user agent (like pine, mutt or thunderbird)
- can handle e-mails separately (maildir, just in case)
- I can use beagle to search in the e-mails independently from the
used MUA.


It should be able to handle these.  Beagle uses the GMime library to
handle emails, so it should do any encoding conversions itself.
This is good news.
I have been switching from iso-8859-2 to utf-8 everywhere (filenames, email 
encodings, html encodings, tex files and so on) and this is a big pain
for me.


I don't see any files attached?  Or do you mean generically files
attached to emails?  Beagle indexes those.
Good to hear that.


Thank you for the informations.

Best wishes,
A.


Diszlexiás a gyermekem? - 25 kérdéses gyorsteszt (Hálás lesz utána, bármi is 
az eredmény!)
http://ad.adverticum.net/b/cl,1,6022,133103,204265/click.prm

___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


simple questions

2007-04-21 Thread Nagyon Almos

Hello all,

I have some (very) simple questions about beagle
and I could not find anything about them.
  1) I would like to search in some directories outside 
my home in which I dont have write permission.
I added those directories with AddRoot.
Beagle seems to ignore those files.
Can I make beagle to index them right now, if yes, then how?
Or does beagle need write permissions (for acls or anything)?
  2) I am trying to make some of my own filters hence
I would like to specify the extensions (in the filter xml file)
with egrep-style regular expressions. Is it possible?
  3) Does anyone have any experience with Thunderbird,
Dovecot (maildir) and Beagle? I tried to look through
the source but it is well beyond my programming knowledge.
  But if there were egrep regex extension support, then  I could
write my very own filter for those files using python
or perl libraries... 
  4) This python/perl is needed because
some of my e-mails are html-entity, quoted-printable, 8-bit, 
iso-8859-2, utf-8  encoded and so on.
I guess this is not a problem for Beagle at all, ie it can search
in any e-mail no mattter how it is encoded.
What about the attached files?

Thanks for your help in advance.

Regards,
A.



___
Tudod már, hol nyaralsz? Úti beszámolókat, utazási ajánlatokat találsz a 
travelline.hu-n! Ha Te is megírod élményeidet, megnyerhetsz egy párizsi utat 
vagy egy kalandtúrát az Alpokba!
http://ad.adverticum.net/b/cl,1,6022,159388,223268/click.prm

___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: simple questions

2007-04-21 Thread Stephan Hegel
Hi,

Nagyon Almos wrote:
   1) I would like to search in some directories outside 
 my home in which I dont have write permission.
I suggest to build a static index for these directories:
 http://beagle-project.org/Static_Indexes

I use this for a collection of ebooks outside my $HOME.

Rgds,
Stephan.
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers