Re: [Tracker] Desktop Crawler's feature comparison

Jamie McCracken Thu, 06 Mar 2008 07:55:32 -0800

On Thu, 2008-03-06 at 01:44 +0000, John Smith wrote:
> Hi,
> 
>  I am trying to make an updated (and sometimes the documentation or
>  webpages are outdated) comparision table in terms of features for the
>  following desktop search tools: Beagle, Tracker, Recoll, Strigi and
>  Jindex. Which would then be added to wikipedia.
> 
>  I would ask your help to tell me what features are implemented in your
>  tool (1) or are foreseen in the future... these are just a couple of
>  Yes or No questions, so it's brief.
>  (1) Note that I'm sending this email (using BCC) to all the
>  corresponding tool's mailing list or developers.
> 
>  I think having this information would be good for users and developers
>  since there are already several desktop crawlers available.
>  It would be nice if your website maintainer added this information
>  (maybe in the form of a table) in your Features or FAQ section.
>  This list can also be seen as ideas for possible features to be added.
> 
>  Thank you for your consideration.
> 
>  PS: I'm aware that the data crawler uses different backends for the
>  different file types, in that case, please refer the backend when
>  appropriate. For example, "PDF indexing capabilities is limited by
>  xpdf. It does not recognize words with hyphes."
> 
>  01) Regular expressions (e.g.: com*on st?ff [A-F] (this | that))


via our rdf query - yes
also future xesam implementation will do this too

>  02) Boolean operators (+and -not)

they are anded by default - we have an expression tree patch to do other
booleans (I have asked for this patch to applied to trunk so the answer
is effectively yes)


>  03) Searching non-alphanumeric characters, maybe through the use of
>  backslash (e.g. := + ? { ] &)

nope - no point searching them (we always filter them out)


>  04) Exact sentences using double quotes (support for line breaks?
>  hyphenization? text in columns?)

Exact and precise phrases will be supported shortly (it will be
case-insensitive but otherwise precise including non-alphanumerics)


>  05) tex, pdf and ps (index sentences correctly even when text is
>  organized in columns or uses hyphens; this is common in scientific
>  articles using the pdf format)

the extractor removes tables fro pdf so they should be correct

>  06) Different encoding and languages (ascii, utf8, japanese, etc)

everything is converted to utf-8. non-utf8 needs user locales set up
appropriately so that data can be successfully converted to utf-8 

>  07) Index archive files (tar, bz2, rar, 7zp, etc) recursively

nope but will probably do so soon

>  08) Index simultaneously with and without stemming (for example,
>  flooring, floors, floored would all be transformed to floor)

yes

>  09) Use of tags to better organize data (allows the user to have collections)

yes

>  10) Restrict search to specific directories or tags

yes

>  11) Provide thumbnails for images and video (allow specifying number
>  of thumbnails for video and time interval between thumbs)

yes

>  12) Image and video content search (something like imgseek... maybe
>  better or maybe it could use it as backend)

dunno what you mean? tags and metadata is extracted from them

>  13) Index removable media (making possible to index and organize data
>  in dvds or external hard drives)


we have partial patch for this but needs more work 

>  14) Databases supported

sqlite

>  15) Allow having different databases catalogs (usefull for searching
>  collection of external devices)

yes

>  16) Checksum (allows finding duplicate files)

we have db support for this but has not been fully implemented as its
not been necessary

>  17) Other aspects worthy of mention

metadata store - can store user objects and index their metadata by
using tracker as the primary storage



_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Desktop Crawler's feature comparison

Reply via email to