On Thu, 2008-03-06 at 01:44 +0000, John Smith wrote:
> Hi,
>
> I am trying to make an updated (and sometimes the documentation or
> webpages are outdated) comparision table in terms of features for the
> following desktop search tools: Beagle, Tracker, Recoll, Strigi and
> Jindex. Which would then be added to wikipedia.
>
> I would ask your help to tell me what features are implemented in your
> tool (1) or are foreseen in the future... these are just a couple of
> Yes or No questions, so it's brief.
> (1) Note that I'm sending this email (using BCC) to all the
> corresponding tool's mailing list or developers.
>
> I think having this information would be good for users and developers
> since there are already several desktop crawlers available.
> It would be nice if your website maintainer added this information
> (maybe in the form of a table) in your Features or FAQ section.
> This list can also be seen as ideas for possible features to be added.
>
> Thank you for your consideration.
>
> PS: I'm aware that the data crawler uses different backends for the
> different file types, in that case, please refer the backend when
> appropriate. For example, "PDF indexing capabilities is limited by
> xpdf. It does not recognize words with hyphes."
>
> 01) Regular expressions (e.g.: com*on st?ff [A-F] (this | that))
via our rdf query - yes
also future xesam implementation will do this too
> 02) Boolean operators (+and -not)
they are anded by default - we have an expression tree patch to do other
booleans (I have asked for this patch to applied to trunk so the answer
is effectively yes)
> 03) Searching non-alphanumeric characters, maybe through the use of
> backslash (e.g. := + ? { ] &)
nope - no point searching them (we always filter them out)
> 04) Exact sentences using double quotes (support for line breaks?
> hyphenization? text in columns?)
Exact and precise phrases will be supported shortly (it will be
case-insensitive but otherwise precise including non-alphanumerics)
> 05) tex, pdf and ps (index sentences correctly even when text is
> organized in columns or uses hyphens; this is common in scientific
> articles using the pdf format)
the extractor removes tables fro pdf so they should be correct
> 06) Different encoding and languages (ascii, utf8, japanese, etc)
everything is converted to utf-8. non-utf8 needs user locales set up
appropriately so that data can be successfully converted to utf-8
> 07) Index archive files (tar, bz2, rar, 7zp, etc) recursively
nope but will probably do so soon
> 08) Index simultaneously with and without stemming (for example,
> flooring, floors, floored would all be transformed to floor)
yes
> 09) Use of tags to better organize data (allows the user to have collections)
yes
> 10) Restrict search to specific directories or tags
yes
> 11) Provide thumbnails for images and video (allow specifying number
> of thumbnails for video and time interval between thumbs)
yes
> 12) Image and video content search (something like imgseek... maybe
> better or maybe it could use it as backend)
dunno what you mean? tags and metadata is extracted from them
> 13) Index removable media (making possible to index and organize data
> in dvds or external hard drives)
we have partial patch for this but needs more work
> 14) Databases supported
sqlite
> 15) Allow having different databases catalogs (usefull for searching
> collection of external devices)
yes
> 16) Checksum (allows finding duplicate files)
we have db support for this but has not been fully implemented as its
not been necessary
> 17) Other aspects worthy of mention
metadata store - can store user objects and index their metadata by
using tracker as the primary storage
_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list