Hi Adrien,

this looks very interesting - I'm happy to see your work and I briefly
looked into your sources and API. With your 440 000 images, do you
have any clear idea about the accuracy of ORB? To explain: I'm working
on Elog.io, which provides a *similar* service and API[1] as yours,
but uses a rather different algorithm and store, and a different use
case. Our algorithm is a variant of a Blockhash[2] algorithm, which
does not do any feature detection at all, but which can easily run in
a browser or mobile platform (we have versions for JavaScript, C and
Python) to generate 256 bit hashes of images. With a hamming distance
calculation, we then determine the quality of a match.

We work primarily on a use case of verbatim use, with a user getting
images from Wikimedia and re-using them elsewhere. Algorithms without
feature detection give very bad results for any modifications to an
image, like rotating, cropping, etc. But since that's not within our
use case, it works, though the flip side of if them is of course that
you can't expect to photograph something (a newspaper article with an
image for instance) and then match it against a set of images as you
expect to be able to do.

The other difference is that our database store isn't specifically
tailored to our hashes: we use W3C Media Annotations to store any kind
of metadata about images, and could equally well store your ORB
signatures assuming they can be serialised.

To give you some numbers, for our use cases (verbatim use, potentially
with format change jpg->png etc, and scaling down to 100px width) we
can successfully match ca 87% of cases, and we have a collision rate
(different images resulting in same or near same hashes) of ca 1,2%.
Both numbers against the Wikimedia Commons set.

While we currently have the full ~22M images from Wikimedia Commons in
our database, we're still ironing out the kinks of the system and
making some additional improvements. If you think that we should
consider ORB instead of or in addition to our current algorithms, we'd
love to give that a try, and it'd obviously be very interesting if we
could end up having compatible signatures compared to your database.

Sincerely,
Jonas

[1] http://docs.cmcatalog.apiary.io
[2] http://blockhash.io






Jonas

On 24 November 2014 at 11:25, Adrien Maglo <[email protected]> wrote:
> Hello,
>
>
> I am not sure this is the right mailing list to introduce this project but I
> have just released Displee. It is a small Android app that allows to search
> for images in the English Wikipedia by taking pictures:
> https://play.google.com/store/apps/details?id=org.visualink.displee
> It is a kind of open source Google Goggles for images from en.wikipedia.org.
>
> I have developed Displee as a demonstrator of Pastec http://pastec.io, my
> open source image recognition index and search engine for mobile apps.
> The index hosted on my server in France currently contains about 440 000
> images. They may not be the most relevant ones but this is a start. ;-)
> I have also other ideas to improve this tiny app if it has an interest for
> the community.
>
> Displee source code (MIT) is available here:
> https://github.com/Visu4link/displee
> Pastec source code (LGPL) is available here:
> https://github.com/Visu4link/pastec
> The source code of the Displee back-end is not released yet. It is basically
> a python3 Django application.
>
> I will be glad to receive your feedback and answer any question!
>
> Best regards,
>
>
> --
> Adrien Maglo
> Pastec developer
> http://www.pastec.io
> +33 6 27 94 34 41
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jonas Öberg, Founder & Shuttleworth Foundation Fellow
Commons Machinery | [email protected]
E-mail is the fastest way to my attention

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to