Re: Color search for images

2010-09-18 Thread Govind Kanshi
Not exactly sure how one would put context of what object is more dominant
than other.
Think of landscape with snow, green mountains and set of flowers of varied
colors including a rose

On Fri, Sep 17, 2010 at 8:43 PM, Shashi Kant sk...@sloan.mit.edu wrote:

 
  What I am envisioning (at least to start) is have all this add two fields
 in
  the index.  One would be for color information for the color similarity
  search.  The other would be a simple multivalued text field that we put
  keywords into based on what OpenCV can detect about the image.  If it
  detects faces, we would put face into this field.  Other things that it
  can detect would result in other keywords.
 
  For the color search, I have a few inter-related hurdles.  I've got to
  figure out what form the color data actually takes and how to represent
 it
  in Solr.  I need Java code for Solr that can take an input color value
 and
  find similar values in the index.  Then I need some code that can go in
 our
  feed processing scripts for new content.  That code would also go into a
  crawler script to handle existing images.
 

 You are on the right track. You can create a set of representative
 keywords from the image. OpenCV  gets a color histogram from the image
 - you can set the bin values to be as granular as you need, and create
 a look-up list of color names to generate a MVF representative of the
 image.
 If you want to get more sophisticated, represent the colors with
 payloads in correlation with the distribution of the color in the
 image.

 Another approach would be to segment the image and extract colors from
 each. So if you have a red rose with all white background, the textual
 representation would be something like:

 white, white...red...white, white

 Play around and see which works best.

 HTH



Re: Color search for images

2010-09-17 Thread Dennis Gearon
Sounds like someone is/has going to say/said:

Make it so, number one

There are some good links off of this article about the color Magenta, (like, 
uh, who knows what 'cyan' or 'magenta' are anyway? So I looked it up. Refilling 
my printer cartidges required an explanation.)

http://en.wikipedia.org/wiki/Magenta


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 9/16/10, Shawn Heisey elyog...@elyograg.org wrote:

 From: Shawn Heisey elyog...@elyograg.org
 Subject: Re: Color search for images
 To: solr-user@lucene.apache.org
 Date: Thursday, September 16, 2010, 7:58 PM
  On 9/16/2010 7:45 AM, Shashi Kant
 wrote:
  Lire is a nascent effort and based on a cursory
 overview a while back,
  IMHO was an over-simplified version of what a CBIR
 engine should be.
  They use CEDD (color  edge descriptors).
  Wouldn't work for the kind of applications I am
 working on - which
  needs among other things, Color, Shape, Orientation,
 Pose, Edge/Corner
  etc.
  
  OpenCV has a steep learning curve, but having been
 through it, is very
  powerful toolkit - the best there is by far! BTW the
 code is in C++,
  but has both Java  .NET bindings.
  This is a fabulous book to get hold of:
  http://www.amazon.com/Learning-OpenCV-Computer-Vision-Library/dp/0596516134,
  if you are seriously into OpenCV.
  
  Pls feel free to reach out of if you need any help
 with OpenCV +
  Solr/Lucene. I have spent quite a bit of time on
 this.
 
 What I am envisioning (at least to start) is have all this
 add two fields in the index.  One would be for color
 information for the color similarity search.  The other
 would be a simple multivalued text field that we put
 keywords into based on what OpenCV can detect about the
 image.  If it detects faces, we would put face into
 this field.  Other things that it can detect would
 result in other keywords.
 
 For the color search, I have a few inter-related
 hurdles.  I've got to figure out what form the color
 data actually takes and how to represent it in Solr.  I
 need Java code for Solr that can take an input color value
 and find similar values in the index.  Then I need some
 code that can go in our feed processing scripts for new
 content.  That code would also go into a crawler script
 to handle existing images.
 
 We can probably handle most of the development if we can
 figure out the methods and data formats.  Naturally we
 would be interested in using off-the-shelf stuff as much as
 possible.  Today I learned that our CTO has already
 been looking into OpenCV and has a copy of the O'Reilly
 book.
 
 Thanks,
 Shawn
 



Re: Color search for images

2010-09-17 Thread Shashi Kant

 What I am envisioning (at least to start) is have all this add two fields in
 the index.  One would be for color information for the color similarity
 search.  The other would be a simple multivalued text field that we put
 keywords into based on what OpenCV can detect about the image.  If it
 detects faces, we would put face into this field.  Other things that it
 can detect would result in other keywords.

 For the color search, I have a few inter-related hurdles.  I've got to
 figure out what form the color data actually takes and how to represent it
 in Solr.  I need Java code for Solr that can take an input color value and
 find similar values in the index.  Then I need some code that can go in our
 feed processing scripts for new content.  That code would also go into a
 crawler script to handle existing images.


You are on the right track. You can create a set of representative
keywords from the image. OpenCV  gets a color histogram from the image
- you can set the bin values to be as granular as you need, and create
a look-up list of color names to generate a MVF representative of the
image.
If you want to get more sophisticated, represent the colors with
payloads in correlation with the distribution of the color in the
image.

Another approach would be to segment the image and extract colors from
each. So if you have a red rose with all white background, the textual
representation would be something like:

white, white...red...white, white

Play around and see which works best.

HTH


Re: Color search for images

2010-09-16 Thread Li Li
do you mean content based image retrieval or just search images by tag?
if the former, you can try LIRE

2010/9/15 Shawn Heisey s...@elyograg.org:
  My index consists of metadata for a collection of 45 million objects, most
 of which are digital images.  The executives have fallen in love with
 Google's color image search.  Here's a search for flower with a red color
 filter:

 http://www.google.com/images?q=flowertbs=isch:1,ic:specific,isc:red

 I am interested in duplicating this.  Can this group of fine people point me
 in the right direction?  I don't want anyone to do it for me, just help me
 find software and/or algorithms that can extract the color information, then
 find a way to get Solr to index and search it.

 Thanks,
 Shawn




Re: Color search for images

2010-09-16 Thread Lance Norskog
Yes, notice the flowers are all a medium-dark crimson red. There are a 
bunch of these image-indexing  search technologies, but there is no (to 
my knowledge) finished technology- it's very much an area of research. 
If you want to search the word 'flower' and index data that can find 
blobs of red, that might be easy with public tools. But there are many 
hard problems.


Lance

Stephen Weiss wrote:

There's a project out there called LIRE (I heard about it on this list) that's 
supposed to create a lucene-based CIBR index for images.  I wonder if this 
could be integrated with Solr?  Personally I don't really care about the flower 
part, I'm more worried about searching whether the flower is red... we have 
good object keywording but not good color keywording - and color is so much 
more subjective too, red can mean a lot of things.  I'm already working on 
testing it separately but it sure would be more useful if the scoring could be 
integrated with the rest of the search index.

--
Steve

On Sep 15, 2010, at 11:56 PM, Shashi Kant wrote:

   

I'm sure there's some post doctoral types who could get a graphic shape 
analyzer, color analyzer, to at least say it's a flower.

However, even Google would have to build new datacenters to have the horsepower 
to do that kind of graphic processing.

   

Not necessarily true. Like.com - which incidentally got acquired by
Google recently - built a true visual search technology and applied it
on a large scale.
 
   


Re: Color search for images

2010-09-16 Thread Shawn Heisey

 On 9/15/2010 10:50 AM, Shashi Kant wrote:

Shawn, I have done some research into this, machine-vision especially
on a large scale is a hard problem, not to be entered into lightly. I
would recommend starting with OpenCV - a comprehensive toolkit for
extracting various features such as Color, Edge etc from images. Also
there is a project LIRE http://www.semanticmetadata.net/lire/ which
attempts to do something along what you are thinking of. Not sure how
well it works.



Lire looks promising, but how hard is it to integrate the content-based 
search into Solr as opposed to Lucene?  I myself am not a Java 
developer.  I have access to people who are, but their time is scarce.


I use DIH to populate my index, so I would have to do analysis outside 
of Solr to populate the database.  From there, I would come up with a 
new schema and DIH config to re-import either the entire index or just 
documents that have been recently updated.  I have a build system to 
handle these things on all my shards.


OpenCV looks intimidating, but potentially very useful and for most 
things would probably not require custom code in Solr.  To mention the 
most obvious capability I can find, I think many of our customers would 
love to be able to check a box to include or exclude photos with faces 
in them.


I can tell it's getting late ... I imagined a scenario similar to the 
Kohler commercial where a woman pulls out a faucet at an architect's 
office ... Design a website around #00ebc9.


Thanks,
Shawn



Re: Color search for images

2010-09-16 Thread Shashi Kant
 Lire looks promising, but how hard is it to integrate the content-based
 search into Solr as opposed to Lucene?  I myself am not a Java developer.  I
 have access to people who are, but their time is scarce.



Lire is a nascent effort and based on a cursory overview a while back,
IMHO was an over-simplified version of what a CBIR engine should be.
They use CEDD (color  edge descriptors).
Wouldn't work for the kind of applications I am working on - which
needs among other things, Color, Shape, Orientation, Pose, Edge/Corner
etc.

OpenCV has a steep learning curve, but having been through it, is very
powerful toolkit - the best there is by far! BTW the code is in C++,
but has both Java  .NET bindings.
This is a fabulous book to get hold of:
http://www.amazon.com/Learning-OpenCV-Computer-Vision-Library/dp/0596516134,
if you are seriously into OpenCV.

Pls feel free to reach out of if you need any help with OpenCV +
Solr/Lucene. I have spent quite a bit of time on this.


Re: Color search for images

2010-09-16 Thread Dennis Gearon
That's impressive!

So Google has BOUGHT some doctoral types, or highly specialized geeks,

And is looking at X number of images.

I bet the number of images on his video film library is at least several orders 
of magnitude above what Like deals with.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/15/10, Shashi Kant sk...@sloan.mit.edu wrote:

 From: Shashi Kant sk...@sloan.mit.edu
 Subject: Re: Color search for images
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 15, 2010, 8:56 PM
  I'm sure there's some post
 doctoral types who could get a graphic shape analyzer, color
 analyzer, to at least say it's a flower.
 
  However, even Google would have to build new
 datacenters to have the horsepower to do that kind of
 graphic processing.
 
 
 Not necessarily true. Like.com - which incidentally got
 acquired by
 Google recently - built a true visual search technology and
 applied it
 on a large scale.
 


Re: Color search for images

2010-09-16 Thread Dennis Gearon
LOL! now that is one of the wisest things I've seen in a while.
Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 9/16/10, Shashi Kant sk...@sloan.mit.edu wrote:

 From: Shashi Kant sk...@sloan.mit.edu
 Subject: Re: Color search for images
 To: solr-user@lucene.apache.org
 Date: Thursday, September 16, 2010, 6:36 AM
 On Thu, Sep 16, 2010 at 3:21 AM,
 Lance Norskog goks...@gmail.com
 wrote:
  Yes, notice the flowers are all a medium-dark crimson
 red. There are a bunch
  of these image-indexing  search technologies, but
 there is no (to my
  knowledge) finished technology- it's very much an
 area of research. If you
  want to search the word 'flower' and index data that
 can find blobs of red,
  that might be easy with public tools. But there are
 many hard problems.
 
 
 Lance, is there *ever* a finished technology? -)
 


Re: Color search for images

2010-09-16 Thread Shawn Heisey

 On 9/16/2010 7:45 AM, Shashi Kant wrote:

Lire is a nascent effort and based on a cursory overview a while back,
IMHO was an over-simplified version of what a CBIR engine should be.
They use CEDD (color  edge descriptors).
Wouldn't work for the kind of applications I am working on - which
needs among other things, Color, Shape, Orientation, Pose, Edge/Corner
etc.

OpenCV has a steep learning curve, but having been through it, is very
powerful toolkit - the best there is by far! BTW the code is in C++,
but has both Java  .NET bindings.
This is a fabulous book to get hold of:
http://www.amazon.com/Learning-OpenCV-Computer-Vision-Library/dp/0596516134,
if you are seriously into OpenCV.

Pls feel free to reach out of if you need any help with OpenCV +
Solr/Lucene. I have spent quite a bit of time on this.


What I am envisioning (at least to start) is have all this add two 
fields in the index.  One would be for color information for the color 
similarity search.  The other would be a simple multivalued text field 
that we put keywords into based on what OpenCV can detect about the 
image.  If it detects faces, we would put face into this field.  Other 
things that it can detect would result in other keywords.


For the color search, I have a few inter-related hurdles.  I've got to 
figure out what form the color data actually takes and how to represent 
it in Solr.  I need Java code for Solr that can take an input color 
value and find similar values in the index.  Then I need some code that 
can go in our feed processing scripts for new content.  That code would 
also go into a crawler script to handle existing images.


We can probably handle most of the development if we can figure out the 
methods and data formats.  Naturally we would be interested in using 
off-the-shelf stuff as much as possible.  Today I learned that our CTO 
has already been looking into OpenCV and has a copy of the O'Reilly book.


Thanks,
Shawn



Re: Color search for images

2010-09-15 Thread Ken Krugler


On Sep 15, 2010, at 7:59am, Shawn Heisey wrote:

My index consists of metadata for a collection of 45 million  
objects, most of which are digital images.  The executives have  
fallen in love with Google's color image search.  Here's a search  
for flower with a red color filter:


http://www.google.com/images?q=flowertbs=isch:1,ic:specific,isc:red

I am interested in duplicating this.  Can this group of fine people  
point me in the right direction?  I don't want anyone to do it for  
me, just help me find software and/or algorithms that can extract  
the color information, then find a way to get Solr to index and  
search it.


When I took at look at the search results, it seems like the word  
red shows up in the image name, or description, or tag for every  
found image.


Are you sure Google is extracting color information? Or just being  
smart about color-specific keywords found in associated text?


-- Ken

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g







Re: Color search for images

2010-09-15 Thread Shashi Kant
Shawn, I have done some research into this, machine-vision especially
on a large scale is a hard problem, not to be entered into lightly. I
would recommend starting with OpenCV - a comprehensive toolkit for
extracting various features such as Color, Edge etc from images. Also
there is a project LIRE http://www.semanticmetadata.net/lire/ which
attempts to do something along what you are thinking of. Not sure how
well it works.

HTH,
Shashi


On Wed, Sep 15, 2010 at 10:59 AM, Shawn Heisey s...@elyograg.org wrote:
  My index consists of metadata for a collection of 45 million objects, most
 of which are digital images.  The executives have fallen in love with
 Google's color image search.  Here's a search for flower with a red color
 filter:

 http://www.google.com/images?q=flowertbs=isch:1,ic:specific,isc:red

 I am interested in duplicating this.  Can this group of fine people point me
 in the right direction?  I don't want anyone to do it for me, just help me
 find software and/or algorithms that can extract the color information, then
 find a way to get Solr to index and search it.

 Thanks,
 Shawn




Re: Color search for images

2010-09-15 Thread Shashi Kant

 On a related note, I'm curious if anyone has run across a good set of
 algorithms (or hopefully a library) for doing naive image
 classification. I'm looking for something that can classify images
 into something similar to the broad categories that Google image
 search has (Face, Photo, Clip Art, Line Drawing, etc.).


 --Paul


OpenCV is the way to go.Very comprehensive set of algorithms.


Re: Color search for images

2010-09-15 Thread Dennis Gearon
My guess is that they are leveraging text on the same web page. 

I'm sure there's some post doctoral types who could get a graphic shape 
analyzer, color analyzer, to at least say it's a flower.

However, even Google would have to build new datacenters to have the horsepower 
to do that kind of graphic processing.

So, since the names of all the images have something that says flower and read, 
I vote for image name or image attributes being the source.

Good luck with rolls of film.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/15/10, Ken Krugler kkrugler_li...@transpac.com wrote:

 From: Ken Krugler kkrugler_li...@transpac.com
 Subject: Re: Color search for images
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 15, 2010, 9:41 AM
 
 On Sep 15, 2010, at 7:59am, Shawn Heisey wrote:
 
  My index consists of metadata for a collection of 45
 million objects, most of which are digital images.  The
 executives have fallen in love with Google's color image
 search.  Here's a search for flower with a red color
 filter:
  
  http://www.google.com/images?q=flowertbs=isch:1,ic:specific,isc:red
  
  I am interested in duplicating this.  Can this
 group of fine people point me in the right direction? 
 I don't want anyone to do it for me, just help me find
 software and/or algorithms that can extract the color
 information, then find a way to get Solr to index and search
 it.
 
 When I took at look at the search results, it seems like
 the word red shows up in the image name, or description,
 or tag for every found image.
 
 Are you sure Google is extracting color information? Or
 just being smart about color-specific keywords found in
 associated text?
 
 -- Ken
 
 --
 Ken Krugler
 +1 530-210-6378
 http://bixolabs.com
 e l a s t i c   w e b   m i n
 i n g
 
 
 
 
 



Re: Color search for images

2010-09-15 Thread Shashi Kant
 I'm sure there's some post doctoral types who could get a graphic shape 
 analyzer, color analyzer, to at least say it's a flower.

 However, even Google would have to build new datacenters to have the 
 horsepower to do that kind of graphic processing.


Not necessarily true. Like.com - which incidentally got acquired by
Google recently - built a true visual search technology and applied it
on a large scale.


Re: Color search

2007-09-29 Thread Guangwei Yuan

 can you you explain exactly how you are indexing the data and what your
 query looks like?


I used the same field name (color), not 10 different names (c0 - c9).

So the index fields look like (50% #00, 20% #99):
color: #00
color: #00
color: #00
color: #00
color: #00
color: #99
color: #99

The query for black dresses will be:
color:#00


Re: Color search

2007-09-29 Thread Chris Hostetter

: I used the same field name (color), not 10 different names (c0 - c9).

ah .. got it.  then what you are probably seeing is because of length 
normalization, if you use omitNorms=true then it shouldn't matter.

(i don't know why i suggested a seperate field for each 10% block ... i'm 
sure i had a good reason but i can't think of it now) 


-Hoss



Re: Color search

2007-09-28 Thread Steven Rowe
Hi Guangwei,

When you index your products, you could have a single color field, and
include duplicates of each color component proportional to its weight.

For example, if you decide to use 10% increments, for your black dress
with 70% of black, 20% of gray, 10% of brown, you would index the
following terms for the color field:

  black black black black black black black
  gray gray
  brown

This works because Lucene natively interprets document term frequencies
as weights.

Steve

Guangwei Yuan wrote:
 Hi,
 
 We're running an e-commerce site that provides product search. We've been
 able to extract colors from product images, and we think it'd be cool and
 useful to search products by color. A product image can have up to 5 colors
 (from a color space of about 100 colors), so we can implement it easily with
 Solr's facet search (thanks all who've developed Solr).
 
 The problem arises when we try to sort the results by the color relevancy.
 What's different from a normal facet search is that colors are weighted. For
 example, a black dress can have 70% of black, 20% of gray, 10% of brown. A
 search query color:black should return results in which the black dress
 ranks higher than other products with less percentage of black.
 
 My question is: how to configure and index the color field so that products
 with higher percentage of color X ranks higher for query color:X?
 
 Thanks for your help!
 
 - Guangwei


Re: Color search

2007-09-28 Thread Grant Ingersoll
Another option would be to extend Solr (and donate back) to  
incorporate Lucene's payload functionality, in which case you could  
associate the percentile of the color as a payload and use the  
BoostingTermQuery... :-)  If you're interested in this, a discussion  
on solr-dev is probably warranted to figure out the best way to do this.


-Grant

On Sep 28, 2007, at 9:23 AM, Yonik Seeley wrote:


If it were just a couple of colors, you could have a separate field
for each color and then index the percent in that field.

black:70
grey:20

and then you could use a function query to influence the score (or you
could sort by the color percent).

However, this doesn't scale well to a large index with a large  
number of colors.
Each field used like that will take up 4 bytes per document in the  
index.


so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes =  
400MB

Doable depending on your index size (use int or float and not
sint or sfloat type for this... it will be better on the memory).

If you needed to be better on the memory, you could encode all of the
colors into a single value (perhaps into a compact string... one
percentile per byte or something) and then have a custom function that
extracts the value for a particular color.  (this involves some java
development)

-Yonik


On 9/28/07, Guangwei Yuan [EMAIL PROTECTED] wrote:

Hi,

We're running an e-commerce site that provides product search.  
We've been
able to extract colors from product images, and we think it'd be  
cool and
useful to search products by color. A product image can have up to  
5 colors
(from a color space of about 100 colors), so we can implement it  
easily with

Solr's facet search (thanks all who've developed Solr).

The problem arises when we try to sort the results by the color  
relevancy.
What's different from a normal facet search is that colors are  
weighted. For
example, a black dress can have 70% of black, 20% of gray, 10% of  
brown. A
search query color:black should return results in which the  
black dress

ranks higher than other products with less percentage of black.

My question is: how to configure and index the color field so that  
products

with higher percentage of color X ranks higher for query color:X?

Thanks for your help!

- Guangwei



--
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




Re: Color search

2007-09-28 Thread Yonik Seeley
If it were just a couple of colors, you could have a separate field
for each color and then index the percent in that field.

black:70
grey:20

and then you could use a function query to influence the score (or you
could sort by the color percent).

However, this doesn't scale well to a large index with a large number of colors.
Each field used like that will take up 4 bytes per document in the index.

so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes = 400MB
Doable depending on your index size (use int or float and not
sint or sfloat type for this... it will be better on the memory).

If you needed to be better on the memory, you could encode all of the
colors into a single value (perhaps into a compact string... one
percentile per byte or something) and then have a custom function that
extracts the value for a particular color.  (this involves some java
development)

-Yonik


On 9/28/07, Guangwei Yuan [EMAIL PROTECTED] wrote:
 Hi,

 We're running an e-commerce site that provides product search. We've been
 able to extract colors from product images, and we think it'd be cool and
 useful to search products by color. A product image can have up to 5 colors
 (from a color space of about 100 colors), so we can implement it easily with
 Solr's facet search (thanks all who've developed Solr).

 The problem arises when we try to sort the results by the color relevancy.
 What's different from a normal facet search is that colors are weighted. For
 example, a black dress can have 70% of black, 20% of gray, 10% of brown. A
 search query color:black should return results in which the black dress
 ranks higher than other products with less percentage of black.

 My question is: how to configure and index the color field so that products
 with higher percentage of color X ranks higher for query color:X?

 Thanks for your help!

 - Guangwei



RE: Color search

2007-09-28 Thread Renaud Waldura
Here's another idea: encode color mixes as one RGB value (32 bits) and sort
according to those values. To find the closest color is like finding the
closest points in the color space. It would be like a distance search.

70% black #00 = 0
20% gray #f0f0f0 = #303030
10% brown #8b4513 = #0e0702
= #3e3732

The distance would be:
sqrt( (r1 - r0)^2 + (g1 - g0)^2 + (b1 - b0)^2 )

Where r0g0b0 is the color the user asked for, and r1g1b1 is the composite
color of the item, calculated above.

--Renaud


-Original Message-
From: Steven Rowe [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 28, 2007 7:14 AM
To: solr-user@lucene.apache.org
Subject: Re: Color search

Hi Guangwei,

When you index your products, you could have a single color field, and
include duplicates of each color component proportional to its weight.

For example, if you decide to use 10% increments, for your black dress with
70% of black, 20% of gray, 10% of brown, you would index the following terms
for the color field:

  black black black black black black black
  gray gray
  brown

This works because Lucene natively interprets document term frequencies as
weights.

Steve

Guangwei Yuan wrote:
 Hi,
 
 We're running an e-commerce site that provides product search. We've 
 been able to extract colors from product images, and we think it'd be 
 cool and useful to search products by color. A product image can have 
 up to 5 colors (from a color space of about 100 colors), so we can 
 implement it easily with Solr's facet search (thanks all who've developed
Solr).
 
 The problem arises when we try to sort the results by the color relevancy.
 What's different from a normal facet search is that colors are 
 weighted. For example, a black dress can have 70% of black, 20% of 
 gray, 10% of brown. A search query color:black should return results 
 in which the black dress ranks higher than other products with less
percentage of black.
 
 My question is: how to configure and index the color field so that 
 products with higher percentage of color X ranks higher for query
color:X?
 
 Thanks for your help!
 
 - Guangwei




Re: Color search

2007-09-28 Thread Steven Rowe
Hi Renaud,

I think your method will produce strange results, probably in most
cases, e.g.

33% red #FF = #55
33% green #00FF00 = #005500
33% blue #FF = #55
= #55

Thus, red, green and blue dress would score well against a search for
medium gray.  Not good.

Steve

Renaud Waldura wrote:
 Here's another idea: encode color mixes as one RGB value (32 bits) and sort
 according to those values. To find the closest color is like finding the
 closest points in the color space. It would be like a distance search.
 
 70% black #00 = 0
 20% gray #f0f0f0 = #303030
 10% brown #8b4513 = #0e0702
 = #3e3732
 
 The distance would be:
 sqrt( (r1 - r0)^2 + (g1 - g0)^2 + (b1 - b0)^2 )
 
 Where r0g0b0 is the color the user asked for, and r1g1b1 is the composite
 color of the item, calculated above.
 
 --Renaud
 
 
 -Original Message-
 From: Steven Rowe [mailto:[EMAIL PROTECTED] 
 Sent: Friday, September 28, 2007 7:14 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Color search
 
 Hi Guangwei,
 
 When you index your products, you could have a single color field, and
 include duplicates of each color component proportional to its weight.
 
 For example, if you decide to use 10% increments, for your black dress with
 70% of black, 20% of gray, 10% of brown, you would index the following terms
 for the color field:
 
   black black black black black black black
   gray gray
   brown
 
 This works because Lucene natively interprets document term frequencies as
 weights.
 
 Steve
 
 Guangwei Yuan wrote:
 Hi,

 We're running an e-commerce site that provides product search. We've 
 been able to extract colors from product images, and we think it'd be 
 cool and useful to search products by color. A product image can have 
 up to 5 colors (from a color space of about 100 colors), so we can 
 implement it easily with Solr's facet search (thanks all who've developed
 Solr).
 The problem arises when we try to sort the results by the color relevancy.
 What's different from a normal facet search is that colors are 
 weighted. For example, a black dress can have 70% of black, 20% of 
 gray, 10% of brown. A search query color:black should return results 
 in which the black dress ranks higher than other products with less
 percentage of black.
 My question is: how to configure and index the color field so that 
 products with higher percentage of color X ranks higher for query
 color:X?
 Thanks for your help!

 - Guangwei
 
 



Re: Color search

2007-09-28 Thread Matthew Runo
This discussion is incredibly interesting to me! We solved this  
simply by indexing the color names, and faceting on that. Not a very  
elegant solution, to be sure - but it works. If people search for a  
green running shoe they get -green- running shoes.


I would be very very interested in having a color picker ajax app  
which then went out and found the products with colors most like the  
one you chose.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Sep 28, 2007, at 1:00 AM, Guangwei Yuan wrote:


Hi,

We're running an e-commerce site that provides product search.  
We've been
able to extract colors from product images, and we think it'd be  
cool and
useful to search products by color. A product image can have up to  
5 colors
(from a color space of about 100 colors), so we can implement it  
easily with

Solr's facet search (thanks all who've developed Solr).

The problem arises when we try to sort the results by the color  
relevancy.
What's different from a normal facet search is that colors are  
weighted. For
example, a black dress can have 70% of black, 20% of gray, 10% of  
brown. A
search query color:black should return results in which the black  
dress

ranks higher than other products with less percentage of black.

My question is: how to configure and index the color field so that  
products

with higher percentage of color X ranks higher for query color:X?

Thanks for your help!

- Guangwei




Re: Color search

2007-09-28 Thread Chris Hostetter

: useful to search products by color. A product image can have up to 5 colors
: (from a color space of about 100 colors), so we can implement it easily with
: Solr's facet search (thanks all who've developed Solr).
: 
: The problem arises when we try to sort the results by the color relevancy.
: What's different from a normal facet search is that colors are weighted. For
: example, a black dress can have 70% of black, 20% of gray, 10% of brown. A

if 5 is a hard max on the number of colors that you support, then you can 
always use 5 seperate fields to store the colors in order of dominance 
and then query on those 5 fields with varying boosts...

 color_1:black^10 color_2:black^7 color_3:black^4 color_4:black 
color_5:black^0.1

...something like this will loose the % granularity info that you have (so 
a 60% black skirt and an 80% black dress would both score the same against 
black since it's hte dominant color)

alternately: i'm assuming your percentage data only has so much confidence
-- maybe on the order of 10%?.  you can have a seperate field for each 
bucket of color percentages and index the name of hte color in the 
corrisponding bucket.  with 10% granularity that's only 10 fields -- a 10 
clause boolean query for the color is no big deal ... even going to 5% 
would be trivial.


Incidently: people interested in teh general topic of color faceting at 
a finer granularity then just color names may want to check out this 
thread from last...

http://www.nabble.com/faceting-and-categorizing-on-color--tf1801106.html



-Hoss



Re: Color search

2007-09-28 Thread Mike Klaas


On 28-Sep-07, at 6:31 AM, Grant Ingersoll wrote:

Another option would be to extend Solr (and donate back) to  
incorporate Lucene's payload functionality, in which case you could  
associate the percentile of the color as a payload and use the  
BoostingTermQuery... :-)  If you're interested in this, a  
discussion on solr-dev is probably warranted to figure out the best  
way to do this.


For reference, here is a summary of the changes needed:

1. A payload analyzer (here is an example that tokenizes strings of  
token:whatever:score into token with payload score:


  /** Returns the next token in the stream, or null at EOS. */
  public final Token next() throws IOException {
Token t = input.next();
if (null == t)
  return null;

String s = t.termText();
if(s.indexOf(:)  -1 ) {
  String []parts = s.split(:);
  assert parts.length == 3;
  String colour = parts[0];
  int bits = Float.floatToIntBits(Float.parseFloat(parts[1]));
  byte []buf = new byte[4];
  for(int shift=0, i=0; shift  32; shift += 8, i++) {
buf[i] = (byte)( (bitsshift)  0xff );
  }
  Token gen = new Token(colour, t.startOffset(), t.endOffset());
  gen.setPayload(new Payload(buf));
  t = gen;
}
return t;

  }


2. A payload deserializer.  Add this method to your custom Similarity  
class:


  public float scorePayload(byte [] payload, int offset, int length) {
assert length == 4;
int accum = ((payload[0+offset]0xff)) |
((payload[1+offset]0xff)8) |
((payload[2+offset]0xff)16)  |
((payload[3+offset]0xff)24);
return Float.intBitsToFloat(accum);
 }

3. Add a relevant query clause.  In a custom request handler, you  
could have a parameter to add BoostingTermQueries:


 q= new BoostingTermQuery(new Term(colourPayload, colour))
query.add(q, Occur.SHOULD);

How to add this generically is an interesting question.  There are  
many possibilities, especially on the request handler and tokenizer  
side of things.  If there is a consensus on a sensible way of doing  
this, I could contribute the bits of code that I have.


HTH,
-Mike



Re: Color search

2007-09-28 Thread Guangwei Yuan
Thanks for all the replies. I think creating 10 fields and feeding each
field with a color's value for 10% from that color is a reasonable approach,
and easy to implement too. One problem though, is that not all products have
a total of 100% colors (due to various reasons including our color
extraction algorithm, etc.) So, for a product with 50% of #00, and 20%
of #99, I'll have to fill the remaining three fields with some dummy
values. Otherwise, Lucene seems to score it higher than products that also
have 50% of #00, but more than 20% of some other colors. Since I also
need a way to exclude the dummy value when faceting, is there a neater
solution?

I'll certainly look at the payload functionality, which is new to me :)

- Guangwei