Re: Color search for images
Not exactly sure how one would put context of what object is more dominant than other. Think of landscape with snow, green mountains and set of flowers of varied colors including a rose On Fri, Sep 17, 2010 at 8:43 PM, Shashi Kant sk...@sloan.mit.edu wrote: What I am envisioning (at least to start) is have all this add two fields in the index. One would be for color information for the color similarity search. The other would be a simple multivalued text field that we put keywords into based on what OpenCV can detect about the image. If it detects faces, we would put face into this field. Other things that it can detect would result in other keywords. For the color search, I have a few inter-related hurdles. I've got to figure out what form the color data actually takes and how to represent it in Solr. I need Java code for Solr that can take an input color value and find similar values in the index. Then I need some code that can go in our feed processing scripts for new content. That code would also go into a crawler script to handle existing images. You are on the right track. You can create a set of representative keywords from the image. OpenCV gets a color histogram from the image - you can set the bin values to be as granular as you need, and create a look-up list of color names to generate a MVF representative of the image. If you want to get more sophisticated, represent the colors with payloads in correlation with the distribution of the color in the image. Another approach would be to segment the image and extract colors from each. So if you have a red rose with all white background, the textual representation would be something like: white, white...red...white, white Play around and see which works best. HTH
Re: Color search for images
Sounds like someone is/has going to say/said: Make it so, number one There are some good links off of this article about the color Magenta, (like, uh, who knows what 'cyan' or 'magenta' are anyway? So I looked it up. Refilling my printer cartidges required an explanation.) http://en.wikipedia.org/wiki/Magenta Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/16/10, Shawn Heisey elyog...@elyograg.org wrote: From: Shawn Heisey elyog...@elyograg.org Subject: Re: Color search for images To: solr-user@lucene.apache.org Date: Thursday, September 16, 2010, 7:58 PM On 9/16/2010 7:45 AM, Shashi Kant wrote: Lire is a nascent effort and based on a cursory overview a while back, IMHO was an over-simplified version of what a CBIR engine should be. They use CEDD (color edge descriptors). Wouldn't work for the kind of applications I am working on - which needs among other things, Color, Shape, Orientation, Pose, Edge/Corner etc. OpenCV has a steep learning curve, but having been through it, is very powerful toolkit - the best there is by far! BTW the code is in C++, but has both Java .NET bindings. This is a fabulous book to get hold of: http://www.amazon.com/Learning-OpenCV-Computer-Vision-Library/dp/0596516134, if you are seriously into OpenCV. Pls feel free to reach out of if you need any help with OpenCV + Solr/Lucene. I have spent quite a bit of time on this. What I am envisioning (at least to start) is have all this add two fields in the index. One would be for color information for the color similarity search. The other would be a simple multivalued text field that we put keywords into based on what OpenCV can detect about the image. If it detects faces, we would put face into this field. Other things that it can detect would result in other keywords. For the color search, I have a few inter-related hurdles. I've got to figure out what form the color data actually takes and how to represent it in Solr. I need Java code for Solr that can take an input color value and find similar values in the index. Then I need some code that can go in our feed processing scripts for new content. That code would also go into a crawler script to handle existing images. We can probably handle most of the development if we can figure out the methods and data formats. Naturally we would be interested in using off-the-shelf stuff as much as possible. Today I learned that our CTO has already been looking into OpenCV and has a copy of the O'Reilly book. Thanks, Shawn
Re: Color search for images
What I am envisioning (at least to start) is have all this add two fields in the index. One would be for color information for the color similarity search. The other would be a simple multivalued text field that we put keywords into based on what OpenCV can detect about the image. If it detects faces, we would put face into this field. Other things that it can detect would result in other keywords. For the color search, I have a few inter-related hurdles. I've got to figure out what form the color data actually takes and how to represent it in Solr. I need Java code for Solr that can take an input color value and find similar values in the index. Then I need some code that can go in our feed processing scripts for new content. That code would also go into a crawler script to handle existing images. You are on the right track. You can create a set of representative keywords from the image. OpenCV gets a color histogram from the image - you can set the bin values to be as granular as you need, and create a look-up list of color names to generate a MVF representative of the image. If you want to get more sophisticated, represent the colors with payloads in correlation with the distribution of the color in the image. Another approach would be to segment the image and extract colors from each. So if you have a red rose with all white background, the textual representation would be something like: white, white...red...white, white Play around and see which works best. HTH
Re: Color search for images
do you mean content based image retrieval or just search images by tag? if the former, you can try LIRE 2010/9/15 Shawn Heisey s...@elyograg.org: My index consists of metadata for a collection of 45 million objects, most of which are digital images. The executives have fallen in love with Google's color image search. Here's a search for flower with a red color filter: http://www.google.com/images?q=flowertbs=isch:1,ic:specific,isc:red I am interested in duplicating this. Can this group of fine people point me in the right direction? I don't want anyone to do it for me, just help me find software and/or algorithms that can extract the color information, then find a way to get Solr to index and search it. Thanks, Shawn
Re: Color search for images
Yes, notice the flowers are all a medium-dark crimson red. There are a bunch of these image-indexing search technologies, but there is no (to my knowledge) finished technology- it's very much an area of research. If you want to search the word 'flower' and index data that can find blobs of red, that might be easy with public tools. But there are many hard problems. Lance Stephen Weiss wrote: There's a project out there called LIRE (I heard about it on this list) that's supposed to create a lucene-based CIBR index for images. I wonder if this could be integrated with Solr? Personally I don't really care about the flower part, I'm more worried about searching whether the flower is red... we have good object keywording but not good color keywording - and color is so much more subjective too, red can mean a lot of things. I'm already working on testing it separately but it sure would be more useful if the scoring could be integrated with the rest of the search index. -- Steve On Sep 15, 2010, at 11:56 PM, Shashi Kant wrote: I'm sure there's some post doctoral types who could get a graphic shape analyzer, color analyzer, to at least say it's a flower. However, even Google would have to build new datacenters to have the horsepower to do that kind of graphic processing. Not necessarily true. Like.com - which incidentally got acquired by Google recently - built a true visual search technology and applied it on a large scale.
Re: Color search for images
On 9/15/2010 10:50 AM, Shashi Kant wrote: Shawn, I have done some research into this, machine-vision especially on a large scale is a hard problem, not to be entered into lightly. I would recommend starting with OpenCV - a comprehensive toolkit for extracting various features such as Color, Edge etc from images. Also there is a project LIRE http://www.semanticmetadata.net/lire/ which attempts to do something along what you are thinking of. Not sure how well it works. Lire looks promising, but how hard is it to integrate the content-based search into Solr as opposed to Lucene? I myself am not a Java developer. I have access to people who are, but their time is scarce. I use DIH to populate my index, so I would have to do analysis outside of Solr to populate the database. From there, I would come up with a new schema and DIH config to re-import either the entire index or just documents that have been recently updated. I have a build system to handle these things on all my shards. OpenCV looks intimidating, but potentially very useful and for most things would probably not require custom code in Solr. To mention the most obvious capability I can find, I think many of our customers would love to be able to check a box to include or exclude photos with faces in them. I can tell it's getting late ... I imagined a scenario similar to the Kohler commercial where a woman pulls out a faucet at an architect's office ... Design a website around #00ebc9. Thanks, Shawn
Re: Color search for images
Lire looks promising, but how hard is it to integrate the content-based search into Solr as opposed to Lucene? I myself am not a Java developer. I have access to people who are, but their time is scarce. Lire is a nascent effort and based on a cursory overview a while back, IMHO was an over-simplified version of what a CBIR engine should be. They use CEDD (color edge descriptors). Wouldn't work for the kind of applications I am working on - which needs among other things, Color, Shape, Orientation, Pose, Edge/Corner etc. OpenCV has a steep learning curve, but having been through it, is very powerful toolkit - the best there is by far! BTW the code is in C++, but has both Java .NET bindings. This is a fabulous book to get hold of: http://www.amazon.com/Learning-OpenCV-Computer-Vision-Library/dp/0596516134, if you are seriously into OpenCV. Pls feel free to reach out of if you need any help with OpenCV + Solr/Lucene. I have spent quite a bit of time on this.
Re: Color search for images
That's impressive! So Google has BOUGHT some doctoral types, or highly specialized geeks, And is looking at X number of images. I bet the number of images on his video film library is at least several orders of magnitude above what Like deals with. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/15/10, Shashi Kant sk...@sloan.mit.edu wrote: From: Shashi Kant sk...@sloan.mit.edu Subject: Re: Color search for images To: solr-user@lucene.apache.org Date: Wednesday, September 15, 2010, 8:56 PM I'm sure there's some post doctoral types who could get a graphic shape analyzer, color analyzer, to at least say it's a flower. However, even Google would have to build new datacenters to have the horsepower to do that kind of graphic processing. Not necessarily true. Like.com - which incidentally got acquired by Google recently - built a true visual search technology and applied it on a large scale.
Re: Color search for images
LOL! now that is one of the wisest things I've seen in a while. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/16/10, Shashi Kant sk...@sloan.mit.edu wrote: From: Shashi Kant sk...@sloan.mit.edu Subject: Re: Color search for images To: solr-user@lucene.apache.org Date: Thursday, September 16, 2010, 6:36 AM On Thu, Sep 16, 2010 at 3:21 AM, Lance Norskog goks...@gmail.com wrote: Yes, notice the flowers are all a medium-dark crimson red. There are a bunch of these image-indexing search technologies, but there is no (to my knowledge) finished technology- it's very much an area of research. If you want to search the word 'flower' and index data that can find blobs of red, that might be easy with public tools. But there are many hard problems. Lance, is there *ever* a finished technology? -)
Re: Color search for images
On 9/16/2010 7:45 AM, Shashi Kant wrote: Lire is a nascent effort and based on a cursory overview a while back, IMHO was an over-simplified version of what a CBIR engine should be. They use CEDD (color edge descriptors). Wouldn't work for the kind of applications I am working on - which needs among other things, Color, Shape, Orientation, Pose, Edge/Corner etc. OpenCV has a steep learning curve, but having been through it, is very powerful toolkit - the best there is by far! BTW the code is in C++, but has both Java .NET bindings. This is a fabulous book to get hold of: http://www.amazon.com/Learning-OpenCV-Computer-Vision-Library/dp/0596516134, if you are seriously into OpenCV. Pls feel free to reach out of if you need any help with OpenCV + Solr/Lucene. I have spent quite a bit of time on this. What I am envisioning (at least to start) is have all this add two fields in the index. One would be for color information for the color similarity search. The other would be a simple multivalued text field that we put keywords into based on what OpenCV can detect about the image. If it detects faces, we would put face into this field. Other things that it can detect would result in other keywords. For the color search, I have a few inter-related hurdles. I've got to figure out what form the color data actually takes and how to represent it in Solr. I need Java code for Solr that can take an input color value and find similar values in the index. Then I need some code that can go in our feed processing scripts for new content. That code would also go into a crawler script to handle existing images. We can probably handle most of the development if we can figure out the methods and data formats. Naturally we would be interested in using off-the-shelf stuff as much as possible. Today I learned that our CTO has already been looking into OpenCV and has a copy of the O'Reilly book. Thanks, Shawn
Re: Color search for images
On Sep 15, 2010, at 7:59am, Shawn Heisey wrote: My index consists of metadata for a collection of 45 million objects, most of which are digital images. The executives have fallen in love with Google's color image search. Here's a search for flower with a red color filter: http://www.google.com/images?q=flowertbs=isch:1,ic:specific,isc:red I am interested in duplicating this. Can this group of fine people point me in the right direction? I don't want anyone to do it for me, just help me find software and/or algorithms that can extract the color information, then find a way to get Solr to index and search it. When I took at look at the search results, it seems like the word red shows up in the image name, or description, or tag for every found image. Are you sure Google is extracting color information? Or just being smart about color-specific keywords found in associated text? -- Ken -- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: Color search for images
Shawn, I have done some research into this, machine-vision especially on a large scale is a hard problem, not to be entered into lightly. I would recommend starting with OpenCV - a comprehensive toolkit for extracting various features such as Color, Edge etc from images. Also there is a project LIRE http://www.semanticmetadata.net/lire/ which attempts to do something along what you are thinking of. Not sure how well it works. HTH, Shashi On Wed, Sep 15, 2010 at 10:59 AM, Shawn Heisey s...@elyograg.org wrote: My index consists of metadata for a collection of 45 million objects, most of which are digital images. The executives have fallen in love with Google's color image search. Here's a search for flower with a red color filter: http://www.google.com/images?q=flowertbs=isch:1,ic:specific,isc:red I am interested in duplicating this. Can this group of fine people point me in the right direction? I don't want anyone to do it for me, just help me find software and/or algorithms that can extract the color information, then find a way to get Solr to index and search it. Thanks, Shawn
Re: Color search for images
On a related note, I'm curious if anyone has run across a good set of algorithms (or hopefully a library) for doing naive image classification. I'm looking for something that can classify images into something similar to the broad categories that Google image search has (Face, Photo, Clip Art, Line Drawing, etc.). --Paul OpenCV is the way to go.Very comprehensive set of algorithms.
Re: Color search for images
My guess is that they are leveraging text on the same web page. I'm sure there's some post doctoral types who could get a graphic shape analyzer, color analyzer, to at least say it's a flower. However, even Google would have to build new datacenters to have the horsepower to do that kind of graphic processing. So, since the names of all the images have something that says flower and read, I vote for image name or image attributes being the source. Good luck with rolls of film. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/15/10, Ken Krugler kkrugler_li...@transpac.com wrote: From: Ken Krugler kkrugler_li...@transpac.com Subject: Re: Color search for images To: solr-user@lucene.apache.org Date: Wednesday, September 15, 2010, 9:41 AM On Sep 15, 2010, at 7:59am, Shawn Heisey wrote: My index consists of metadata for a collection of 45 million objects, most of which are digital images. The executives have fallen in love with Google's color image search. Here's a search for flower with a red color filter: http://www.google.com/images?q=flowertbs=isch:1,ic:specific,isc:red I am interested in duplicating this. Can this group of fine people point me in the right direction? I don't want anyone to do it for me, just help me find software and/or algorithms that can extract the color information, then find a way to get Solr to index and search it. When I took at look at the search results, it seems like the word red shows up in the image name, or description, or tag for every found image. Are you sure Google is extracting color information? Or just being smart about color-specific keywords found in associated text? -- Ken -- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: Color search for images
I'm sure there's some post doctoral types who could get a graphic shape analyzer, color analyzer, to at least say it's a flower. However, even Google would have to build new datacenters to have the horsepower to do that kind of graphic processing. Not necessarily true. Like.com - which incidentally got acquired by Google recently - built a true visual search technology and applied it on a large scale.
Re: Color search
can you you explain exactly how you are indexing the data and what your query looks like? I used the same field name (color), not 10 different names (c0 - c9). So the index fields look like (50% #00, 20% #99): color: #00 color: #00 color: #00 color: #00 color: #00 color: #99 color: #99 The query for black dresses will be: color:#00
Re: Color search
: I used the same field name (color), not 10 different names (c0 - c9). ah .. got it. then what you are probably seeing is because of length normalization, if you use omitNorms=true then it shouldn't matter. (i don't know why i suggested a seperate field for each 10% block ... i'm sure i had a good reason but i can't think of it now) -Hoss
Color search
Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
Re: Color search
Hi Guangwei, When you index your products, you could have a single color field, and include duplicates of each color component proportional to its weight. For example, if you decide to use 10% increments, for your black dress with 70% of black, 20% of gray, 10% of brown, you would index the following terms for the color field: black black black black black black black gray gray brown This works because Lucene natively interprets document term frequencies as weights. Steve Guangwei Yuan wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
Re: Color search
Another option would be to extend Solr (and donate back) to incorporate Lucene's payload functionality, in which case you could associate the percentile of the color as a payload and use the BoostingTermQuery... :-) If you're interested in this, a discussion on solr-dev is probably warranted to figure out the best way to do this. -Grant On Sep 28, 2007, at 9:23 AM, Yonik Seeley wrote: If it were just a couple of colors, you could have a separate field for each color and then index the percent in that field. black:70 grey:20 and then you could use a function query to influence the score (or you could sort by the color percent). However, this doesn't scale well to a large index with a large number of colors. Each field used like that will take up 4 bytes per document in the index. so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes = 400MB Doable depending on your index size (use int or float and not sint or sfloat type for this... it will be better on the memory). If you needed to be better on the memory, you could encode all of the colors into a single value (perhaps into a compact string... one percentile per byte or something) and then have a custom function that extracts the value for a particular color. (this involves some java development) -Yonik On 9/28/07, Guangwei Yuan [EMAIL PROTECTED] wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Color search
If it were just a couple of colors, you could have a separate field for each color and then index the percent in that field. black:70 grey:20 and then you could use a function query to influence the score (or you could sort by the color percent). However, this doesn't scale well to a large index with a large number of colors. Each field used like that will take up 4 bytes per document in the index. so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes = 400MB Doable depending on your index size (use int or float and not sint or sfloat type for this... it will be better on the memory). If you needed to be better on the memory, you could encode all of the colors into a single value (perhaps into a compact string... one percentile per byte or something) and then have a custom function that extracts the value for a particular color. (this involves some java development) -Yonik On 9/28/07, Guangwei Yuan [EMAIL PROTECTED] wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
RE: Color search
Here's another idea: encode color mixes as one RGB value (32 bits) and sort according to those values. To find the closest color is like finding the closest points in the color space. It would be like a distance search. 70% black #00 = 0 20% gray #f0f0f0 = #303030 10% brown #8b4513 = #0e0702 = #3e3732 The distance would be: sqrt( (r1 - r0)^2 + (g1 - g0)^2 + (b1 - b0)^2 ) Where r0g0b0 is the color the user asked for, and r1g1b1 is the composite color of the item, calculated above. --Renaud -Original Message- From: Steven Rowe [mailto:[EMAIL PROTECTED] Sent: Friday, September 28, 2007 7:14 AM To: solr-user@lucene.apache.org Subject: Re: Color search Hi Guangwei, When you index your products, you could have a single color field, and include duplicates of each color component proportional to its weight. For example, if you decide to use 10% increments, for your black dress with 70% of black, 20% of gray, 10% of brown, you would index the following terms for the color field: black black black black black black black gray gray brown This works because Lucene natively interprets document term frequencies as weights. Steve Guangwei Yuan wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
Re: Color search
Hi Renaud, I think your method will produce strange results, probably in most cases, e.g. 33% red #FF = #55 33% green #00FF00 = #005500 33% blue #FF = #55 = #55 Thus, red, green and blue dress would score well against a search for medium gray. Not good. Steve Renaud Waldura wrote: Here's another idea: encode color mixes as one RGB value (32 bits) and sort according to those values. To find the closest color is like finding the closest points in the color space. It would be like a distance search. 70% black #00 = 0 20% gray #f0f0f0 = #303030 10% brown #8b4513 = #0e0702 = #3e3732 The distance would be: sqrt( (r1 - r0)^2 + (g1 - g0)^2 + (b1 - b0)^2 ) Where r0g0b0 is the color the user asked for, and r1g1b1 is the composite color of the item, calculated above. --Renaud -Original Message- From: Steven Rowe [mailto:[EMAIL PROTECTED] Sent: Friday, September 28, 2007 7:14 AM To: solr-user@lucene.apache.org Subject: Re: Color search Hi Guangwei, When you index your products, you could have a single color field, and include duplicates of each color component proportional to its weight. For example, if you decide to use 10% increments, for your black dress with 70% of black, 20% of gray, 10% of brown, you would index the following terms for the color field: black black black black black black black gray gray brown This works because Lucene natively interprets document term frequencies as weights. Steve Guangwei Yuan wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
Re: Color search
This discussion is incredibly interesting to me! We solved this simply by indexing the color names, and faceting on that. Not a very elegant solution, to be sure - but it works. If people search for a green running shoe they get -green- running shoes. I would be very very interested in having a color picker ajax app which then went out and found the products with colors most like the one you chose. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Sep 28, 2007, at 1:00 AM, Guangwei Yuan wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
Re: Color search
: useful to search products by color. A product image can have up to 5 colors : (from a color space of about 100 colors), so we can implement it easily with : Solr's facet search (thanks all who've developed Solr). : : The problem arises when we try to sort the results by the color relevancy. : What's different from a normal facet search is that colors are weighted. For : example, a black dress can have 70% of black, 20% of gray, 10% of brown. A if 5 is a hard max on the number of colors that you support, then you can always use 5 seperate fields to store the colors in order of dominance and then query on those 5 fields with varying boosts... color_1:black^10 color_2:black^7 color_3:black^4 color_4:black color_5:black^0.1 ...something like this will loose the % granularity info that you have (so a 60% black skirt and an 80% black dress would both score the same against black since it's hte dominant color) alternately: i'm assuming your percentage data only has so much confidence -- maybe on the order of 10%?. you can have a seperate field for each bucket of color percentages and index the name of hte color in the corrisponding bucket. with 10% granularity that's only 10 fields -- a 10 clause boolean query for the color is no big deal ... even going to 5% would be trivial. Incidently: people interested in teh general topic of color faceting at a finer granularity then just color names may want to check out this thread from last... http://www.nabble.com/faceting-and-categorizing-on-color--tf1801106.html -Hoss
Re: Color search
On 28-Sep-07, at 6:31 AM, Grant Ingersoll wrote: Another option would be to extend Solr (and donate back) to incorporate Lucene's payload functionality, in which case you could associate the percentile of the color as a payload and use the BoostingTermQuery... :-) If you're interested in this, a discussion on solr-dev is probably warranted to figure out the best way to do this. For reference, here is a summary of the changes needed: 1. A payload analyzer (here is an example that tokenizes strings of token:whatever:score into token with payload score: /** Returns the next token in the stream, or null at EOS. */ public final Token next() throws IOException { Token t = input.next(); if (null == t) return null; String s = t.termText(); if(s.indexOf(:) -1 ) { String []parts = s.split(:); assert parts.length == 3; String colour = parts[0]; int bits = Float.floatToIntBits(Float.parseFloat(parts[1])); byte []buf = new byte[4]; for(int shift=0, i=0; shift 32; shift += 8, i++) { buf[i] = (byte)( (bitsshift) 0xff ); } Token gen = new Token(colour, t.startOffset(), t.endOffset()); gen.setPayload(new Payload(buf)); t = gen; } return t; } 2. A payload deserializer. Add this method to your custom Similarity class: public float scorePayload(byte [] payload, int offset, int length) { assert length == 4; int accum = ((payload[0+offset]0xff)) | ((payload[1+offset]0xff)8) | ((payload[2+offset]0xff)16) | ((payload[3+offset]0xff)24); return Float.intBitsToFloat(accum); } 3. Add a relevant query clause. In a custom request handler, you could have a parameter to add BoostingTermQueries: q= new BoostingTermQuery(new Term(colourPayload, colour)) query.add(q, Occur.SHOULD); How to add this generically is an interesting question. There are many possibilities, especially on the request handler and tokenizer side of things. If there is a consensus on a sensible way of doing this, I could contribute the bits of code that I have. HTH, -Mike
Re: Color search
Thanks for all the replies. I think creating 10 fields and feeding each field with a color's value for 10% from that color is a reasonable approach, and easy to implement too. One problem though, is that not all products have a total of 100% colors (due to various reasons including our color extraction algorithm, etc.) So, for a product with 50% of #00, and 20% of #99, I'll have to fill the remaining three fields with some dummy values. Otherwise, Lucene seems to score it higher than products that also have 50% of #00, but more than 20% of some other colors. Since I also need a way to exclude the dummy value when faceting, is there a neater solution? I'll certainly look at the payload functionality, which is new to me :) - Guangwei