Re: Converting German special characters / umlaute

2007-09-28 Thread Thorsten Scherler
On Thu, 2007-09-27 at 13:26 -0400, J.J. Larrea wrote:
 At 12:13 PM -0400 9/27/07, Steven Rowe wrote:
 Chris Hostetter wrote:
...
 As for implementation, the first part could easily and flexibly accomplished 
 with the current PatternReplaceFilter, and I'm thinking the second could be 
 done with an extension to that or better yet a new Filter which allows 
 parsing synonymous tokens from a flat to overlaid format, e.g. something on 
 the order of:
 
 filter class=solr.PatternReplaceFilterFactory
  pattern=(.*)(ü|ue)(.*)
  replacement=$1ue$3|$1u$3
  tokensep=|  !-- not currently implemented --
  replace=first/
 
 or perhaps better,
 
 filter class=solr.PatternReplaceFilterFactory
  pattern=(.*)(ü|ue)(.*)
  replacement=$1ue$3|$1u$3
  replace=first/
 filter class=solr.OverlayTokenFilterFactory
  tokensep=|/   !-- not currently implemented --
 
 which in my fantasy implementation would map:
 
 Müller - Mueller|Muller
 Mueller - Mueller|Muller
 Muller - Muller
 
 and could be run at index-time and/or query-time as appropriate.
 
 Does anyone know if there are other (Latin-1-utilizing) languages
 besides German with standardized diacritic substitutions that involve
 something other than just stripping the diacritics?
 
 I'm curious about this too.
 

I am German, but working in Spain so I have not faced the problem so
far. Anyhow, IMO 
Müller - Mueller
Mueller - Mueller

is right to further shorten the word does not seems right since one is
changing the meaning too much.

Further:
groß - gross
gross - gross

ß is pronounced 'sz' but only replaced by 'ss'.

salu2

 - J.J.
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



Color search

2007-09-28 Thread Guangwei Yuan
Hi,

We're running an e-commerce site that provides product search. We've been
able to extract colors from product images, and we think it'd be cool and
useful to search products by color. A product image can have up to 5 colors
(from a color space of about 100 colors), so we can implement it easily with
Solr's facet search (thanks all who've developed Solr).

The problem arises when we try to sort the results by the color relevancy.
What's different from a normal facet search is that colors are weighted. For
example, a black dress can have 70% of black, 20% of gray, 10% of brown. A
search query color:black should return results in which the black dress
ranks higher than other products with less percentage of black.

My question is: how to configure and index the color field so that products
with higher percentage of color X ranks higher for query color:X?

Thanks for your help!

- Guangwei


Indexing without application server

2007-09-28 Thread Jae Joo
Hi,

I have a multi millions document to be indexed and looking for the way to
index it without j2ee application server.
It is not incremental indexing, this is a kind of Index once, use forever
- all batch mode.

I can guess if there is a way to index it without J2EE, it may be much
faster...

Thanks,

Jae Joo


Re: Color search

2007-09-28 Thread Steven Rowe
Hi Guangwei,

When you index your products, you could have a single color field, and
include duplicates of each color component proportional to its weight.

For example, if you decide to use 10% increments, for your black dress
with 70% of black, 20% of gray, 10% of brown, you would index the
following terms for the color field:

  black black black black black black black
  gray gray
  brown

This works because Lucene natively interprets document term frequencies
as weights.

Steve

Guangwei Yuan wrote:
 Hi,
 
 We're running an e-commerce site that provides product search. We've been
 able to extract colors from product images, and we think it'd be cool and
 useful to search products by color. A product image can have up to 5 colors
 (from a color space of about 100 colors), so we can implement it easily with
 Solr's facet search (thanks all who've developed Solr).
 
 The problem arises when we try to sort the results by the color relevancy.
 What's different from a normal facet search is that colors are weighted. For
 example, a black dress can have 70% of black, 20% of gray, 10% of brown. A
 search query color:black should return results in which the black dress
 ranks higher than other products with less percentage of black.
 
 My question is: how to configure and index the color field so that products
 with higher percentage of color X ranks higher for query color:X?
 
 Thanks for your help!
 
 - Guangwei


Re: Color search

2007-09-28 Thread Grant Ingersoll
Another option would be to extend Solr (and donate back) to  
incorporate Lucene's payload functionality, in which case you could  
associate the percentile of the color as a payload and use the  
BoostingTermQuery... :-)  If you're interested in this, a discussion  
on solr-dev is probably warranted to figure out the best way to do this.


-Grant

On Sep 28, 2007, at 9:23 AM, Yonik Seeley wrote:


If it were just a couple of colors, you could have a separate field
for each color and then index the percent in that field.

black:70
grey:20

and then you could use a function query to influence the score (or you
could sort by the color percent).

However, this doesn't scale well to a large index with a large  
number of colors.
Each field used like that will take up 4 bytes per document in the  
index.


so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes =  
400MB

Doable depending on your index size (use int or float and not
sint or sfloat type for this... it will be better on the memory).

If you needed to be better on the memory, you could encode all of the
colors into a single value (perhaps into a compact string... one
percentile per byte or something) and then have a custom function that
extracts the value for a particular color.  (this involves some java
development)

-Yonik


On 9/28/07, Guangwei Yuan [EMAIL PROTECTED] wrote:

Hi,

We're running an e-commerce site that provides product search.  
We've been
able to extract colors from product images, and we think it'd be  
cool and
useful to search products by color. A product image can have up to  
5 colors
(from a color space of about 100 colors), so we can implement it  
easily with

Solr's facet search (thanks all who've developed Solr).

The problem arises when we try to sort the results by the color  
relevancy.
What's different from a normal facet search is that colors are  
weighted. For
example, a black dress can have 70% of black, 20% of gray, 10% of  
brown. A
search query color:black should return results in which the  
black dress

ranks higher than other products with less percentage of black.

My question is: how to configure and index the color field so that  
products

with higher percentage of color X ranks higher for query color:X?

Thanks for your help!

- Guangwei



--
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




locallucene former custom-sort thread

2007-09-28 Thread patrick o'leary




Changing thread name;

Are you using local lucene or local solr, and which version?


P

[EMAIL PROTECTED] wrote:

  
  i have been testing locallucene with our data for the last couple of days.
one issue i faced with it is during when using geo sorting is that it seems
to eat up all the memory, however big and become progressively slower,
finally after several requests (10 or so in my case) it throws up a
java.lang.OutOfMemoryError: Java heap space error.

is there a way to get around this?

-Original Message-
From: Jon Pierce [mailto:[EMAIL PROTECTED]]
Sent: 28 September 2007 15:48
To: solr-user@lucene.apache.org
Subject: Re: custom sorting


Is the machinery in place to do this now (hook up a function query to
be used in sorting)?

I'm trying to figure out what's the best way to do a distance sort:
custom comparator or function query.

Using a custom comparator seems straightforward and reusable across
both the standard and dismax handlers.  But it also seems most likely
to impact performance (or at least require the most work/knowledge to
get right by minimizing calculations, caching, watching out for memory
leaks, etc.).  (Speaking of which, could anyone with more Lucene/Solr
experience than I comment on the performance characteristics of the
locallucene implementation mentioned on the list recently?  I've taken
a first look and it seems reasonable to me.)

Using a function query, as Yonik suggests above, is another approach.
But to get a true sort, you have to boost the original query to zero?
How does this impact the results returned by the original query?  Will
the requirements (and boosts) of the original (now nested) query
remain intact, only sorted by the function?  Also, is there any way to
do this with the dismax handler?

Thanks,
- Jon

On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:
  
  
 On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote:


Using something like this, how would the custom SortComparatorSource
  get a parameter from the request to use in sorting calculations?
  


 perhaps hook in via function query:
   dist(10.4,20.2,geoloc)

 And either manipulate the score with that and sort by score,

 q=+(foo bar)0 dist(10.4,20.2,geoloc)
 sort=score asc

 or extend solr's sorting mechanisms to allow specifying a function to sort

  
  by.
  
  

 sort="dist(10.4,20.2,geoloc) asc"

 -Yonik


  
  
This email is confidential and may also be privileged. If you are not the intended recipient please notify us immediately by telephoning +44 (0)20 7452 5300 or email [EMAIL PROTECTED]. You should not copy it or use it for any purpose nor disclose its contents to any other person. Touch Local cannot accept liability for statements made which are clearly the sender's own and are not made on behalf of the firm.

Touch Local Limited
Registered Number: 2885607
VAT Number: GB896112114
Cardinal Tower, 12 Farringdon Road, London EC1M 3NN
+44 (0)20 7452 5300

  
  


-- 
Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: Color search

2007-09-28 Thread Yonik Seeley
If it were just a couple of colors, you could have a separate field
for each color and then index the percent in that field.

black:70
grey:20

and then you could use a function query to influence the score (or you
could sort by the color percent).

However, this doesn't scale well to a large index with a large number of colors.
Each field used like that will take up 4 bytes per document in the index.

so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes = 400MB
Doable depending on your index size (use int or float and not
sint or sfloat type for this... it will be better on the memory).

If you needed to be better on the memory, you could encode all of the
colors into a single value (perhaps into a compact string... one
percentile per byte or something) and then have a custom function that
extracts the value for a particular color.  (this involves some java
development)

-Yonik


On 9/28/07, Guangwei Yuan [EMAIL PROTECTED] wrote:
 Hi,

 We're running an e-commerce site that provides product search. We've been
 able to extract colors from product images, and we think it'd be cool and
 useful to search products by color. A product image can have up to 5 colors
 (from a color space of about 100 colors), so we can implement it easily with
 Solr's facet search (thanks all who've developed Solr).

 The problem arises when we try to sort the results by the color relevancy.
 What's different from a normal facet search is that colors are weighted. For
 example, a black dress can have 70% of black, 20% of gray, 10% of brown. A
 search query color:black should return results in which the black dress
 ranks higher than other products with less percentage of black.

 My question is: how to configure and index the color field so that products
 with higher percentage of color X ranks higher for query color:X?

 Thanks for your help!

 - Guangwei



RE: custom sorting

2007-09-28 Thread Sandeep Shetty
i have been testing locallucene with our data for the last couple of days.
one issue i faced with it is during when using geo sorting is that it seems
to eat up all the memory, however big and become progressively slower,
finally after several requests (10 or so in my case) it throws up a
java.lang.OutOfMemoryError: Java heap space error.

is there a way to get around this?

-Original Message-
From: Jon Pierce [mailto:[EMAIL PROTECTED]
Sent: 28 September 2007 15:48
To: solr-user@lucene.apache.org
Subject: Re: custom sorting


Is the machinery in place to do this now (hook up a function query to
be used in sorting)?

I'm trying to figure out what's the best way to do a distance sort:
custom comparator or function query.

Using a custom comparator seems straightforward and reusable across
both the standard and dismax handlers.  But it also seems most likely
to impact performance (or at least require the most work/knowledge to
get right by minimizing calculations, caching, watching out for memory
leaks, etc.).  (Speaking of which, could anyone with more Lucene/Solr
experience than I comment on the performance characteristics of the
locallucene implementation mentioned on the list recently?  I've taken
a first look and it seems reasonable to me.)

Using a function query, as Yonik suggests above, is another approach.
But to get a true sort, you have to boost the original query to zero?
How does this impact the results returned by the original query?  Will
the requirements (and boosts) of the original (now nested) query
remain intact, only sorted by the function?  Also, is there any way to
do this with the dismax handler?

Thanks,
- Jon

On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:
 On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote:
  Using something like this, how would the custom SortComparatorSource
  get a parameter from the request to use in sorting calculations?

 perhaps hook in via function query:
   dist(10.4,20.2,geoloc)

 And either manipulate the score with that and sort by score,

 q=+(foo bar)^0 dist(10.4,20.2,geoloc)
 sort=score asc

 or extend solr's sorting mechanisms to allow specifying a function to sort
by.

 sort=dist(10.4,20.2,geoloc) asc

 -Yonik


This email is confidential and may also be privileged. If you are not the 
intended recipient please notify us immediately by telephoning +44 (0)20 7452 
5300 or email [EMAIL PROTECTED] You should not copy it or use it for any 
purpose nor disclose its contents to any other person. Touch Local cannot 
accept liability for statements made which are clearly the sender's own and are 
not made on behalf of the firm.

Touch Local Limited
Registered Number: 2885607
VAT Number: GB896112114
Cardinal Tower, 12 Farringdon Road, London EC1M 3NN
+44 (0)20 7452 5300



RE: locallucene former custom-sort thread

2007-09-28 Thread Sandeep Shetty
Hi, i'm using local lucene, downloaded the latest zip file
solr-example_s1.3_ls0.2.tgz

is there a newer version available? 

Thanks!
Sandeep

-Original Message-
From: patrick o'leary [mailto:[EMAIL PROTECTED]
Sent: 28 September 2007 16:08
To: solr-user@lucene.apache.org
Subject: locallucene former custom-sort thread


Changing thread name;

Are you using local lucene or local solr, and which version?


P

[EMAIL PROTECTED] wrote:
 i have been testing locallucene with our data for the last couple of days.
 one issue i faced with it is during when using geo sorting is that it
seems
 to eat up all the memory, however big and become progressively slower,
 finally after several requests (10 or so in my case) it throws up a
 java.lang.OutOfMemoryError: Java heap space error.

 is there a way to get around this?

 -Original Message-
 From: Jon Pierce [mailto:[EMAIL PROTECTED]
 Sent: 28 September 2007 15:48
 To: solr-user@lucene.apache.org
 Subject: Re: custom sorting


 Is the machinery in place to do this now (hook up a function query to
 be used in sorting)?

 I'm trying to figure out what's the best way to do a distance sort:
 custom comparator or function query.

 Using a custom comparator seems straightforward and reusable across
 both the standard and dismax handlers.  But it also seems most likely
 to impact performance (or at least require the most work/knowledge to
 get right by minimizing calculations, caching, watching out for memory
 leaks, etc.).  (Speaking of which, could anyone with more Lucene/Solr
 experience than I comment on the performance characteristics of the
 locallucene implementation mentioned on the list recently?  I've taken
 a first look and it seems reasonable to me.)

 Using a function query, as Yonik suggests above, is another approach.
 But to get a true sort, you have to boost the original query to zero?
 How does this impact the results returned by the original query?  Will
 the requirements (and boosts) of the original (now nested) query
 remain intact, only sorted by the function?  Also, is there any way to
 do this with the dismax handler?

 Thanks,
 - Jon

 On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:
   
  On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 
   Using something like this, how would the custom SortComparatorSource
   get a parameter from the request to use in sorting calculations?
   
 
  perhaps hook in via function query:
dist(10.4,20.2,geoloc)
 
  And either manipulate the score with that and sort by score,
 
  q=+(foo bar)0 dist(10.4,20.2,geoloc)
  sort=score asc
 
  or extend solr's sorting mechanisms to allow specifying a function to
sort
 
 by.
   
 
  sort=dist(10.4,20.2,geoloc) asc
 
  -Yonik
 
 

 This email is confidential and may also be privileged. If you are not the
intended recipient please notify us immediately by telephoning +44 (0)20
7452 5300 or email [EMAIL PROTECTED] You should not copy it or use
it for any purpose nor disclose its contents to any other person. Touch
Local cannot accept liability for statements made which are clearly the
sender's own and are not made on behalf of the firm.

 Touch Local Limited
 Registered Number: 2885607
 VAT Number: GB896112114
 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN
 +44 (0)20 7452 5300

   

-- 

Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his
tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive
them there. The only difference is that there is no cat.
  - Albert Einstein

View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile
http://www.linkedin.com/in/pjaol


RE: Color search

2007-09-28 Thread Renaud Waldura
Here's another idea: encode color mixes as one RGB value (32 bits) and sort
according to those values. To find the closest color is like finding the
closest points in the color space. It would be like a distance search.

70% black #00 = 0
20% gray #f0f0f0 = #303030
10% brown #8b4513 = #0e0702
= #3e3732

The distance would be:
sqrt( (r1 - r0)^2 + (g1 - g0)^2 + (b1 - b0)^2 )

Where r0g0b0 is the color the user asked for, and r1g1b1 is the composite
color of the item, calculated above.

--Renaud


-Original Message-
From: Steven Rowe [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 28, 2007 7:14 AM
To: solr-user@lucene.apache.org
Subject: Re: Color search

Hi Guangwei,

When you index your products, you could have a single color field, and
include duplicates of each color component proportional to its weight.

For example, if you decide to use 10% increments, for your black dress with
70% of black, 20% of gray, 10% of brown, you would index the following terms
for the color field:

  black black black black black black black
  gray gray
  brown

This works because Lucene natively interprets document term frequencies as
weights.

Steve

Guangwei Yuan wrote:
 Hi,
 
 We're running an e-commerce site that provides product search. We've 
 been able to extract colors from product images, and we think it'd be 
 cool and useful to search products by color. A product image can have 
 up to 5 colors (from a color space of about 100 colors), so we can 
 implement it easily with Solr's facet search (thanks all who've developed
Solr).
 
 The problem arises when we try to sort the results by the color relevancy.
 What's different from a normal facet search is that colors are 
 weighted. For example, a black dress can have 70% of black, 20% of 
 gray, 10% of brown. A search query color:black should return results 
 in which the black dress ranks higher than other products with less
percentage of black.
 
 My question is: how to configure and index the color field so that 
 products with higher percentage of color X ranks higher for query
color:X?
 
 Thanks for your help!
 
 - Guangwei




Re: locallucene former custom-sort thread

2007-09-28 Thread patrick o'leary




That's the latest. I was experimenting with caching, which might be the
problem.
I'll have a look, could you give me an idea of how large the radius was
and how many results were coming back.

Thanks
P

Sandeep Shetty wrote:

  Hi, i'm using local lucene, downloaded the latest zip file
solr-example_s1.3_ls0.2.tgz

is there a newer version available? 

Thanks!
Sandeep

-Original Message-
From: patrick o'leary [mailto:[EMAIL PROTECTED]]
Sent: 28 September 2007 16:08
To: solr-user@lucene.apache.org
Subject: locallucene former custom-sort thread


Changing thread name;

Are you using local lucene or local solr, and which version?


P

[EMAIL PROTECTED] wrote:
  
  
i have been testing locallucene with our data for the last couple of days.
one issue i faced with it is during when using geo sorting is that it

  
  seems
  
  
to eat up all the memory, however big and become progressively slower,
finally after several requests (10 or so in my case) it throws up a
java.lang.OutOfMemoryError: Java heap space error.

is there a way to get around this?

-Original Message-
From: Jon Pierce [mailto:[EMAIL PROTECTED]]
Sent: 28 September 2007 15:48
To: solr-user@lucene.apache.org
Subject: Re: custom sorting


Is the machinery in place to do this now (hook up a function query to
be used in sorting)?

I'm trying to figure out what's the best way to do a distance sort:
custom comparator or function query.

Using a custom comparator seems straightforward and reusable across
both the standard and dismax handlers.  But it also seems most likely
to impact performance (or at least require the most work/knowledge to
get right by minimizing calculations, caching, watching out for memory
leaks, etc.).  (Speaking of which, could anyone with more Lucene/Solr
experience than I comment on the performance characteristics of the
locallucene implementation mentioned on the list recently?  I've taken
a first look and it seems reasonable to me.)

Using a function query, as Yonik suggests above, is another approach.
But to get a true sort, you have to boost the original query to zero?
How does this impact the results returned by the original query?  Will
the requirements (and boosts) of the original (now nested) query
remain intact, only sorted by the function?  Also, is there any way to
do this with the dismax handler?

Thanks,
- Jon

On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:
  


  
On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote:

  
  
  
  

  
Using something like this, how would the custom SortComparatorSource
get a parameter from the request to use in sorting calculations?

  

  

perhaps hook in via function query:
  dist(10.4,20.2,geoloc)

And either manipulate the score with that and sort by score,

q=+(foo bar)0 dist(10.4,20.2,geoloc)
sort=score asc

or extend solr's sorting mechanisms to allow specifying a function to

  

  
  sort
  
  

  
  

by.
  


  
sort="dist(10.4,20.2,geoloc) asc"

-Yonik


  
  
  

This email is confidential and may also be privileged. If you are not the

  
  intended recipient please notify us immediately by telephoning +44 (0)20
7452 5300 or email [EMAIL PROTECTED]. You should not copy it or use
it for any purpose nor disclose its contents to any other person. Touch
Local cannot accept liability for statements made which are clearly the
sender's own and are not made on behalf of the firm.
  
  
Touch Local Limited
Registered Number: 2885607
VAT Number: GB896112114
Cardinal Tower, 12 Farringdon Road, London EC1M 3NN
+44 (0)20 7452 5300

  

  
  
  


-- 
Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: custom sorting

2007-09-28 Thread Narayanan Palasseri
Hi all,
Regarding this issue, we tried using a custom request handler which inturn
uses the CustomCompartor. But this has a memory leak and we are almost got
stuck up at that point. As somebody mentioned, we are thinking of moving
towards function query to achieve the same. Please let me know whether
anybody has faced similar issue or is it that we are doing something wrong.
The additional code that we have return from the default handler is as given
below.

*

if* (*myappRequestHandler*.equalsIgnoreCase(requestHandler))

{

sort = getSortCriteria(*new* SimpleSortComparatorSourceImpl());

}

Thanks and Regards
Narayanan


On 9/28/07, Yonik Seeley [EMAIL PROTECTED] wrote:

 On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote:
  Using something like this, how would the custom SortComparatorSource
  get a parameter from the request to use in sorting calculations?

 perhaps hook in via function query:
 dist(10.4,20.2,geoloc)

 And either manipulate the score with that and sort by score,

 q=+(foo bar)^0 dist(10.4,20.2,geoloc)
 sort=score asc

 or extend solr's sorting mechanisms to allow specifying a function to sort
 by.

 sort=dist(10.4,20.2,geoloc) asc

 -Yonik



Re: Indexing without application server

2007-09-28 Thread Walter Underwood
I do not think it will be much faster. The data transfer time is small
compared to the indexing time.

The indexing will probably take less than a day, so if you spend more
than 30 minutes coding a faster method, the project will take longer.

wunder

On 9/28/07 6:06 AM, Jae Joo [EMAIL PROTECTED] wrote:

 Hi,
 
 I have a multi millions document to be indexed and looking for the way to
 index it without j2ee application server.
 It is not incremental indexing, this is a kind of Index once, use forever
 - all batch mode.
 
 I can guess if there is a way to index it without J2EE, it may be much
 faster...
 
 Thanks,
 
 Jae Joo



one query or multiple queries

2007-09-28 Thread Xuesong Luo
Hi, there,
I have a user index(each user has a unique index record). If I want to
search 10 users, should I run 10 queries or 1 query with multiple user
ids? Is there any performance difference?
 
Thanks
Xuesong

 



RE: locallucene former custom-sort thread

2007-09-28 Thread Sandeep Shetty
also probably a point to consider, the index has about 2.9 million records
in total

-Original Message-
From: Sandeep Shetty 
Sent: 28 September 2007 17:15
To: 'solr-user@lucene.apache.org'
Subject: RE: locallucene former custom-sort thread


yes i was thinking abt the same. 

i was searching for a radius of 25 miles. we get about 2500 results back for
the search. it seems like its storing all those geo results in cache and it
keeps on adding to it each time a geo request is made...

thanks for looking into it! 

Sandeep

-Original Message-
From: patrick o'leary [mailto:[EMAIL PROTECTED]
Sent: 28 September 2007 17:02
To: solr-user@lucene.apache.org
Subject: Re: locallucene former custom-sort thread


That's the latest. I was experimenting with caching, which might be the
problem.
I'll have a look, could you give me an idea of how large the radius was
and how many results were coming back.

Thanks
P

Sandeep Shetty wrote:
 Hi, i'm using local lucene, downloaded the latest zip file
 solr-example_s1.3_ls0.2.tgz

 is there a newer version available? 

 Thanks!
 Sandeep

 -Original Message-
 From: patrick o'leary [mailto:[EMAIL PROTECTED]
 Sent: 28 September 2007 16:08
 To: solr-user@lucene.apache.org
 Subject: locallucene former custom-sort thread


 Changing thread name;

 Are you using local lucene or local solr, and which version?


 P

 [EMAIL PROTECTED] wrote:
   
 i have been testing locallucene with our data for the last couple of
days.
 one issue i faced with it is during when using geo sorting is that it
 
 seems
   
 to eat up all the memory, however big and become progressively slower,
 finally after several requests (10 or so in my case) it throws up a
 java.lang.OutOfMemoryError: Java heap space error.

 is there a way to get around this?

 -Original Message-
 From: Jon Pierce [mailto:[EMAIL PROTECTED]
 Sent: 28 September 2007 15:48
 To: solr-user@lucene.apache.org
 Subject: Re: custom sorting


 Is the machinery in place to do this now (hook up a function query to
 be used in sorting)?

 I'm trying to figure out what's the best way to do a distance sort:
 custom comparator or function query.

 Using a custom comparator seems straightforward and reusable across
 both the standard and dismax handlers.  But it also seems most likely
 to impact performance (or at least require the most work/knowledge to
 get right by minimizing calculations, caching, watching out for memory
 leaks, etc.).  (Speaking of which, could anyone with more Lucene/Solr
 experience than I comment on the performance characteristics of the
 locallucene implementation mentioned on the list recently?  I've taken
 a first look and it seems reasonable to me.)

 Using a function query, as Yonik suggests above, is another approach.
 But to get a true sort, you have to boost the original query to zero?
 How does this impact the results returned by the original query?  Will
 the requirements (and boosts) of the original (now nested) query
 remain intact, only sorted by the function?  Also, is there any way to
 do this with the dismax handler?

 Thanks,
 - Jon

 On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:
   
 
 On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 
 
   
 Using something like this, how would the custom SortComparatorSource
 get a parameter from the request to use in sorting calculations?
 
   

 perhaps hook in via function query:
   dist(10.4,20.2,geoloc)

 And either manipulate the score with that and sort by score,

 q=+(foo bar)0 dist(10.4,20.2,geoloc)
 sort=score asc

 or extend solr's sorting mechanisms to allow specifying a function to
 
 sort
   
 
   
 by.
   
 
 sort=dist(10.4,20.2,geoloc) asc

 -Yonik

 
 
   
 This email is confidential and may also be privileged. If you are not the
 
 intended recipient please notify us immediately by telephoning +44 (0)20
 7452 5300 or email [EMAIL PROTECTED] You should not copy it or
use
 it for any purpose nor disclose its contents to any other person. Touch
 Local cannot accept liability for statements made which are clearly the
 sender's own and are not made on behalf of the firm.
   
 Touch Local Limited
 Registered Number: 2885607
 VAT Number: GB896112114
 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN
 +44 (0)20 7452 5300

   
 

   

-- 

Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his
tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive
them there. The only difference is that there is no cat.
  - Albert Einstein

View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile
http://www.linkedin.com/in/pjaol


Re: custom sorting

2007-09-28 Thread Jon Pierce
Is the machinery in place to do this now (hook up a function query to
be used in sorting)?

I'm trying to figure out what's the best way to do a distance sort:
custom comparator or function query.

Using a custom comparator seems straightforward and reusable across
both the standard and dismax handlers.  But it also seems most likely
to impact performance (or at least require the most work/knowledge to
get right by minimizing calculations, caching, watching out for memory
leaks, etc.).  (Speaking of which, could anyone with more Lucene/Solr
experience than I comment on the performance characteristics of the
locallucene implementation mentioned on the list recently?  I've taken
a first look and it seems reasonable to me.)

Using a function query, as Yonik suggests above, is another approach.
But to get a true sort, you have to boost the original query to zero?
How does this impact the results returned by the original query?  Will
the requirements (and boosts) of the original (now nested) query
remain intact, only sorted by the function?  Also, is there any way to
do this with the dismax handler?

Thanks,
- Jon

On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:
 On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote:
  Using something like this, how would the custom SortComparatorSource
  get a parameter from the request to use in sorting calculations?

 perhaps hook in via function query:
   dist(10.4,20.2,geoloc)

 And either manipulate the score with that and sort by score,

 q=+(foo bar)^0 dist(10.4,20.2,geoloc)
 sort=score asc

 or extend solr's sorting mechanisms to allow specifying a function to sort by.

 sort=dist(10.4,20.2,geoloc) asc

 -Yonik



RE: locallucene former custom-sort thread

2007-09-28 Thread Sandeep Shetty
yes i was thinking abt the same. 

i was searching for a radius of 25 miles. we get about 2500 results back for
the search. it seems like its storing all those geo results in cache and it
keeps on adding to it each time a geo request is made...

thanks for looking into it! 

Sandeep

-Original Message-
From: patrick o'leary [mailto:[EMAIL PROTECTED]
Sent: 28 September 2007 17:02
To: solr-user@lucene.apache.org
Subject: Re: locallucene former custom-sort thread


That's the latest. I was experimenting with caching, which might be the
problem.
I'll have a look, could you give me an idea of how large the radius was
and how many results were coming back.

Thanks
P

Sandeep Shetty wrote:
 Hi, i'm using local lucene, downloaded the latest zip file
 solr-example_s1.3_ls0.2.tgz

 is there a newer version available? 

 Thanks!
 Sandeep

 -Original Message-
 From: patrick o'leary [mailto:[EMAIL PROTECTED]
 Sent: 28 September 2007 16:08
 To: solr-user@lucene.apache.org
 Subject: locallucene former custom-sort thread


 Changing thread name;

 Are you using local lucene or local solr, and which version?


 P

 [EMAIL PROTECTED] wrote:
   
 i have been testing locallucene with our data for the last couple of
days.
 one issue i faced with it is during when using geo sorting is that it
 
 seems
   
 to eat up all the memory, however big and become progressively slower,
 finally after several requests (10 or so in my case) it throws up a
 java.lang.OutOfMemoryError: Java heap space error.

 is there a way to get around this?

 -Original Message-
 From: Jon Pierce [mailto:[EMAIL PROTECTED]
 Sent: 28 September 2007 15:48
 To: solr-user@lucene.apache.org
 Subject: Re: custom sorting


 Is the machinery in place to do this now (hook up a function query to
 be used in sorting)?

 I'm trying to figure out what's the best way to do a distance sort:
 custom comparator or function query.

 Using a custom comparator seems straightforward and reusable across
 both the standard and dismax handlers.  But it also seems most likely
 to impact performance (or at least require the most work/knowledge to
 get right by minimizing calculations, caching, watching out for memory
 leaks, etc.).  (Speaking of which, could anyone with more Lucene/Solr
 experience than I comment on the performance characteristics of the
 locallucene implementation mentioned on the list recently?  I've taken
 a first look and it seems reasonable to me.)

 Using a function query, as Yonik suggests above, is another approach.
 But to get a true sort, you have to boost the original query to zero?
 How does this impact the results returned by the original query?  Will
 the requirements (and boosts) of the original (now nested) query
 remain intact, only sorted by the function?  Also, is there any way to
 do this with the dismax handler?

 Thanks,
 - Jon

 On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:
   
 
 On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 
 
   
 Using something like this, how would the custom SortComparatorSource
 get a parameter from the request to use in sorting calculations?
 
   

 perhaps hook in via function query:
   dist(10.4,20.2,geoloc)

 And either manipulate the score with that and sort by score,

 q=+(foo bar)0 dist(10.4,20.2,geoloc)
 sort=score asc

 or extend solr's sorting mechanisms to allow specifying a function to
 
 sort
   
 
   
 by.
   
 
 sort=dist(10.4,20.2,geoloc) asc

 -Yonik

 
 
   
 This email is confidential and may also be privileged. If you are not the
 
 intended recipient please notify us immediately by telephoning +44 (0)20
 7452 5300 or email [EMAIL PROTECTED] You should not copy it or
use
 it for any purpose nor disclose its contents to any other person. Touch
 Local cannot accept liability for statements made which are clearly the
 sender's own and are not made on behalf of the firm.
   
 Touch Local Limited
 Registered Number: 2885607
 VAT Number: GB896112114
 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN
 +44 (0)20 7452 5300

   
 

   

-- 

Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his
tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive
them there. The only difference is that there is no cat.
  - Albert Einstein

View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile
http://www.linkedin.com/in/pjaol


Dismax and Grouping query

2007-09-28 Thread Ty Hahn
Hi,

I've tried to use grouping query on DisMaxRequestHandler without success.
When I sent grouping query in Solr Admin, I could see parens of query
escaped in 'querystring' line with debugQuery On.
Is this the cause of the failure?

e.g.
When I send query like +(lucene solr),
I can see following line in the result page.
str name=querystring+\(lucene solr\)/str

When I tried this with StandardRequestHandler, parens of the query were not
escaped. And the query was successfully answered.
Digging into the source of Solr, I could find the following line at
DisMaxRequestHandler.java.
userQuery = U.partialEscape(U.stripUnbalancedQuotes(userQuery)).toString();
And partialEscape function seems to carry out the escaping.

So... Can I carry out grouping query on DisMaxRequestHandler?
If so, should I use special character for grouping in stead of parens?

I'm pretty new on Solr. Any reply will help.
Thanks in advance.


Re: searching remote indexes

2007-09-28 Thread Venkatraman S
resending due to lack of response :
[We are using embedded solr 1.2 ]

I need a mechanism by which i can search over 3 remote indexes? Can i use
the Lucene remote apis to access the index created via Embedded solr?

-Venkat

On 9/4/07, Venkatraman S [EMAIL PROTECTED] wrote:

 Hi,

 [I am new to Solr].

 How do i search remote indexes using Solr? I am not able to find suitable
 documentation on this - can you guys guide me?

 Regards,
 Venkat

 --




--


Re: Color search

2007-09-28 Thread Steven Rowe
Hi Renaud,

I think your method will produce strange results, probably in most
cases, e.g.

33% red #FF = #55
33% green #00FF00 = #005500
33% blue #FF = #55
= #55

Thus, red, green and blue dress would score well against a search for
medium gray.  Not good.

Steve

Renaud Waldura wrote:
 Here's another idea: encode color mixes as one RGB value (32 bits) and sort
 according to those values. To find the closest color is like finding the
 closest points in the color space. It would be like a distance search.
 
 70% black #00 = 0
 20% gray #f0f0f0 = #303030
 10% brown #8b4513 = #0e0702
 = #3e3732
 
 The distance would be:
 sqrt( (r1 - r0)^2 + (g1 - g0)^2 + (b1 - b0)^2 )
 
 Where r0g0b0 is the color the user asked for, and r1g1b1 is the composite
 color of the item, calculated above.
 
 --Renaud
 
 
 -Original Message-
 From: Steven Rowe [mailto:[EMAIL PROTECTED] 
 Sent: Friday, September 28, 2007 7:14 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Color search
 
 Hi Guangwei,
 
 When you index your products, you could have a single color field, and
 include duplicates of each color component proportional to its weight.
 
 For example, if you decide to use 10% increments, for your black dress with
 70% of black, 20% of gray, 10% of brown, you would index the following terms
 for the color field:
 
   black black black black black black black
   gray gray
   brown
 
 This works because Lucene natively interprets document term frequencies as
 weights.
 
 Steve
 
 Guangwei Yuan wrote:
 Hi,

 We're running an e-commerce site that provides product search. We've 
 been able to extract colors from product images, and we think it'd be 
 cool and useful to search products by color. A product image can have 
 up to 5 colors (from a color space of about 100 colors), so we can 
 implement it easily with Solr's facet search (thanks all who've developed
 Solr).
 The problem arises when we try to sort the results by the color relevancy.
 What's different from a normal facet search is that colors are 
 weighted. For example, a black dress can have 70% of black, 20% of 
 gray, 10% of brown. A search query color:black should return results 
 in which the black dress ranks higher than other products with less
 percentage of black.
 My question is: how to configure and index the color field so that 
 products with higher percentage of color X ranks higher for query
 color:X?
 Thanks for your help!

 - Guangwei
 
 



Re: one query or multiple queries

2007-09-28 Thread Ian Lea
I'd guess the latter would be faster, but who knows?  Try it both ways.


--
Ian.


On 9/28/07, Xuesong Luo [EMAIL PROTECTED] wrote:
 Hi, there,
 I have a user index(each user has a unique index record). If I want to
 search 10 users, should I run 10 queries or 1 query with multiple user
 ids? Is there any performance difference?

 Thanks
 Xuesong






Re: Color search

2007-09-28 Thread Matthew Runo
This discussion is incredibly interesting to me! We solved this  
simply by indexing the color names, and faceting on that. Not a very  
elegant solution, to be sure - but it works. If people search for a  
green running shoe they get -green- running shoes.


I would be very very interested in having a color picker ajax app  
which then went out and found the products with colors most like the  
one you chose.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Sep 28, 2007, at 1:00 AM, Guangwei Yuan wrote:


Hi,

We're running an e-commerce site that provides product search.  
We've been
able to extract colors from product images, and we think it'd be  
cool and
useful to search products by color. A product image can have up to  
5 colors
(from a color space of about 100 colors), so we can implement it  
easily with

Solr's facet search (thanks all who've developed Solr).

The problem arises when we try to sort the results by the color  
relevancy.
What's different from a normal facet search is that colors are  
weighted. For
example, a black dress can have 70% of black, 20% of gray, 10% of  
brown. A
search query color:black should return results in which the black  
dress

ranks higher than other products with less percentage of black.

My question is: how to configure and index the color field so that  
products

with higher percentage of color X ranks higher for query color:X?

Thanks for your help!

- Guangwei




Index multiple languages with multiple analyzers with the same field

2007-09-28 Thread Wu, Daniel
Hi,
 
I know this probably has been asked before, but I was not able to find
it in the mailing list.  So forgive me if I repeated the same question.
 
We are trying to build a search application to support multiple
languages.  Users can potentially query with any language.  First
thought come to us is to index the text of all languages in the same
field using language specific analyzer.  As all the data are indexed in
the same field, it would just find results with the language that
matches the user query.
 
Looking at the Solr schema, it seems each field can have one and only
analyzer.  Is it possible to have multiple analyzers for the same field?
 
Or is there any other approaches that can achieve the same thing?
 
Daniel


Re: Index multiple languages with multiple analyzers with the same field

2007-09-28 Thread Mike Klaas

On 28-Sep-07, at 11:13 AM, Wu, Daniel wrote:


Hi,

I know this probably has been asked before, but I was not able to find
it in the mailing list.  So forgive me if I repeated the same  
question.


This thread hashes out the issues in quite a lot of detail:

http://www.nabble.com/Multi-language-indexing-and-searching- 
tf3885324.html#a11012939


-Mike


Re: searching remote indexes

2007-09-28 Thread Mike Klaas
Solr's main interface is http, so you can connect to that remotely.   
Query each machine and combine the results using you own business logic.


Alternatively, you can try out the query distribution code being  
developed in

http://issues.apache.org/jira/browse/SOLR-303

-Mike

On 28-Sep-07, at 1:59 AM, Venkatraman S wrote:


resending due to lack of response :
[We are using embedded solr 1.2 ]

I need a mechanism by which i can search over 3 remote indexes? Can  
i use

the Lucene remote apis to access the index created via Embedded solr?

-Venkat

On 9/4/07, Venkatraman S [EMAIL PROTECTED] wrote:


Hi,

[I am new to Solr].

How do i search remote indexes using Solr? I am not able to find  
suitable

documentation on this - can you guys guide me?

Regards,
Venkat

--





--




Re: Index multiple languages with multiple analyzers with the same field

2007-09-28 Thread Thom Nelson
I had the same problem, but never found a good solution.  The best 
solution is to have a more dynamic way of determining which analyzer to 
return, such as having some kind of conditional expression evalution in 
the fieldType/analyzer element, where either the document or the query 
request could be used as the comparison object.


fieldtype type=textMultiLingual class=solr.TextField
   analyzer type=query expression=request.lang == 'EN'
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StandardFilterFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.StopFilterFactory/
   filter class=solr.PorterStemFilterFactory/
   /analyzer
/fieldtype

Analyzers could still be cached by adding the expression to the cache key.

Unfortunately I have switched jobs, so I don't have the time or 
motivation to do this, but it should be a very useful addition.


- Thom

Wu, Daniel wrote:

Hi,
 
I know this probably has been asked before, but I was not able to find

it in the mailing list.  So forgive me if I repeated the same question.
 
We are trying to build a search application to support multiple

languages.  Users can potentially query with any language.  First
thought come to us is to index the text of all languages in the same
field using language specific analyzer.  As all the data are indexed in
the same field, it would just find results with the language that
matches the user query.
 
Looking at the Solr schema, it seems each field can have one and only

analyzer.  Is it possible to have multiple analyzers for the same field?
 
Or is there any other approaches that can achieve the same thing?
 
Daniel


  




Re: Color search

2007-09-28 Thread Chris Hostetter

: useful to search products by color. A product image can have up to 5 colors
: (from a color space of about 100 colors), so we can implement it easily with
: Solr's facet search (thanks all who've developed Solr).
: 
: The problem arises when we try to sort the results by the color relevancy.
: What's different from a normal facet search is that colors are weighted. For
: example, a black dress can have 70% of black, 20% of gray, 10% of brown. A

if 5 is a hard max on the number of colors that you support, then you can 
always use 5 seperate fields to store the colors in order of dominance 
and then query on those 5 fields with varying boosts...

 color_1:black^10 color_2:black^7 color_3:black^4 color_4:black 
color_5:black^0.1

...something like this will loose the % granularity info that you have (so 
a 60% black skirt and an 80% black dress would both score the same against 
black since it's hte dominant color)

alternately: i'm assuming your percentage data only has so much confidence
-- maybe on the order of 10%?.  you can have a seperate field for each 
bucket of color percentages and index the name of hte color in the 
corrisponding bucket.  with 10% granularity that's only 10 fields -- a 10 
clause boolean query for the color is no big deal ... even going to 5% 
would be trivial.


Incidently: people interested in teh general topic of color faceting at 
a finer granularity then just color names may want to check out this 
thread from last...

http://www.nabble.com/faceting-and-categorizing-on-color--tf1801106.html



-Hoss



Re: Request for graphics

2007-09-28 Thread Chris Hostetter

: I am trying to make a presentation on SOLR and have been unable to find the
: SOLR graphic in high quality.  Could someone point me in the right direction
: or provide the graphics?

you're right -- i can't find the orriginal source files for it in subversion.

I think i know who made it (here at CNET) I'll ping him and see if i can 
get the orriginal source files and get them into subversion so alternate 
resolutions can be generated.


-Hoss



Schema version question

2007-09-28 Thread Robert Purdy

I was wondering if anyone could help me, I just completed a full index of my
data (about 4 million documents) and noticed that when I was first setting
up the schema I set the version number to 1.2 thinking that solr 1.2 uses
schema version 1.2... ugh... so I am wondering if I can just set the schema
to 1.1 without having to rebuild the full index? I ask because I am hoping
that given an invalid schema version number, that version 1.0 is not used by
default and all my fields are now mulitvalued. Any help would be greatly
appreciated. Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/Schema-version-question-tf4536802.html#a12948588
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Color search

2007-09-28 Thread Mike Klaas


On 28-Sep-07, at 6:31 AM, Grant Ingersoll wrote:

Another option would be to extend Solr (and donate back) to  
incorporate Lucene's payload functionality, in which case you could  
associate the percentile of the color as a payload and use the  
BoostingTermQuery... :-)  If you're interested in this, a  
discussion on solr-dev is probably warranted to figure out the best  
way to do this.


For reference, here is a summary of the changes needed:

1. A payload analyzer (here is an example that tokenizes strings of  
token:whatever:score into token with payload score:


  /** Returns the next token in the stream, or null at EOS. */
  public final Token next() throws IOException {
Token t = input.next();
if (null == t)
  return null;

String s = t.termText();
if(s.indexOf(:)  -1 ) {
  String []parts = s.split(:);
  assert parts.length == 3;
  String colour = parts[0];
  int bits = Float.floatToIntBits(Float.parseFloat(parts[1]));
  byte []buf = new byte[4];
  for(int shift=0, i=0; shift  32; shift += 8, i++) {
buf[i] = (byte)( (bitsshift)  0xff );
  }
  Token gen = new Token(colour, t.startOffset(), t.endOffset());
  gen.setPayload(new Payload(buf));
  t = gen;
}
return t;

  }


2. A payload deserializer.  Add this method to your custom Similarity  
class:


  public float scorePayload(byte [] payload, int offset, int length) {
assert length == 4;
int accum = ((payload[0+offset]0xff)) |
((payload[1+offset]0xff)8) |
((payload[2+offset]0xff)16)  |
((payload[3+offset]0xff)24);
return Float.intBitsToFloat(accum);
 }

3. Add a relevant query clause.  In a custom request handler, you  
could have a parameter to add BoostingTermQueries:


 q= new BoostingTermQuery(new Term(colourPayload, colour))
query.add(q, Occur.SHOULD);

How to add this generically is an interesting question.  There are  
many possibilities, especially on the request handler and tokenizer  
side of things.  If there is a consensus on a sensible way of doing  
this, I could contribute the bits of code that I have.


HTH,
-Mike



Re: small rsync index question

2007-09-28 Thread Yonik Seeley
On 9/28/07, Brian Whitman [EMAIL PROTECTED] wrote:
 For some reason sending a
 commit/ is not refreshing the index

It should... are there any errors in the logs?  do you see the commit
in the logs?
Check the stats page to see info about when the current searcher was
last opened too.

-Yonik


Re: Schema version question

2007-09-28 Thread Yonik Seeley
On 9/28/07, Robert Purdy [EMAIL PROTECTED] wrote:
 I was wondering if anyone could help me, I just completed a full index of my
 data (about 4 million documents) and noticed that when I was first setting
 up the schema I set the version number to 1.2 thinking that solr 1.2 uses
 schema version 1.2... ugh... so I am wondering if I can just set the schema
 to 1.1 without having to rebuild the full index? I ask because I am hoping
 that given an invalid schema version number, that version 1.0 is not used by
 default and all my fields are now mulitvalued. Any help would be greatly
 appreciated. Thanks in advance

Yes, it should be OK to set it back to 1.1 w/o reindexing.
The index format does not differentiate between single and
multi-valued fields so you should be fine there.

-Yonik


Re: Request for graphics

2007-09-28 Thread Yonik Seeley
On 9/28/07, Clay Webster [EMAIL PROTECTED] wrote:
 i'm late for dinner out, so i'm just attaching it here.

Most attachments are stripped :-)

-Yonik


RE: Index multiple languages with multiple analyzers with the same field

2007-09-28 Thread Lance Norskog
Other people custom-create a separate dynamic field for each language they
want to support.  The spellchecker in Solr 1.2 wants just one field to use
as its word source, so this fits. 

We have a more complex version of this problem: we have content with both
English and other languages. Searching is one problem; we also want to have
spelling correction dictionaries for each language. We have many world
languages which need very different handling and semantics, like CJK
processing. We will have to use the multiple-field trick; I don't think we
can shoehorn our complexity into this technique. It is a valiant effort,
though.

It's possible we could separate out the different-language words in the
document, put them each in separate words_en_text, word_sp_text, etc. and
make the default search field out of 
copyField source=*_text dest=defaultText/
Hmm.

Lance

-Original Message-
From: Thom Nelson [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 28, 2007 12:07 PM
To: solr-user@lucene.apache.org; [EMAIL PROTECTED]
Subject: Re: Index multiple languages with multiple analyzers with the same
field

I had the same problem, but never found a good solution.  The best solution
is to have a more dynamic way of determining which analyzer to return, such
as having some kind of conditional expression evalution in the
fieldType/analyzer element, where either the document or the query request
could be used as the comparison object.

fieldtype type=textMultiLingual class=solr.TextField
analyzer type=query expression=request.lang == 'EN'
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldtype

Analyzers could still be cached by adding the expression to the cache key.

Unfortunately I have switched jobs, so I don't have the time or motivation
to do this, but it should be a very useful addition.

- Thom

Wu, Daniel wrote:
 Hi,
  
 I know this probably has been asked before, but I was not able to find 
 it in the mailing list.  So forgive me if I repeated the same question.
  
 We are trying to build a search application to support multiple 
 languages.  Users can potentially query with any language.  First 
 thought come to us is to index the text of all languages in the same 
 field using language specific analyzer.  As all the data are indexed 
 in the same field, it would just find results with the language that 
 matches the user query.
  
 Looking at the Solr schema, it seems each field can have one and only 
 analyzer.  Is it possible to have multiple analyzers for the same field?
  
 Or is there any other approaches that can achieve the same thing?
  
 Daniel

   



Re: custom sorting

2007-09-28 Thread Chris Hostetter

:  Using something like this, how would the custom SortComparatorSource
:  get a parameter from the request to use in sorting calculations?

in general: you wouldn't you would have to specify all options as init 
params for the FieldType -- which makes it pretty horrible for distance 
calculations, and isn't something i considered when i posted that.

the only way i can think of that you can really solve the problem with a 
plugin at the moment (without some serious internal changes that yonik 
describes below) would be to use a dynamicField when you want geodistance 
sort, and encode the center lat/lon point in the field name, ala:

   sort=geodist_-124.75_93.45

: or extend solr's sorting mechanisms to allow specifying a function to sort by.
: 
: sort=dist(10.4,20.2,geoloc) asc

thta would in fact, kick ass.  even if there is a better solution for the 
distance stuff the idea of being able to specify a raw function as a sort 
would be pretty sick. (NOTE: that's sick as in so good it's amazing 
... since the last person i used that idiom with didn't understand and 
thought i ment bad)



-Hoss



Re: Color search

2007-09-28 Thread Guangwei Yuan
Thanks for all the replies. I think creating 10 fields and feeding each
field with a color's value for 10% from that color is a reasonable approach,
and easy to implement too. One problem though, is that not all products have
a total of 100% colors (due to various reasons including our color
extraction algorithm, etc.) So, for a product with 50% of #00, and 20%
of #99, I'll have to fill the remaining three fields with some dummy
values. Otherwise, Lucene seems to score it higher than products that also
have 50% of #00, but more than 20% of some other colors. Since I also
need a way to exclude the dummy value when faceting, is there a neater
solution?

I'll certainly look at the payload functionality, which is new to me :)

- Guangwei


Re: Dismax and Grouping query

2007-09-28 Thread Chris Hostetter

: I've tried to use grouping query on DisMaxRequestHandler without success.

: e.g.
: When I send query like +(lucene solr),
: I can see following line in the result page.
: str name=querystring+\(lucene solr\)/str

the dismax handler does not consider parens to be special characters.  if 
it did, it's not clear what the semantics would be of a query like...
q=A +(B C)qf=X Y Z
..when building the query structure ... what happens if X:B exists and Y:C 
exists? is that considered a match?

Generally, the mm param is used to indicate how many of the query terms 
(that don't have a + or - prefix) are required, or you can explicitly 
require/prohibit a term using + or -, but there is no way to require that 
one of N sub terms is required (prohibiting any of N sub terms is easy, 
just prohibit them all individually)



-Hoss



Re: custom sorting

2007-09-28 Thread Chris Hostetter

: leaks, etc.).  (Speaking of which, could anyone with more Lucene/Solr
: experience than I comment on the performance characteristics of the
: locallucene implementation mentioned on the list recently?  I've taken
: a first look and it seems reasonable to me.)

i cna't speak for anyone else, but i haven't had a chacne to drill into it 
yet.

: Using a function query, as Yonik suggests above, is another approach.
: But to get a true sort, you have to boost the original query to zero?

or a very close approximation there of (0.01 perhaps)

keep in mind: a true distance sort while easy to explain may not be as 
useful as a sort by score where the distance is factored into the score 
... there have been some threads about this on the java-user list in the 
past and it's been discussed that a really relevant result 2 miles away is 
probably better then a mildly relevent result 1.5 miles away ... that's 
where a function query with well choosen boosts might serve you better.

: How does this impact the results returned by the original query?  Will
: the requirements (and boosts) of the original (now nested) query
: remain intact, only sorted by the function?  Also, is there any way to

it should ... but i won't swear to that.

: do this with the dismax handler?

a strict sort on the value of a a function?  put the function in the bf
param, don't bother with bq or pf params and change your qf params to all 
have really small boosts.



-Hoss



Re: small rsync index question

2007-09-28 Thread Chris Hostetter

: To completely remove the window of inconsistency, comment out the
: post-commit hook in solrconfig.xml that takes a snapshot, then send a
: commit to get a new snapshot and rsync from that.

i think yonik ment UN-comment the postCommit hook in the example 
solrconfig.xml.


-Hoss