adjusting score slightly by date field

2007-05-09 Thread mike topper

Hello,

In our application there are a lot of old records that we still want in 
our index but would like for them to be scored lower than some newer 
records.


Is it possible for a date field to weigh in on the score slightly in 
some way?  Or if not is there another way to push up newer records in 
the order of results while still maintaining the scoring?


-Mike


Re: Solr Update Handler Failes with Some Doc Characters

2007-05-09 Thread Brian Whitman


I see that the update handler fails even if the character is NOT  
right next to XML closing tag. If the character is anywhere in any  
of the XML tags, the update handler fails to parse the XML.





Does posting the utf8-example in the exampledocs directory work?




Re: Solr Update Handler Failes with Some Doc Characters

2007-05-09 Thread [EMAIL PROTECTED]
Hi,

I specify the following encoding when POSTING the data to Solr:

text/xml; charset=utf-8

The encoding of the actual XML is also UTF-8.

I see that the update handler fails even if the character is NOT right next to 
XML closing tag. If the character is anywhere in any of the XML tags, the 
update handler fails to parse the XML.

Thanks,
Av

- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, May 9, 2007 10:45:43 AM
Subject: Re: Solr Update Handler Failes with Some Doc Characters


On 5/9/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 I run the example using Jetty on Windows 2003 machine. When I submit some 
 documents containing upper ASCII characters, Solr update handler fails with 
 an XML parsing error saying that it encountered an EOF before the closing 
 tags.

Normally if there is a charset mixup, you will just get weird looking results.
I suppose that if a char that is greater than 128 is used, and Solr is
treating as UTF-8, then the following char would be treated as part of
a single multibyte character.  Hence if the char is directly followed
by XML markup, part of that XML markup will be lost (hence the parse
exception).

In short, this is probably a char encoding issue.  What character
encoding are you using when posting to Solr, and is it declared in the
HTTP header?

-Yonik


 

Bored stiff? Loosen up... 
Download and play hundreds of games for free on Yahoo! Games.
http://games.yahoo.com/games/front

Re: Solr Update Handler Failes with Some Doc Characters

2007-05-09 Thread Koji Sekiguchi
I'm not sure this is the case, but did you use CDATA section in your XML?
Or try to use character reference to represent copyright symbol.
I believe it is copy; or #169; .

Hope this helps,

Koji



[EMAIL PROTECTED] wrote:
 Hi,

 I run the example using Jetty on Windows 2003 machine. When I submit some 
 documents containing upper ASCII characters, Solr update handler fails with 
 an XML parsing error saying that it encountered an EOF before the closing 
 tags.

 The XML is perfectly correct and is using utf-8 encoding. It is generated 
 using XmlWriter from C#. When viewing the XML in a browser it parses and 
 displayes properly.

 For exampe, Solr breaks on the copyright symbol (c).

 Is there some configuration setting that I need to change to make sure it is 
 able to parse this documents correctly?

 Thank you in advance!
 Av

 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around 
 http://mail.yahoo.com 
   



Re: Solr Update Handler Failes with Some Doc Characters

2007-05-09 Thread [EMAIL PROTECTED]
Hi,

I tried CDATA. It fails the same way. I will try to check of the 
utf8-example.xml works ok (I just have to change it to match my schema).

I just ran a test by adding (R) symbolt into the XML to get the exact error 
message. See below.

Thanks,
Av

*** SUBMITTED REQUEST *** (as captured by HTTP proxy)

POST /solr/update HTTP/1.1
Content-Type: text/xml; charset=utf-8
Host: ws2006b:8983
Content-Length: 695
Expect: 100-continue
Proxy-Connection: Close
?xml version=1.0 encoding=UTF-8?
add
doc
  field name=id1000194/field
  field name=urlhttp://www®barharborinn®com/field
  field name=titleBar Harbor Hotels and Bar Harbor Inn near Acadia in Bar 
Harbor ME/field
  field name=metaDescriptionBar Harbor Inn- premier oceanfront hotel in Bar 
Harbor ME® Rated Superior First Class by OHG, Best in-town location®Special 
Value Packages-fine dining-personal service-amenities®/field
  field name=metaKeywordsBar Harbor Inn- premier oceanfront hotel in Bar 
Harbor ME® Rated Superior First Class by OHG, Best in-town location®Special 
Value Packages-fine dining-personal service-amenities®/field
/doc

*** ERROR MESSAGE ***

html
head
titleError 500 no more data available - expected end tag lt;/addgt; to 
close start tag lt;addgt; from line 2, parser stopped on END_TAG seen ...e 
Packages-fine dining-personal 
service-amenities\uaelt;/fieldgt;\r\nlt;/docgt;... @9:7
java.io.EOFException: no more data available - expected end tag lt;/addgt; to 
close start tag lt;addgt; from line 2, parser stopped on END_TAG seen ...e 
Packages-fine dining-personal 
service-amenities\uaelt;/fieldgt;\r\nlt;/docgt;... @9:7
at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:3015)
at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1144)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:159)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:188)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:156)
at 
org.mortbay.jetty.servlet.WebApplicationHandler$CachedChain.doFilter(WebApplicationHandler.java:821)
at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:471)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1530)
at 
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:633)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1482)
at org.mortbay.http.HttpServer.service(HttpServer.java:909)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:820)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:986)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:837)
at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:245)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
/title
/head
body
h2HTTP ERROR: 500/h2preno more data available - expected end tag 
lt;/addgt; to close start tag lt;addgt; from line 2, parser stopped on 
END_TAG seen ...e Packages-fine dining-personal 
service-amenities\uaelt;/fieldgt;\r\nlt;/docgt;... @9:7
java.io.EOFException: no more data available - expected end tag lt;/addgt; to 
close start tag lt;addgt; from line 2, parser stopped on END_TAG seen ...e 
Packages-fine dining-personal 
service-amenities\uaelt;/fieldgt;\r\nlt;/docgt;... @9:7
at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:3015)
at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1144)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:159)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:188)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:156)
at 
org.mortbay.jetty.servlet.WebApplicationHandler$CachedChain.doFilter(WebApplicationHandler.java:821)
at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:471)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568)
at 

Re: Solr Update Handler Failes with Some Doc Characters

2007-05-09 Thread Yonik Seeley

On 5/9/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

add
doc
  field name=id1000194/field
  field name=urlhttp://www(r)barharborinn(r)com/field
  field name=titleBar Harbor Hotels and Bar Harbor Inn near Acadia in Bar Harbor 
ME/field
  field name=metaDescriptionBar Harbor Inn- premier oceanfront hotel in Bar 
Harbor ME(r) Rated Superior First Class by OHG, Best in-town location(r)Special Value Packages-fine 
dining-personal service-amenities(r)/field
  field name=metaKeywordsBar Harbor Inn- premier oceanfront hotel in Bar Harbor 
ME(r) Rated Superior First Class by OHG, Best in-town location(r)Special Value Packages-fine 
dining-personal service-amenities(r)/field
/doc

*** ERROR MESSAGE ***

html
head
titleError 500 no more data available - expected end tag /add to close 
start tag


That seems to be the problem... where is the /add?

-Yonik


date range search

2007-05-09 Thread Will Johnson
does solr support date range searching?  i've tried all the examples on
the lucene site as well as using the solr response format and a few
others that seemed nifty but so far I always get query parsing errors.
i know i can easily convert the dates to ints and do ranges that way but
all the documentation seemed to imply it was possibly to do range
searches on real dates, it just didn't give an example that i could see.

 

- will



Re: Index corruptions?

2007-05-09 Thread Yonik Seeley

On 5/7/07, Tom Hill [EMAIL PROTECTED] wrote:

Is the cp-lr in snapshot really guaranteed to be atomic? Or is it just
fast, and unlikely to be interrupted?


It's called from Solr within a synchronized context, and it's
guaranteed that no index changes (via Solr at least) will happen
concurrently.

-Yonik


Dismax Config?

2007-05-09 Thread Matthew Runo
I'd love to see some explanation of what's going on here, and how to  
configure it for my own use. I've changed the fields to match my own  
columns, but it'd be great if I could actually understand it..


243   requestHandler name=dismax  
class=solr.DisMaxRequestHandler 

244 lst name=defaults
245  str name=echoParamsexplicit/str
246  float name=tie0.01/float
247  str name=qf
248 text^0.5 description^1.0 name^5.0 style_id^1.5  
product_id^10.0 brand^4.1 product_type^1.4

249  /str
250  str name=pf
251 text^0.2 description^1.1 name^1.5 brand^1.4  
brandexact^1.9

252  /str
253  str name=bf
254 ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3
255  /str
256  str name=fl
257 product_id,name,price,score
258  /str
259  str name=mm
260 2lt;-1 5lt;-2 6lt;90%
261  /str
262  int name=ps100/int
263 /lst
264   /requestHandler

Thank you!

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++




Re: Dismax Config?

2007-05-09 Thread Ryan McKinley


check:
http://wiki.apache.org/solr/DisMaxRequestHandler

For now, most of the docs for dismax are in the javadocs:
http://lucene.apache.org/solr/api/org/apache/solr/request/DisMaxRequestHandler.html



Matthew Runo wrote:
I'd love to see some explanation of what's going on here, and how to 
configure it for my own use. I've changed the fields to match my own 
columns, but it'd be great if I could actually understand it..


243   requestHandler name=dismax class=solr.DisMaxRequestHandler 
244 lst name=defaults
245  str name=echoParamsexplicit/str
246  float name=tie0.01/float
247  str name=qf
248 text^0.5 description^1.0 name^5.0 style_id^1.5 
product_id^10.0 brand^4.1 product_type^1.4

249  /str
250  str name=pf
251 text^0.2 description^1.1 name^1.5 brand^1.4 brandexact^1.9
252  /str
253  str name=bf
254 ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3
255  /str
256  str name=fl
257 product_id,name,price,score
258  /str
259  str name=mm
260 2lt;-1 5lt;-2 6lt;90%
261  /str
262  int name=ps100/int
263 /lst
264   /requestHandler

Thank you!

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++







Re: Dismax Config?

2007-05-09 Thread Matthew Runo
Perfect! I had seen the wiki, but did not visit the class page since  
I am using Perl.


What is slop? heh

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On May 9, 2007, at 11:00 AM, Ryan McKinley wrote:



check:
http://wiki.apache.org/solr/DisMaxRequestHandler

For now, most of the docs for dismax are in the javadocs:
http://lucene.apache.org/solr/api/org/apache/solr/request/ 
DisMaxRequestHandler.html




Matthew Runo wrote:
I'd love to see some explanation of what's going on here, and how  
to configure it for my own use. I've changed the fields to match  
my own columns, but it'd be great if I could actually understand it..
243   requestHandler name=dismax  
class=solr.DisMaxRequestHandler 

244 lst name=defaults
245  str name=echoParamsexplicit/str
246  float name=tie0.01/float
247  str name=qf
248 text^0.5 description^1.0 name^5.0 style_id^1.5  
product_id^10.0 brand^4.1 product_type^1.4

249  /str
250  str name=pf
251 text^0.2 description^1.1 name^1.5 brand^1.4  
brandexact^1.9

252  /str
253  str name=bf
254 ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3
255  /str
256  str name=fl
257 product_id,name,price,score
258  /str
259  str name=mm
260 2lt;-1 5lt;-2 6lt;90%
261  /str
262  int name=ps100/int
263 /lst
264   /requestHandler
Thank you!
++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++






Re: Dismax Config?

2007-05-09 Thread Matthew Runo
Ah hah! After doing some research, slop is a fun term for how sloppy  
a match SOLR will make.


Eg, slop = 0, means that only exact matches will work. Slop = 1 means  
that they can be off by one word... etc


Yes?

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On May 9, 2007, at 11:19 AM, Matthew Runo wrote:

Perfect! I had seen the wiki, but did not visit the class page  
since I am using Perl.


What is slop? heh

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On May 9, 2007, at 11:00 AM, Ryan McKinley wrote:



check:
http://wiki.apache.org/solr/DisMaxRequestHandler

For now, most of the docs for dismax are in the javadocs:
http://lucene.apache.org/solr/api/org/apache/solr/request/ 
DisMaxRequestHandler.html




Matthew Runo wrote:
I'd love to see some explanation of what's going on here, and how  
to configure it for my own use. I've changed the fields to match  
my own columns, but it'd be great if I could actually understand  
it..
243   requestHandler name=dismax  
class=solr.DisMaxRequestHandler 

244 lst name=defaults
245  str name=echoParamsexplicit/str
246  float name=tie0.01/float
247  str name=qf
248 text^0.5 description^1.0 name^5.0 style_id^1.5  
product_id^10.0 brand^4.1 product_type^1.4

249  /str
250  str name=pf
251 text^0.2 description^1.1 name^1.5 brand^1.4  
brandexact^1.9

252  /str
253  str name=bf
254 ord(poplarity)^0.5 recip(rord(price),1,1000,1000) 
^0.3

255  /str
256  str name=fl
257 product_id,name,price,score
258  /str
259  str name=mm
260 2lt;-1 5lt;-2 6lt;90%
261  /str
262  int name=ps100/int
263 /lst
264   /requestHandler
Thank you!
++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++








Re: Dismax Config?

2007-05-09 Thread Yonik Seeley

On 5/9/07, Matthew Runo [EMAIL PROTECTED] wrote:

Ah hah! After doing some research, slop is a fun term for how sloppy
a match SOLR will make.

Eg, slop = 0, means that only exact matches will work. Slop = 1 means
that they can be off by one word... etc

Yes?


All terms must appear, but the positions can be off.  It's called a
sloppy phrase query, or proximity query.  It's actually based on how
many moves need to be made to get the tokens to match in the correct
positions.

Example:
a b~1  will match fields with a b, a x b, or b a, but not b x a.
The last would require a slop of 2

-Yonik


Re: Dismax Config?

2007-05-09 Thread Chris Hostetter
: Example:
: a b~1  will match fields with a b, a x b, or b a, but not b x a.
: The last would require a slop of 2

also note that there are two slop params in the DisMax handler ... qs
refers to how much slop' will be used when querying the qf fields if the
user actually types in a query string containing quotes.  ie, if hte user
types in...

Chris Hostetter Solr

that will create a big complex DisjunctionMaxQuery across all of hte qf
fields for the term Solr and the phrase Chris Hostetter ... and the qs
param will determine how much slop is allowed for the qs field

ps refers to the amount of slop that will be used on the artificially
constructed phrase query used to boost the scores of documents that
match the entire query string as a single phrase on any of the pf
fields.  in the previous example, Chris Hostetter Solr as a single
phrase will be queried across all of the pf fields with ps slop, and
any matches will get their overall scores increased.

As a general rules, you probably want qs to be small since it affects how
loose your matching will be, while ps can be quite large (because it's
only increasing the scores of existing matches, and regardless of the ps
value, looser matches will sore lower then tighter matches).  qs exists
mainly to deal with situations where you know there might be a small
offset between terms that you would otherwise consider sequential (ie: due
to synonym injection, or stop word removal)



-Hoss



Re: Look ahead queries

2007-05-09 Thread Yonik Seeley

You could perhaps use faceting to do this for single terms.
Set the base query to whatever you want (or *:* for everything)
Then use facet.field=textfacet.prefix=foo

If you indexed field values as strings like fuel consumption
(instead of breaking it up into tokens) then you could get your
phrases but phrase detection is not automatic.

-Yonik

On 5/3/07, Ge, Yao (Y.) [EMAIL PROTECTED] wrote:

I am planning to develop look ahead queries with Solr so that as user
type query terms a list of related terms is shown in a popup window
(similar to Google suggest).


Index Concurrency

2007-05-09 Thread joestelmach

Hello,

I'm a bit new to search indexing and I'm hoping some of you here can help me
with an e-mail application I'm working on.  I have a mail retrieval program
that accesses multiple POP accounts in parallel, and parses each message
into a database.  I would like to add a new document to a solr index each
time I process a message.

My first intuition is to give each user their own index. My thinking here is
that querying would be faster (since each user's index would be much smaller
than one big index,) and, more importantly, that I would dodge any
concurrency issues stemming from multiple threads trying to update the same
index simultaneously.  I realize that Lucene implements a locking mechanism
to protect against concurrent access, but I seem to hit the lock access
timeout quite easily with only a couple threads.

After looking at solr, I would really like to take advantage of the many
features it adds to Lucene, but it doesn't look like I'll be able to achieve
multiple indexes.

Am I completely off in thinking that I need multiple indexes?  Is there some
best practice for this sort of thing that I haven't stumbled upon?

Any advice would be greatly appreciated.

Thanks,
Joe
-- 
View this message in context: 
http://www.nabble.com/Index-Concurrency-tf3718634.html#a10403918
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facet only support english?

2007-05-09 Thread Yonik Seeley

On 5/5/07, James liu [EMAIL PROTECTED] wrote:

Expect it to support other language like chinese.

maybe solr facet can config like this when it support other language.

str name=facet.querytitle:诺基亚/str


solrconfig.xml is a normal XML document.  It currently starts off with
?xml version=1.0?
which has no char encoding specified and the XML parser may default to
something you don't want.

If you are saving the file in UTF-8 format, then try changing the
first line to be this:
?xml version=1.0 encoding=UTF-8?

-Yonik


Re: Facet only support english?

2007-05-09 Thread Yonik Seeley

On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote:

If you are saving the file in UTF-8 format, then try changing the
first line to be this:
?xml version=1.0 encoding=UTF-8?


We should probably change the example solrconfig.xml and schema.xml to
be UTF-8 by default.  Any objections?

-Yonik


Re: Facet only support english?

2007-05-09 Thread Mike Klaas

On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote:

On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote:
 If you are saving the file in UTF-8 format, then try changing the
 first line to be this:
 ?xml version=1.0 encoding=UTF-8?

We should probably change the example solrconfig.xml and schema.xml to
be UTF-8 by default.  Any objections?


No--I'm not sure that it'll bring clarity for anyone who isn't aware
of xml encoding issues, but I can't see it hurting.

-Mike


Re: Index Concurrency

2007-05-09 Thread Yonik Seeley

On 5/9/07, joestelmach [EMAIL PROTECTED] wrote:

My first intuition is to give each user their own index. My thinking here is
that querying would be faster (since each user's index would be much smaller
than one big index,) and, more importantly, that I would dodge any
concurrency issues stemming from multiple threads trying to update the same
index simultaneously.  I realize that Lucene implements a locking mechanism
to protect against concurrent access, but I seem to hit the lock access
timeout quite easily with only a couple threads.

After looking at solr, I would really like to take advantage of the many
features it adds to Lucene, but it doesn't look like I'll be able to achieve
multiple indexes.


No, not currently.  Start your implementation with just a single
index... unless it is very large, it will likely be fast enough.

Solr also handles all the concurrency issues, and you should never hit
lock access timeout when updating from multiple threads.

-Yonik


Re: Facet only support english?

2007-05-09 Thread Ryan McKinley

Yonik Seeley wrote:

On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote:

If you are saving the file in UTF-8 format, then try changing the
first line to be this:
?xml version=1.0 encoding=UTF-8?


We should probably change the example solrconfig.xml and schema.xml to
be UTF-8 by default.  Any objections?



I'm for it...

but if the xml parser uses getReader() does it make any difference?


Re: Facet only support english?

2007-05-09 Thread Yonik Seeley

On 5/9/07, Ryan McKinley [EMAIL PROTECTED] wrote:

Yonik Seeley wrote:
 We should probably change the example solrconfig.xml and schema.xml to
 be UTF-8 by default.  Any objections?


I'm for it...

but if the xml parser uses getReader() does it make any difference?


For Solr's XML config files, DocumentBuilder.parse(InputStream) is
called, so we don't construct a reader first.

-Yonik


Re: Facet only support english?

2007-05-09 Thread Koji Sekiguchi

+1 on explicit encoding declarations.

Yonik Seeley wrote:

On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote:

If you are saving the file in UTF-8 format, then try changing the
first line to be this:
?xml version=1.0 encoding=UTF-8?


We should probably change the example solrconfig.xml and schema.xml to
be UTF-8 by default.  Any objections?

-Yonik





Re: Facet only support english?

2007-05-09 Thread Yonik Seeley

+1 on explicit encoding declarations.


Done  (even though it really wasn't needed since there were no int'l
chars in the example).

As Mike points out, it only marginally helps... if the user adds
international chars to the config and saves it as something other than
UTF-8 they are still hosed.  At least UTF-8 is a better default than
something like latin-1 though.

-Yonik


Re: Facet only support english?

2007-05-09 Thread Mike Klaas

On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote:

 +1 on explicit encoding declarations.

Done  (even though it really wasn't needed since there were no int'l
chars in the example).

As Mike points out, it only marginally helps... if the user adds
international chars to the config and saves it as something other than
UTF-8 they are still hosed.  At least UTF-8 is a better default than
something like latin-1 though.


I thought that conformant parsers use UTF-8 as the default anyway:

http://www.w3.org/TR/REC-xml/#charencoding

-Mike


Re: Facet only support english?

2007-05-09 Thread Walter Underwood
I didn't remember that requirement, so I looked it up. It was added
in XML 1.0 2nd edition. Originally, unspecified encodings were open
for auto-detection.

Content type trumps encoding declarations, of course, per RFC 3023
and allowed by the XML spec.

wunder

On 5/9/07 4:19 PM, Mike Klaas [EMAIL PROTECTED] wrote:

 I thought that conformant parsers use UTF-8 as the default anyway:
 
 http://www.w3.org/TR/REC-xml/#charencoding
 
 -Mike



Re: Solr Sorting, merging/weighting sort fields

2007-05-09 Thread Walter Underwood
No problem. Use a boost function. In a DisMaxRequestHandler spec
in solrconfig.xml, specify this:

  str name=bf
 popularity^0.5
  /str

This value will be added to the score before ranking.

You will probably need to fuss with the multiplier to get the popularity
to the right proportion of the total score. I find it handy to return the
score and the popularity value and look over a few test queries to adjust
that.

wunder

On 5/9/07 4:58 PM, Nick Jenkin [EMAIL PROTECTED] wrote:

 Hi all,
 
 I have a popularity field in my solr index, this field is a popularity
 rating of a particular product (based on the number of product views
 etc).
 
 I want to be able to integrate this number into the search result
 sorting such that a product with a higher popularity rating is ranking
 higher in the search results.
 
 I can always do:
 title:(harry potter); popularity desc, score desc;
 
 In this example I will say be searching for harry potter books,
 obviously the latest book has a very high popularity rating, but lets
 say a completely unrelated product with a high popularity also matches
 for harry potter and gets into the search results (but has a very low
 score value), this will bring it to the top of the results.
 
 I want to be able to weight popularity in such a way that it boosts
 the score, but will not greatly affect the search results.
 
 Is this possible?
 
 Thanks



Re: facet.sort does not work in python output

2007-05-09 Thread Yonik Seeley

On 5/3/07, Mike Klaas [EMAIL PROTECTED] wrote:

On 5/3/07, Jack L [EMAIL PROTECTED] wrote:
 The Python output uses nested dictionaries for facet counts.

This might be fixed in the future


It's fixed in the current development version (future 1.2), already.
See http://wiki.apache.org/solr/SolJSON
which is the base for both Python and Ruby.

The default is json.nl=flat which results in alternating term and
count in a flat array.

facet_fields:{
   cat:[
electronics,3,
card,2,
graphics,2,
music,1]}},

-Yonik


Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

2007-05-09 Thread Sean Timm




Yes, for good (hopefully)
or bad.

-Sean

Shridhar Venkatraman wrote on 5/7/2007, 12:37 AM:


Interesting..
Surrogates can also bring the searcher's subjectivity (opinion and
context) into it by the learning process ?
shridhar
  
Sean Timm wrote:
  
   It may not be easy or even possible
without major changes, but having
global collection statistics would allow scores to be compared across
searchers. To do this, the master indexes would need to be able to
communicate with each other.

An other approach to merging across searchers is described here:
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, Greg Pass, Ophir
Frieder, 
"Surrogate Scoring for Improved Metasearch Precision" , Proceedings
of the 2005 ACM Conference on Research and Development in Information
Retrieval (SIGIR-2005), Salvador, Brazil, August 2005.

-Sean

[EMAIL PROTECTED] wrote: 

On 4/11/07, Chris
Hostetter [EMAIL PROTECTED]
wrote: 
  
  

A custom Similaity class with simplified tf, idf, and queryNorm
functions 
might also help you get scores from the Explain method that are more 
easily manageable since you'll have predictible query structures hard 
coded into your application. 

ie: run the large query once, get the results back, and for each result

look at the explanation and pull out the individual pieces of hte 
explanation and compare them with those of hte other matches to create 
your own "normalization". 

   
  
Chuck Williams mentioned a proposal he had for normalization of scores
that 
would give a constant score range that would allow comparison of
scores. 
Chuck, did you ever write any code to that end or was it just
algorithmic 
discussion? 
  
Here is the point I'm at now: 
  
I have my matching engine working. The fields to be indexed and the
queries 
are defined by the user. Hoss, I'm not sure how that affects your idea
of 
having a custom Similarity class since you mentioned that having
predictable 
query structures was important... 
The user kicks off an indexing then defines the queries they want to
try 
matching with. Here is an example of the query fragments I'm working
with 
right now: 
year_str:"${Year}"^2 year_str:[${Year -1} TO ${Year +1}] 
title_title_mv:"${Title}"^10 title_title_mv:${Title}^2 
+(title_title_mv:"${Title}"~^5 title_title_mv:${Title}~) 
director_name_mv:"${Director}"~2^10 director_name_mv:${Director}^5 
director_name_mv:${Director}~.7 
  
For each item in the source feed, the variables are interpolated (the
query 
term is transformed into a grouped term if there are multiple values
for a 
variable). That query is then made to find the overall best match. 
I then determine the relevance for each query fragment. I haven't
written 
any plugins for Lucene yet, so my current method of determining the 
relevance is by running each query fragment by itself then iterating
through 
the results looking to see if the overall best match is in this result
set. 
If it is, I record the rank and multiply that rank (e.g. 5 out of 10)
by a 
configured fragment weight. 
  
Since the scores aren't normalized, I have no good way of determining a
poor 
overall match from a really high quality one. The overall item could be
the 
first item returned in each of the query fragments. 
  
Any help here would be very appreciated. Ideally, I'm hoping that maybe
  
Chuck has a patch or plugin that I could use to normalize my scores
such 
that I could let the user do a matching run, look at the results and 
determine what score threshold to set for subsequent runs. 
  
Thanks, 
Daniel 
  
  

  





Re: Solr Sorting, merging/weighting sort fields

2007-05-09 Thread Nick Jenkin

Thanks, worked perfectly!
-Nick

On 5/10/07, Walter Underwood [EMAIL PROTECTED] wrote:

No problem. Use a boost function. In a DisMaxRequestHandler spec
in solrconfig.xml, specify this:

  str name=bf
 popularity^0.5
  /str

This value will be added to the score before ranking.

You will probably need to fuss with the multiplier to get the popularity
to the right proportion of the total score. I find it handy to return the
score and the popularity value and look over a few test queries to adjust
that.

wunder

On 5/9/07 4:58 PM, Nick Jenkin [EMAIL PROTECTED] wrote:

 Hi all,

 I have a popularity field in my solr index, this field is a popularity
 rating of a particular product (based on the number of product views
 etc).

 I want to be able to integrate this number into the search result
 sorting such that a product with a higher popularity rating is ranking
 higher in the search results.

 I can always do:
 title:(harry potter); popularity desc, score desc;

 In this example I will say be searching for harry potter books,
 obviously the latest book has a very high popularity rating, but lets
 say a completely unrelated product with a high popularity also matches
 for harry potter and gets into the search results (but has a very low
 score value), this will bring it to the top of the results.

 I want to be able to weight popularity in such a way that it boosts
 the score, but will not greatly affect the search results.

 Is this possible?

 Thanks





--
- Nick


Re: Index Concurrency

2007-05-09 Thread joestelmach

Yonik,

Thanks for  your fast reply.

 No, not currently.  Start your implementation with just a single
 index... unless it is very large, it will likely be fast enough.

My index will get quite large

 Solr also handles all the concurrency issues, and you should never hit
 lock access timeout when updating from multiple threads.

Does solr provide any additional concurrency control over what Lucene
provides?  In my simple testing of indexing 2,000 messages, solr would issue
lock access timeouts with as little as 10 threads.   Running all 2,000
messages through sequentially yields no problems at all.   Actually, I'm
able churn through over 100,000 messages when no threads are involved.  Am I
missing some concurrency settings?

Thanks,
Joe


-- 
View this message in context: 
http://www.nabble.com/Index-Concurrency-tf3718634.html#a10406382
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Index Concurrency

2007-05-09 Thread Yonik Seeley

On 5/9/07, joestelmach [EMAIL PROTECTED] wrote:

Does solr provide any additional concurrency control over what Lucene
provides?


Yes, coordination between the main index searcher, the index writer,
and the index reader needed to delete other documents.


In my simple testing of indexing 2,000 messages, solr would issue
lock access timeouts with as little as 10 threads.


That's weird... I've never seen that.
The lucene write lock is only obtained when the IndexWriter is created.
Can you post the relevant part of the log file where the exception happens?

Also, unless you have at least 6 CPU cores or so, you are unlikely to
see greater throughput with 10 threads.  If you add multiple documents
per HTTP-POST (such that HTTP latency is minimized), the best setting
would probably be nThreads == nCores.  For a single doc per POST, more
threads will serve to cover the latency and keep Solr busy.

-Yonik


Question about delete

2007-05-09 Thread James liu

i use command like this


curl http://localhost:8983/solr/update --data-binary 
'deletequeryname:DDR/query/delete'
curl http://localhost:8983/solr/update --data-binary 'commit/'



and i get


numDocs : 0
maxDoc : 1218819



when i search something which exists in before delete and find nothing.

but index file size not changed and maxDoc not changed.

why it happen?


--
regards
jl


Re: Requests per second/minute monitor?

2007-05-09 Thread Ian Holsman


Walter Underwood wrote:
 This is for monitoring -- what happened in the last 30 seconds.
 Log file analysis doesn't really do that.
 

I would respectfully disagree.
Log file analysis of each request can give you that, and a whole lot more.

you could either grab the stats via a regular cron job, or create a separate
filter to parse them real time.
It would then let you grab more sophisticated stats if you choose to.

What I would like to know is (and excuse the newbieness of the question) how
to enable solr to log a file with the following data.


- time spent (ms) in the request. 
- IP# of the incoming request
- what the request was (and what handler executed it)
- a status code to signal if the request failed for some reasons 
- number of rows fetched
and 
- the number of rows actually returned

is this possible? (I'm using tomcat if that changes the answer).

regards
Ian
-- 
View this message in context: 
http://www.nabble.com/Re%3A-Requests-per-second-minute-monitor--tf3659369.html#a10407072
Sent from the Solr - User mailing list archive at Nabble.com.