date:20070427

AW: Help with Setup

2007-04-27 Thread Burkamp, Christian

Hi,

You can use curl with a file if you put the @ char in front of it's name. 
(Otherwise curl expects the data on the commandline).

curl http://localhost:8080/solr/update --data-binary @articles.xml

-Ursprüngliche Nachricht-
Von: Sean Bowman [mailto:[EMAIL PROTECTED] 
Gesendet: Donnerstag, 26. April 2007 23:32
An: solr-user@lucene.apache.org
Betreff: Re: Help with Setup

Try:

curl http://localhost:8080/solr/update --data-binary 'adddocfield 
name=id2008/fieldfield name=storyTextThe Rain in Spain Falls Mainly In 
The Plain/
field/doc/add'

And see if that works.  I don't think curl lets you put a filename in for the 
--data-binary parameter.  Has to be the actual data, though something like this 
might also work:

curl http://localhost:8080/solr/update --data-binary `cat articles.xml`

Those are open ticks, not apostrophes.

On 4/26/07, Ryan McKinley [EMAIL PROTECTED] wrote:
 
  paladin:/data/solr mtorgler1$ curl http://localhost:8080/solr/update 
  --data-binary articles.xml result 
  status=1org.xmlpull.v1.XmlPullParserException: only whitespace 
  content allowed before start tag and not a (position:
  START_DOCUMENT seen a... @1:1)
  at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1519)
  at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395)

 My guess is you have some funny character at the start of the document.
   I have seen funny chars show show up when i edit a UTF-8 file and 
 save it as ASCII.  If you don't see it in your normal editor, try a 
 different one.

 If that does not help, start with the working example and add modify a 
 little bit at a time...

 ryan

Re: Help with Setup

2007-04-27 Thread Mike

I thought that too, I opened it up via vi and nothing was there.   
Usually if I have a PC encoding issue ( I use EditPlus as a text  
editor) it will show up in vi.

On Apr 26, 2007, at 5:19 PM, Ryan McKinley wrote:

paladin:/data/solr mtorgler1$ curl http://localhost:8080/solr/ 
update --data-binary articles.xml
result status=1org.xmlpull.v1.XmlPullParserException: only  
whitespace content allowed before start tag and not a (position:  
START_DOCUMENT seen a... @1:1)

at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1519)
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395)


My guess is you have some funny character at the start of the  
document.  I have seen funny chars show show up when i edit a UTF-8  
file and save it as ASCII.  If you don't see it in your normal  
editor, try a different one.


If that does not help, start with the working example and add  
modify a little bit at a time...


ryan

Re: Help with Setup

2007-04-27 Thread Mike

Wrapping is purely via email, the text xml is single line (on  
purpose, originally it was a 3 paragraph field that was html encoded,  
I simplified the text as a 'just in case')

On Apr 26, 2007, at 5:09 PM, Cody Caughlan wrote:


For the storyText field element, is that wrapping only in this email
or is the source document wrapping like that as well?

/cody

On 4/26/07, Mike [EMAIL PROTECTED] wrote:

Greetings

I've gotten SOLR installed and the admin screens working.  At this
point I'm just trying to get my add record to be grabbed by the SOLR
update process, but unfortunately, I'm getting a whitespace error
that I could use some pointers on.  I've searched the site and found
similar errors but no tips that could help me out.

paladin:/data/solr mtorgler1$ curl http://localhost:8080/solr/update
--data-binary articles.xml
result status=1org.xmlpull.v1.XmlPullParserException: only
whitespace content allowed before start tag and not a (position:
START_DOCUMENT seen a... @1:1)
 at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1519)
 at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395)
 at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
 at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078)
 at org.apache.solr.core.SolrCore.update(SolrCore.java:661)
 at org.apache.solr.servlet.SolrUpdateServlet.doPost
(SolrUpdateServlet.java:53)
 at javax.servlet.http.HttpServlet.service 
(HttpServlet.java:709)
 at javax.servlet.http.HttpServlet.service 
(HttpServlet.java:802)

 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter
(ApplicationFilterChain.java:252)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter
(ApplicationFilterChain.java:173)
 at org.apache.catalina.core.StandardWrapperValve.invoke
(StandardWrapperValve.java:213)
 at org.apache.catalina.core.StandardContextValve.invoke
(StandardContextValve.java:178)
 at org.apache.catalina.core.StandardHostValve.invoke
(StandardHostValve.java:126)
 at org.apache.catalina.valves.ErrorReportValve.invoke
(ErrorReportValve.java:105)
 at org.apache.catalina.core.StandardEngineValve.invoke
(StandardEngineValve.java:107)
 at org.apache.catalina.connector.CoyoteAdapter.service
(CoyoteAdapter.java:148)
 at org.apache.coyote.http11.Http11Processor.process
(Http11Processor.java:869)
 at org.apache.coyote.http11.Http11BaseProtocol
$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java: 
664)

 at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket
(PoolTcpEndpoint.java:527)
 at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt
(LeaderFollowerWorkerThread.java:80)
 at org.apache.tomcat.util.threads.ThreadPool
$ControlRunnable.run(ThreadPool.java:684)
 at java.lang.Thread.run(Thread.java:613)


---

My schema.xml is pretty straight forward, I didn't modify the types
section, and modified the fields to the following


fields
field name=id type=string indexed=true stored=true/
field name=storyText type=text indexed=true  
stored=true/

   /fields

!-- field to use to determine and enforce document uniqueness. --
uniqueKeyid/uniqueKey

!-- field for the QueryParser to use when an explicit fieldname is
absent --
defaultSearchFieldstoryText/defaultSearchField

!-- SolrQueryParser configuration: defaultOperator=AND|OR --
solrQueryParser defaultOperator=OR/
-

My example document at this time is a singe doc (I've tried both with
and without xml declaration)

add
doc
field name=id2008/field
field name=storyTextThe Rain in Spain Falls Mainly In The Plain/
field
/doc
/add


What else am I missing?

My Tomcat installation is 5.5.20

My SOLR info is
Solr Specification Version: 1.1.0
Solr Implementation Version: 1.1.0-incubating - Yonik - 2006-12-17
17:09:54
Lucene Specification Version: nightly
Lucene Implementation Version: build 2006-11-15

java.vm.version = 1.5.0_07-87


Any and all help is appreciated and will be rewarded with a warm
glowing feeling of accomplishment!  Thx!

Re: Question to php to do with multi index

2007-04-27 Thread Michael Kimsal


The curl_multi is probably the most effective way, using straight PHP.
Another option would be to spawn several jobs, assuming unix/linux, and wait
for them to get done.  It doesn't give you very good error handling (well,
none at all actually!) but would let you run multiple indexing jobs at once.

Visit http://us.php.net/shell_exec and look at the 'class exec' contributed
note about halfway down the page.  It'll give you an idea of how to easily
spawn multiple jobs.

If you're using PHP5, the proc_open function may be another way to go.
proc_open was available in 4, but there were a number of extra parameters
and controls made available in 5.
http://us.php.net/manual/en/function.proc-open.php

An adventurous soul could combine the two concepts in to one class to manage
pipes communication between multiple child processes effectively.

On 4/26/07, James liu [EMAIL PROTECTED] wrote:


php not support multi thread,,,and how can u solve with multi index in
parallel？

now i use curl_multi

maybe more effect way i don't know,,,so if u know, tell me. thks.


--
regards
jl





--
Michael Kimsal
http://webdevradio.com

Re: case sensitivity

2007-04-27 Thread Yonik Seeley


On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:

We're (and by 'we' I mean my esteemed colleague!) working on patching a few
of these items to be in the solrconf.xml file and should likely have some
patches submitted next week.  It's being done on 'company time' and I'm not
sure about the exact policy/procedure for this sort of thing here (or
indeed, if there is one at all).


That's fine, as long as your company has agreed to contribute back the
patch (under the Apache license).  Apache enjoys a lot of business
support (being business friendly) and a *lot* of contributions is done
on company time.

Anything really big would probably need a CLA, but patches only
require clicking the grant license to ASF button in JIRA.

-Yonik

Re: case sensitivity

2007-04-27 Thread Michael Kimsal


Can you point me to the process for submitting these small patches?  I'm
looking at the jira site but don't see much of anything there outlining a
process for submitting patches.  Sorry to be so basic about this, but I'm
trying to follow correct procedures on both sides of the aisle, so to speak.


On 4/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:


On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:
 We're (and by 'we' I mean my esteemed colleague!) working on patching a
few
 of these items to be in the solrconf.xml file and should likely have
some
 patches submitted next week.  It's being done on 'company time' and I'm
not
 sure about the exact policy/procedure for this sort of thing here (or
 indeed, if there is one at all).

That's fine, as long as your company has agreed to contribute back the
patch (under the Apache license).  Apache enjoys a lot of business
support (being business friendly) and a *lot* of contributions is done
on company time.

Anything really big would probably need a CLA, but patches only
require clicking the grant license to ASF button in JIRA.

-Yonik





--
Michael Kimsal
http://webdevradio.com

Re: Help with Setup

2007-04-27 Thread Sean Bowman


That's an awesome tip to keep in ol' toolbox, Christian.

Facet Results Strange - Help

2007-04-27 Thread realw5


Hello,
I'm running into some strange results for some facets of mine. Below you'll
see the XML returned from solr. I did a query using the standard request
handler. Notice the duplicated values returned (american standard, delta,
etc). There is actually quite a few of them. At first I though it may be
because of case sensitivity, but I since lower case everything going to
solr. 

Hopefully someone can chime in with some tips, thanks!

Dan

?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime4/int 
  /lst
  result name=response numFound=2328 start=0 / 
- lst name=facet_counts
  lst name=facet_queries / 
- lst name=facet_fields
- lst name=manufacturer_facet
  int name=kohler1560/int 
  int name=american standard197/int 
  int name=toto181/int 
  int name=bemis83/int 
  int name=porcher56/int 
  int name=ginger45/int 
  int name=elements of design40/int 
  int name=brasstech18/int 
  int name=st thomas18/int 
  int name=hansgrohe15/int 
  int name=sterling14/int 
  int name=whitehaus13/int 
  int name=delta12/int 
  int name=jacuzzi10/int 
  int name=cifial8/int 
  int name=kwc8/int 
  int name=herbeau7/int 
  int name=jado7/int 
  int name=elizabethan classics6/int 
  int name=showhouse by moen5/int 
  int name=grohe4/int 
  int name=creative specialties3/int 
  int name=latoscana3/int 
  int name=american standard2/int 
  int name=danze2/int 
  int name=ronbow2/int 
  int name=belle foret1/int 
  int name=dornbracht1/int 
  int name=kohler1/int 
  int name=myson1/int 
  int name=newport brass1/int 
  int name=price pfister1/int 
  int name=quayside publishing1/int 
  int name=st. thomas1/int 
  int name=adagio0/int 
  int name=alno0/int 
  int name=alsons0/int 
  int name=bates and bates0/int 
  int name=blanco0/int 
  int name=cec0/int 
  int name=cole and co0/int 
  int name=competitive0/int 
  int name=corstone0/int 
  int name=creative specialties0/int 
  int name=danze0/int 
  int name=decolav0/int 
  int name=dolan designs0/int 
  int name=doralfe0/int 
  int name=dornbracht0/int 
  int name=dreamline0/int 
  int name=elkay0/int 
  int name=fontaine0/int 
  int name=franke0/int 
  int name=grohe0/int 
  int name=hamat0/int 
  int name=hydrosystems0/int 
  int name=improvement direct0/int 
  int name=insinkerator0/int 
  int name=kenroy international0/int 
  int name=kichler0/int 
  int name=kindred0/int 
  int name=maxim0/int 
  int name=mico0/int 
  int name=moen0/int 
  int name=moen0/int 
  int name=mr sauna0/int 
  int name=mr steam0/int 
  int name=neo elements0/int 
  int name=newport brass0/int 
  int name=ondine0/int 
  int name=pegasus0/int 
  int name=price pfister0/int 
  int name=progress lighting0/int 
  int name=pulse0/int 
  int name=quoizel0/int 
  int name=robern0/int 
  int name=rohl0/int 
  int name=sagehill designs0/int 
  int name=sea gull lighting0/int 
  int name=show house0/int 
  int name=sloan0/int 
  int name=st%2e thomas0/int 
  int name=st%2e thomas creations0/int 
  int name=steamist0/int 
  int name=swanstone0/int 
  int name=thomas lighting0/int 
  int name=warmatowel0/int 
  int name=waste king0/int 
  int name=waterstone0/int 
  /lst
  /lst
  /lst
  /response
-- 
View this message in context: 
http://www.nabble.com/Facet-Results-Strange---Help-tf3658597.html#a1084
Sent from the Solr - User mailing list archive at Nabble.com.

Re: resin faile to start with solr.

2007-04-27 Thread Bill Au


Have you tried using the schema.xml that is in example/solr/conf.  It that
works then the problem is definitely in your schema.xml.

Bill

On 4/26/07, James liu [EMAIL PROTECTED] wrote:


but it is ok when i use tomcat.

2007/4/26, Ken Krugler [EMAIL PROTECTED]:

 3.0.23 yesterday i try and fail.
 
 which version u use,,,i just not use pro version.

 From the error below, either your schema.xml file is messed up, or it
 might be that you still need to uncomment out the lines at the
 beginning of the web.xml file.

 These are the ones that say Uncomment if you are trying to use a
 Resin version before 3.0.19). Even though you're using a later
 version of Resin, I've had lots of issues with their XML parsing.

 -- Ken



 
 2007/4/26, Bill Au [EMAIL PROTECTED]:
 
 Have you tried resin 3.0.x?  3.1 is a development branch so it is less
 stable as 3.0.
 
 Bill
 
 On 4/19/07, James liu [EMAIL PROTECTED] wrote:
 
   It work well when i use tomcat with solr
 
   now i wanna test resin,,,i use resin-3.1.0
 
   now it show me
 
   [03:47:34.047] WebApp[http://localhost:8080] starting
   [03:47:34.691] WebApp[http://localhost:8080/resin-doc] starting
   [03:47:34.927] WebApp[http://localhost:8080/solr1] starting
   [03:47:35.051] SolrServlet.init()
   [03:47:35.077] Solr home set to '/usr/solrapp/solr1/'
   [03:47:35.077] user.dir=/tmp/resin-3.1.0/bin
   [03:47:35.231] Loaded SolrConfig: solrconfig.xml
   [03:47:35.522] adding requestHandler standard=
 solr.StandardRequestHandler
   [03:47:35.621] adding requestHandler dismax=
solr.DisMaxRequestHandler
   [03:47:35.692] adding requestHandler partitioned=
 solr.DisMaxRequestHandler
   [03:47:35.721] adding requestHandler instock=
 solr.DisMaxRequestHandler
   [03:47:35.819] Opening new SolrCore at /usr/solrapp/solr1/,
   dataDir=/usr/solrapp/solr1/data
   [03:47:35.884] Reading Solr Schema
[03:47:35.916] Schema name=example
   [03:47:35.929] org.apache.solr.core.SolrException: Schema Parsing
 Failed
[03:47:35.929]  at org.apache.solr.schema.IndexSchema.readConfig(
   IndexSchema.java:441)
   [03:47:35.929]  at org.apache.solr.schema.IndexSchema.init(
   IndexSchema.java:69)
   [03:47:35.929]  at org.apache.solr.core.SolrCore.init(
SolrCore.java
 :191)
 
 
 
   --
   regards
jl

 --
 Ken Krugler
 Krugle, Inc.
 +1 530-210-6378
 Find Code, Find Answers




--
regards
jl

Re[8]: Things are not quite stable...

2007-04-27 Thread Jack L

Hello Bertrand,

Is there a build script that automagically grab files from jetty's
source tree (local) and build a solr release? In other words, I can
try building with a newer version of jetty if it doesn't take too
much work - I don't know much about jetty or solr at the code level.

-- 
Best regards,
Jack

 Agreed, but note that we don't have any factual evidence that the
 Jetty RC that we use is indeed the cause of SOLR-118, so upgrading
 might not solve the problem.

 We're just at the wild guess stage at this point, and many of us have
 never seen the problem. In my case, we have more urgent stuff to do
 before looking at the problem in more detail.

 -Bertrand

Re: AW: Help with Setup

2007-04-27 Thread Mike



On Apr 27, 2007, at 4:24 AM, Burkamp, Christian wrote:


curl http://localhost:8080/solr/update --data-binary @


I think the issue was with a bad file in /data/solr/conf.  After a  
bunch of testing out of the installtion directory, I was able to use  
the post.sh script to post correctly into the server and had a  
successful commit.


I'm now going to retry to customize the data set and see what I can  
screw up.


Thanks for the help!

Solr and memcached

2007-04-27 Thread Otis Gospodnetic

Hi,

I'm considering adding support for caching results in memcached.  Questions:

1. Has anyone already done this? (searched, didn't find anything)
2. Can anyone think of any reasons why this might not be a good idea? (I *just* 
started considering this)
3. I read http://wiki.apache.org/solr/SolrCaching , but I think the whole cache 
discarding and warming wouldn't be needed if what I store in memcached is: 
StringRepresentationOfQuery  DocSet or DocList .  Hm, I see QueryResultKey 
class now.  Then maybe I'd store QueryResultKey - DocSet or DocList in the 
memcached cache.  Is this a correct?

Thanks,
Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

Re: case sensitivity

2007-04-27 Thread Otis Gospodnetic

Once the code/patch in the issue is put/committed to SVN, it means it will be 
in the next release.  You get your patch committed faster if it's clear, well 
written and explained, if it comes with a unit test if it's a code change, and 
so on.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Michael Kimsal [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, April 27, 2007 1:47:06 PM
Subject: Re: case sensitivity

What's the procedure then for something to get included in the next
release?

Thanks again all!

On 4/27/07, Michael Kimsal [EMAIL PROTECTED] wrote:

 So I just create my own 'issue' first?  OK.  Thanks.

 On 4/27/07, Ryan McKinley [EMAIL PROTECTED] wrote:
 
  Michael Kimsal wrote:
   Can you point me to the process for submitting these small
  patches?  I'm
   looking at the jira site but don't see much of anything there
  outlining a
   process for submitting patches.  Sorry to be so basic about this, but
  I'm
   trying to follow correct procedures on both sides of the aisle, so to
   speak.
  
 
  Check: http://wiki.apache.org/solr/HowToContribute
 
  Essentially you will create a new issue on JIRA, then upload a svn diff
  to that issue.
 
  holler if you have any troubles
 
  ryan
 
 


 --
 Michael Kimsal
 http://webdevradio.com




-- 
Michael Kimsal
http://webdevradio.com

Re: Requests per second/minute monitor?

2007-04-27 Thread Otis Gospodnetic

Would creating a new QueryRateFilter servlet filter be a good place to put 
this?  This way it could stay out of the Solr core and coult be turned on/off 
via web.xml.

Otis

- Original Message 
From: Chris Hostetter [EMAIL PROTECTED]
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Thursday, April 19, 2007 5:37:56 PM
Subject: Re: Requests per second/minute monitor?

: Is there a good spot to track request rate in Solr? Has anyone
: built a monitor?

I would think it would make more sense to track this in your application
server then to add it to Solr itself.

-Hoss

EmbeddedSolr class from Wiki

2007-04-27 Thread Daniel Einspanjer


The example EmbeddedSolr class on the wiki makes use of getUpdateHandler
which was added after 1.1 (so it seems to be available only on trunk).
I'd really like to move to an embedded Solr sooner rather than later.  My
questions are:

  - Would it be easy/possible to work around the lack of
  getUpdateHandler in 1.1 or would it completely change the
  implementation of EmbeddedSolr?
  - How stable is trunk right now?
  - Is 1.2 due out soon?
  - Is this approach significantly better than the one roughly outlined
  in
  http://www.mail-archive.com/solr-user@lucene.apache.org/msg03137.html?

Re: Additive scoring using Dismax...

2007-04-27 Thread Chris Hostetter


: AND does not controll scoring, only matching.  If you want dismax to
: be purely additive, pass tie=1.0 to the handler.

more specificly: the defaultOperator option in the schema.xml does not
affect the dismax parser used on the q param at all (only the stock
SolrQueryParser used for things like the the fq and bq params)



-Hoss

Re: EmbeddedSolr class from Wiki

2007-04-27 Thread Ryan McKinley


Daniel Einspanjer wrote:

The example EmbeddedSolr class on the wiki makes use of getUpdateHandler
which was added after 1.1 (so it seems to be available only on trunk).
I'd really like to move to an embedded Solr sooner rather than later.  My
questions are:

  - Would it be easy/possible to work around the lack of
  getUpdateHandler in 1.1 or would it completely change the
  implementation of EmbeddedSolr?


If you need to update, it will be difficult ;)



  - How stable is trunk right now?


There is nothing we know that is not stable.



  - Is 1.2 due out soon?


Hopefully soon.  I hope within a week or two.



  - Is this approach significantly better than the one roughly outlined
  in
  http://www.mail-archive.com/solr-user@lucene.apache.org/msg03137.html?



The wiki page gives an example of how you can internally access most 
things you would need to try.  It is a superset of what I suggesed in 
the email.  Check the printDocs(DocList docs) code.


Another approach is:
 https://issues.apache.org/jira/browse/SOLR-212
In this case you don't get access to the lucene Document, you just the 
the output text without an HTTP connection.


Your specific approach will depend on what you need.

ryan

Re: EmbeddedSolr class from Wiki

2007-04-27 Thread Fuad Efendi


Additional questions regarding EmbeddedSolr (for using the Solr API directly
without HTTP):

- Can I use separate JVMs for same Directory object? One process will
create/update/delete, and another search.
- Can I use separate JEE contexts inside same JVM?

Looks like singleton is a must, but separate search should be
possible...

Thanks,
Fuad
-- 
View this message in context: 
http://www.nabble.com/EmbeddedSolr-class-from-Wiki-tf3659379.html#a10225263
Sent from the Solr - User mailing list archive at Nabble.com.

multiple solr instances using same index files?

2007-04-27 Thread Ryan McKinley

Is it possible / is it an ok idea to have multiple solr instances 
running on the same machine pointing to the same index files?


Essentially, I have two distinct needs - in some cases i need a commit 
immediately after indexing one document, but most of the time it is fine 
to wait 10 mins for changes if that has better performance.


Alternatively i could rsync the index and use the standard distribution 
stuff.  If thats avoidable, it is one less thing...


thanks
ryan

Re: Solr and memcached

2007-04-27 Thread Yonik Seeley


If you store internal docids, then you need to add the specific reader
(or index version?) as part of the key since the ids are transient.

-Yonik

On 4/27/07, Otis Gospodnetic [EMAIL PROTECTED] wrote:

Hi,

I'm considering adding support for caching results in memcached.  Questions:

1. Has anyone already done this? (searched, didn't find anything)
2. Can anyone think of any reasons why this might not be a good idea? (I *just* 
started considering this)
3. I read http://wiki.apache.org/solr/SolrCaching , but I think the whole cache 
discarding and warming wouldn't be needed if what I store in memcached is: 
StringRepresentationOfQuery  DocSet or DocList .  Hm, I see QueryResultKey class 
now.  Then maybe I'd store QueryResultKey - DocSet or DocList in the memcached 
cache.  Is this a correct?

Thanks,
Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

Re: multiple solr instances using same index files?

2007-04-27 Thread Yonik Seeley


On 4/27/07, Ryan McKinley [EMAIL PROTECTED] wrote:

Is it possible / is it an ok idea to have multiple solr instances
running on the same machine pointing to the same index files?


If only one at a time is used to update the index, then yes it is possible.


Essentially, I have two distinct needs - in some cases i need a commit
immediately after indexing one document, but most of the time it is fine
to wait 10 mins for changes if that has better performance.


Sounds like a configuration issue... set autocommit to 10 minutes, but
explicitly
commit the important documents?

I don't quite get why one would want two solr instances for this.

-Yonik

Re: Solr and memcached

2007-04-27 Thread Ken Krugler


Hi Otis,


I'm considering adding support for caching results in memcached.  Questions:

1. Has anyone already done this? (searched, didn't find anything)


Not exactly, but we do something similar to this for Nutch searches 
using ehcache (http://krugle.com/kse/projects/eFNJEmX). But we store 
the (rewritten) query string and then the serialized XML response, as 
that way we don't have dependencies on stable searcher/doc ids (which 
is, for Nutch, the only reference we get from the remote searchers).


So depending on the # of entries in your cache and the size of the 
hit (docs * XML representation for each), storing the XML might be a 
reasonable option.


-- Ken

2. Can anyone think of any reasons why this might not be a good 
idea? (I *just* started considering this)
3. I read http://wiki.apache.org/solr/SolrCaching , but I think the 
whole cache discarding and warming wouldn't be needed if what I 
store in memcached is: StringRepresentationOfQuery  DocSet or 
DocList .  Hm, I see QueryResultKey class now.  Then maybe I'd store 
QueryResultKey - DocSet or DocList in the memcached cache.  Is this 
a correct?


Thanks,
Otis


--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
Find Code, Find Answers

Re: case sensitivity

2007-04-27 Thread Yonik Seeley


On 4/26/07, Erik Hatcher [EMAIL PROTECTED] wrote:

I think we should open up as many of the switches as we can to
QueryParser, allowing users to tinker with them if they want, setting
the defaults to the most common reasonable settings we can agree upon.


I think we should also try and handle what we can automatically too.
Always lowercasing or not isn't elegant, as the right thing to do
depends on the field.

I always had it in my head that the QueryParser should figure it out.
Actually, for good performance, the fieldType should figure it out just once.
The presense of a LowerCaseFilter could be one signal to lowercase
prefix strings,
or one could actually run a test token through analysis and test if it
comes out lowercased.

Numeric fields are a sticking point... prefix queries and wildcard
queries aren't even possible there.  Of course, even stemming is
problematic with wildcard queries.

-Yonik

Re: Requests per second/minute monitor?

2007-04-27 Thread Yonik Seeley


On 4/27/07, Otis Gospodnetic [EMAIL PROTECTED] wrote:

Would creating a new QueryRateFilter servlet filter be a good place to put 
this?  This way it could stay out of the Solr core and coult be turned on/off 
via web.xml.


There's already gotta be some nice external tools that parse log files
and produce pretty graphs, no?

-Yonik

Re: Solr and memcached

2007-04-27 Thread Chris Hostetter


: 2. Can anyone think of any reasons why this might not be a good idea? (I
: *just* started considering this)

: 3. I read http://wiki.apache.org/solr/SolrCaching , but I think the
: whole cache discarding and warming wouldn't be needed if what I store in
: memcached is: StringRepresentationOfQuery  DocSet or DocList .  Hm, I
: see QueryResultKey class now.  Then maybe I'd store QueryResultKey -
: DocSet or DocList in the memcached cache.  Is this a correct?

The nice thing about the internal caching is that because it's internal,
it can be autowarmed, and it can store things that only make sense as part
of the internal API (ie: big OpenBitSet based DocSets that rely on the
IndexReader for getting the real field contents)

when you start talking about caching the data outside of Solr, I don't
think the internal SolrCache APIs make as much sense anymore, what you
can effectively/efficiently cache changes, and it may not make sense to
hook that cache in at such a low level anymore -- it starts making more
sense to talk about caching request=response pairs (with
pagination and field lists baked into them) instead of caching
individual DocLists and DocSets ... at that level hooking into memcached
might make sense, but it's probably easier and just as effective to use
something like squid as a proxy cache in front of SOlr instead.

(that's what i do)



-Hoss

Re: Facet Results Strange - Help

2007-04-27 Thread realw5


I have a dynamic field setup for facets. It looks like this:

dynamicField name=*_facet type=string indexed=true stored=false
multiValued=true / 

I do this, because we add facets quite often, so having to modify the schema
every time would be unfeasible.

I'm currently reindexing from scratch, so I cannot try wt=python for little
bit longer. Once it's done indexing I'll give that a go and see if I notice
anything.

Dan


Yonik Seeley wrote:
 
 On 4/27/07, realw5 [EMAIL PROTECTED] wrote:
 Hello,
 I'm running into some strange results for some facets of mine. Below
 you'll
 see the XML returned from solr. I did a query using the standard request
 handler. Notice the duplicated values returned (american standard, delta,
 etc). There is actually quite a few of them. At first I though it may be
 because of case sensitivity, but I since lower case everything going to
 solr.

 Hopefully someone can chime in with some tips, thanks!
 
 What's the field definition for manufacturer_facet in your schema?  Is
 it multi-valued or not?
 
 Also, can you try the python response format (wt=python) as it outputs
 only ASCII and escapes everything else... there is an off chance the
 strings look the same but aren't.
 
 -Yonik
 
 

-- 
View this message in context: 
http://www.nabble.com/Facet-Results-Strange---Help-tf3658597.html#a10226359
Sent from the Solr - User mailing list archive at Nabble.com.

Re: EmbeddedSolr class from Wiki

2007-04-27 Thread Chris Hostetter


: - Can I use separate JVMs for same Directory object? One process will
: create/update/delete, and another search.
: - Can I use separate JEE contexts inside same JVM?
:
: Looks like singleton is a must, but separate search should be
: possible...

in theory it should work, Solr doens't do any additional locking on top of
Lucene, so having multiple Solr instances acting as a reader and a single
Solr instance acting as a writer should work (but i've never tried it)

you could even have the postCommit hook of your writer trigger a commit
call on your readers so they reopen the newly updated index.




-Hoss

Re: multiple solr instances using same index files?

2007-04-27 Thread Chris Hostetter

:  Essentially, I have two distinct needs - in some cases i need a commit
:  immediately after indexing one document, but most of the time it is fine
:  to wait 10 mins for changes if that has better performance.
:
: Sounds like a configuration issue... set autocommit to 10 minutes, but
: explicitly
: commit the important documents?
:
: I don't quite get why one would want two solr instances for this.

i think he means that he has some search clinets that want to see changes
immediately, but other search clients can have more lag and he wants
the speed benefits of good caching for those clients

in essence what you describe is the standard master/slave setup where you
commit on the master very frequently, search the master when you need
real time info, and only run snapinstaller on your slave at large
regular intervals ... the difference is you aren't really making or
pulling snapshots.

(I'm guessing you and Fuad are in the exact same situation)


-Hoss

Re: Facet Results Strange - Help

2007-04-27 Thread Yonik Seeley


On 4/27/07, realw5 [EMAIL PROTECTED] wrote:

I have a dynamic field setup for facets. It looks like this:

dynamicField name=*_facet type=string indexed=true stored=false
multiValued=true /

I do this, because we add facets quite often, so having to modify the schema
every time would be unfeasible.

I'm currently reindexing from scratch, so I cannot try wt=python for little
bit longer. Once it's done indexing I'll give that a go and see if I notice
anything.


If it's really the same field value repeated, you've hit a bug.
If so, it would be helpful if you could open a JIRA bug, and anything
you can do to help us reproduce the problem would be appreciated.

-Yonik

Re: Requests per second/minute monitor?

2007-04-27 Thread Ryan McKinley


Walter Underwood wrote:

This is for monitoring -- what happened in the last 30 seconds.
Log file analysis doesn't really do that.

I think the XML output in admin/stats.jsp may be enough for us.
That gives the cumulative requests on each handler. Those are
counted in StandardRequestHandler DisMaxRequestHandler and
are available through the MBean interface.



If you are running a solr build since yesterday, take a look at the 
PluginInfoHandler


 http://localhost:8983/solr/admin/plugins?stats=true

This gives you standard response format access to the same info.

I'll write up some docs for the wiki shortly

Perhaps it should be modified to let you specify a single handler.

Re: just advice

2007-04-27 Thread Chris Hostetter


: i will use /usr/solrapp/conf/solr1_solrconfig.xml, solr2_solrconfig.xml,
: solr3_solrconfig.xml...and so.
:
: when i test these instance, i just stay in /usr/solrapp.conf/,,,not like
: now,
:
: i have to change
: /usr/solrapp/solr1/conf,,,/usr/solrapp/solr2/conf,,,/usr/solrapp/solr3/conf,

Hmmm... i suppose Solr could support a system property for overriding the
name ofhte solrconfig.xml file ... but honestly i think a simpler way to
get the behavior you are describing for your development envirnment would
be to have seperate solr.home directories for each of your indexes, leave
the config files named solrconfig.xml in each, and mke a new directory
containing symlinks to each of those config files with wahtever ame you
want that helps you remember what it's for.



-Hoss

Re: wrong path in snappuller

2007-04-27 Thread Chris Hostetter


: The solr on the rsync command line is just a label which is defined in
: rsyncd.conf on the master.  rsyncd.conf is created on the fly by the script
: rsyncd-start:
...
: This label is then mapped to the path defined in $data_dir.

Ah... right, i forgot about that.

:  Why does it need to start an rsyncd in the master in a different port
:  for each ap, is it not enough to call rsync on master:path?

one of the reasons for this appraoch is to make it easier to run solr in a
somewhat self contained setup .. you don't have to rely on an external
(to the Solr install) instance of rsyncd running rooted at base of the
filesystem.  the other nice thing with having seperate rsyncd for each
solr instance is that you can shutoff all replication with a single
command on a master solr port (without disabling other solr masters
running on the same machine, or breaking other non-solr uses of rsync on
that machine)

this can be handy when you want to do a upgrade to a solr tier without any
down time:
  1) turn of the master's rsync port,
  2) disable snappuller on all of the slaves
  3) shutdown and upgrade the master solr port
  4) rebuild the index on the master as needed
  5) run queries against the master to test things are working well.
  6) start the master's rsyncd port
  7) take half of your slaves out of rotation from your load balancer
  8) shutdown and upgrade the slaves that are out of rotation
  9) enable snappulling on the slaves that are out of rotation
 10) swap which slaves are in/out of rotation on your load balancer
 11) repeat steps 8 and 9
 12) add all slaves back into rotation on your load balancer.

...if you had a sincel rsync port for the entire machine, then this
wouldn't work very cleanly if the machine you were using as the master
was hosting more then solr index (or any other apps using rsync)


-Hoss

Re: Facet Results Strange - Help

2007-04-27 Thread realw5


Ok, I just finished indexing about 20k in documents. I took a look at so far
the problem has not appearred again. What I'm thinking caused it was I was
not adding overwritePending  overwriteCommited in the add process. Therefor
over time as data was being cleaned up, it was just appending to the
existing data.

I did have once cause of repeated values, but after looking at the python
writer, I notice a space at the end. I can fix this issue by triming all my
values before sening them to solr :-) 

I'm going to continue indexing, and if the problem popups up once fully
indexed I'll post back again. Otherwise thanks for the quick replies!

Dan


Yonik Seeley wrote:
 
 On 4/27/07, realw5 [EMAIL PROTECTED] wrote:
 I have a dynamic field setup for facets. It looks like this:

 dynamicField name=*_facet type=string indexed=true stored=false
 multiValued=true /

 I do this, because we add facets quite often, so having to modify the
 schema
 every time would be unfeasible.

 I'm currently reindexing from scratch, so I cannot try wt=python for
 little
 bit longer. Once it's done indexing I'll give that a go and see if I
 notice
 anything.
 
 If it's really the same field value repeated, you've hit a bug.
 If so, it would be helpful if you could open a JIRA bug, and anything
 you can do to help us reproduce the problem would be appreciated.
 
 -Yonik
 
 

-- 
View this message in context: 
http://www.nabble.com/Facet-Results-Strange---Help-tf3658597.html#a10226731
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple solr instances using same index files?

2007-04-27 Thread Ryan McKinley


Chris Hostetter wrote:

:  Essentially, I have two distinct needs - in some cases i need a commit
:  immediately after indexing one document, but most of the time it is fine
:  to wait 10 mins for changes if that has better performance.
:
: Sounds like a configuration issue... set autocommit to 10 minutes, but
: explicitly
: commit the important documents?
:
: I don't quite get why one would want two solr instances for this.

i think he means that he has some search clinets that want to see changes
immediately, but other search clients can have more lag and he wants
the speed benefits of good caching for those clients



Yes - I want the 1% use to not disrupt the 99% (readonly) use, but the 
overall load does not require multiple machines.


I want one set of users to have a large infrequently disrupted cache and 
another that has no auto warming and opens new searchers frequently.


If lucene supports that, i'll give it a try!

ryan

Re: Facet Results Strange - Help

2007-04-27 Thread Yonik Seeley


On 4/27/07, realw5 [EMAIL PROTECTED] wrote:

Ok, I just finished indexing about 20k in documents. I took a look at so far
the problem has not appearred again. What I'm thinking caused it was I was
not adding overwritePending  overwriteCommited in the add process. Therefor
over time as data was being cleaned up, it was just appending to the
existing data.


That is the default anyway.  Even if duplicate documents were somehow
added, that should not cause duplicates in facet results.  It should
be impossible to get duplicate values from facet.field, regardless of
what the index looks like.


I did have once cause of repeated values, but after looking at the python
writer, I notice a space at the end. I can fix this issue by triming all my
values before sening them to solr :-)


Hopefully you should have also seen the space in the XML response...
if it's not there, that would be a bug.

-Yonik

Re: Facet Results Strange - Help

2007-04-27 Thread Chris Hostetter


: It's likely you have the facet category added more than once for one
: or more docs. Like this;
:
: field name=manufacturer_facetamerican standard/field
: field name=manufacturer_facetamerican standard/field
:
: Are you adding the facet values on-the-fly? This happened to me and I
: solved it by removing the duplicate facet fields.

that's really odd ... i can't think of any way that exactly duplicate
field values would be counted twice in the current facet.field code.

I just tested this using the exampledocs by adding electronics to the
cat field of some docs multiple times, and i couldn't reproduce this
behavior.

can you elaborate more on how to trigger it?


-Hoss

Re: case sensitivity

2007-04-27 Thread Yonik Seeley


On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:

My colleague, after some digging, found in SolrQueryParser

(around line 62)
setLowercaseExpandedTerms(false);

The default for Lucene is true.  Was this intentional?  Or an oversight?


Way back before Solr was opensourced, and Chris was the only
user, I thought he needed to do prefix queries where case sensitive
wildcard queries (hence I set it to false).  I think I may have been
mistaken about that need, but by that time, I didn't know if anyone
depended on it, so I never changed it back.

A default of false is actually more powerful too.  You can do prefix
queries on fields that have a LowercaseFilter in their analyzer, and
also fields that don't.  If it's set to true, you can't reliably do
prefix queries on fields that don't have a LowercaseFilter.

-Yonik

Vote For Jira Issues

2007-04-27 Thread Chris Hostetter


Hi everybody,

I just wanted to point out that there has been some discussion going on on
the solr-dev list about making a Solr 1.2 release in the near future.
There is no ETA on when this will happen, at this point it's mainly a
discussion of what pending Jira patches should we ensure to include?

The best way to indicate that you think a patch for a bug fix or new
feature should be included is to login to Jira (anyone can create an
account) and vote for the issues you care about...

http://issues.apache.org/jira/browse/SOLR?report=com.atlassian.jira.plugin.system.project:popularissues-panel

Please keep in mind, this is not a guarantee that the most popular
bugs/features will be fixed/included in the next release -- the likelihood
of a patch being applied is proportionate to it's popularity, but it's
also inversely proportionate to it's size, complexity, and impact on
existing users.



-Hoss

Unicode characters

2007-04-27 Thread HUYLEBROECK Jeremy RD-ILAB-SSF


Hi,

We experience some encoding probs with the unicode characters getting
out of solr.
Let me explain our flow:

-fetch a web page
-decode entities and unicode characters(such as $#149; ) using Neko
library
-get a unicode String in Java
-Sent it to SOLR through XML created by SAX, with the right encoding
(UTF-8) specified everywhere( writer, header etc...)
-it apparently arrives clean on the SOLR side (verified in our logs).
-In the query output from SOLR (XML message), the character is not
encoded as an entity (not #149;) but the character itself is used
(character 149=95 hexadecimal).

And we can see in firefox and our logs a code instead of the character
(code 149 or 95), even if the original XML message sent to SOLR was
properly rendered in Firefox, or the shell etc...

We might have missed something somewhere as we easily get our minds lost
in all the encoding/unicode nightmare ;)

We've seen the method escape in the XML Object of SOLR. It escapes
only a few codes as Entities. Could it be the source of our prob?
What would be the right approach to encode properly our input without
having to tweak solr code? Or is it a bug?

Thanks

Jeremy.

Re: Facet Results Strange - Help

2007-04-27 Thread Chris Hostetter

: writer, I notice a space at the end. I can fix this issue by triming all my
: values before sening them to solr :-)

The built in Field Faceting works on the indexed values, so Solr can solve
this for you if you use something like this for your facet field type...

   fieldType name=facetString class=solr.TextField omitNorms=true
 analyzer
  !-- KeywordTokenizer does no actual tokenizing, so the entire
   input string is preserved as a single token
--
  tokenizer class=solr.KeywordTokenizerFactory/
  !-- The LowerCase TokenFilter does what you expect, which can be
   when you want your sorting to be case insensitive
--
  filter class=solr.LowerCaseFilterFactory /
  !-- The TrimFilter removes any leading or trailing whitespace --
  filter class=solr.TrimFilterFactory /
 /analyzer
   /fieldType



-Hoss

Re: case sensitivity

2007-04-27 Thread Michael Pelz Sherman

In our experience, setting a LowercaseFilter in the query did not work; we had 
to call setLowercaseExpandedTerms(true) to get wildcard queries to be 
case-insensitive.
   
  Here's our analyzer definition from our solr schema:
   
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
   
  If calling setLowercaseExpandedTerms(true) is *not* in fact necessary for 
case-insensitive wildcard queries, could you please provide an example of a 
solr schema that can achieve this?
   
  Thanks!
  - mps
  
Yonik Seeley [EMAIL PROTECTED] wrote:
  On 4/26/07, Michael Kimsal wrote:
 My colleague, after some digging, found in SolrQueryParser

 (around line 62)
 setLowercaseExpandedTerms(false);

 The default for Lucene is true. Was this intentional? Or an oversight?

Way back before Solr was opensourced, and Chris was the only
user, I thought he needed to do prefix queries where case sensitive
wildcard queries (hence I set it to false). I think I may have been
mistaken about that need, but by that time, I didn't know if anyone
depended on it, so I never changed it back.

A default of false is actually more powerful too. You can do prefix
queries on fields that have a LowercaseFilter in their analyzer, and
also fields that don't. If it's set to true, you can't reliably do
prefix queries on fields that don't have a LowercaseFilter.

-Yonik

Re: case sensitivity

2007-04-27 Thread Yonik Seeley


On 4/27/07, Michael Pelz Sherman [EMAIL PROTECTED] wrote:

In our experience, setting a LowercaseFilter in the query did not work; we had 
to call setLowercaseExpandedTerms(true) to get wildcard queries to be 
case-insensitive.


Correct, because in that case the QueryParser does not invoke analysis
(because it's a partial word, not a whole word).


  If calling setLowercaseExpandedTerms(true) is *not* in fact necessary for 
case-insensitive wildcard queries, could you please provide an example of a 
solr schema that can achieve this?


I didn't say that :-)

I'm saying setLowercaseExpandedTerms(true) is not sufficient for
wildcard queries in general.  If the term is indexed as Windows95,
then a prefix query of Windows* won't find anything if
setLowercaseExpandedTerms(true)

-Yonik



Yonik Seeley [EMAIL PROTECTED] wrote:
  On 4/26/07, Michael Kimsal wrote:
 My colleague, after some digging, found in SolrQueryParser

 (around line 62)
 setLowercaseExpandedTerms(false);

 The default for Lucene is true. Was this intentional? Or an oversight?

Way back before Solr was opensourced, and Chris was the only
user, I thought he needed to do prefix queries where case sensitive
wildcard queries (hence I set it to false). I think I may have been
mistaken about that need, but by that time, I didn't know if anyone
depended on it, so I never changed it back.

A default of false is actually more powerful too. You can do prefix
queries on fields that have a LowercaseFilter in their analyzer, and
also fields that don't. If it's set to true, you can't reliably do
prefix queries on fields that don't have a LowercaseFilter.

-Yonik

Re: Unicode characters

2007-04-27 Thread Yonik Seeley


On 4/27/07, HUYLEBROECK Jeremy RD-ILAB-SSF

-In the query output from SOLR (XML message), the character is not
encoded as an entity (not #149;) but the character itself is used
(character 149=95 hexadecimal).


That's fine, as they are equivalent representations, and that
character is directly representable in UTF-8 (which Solr uses for it's
output).
Is this causing a problem for you somehow?

-Yonik

Re: Unicode characters

2007-04-27 Thread Chris Hostetter


: -fetch a web page
: -decode entities and unicode characters(such as $#149; ) using Neko
: library
: -get a unicode String in Java
: -Sent it to SOLR through XML created by SAX, with the right encoding
: (UTF-8) specified everywhere( writer, header etc...)
: -it apparently arrives clean on the SOLR side (verified in our logs).
: -In the query output from SOLR (XML message), the character is not
: encoded as an entity (not #149;) but the character itself is used
: (character 149=95 hexadecimal).

Just because someone uses an html entity to display a character in a web
page doesn't mean it needs to be escaped in XML ... i think that in
theory we could use numeric entities to escape *every* character but that
would make the XML responses a lot bigger ... so in general Solr only
escapes the characters that need to be escaped to have a valid UTF-8 XML
response.

Your may also be having some additional problems since 149 (hex 95) is not
a printable UTF-8 character, it's a control character (MESSAGE_WAITING)
... it sounds like you're dealing with HTML where people were using the
numeric value from the Windows-1252 charset.

you may want to modify your parsing code to do some mappings between
control characters that you know aren't ment to be control characters
before you ever send them to solr.  a quick search for Neko
windows-1525 indicates that enough people have had problems with this
that it is a built in feature...
http://people.apache.org/~andyc/neko/doc/html/settings.html
http://cyberneko.org/html/features/scanner/fix-mswindows-refs
 Specifies whether to fix character entity references for Microsoft
 Windows characters as described at
 http://www.cs.tut.fi/~jkorpela/www/windows-chars.html.;

(I've run into this a number of times over the years when dealing with
content created by windows users, as you can see from my one and only
thread on JavaJunkies ...
  http://www.javajunkies.org/index.pl?node_id=3436
)


-Hoss

Re: AW: Leading wildcards

2007-04-27 Thread Paul Fryer


PLEASE REMOVE ME FROM THIS MAILING LIST!!!

Whoever manages this list, can you please remove me i have tried sending 
emails to the unsubscribe email, but i just keep getting more emails. This 
is really an issue for me... so your help would be great!


Thanks,

Paul



From: Chris Hostetter [EMAIL PROTECTED]
Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: AW: Leading wildcards
Date: Fri, 27 Apr 2007 16:25:37 -0700 (PDT)


: when we do a search on a nonexisting field, we get a  SolrException:
: undefined field
: (this was for query nonfield:test)
:
: but when we use wildcards in our query, we dont get the undefined field
: exception,
: so the query nonfield:*test works fine ... just zero results...
:
: is this normal behaviour ?

the error about undefined fields comes up because the Lucene QueryParser
is attempting to analyze the field, and the Solr IndexSchema
complains if it can't find the field it's asked to provide an analyzer
for.

for wildcard (and fuzzy and prefix) queries, the input is not analyzed
(the Lucene FAQ explains this a bit) so the Solr IndexSchema is never
consulted about the field.


It is certianly an odd bit of behavior, and we should try to be
consistent.  I think it it would be fairly straight forward to make the
SolrQueryParser *always* test that the field is viable according the
IndexSchema ... would you mind opening a bug in Jira for this?



-Hoss



_
Download Messenger. Join the im Initiative. Help make a difference today. 
http://im.live.com/messenger/im/home/?source=TAGHM_APR07

Re: just advice

2007-04-27 Thread James liu


HMmm

mkdir+ln-s = solve my problem

thks hoss.

2007/4/28, Chris Hostetter [EMAIL PROTECTED]:



: i will use /usr/solrapp/conf/solr1_solrconfig.xml, solr2_solrconfig.xml,
: solr3_solrconfig.xml...and so.
:
: when i test these instance, i just stay in /usr/solrapp.conf/,,,not like
: now,
:
: i have to change
:
/usr/solrapp/solr1/conf,,,/usr/solrapp/solr2/conf,,,/usr/solrapp/solr3/conf,

Hmmm... i suppose Solr could support a system property for overriding the
name ofhte solrconfig.xml file ... but honestly i think a simpler way to
get the behavior you are describing for your development envirnment would
be to have seperate solr.home directories for each of your indexes, leave
the config files named solrconfig.xml in each, and mke a new directory
containing symlinks to each of those config files with wahtever ame you
want that helps you remember what it's for.



-Hoss





--
regards
jl

Re: Solr and memcached

2007-04-27 Thread James liu


i used to think cache data with memcached.

why i think that?

for example, i have 45 solr instance, and i have to merge their results into
on array and sort by score or datetime, if i use rows=10, it means i will
get max 45*10 results,,but it only show 10result per page.

how to do with 440 results,,just kill them or save them into memcached?
i saved them into memcached.

it seems only save merge and sort time because i find same query just
pagenum changed the response will be very quick.

Merge and sort time is no much when i test. And if use memcached, we have to
manage it ,add hardware and have to learn how to use it, solve its problem.

So now i not use memcached and just use solr *inside* cache.


2007/4/28, Chris Hostetter [EMAIL PROTECTED]:



: 2. Can anyone think of any reasons why this might not be a good idea? (I
: *just* started considering this)

: 3. I read http://wiki.apache.org/solr/SolrCaching , but I think the
: whole cache discarding and warming wouldn't be needed if what I store in
: memcached is: StringRepresentationOfQuery  DocSet or DocList .  Hm, I
: see QueryResultKey class now.  Then maybe I'd store QueryResultKey -
: DocSet or DocList in the memcached cache.  Is this a correct?

The nice thing about the internal caching is that because it's internal,
it can be autowarmed, and it can store things that only make sense as part
of the internal API (ie: big OpenBitSet based DocSets that rely on the
IndexReader for getting the real field contents)

when you start talking about caching the data outside of Solr, I don't
think the internal SolrCache APIs make as much sense anymore, what you
can effectively/efficiently cache changes, and it may not make sense to
hook that cache in at such a low level anymore -- it starts making more
sense to talk about caching request=response pairs (with
pagination and field lists baked into them) instead of caching
individual DocLists and DocSets ... at that level hooking into memcached
might make sense, but it's probably easier and just as effective to use
something like squid as a proxy cache in front of SOlr instead.

(that's what i do)



-Hoss





--
regards
jl

showing range facet example = by Range ( 1 to 1000 )

2007-04-27 Thread Jery Cook

im stuck:

 

Have a facet, and field in an document called estimatedRepairs, it is
declared in  the schema.xml as

 





  field name=estimatedRepairs type=sfloat indexed=true stored=true
multiValued=true/





 

 

I execute a query with the below parameters

 

q=state%3Avirgina;

facet.query=estimatedRepairs:[*+TO+1000.0]

facet.query=estimatedRepairs:[1000.0+TO+*]

facet=true

facet.field=state

facet.field=country

facet.field=zip

facet.field=estimatedProfit

facet.field=marketValue

facet.field=numberOfBaths

facet.field=numberOfBeds

facet.field=price

facet.field=type

facet.limit=10

facet.zeros=false

facet.missing=false

version=2.2

debugQuery=true

 

 

However my results show

 

facet name: [estimatedRepairs] value count: [10]

[ListingApp] INFO [main]
ListingManagerImplTest.testCreateQueryFromSearchParams(186) |  count Name:
[24153.0] , count: [7]

[ListingApp] INFO [main]
ListingManagerImplTest.testCreateQueryFromSearchParams(186) |  count Name:
[1469.0] , count: [6]

[ListingApp] INFO [main]
ListingManagerImplTest.testCreateQueryFromSearchParams(186) |  count Name:
[4249.0] , count: [6]

[ListingApp] INFO [main]
ListingManagerImplTest.testCreateQueryFromSearchParams(186) |  count Name:
[16444.0] , count: [6]

[ListingApp] INFO [main]
ListingManagerImplTest.testCreateQueryFromSearchParams(186) |  count Name:
[21555.0] , count: [6]

[ListingApp] INFO [main]
ListingManagerImplTest.testCreateQueryFromSearchParams(186) |  count Name:
[23132.0] , count: [6]

[ListingApp] INFO [main]
ListingManagerImplTest.testCreateQueryFromSearchParams(186) |  count Name:
[25669.0] , count: [6]

[ListingApp] INFO [main]
ListingManagerImplTest.testCreateQueryFromSearchParams(186) |  count Name:
[26160.0] , count: [6]

[ListingApp] INFO [main]
ListingManagerImplTest.testCreateQueryFromSearchParams(186) |  count Name:
[27058.0] , count: [6]

[ListingApp] INFO [main]
ListingManagerImplTest.testCreateQueryFromSearchParams(186) |  count Name:
[171.0] , count: [5]

 

 

AND I DON'T WANT THIS.

 

I want it to show

 

Something like

 

by estimated Repairs.

1 to 1000[23]

 1000 - 2000[53]

 

 

I thought facet.query allows me to do this? If not what will let SOLR
generate the query counts, for the results in intervals of 1000

 

Jeryl Cook

 

^ Pharaoh ^
http://pharaohofkush.blogspot.com/

1f u c4n r34d th1s u r34lly n33d t0 g37 l41d

Re: resin faile to start with solr.

2007-04-27 Thread James liu


yes, i tried and failed.

afternoon i will redownload solr and test .

2007/4/28, Bill Au [EMAIL PROTECTED]:


Have you tried using the schema.xml that is in example/solr/conf.  It that
works then the problem is definitely in your schema.xml.

Bill

On 4/26/07, James liu [EMAIL PROTECTED] wrote:

 but it is ok when i use tomcat.

 2007/4/26, Ken Krugler [EMAIL PROTECTED]:
 
  3.0.23 yesterday i try and fail.
  
  which version u use,,,i just not use pro version.
 
  From the error below, either your schema.xml file is messed up, or it
  might be that you still need to uncomment out the lines at the
  beginning of the web.xml file.
 
  These are the ones that say Uncomment if you are trying to use a
  Resin version before 3.0.19). Even though you're using a later
  version of Resin, I've had lots of issues with their XML parsing.
 
  -- Ken
 
 
 
  
  2007/4/26, Bill Au [EMAIL PROTECTED]:
  
  Have you tried resin 3.0.x?  3.1 is a development branch so it is
less
  stable as 3.0.
  
  Bill
  
  On 4/19/07, James liu [EMAIL PROTECTED] wrote:
  
It work well when i use tomcat with solr
  
now i wanna test resin,,,i use resin-3.1.0
  
now it show me
  
[03:47:34.047] WebApp[http://localhost:8080] starting
[03:47:34.691] WebApp[http://localhost:8080/resin-doc] starting
[03:47:34.927] WebApp[http://localhost:8080/solr1] starting
[03:47:35.051] SolrServlet.init()
[03:47:35.077] Solr home set to '/usr/solrapp/solr1/'
[03:47:35.077] user.dir=/tmp/resin-3.1.0/bin
[03:47:35.231] Loaded SolrConfig: solrconfig.xml
[03:47:35.522] adding requestHandler standard=
  solr.StandardRequestHandler
[03:47:35.621] adding requestHandler dismax=
 solr.DisMaxRequestHandler
[03:47:35.692] adding requestHandler partitioned=
  solr.DisMaxRequestHandler
[03:47:35.721] adding requestHandler instock=
  solr.DisMaxRequestHandler
[03:47:35.819] Opening new SolrCore at /usr/solrapp/solr1/,
dataDir=/usr/solrapp/solr1/data
[03:47:35.884] Reading Solr Schema
 [03:47:35.916] Schema name=example
[03:47:35.929] org.apache.solr.core.SolrException: Schema Parsing
  Failed
 [03:47:35.929]  at org.apache.solr.schema.IndexSchema.readConfig
(
IndexSchema.java:441)
[03:47:35.929]  at org.apache.solr.schema.IndexSchema.init(
IndexSchema.java:69)
[03:47:35.929]  at org.apache.solr.core.SolrCore.init(
 SolrCore.java
  :191)
  
  
  
--
regards
 jl
 
  --
  Ken Krugler
  Krugle, Inc.
  +1 530-210-6378
  Find Code, Find Answers
 



 --
 regards
 jl






--
regards
jl

Re: Question to php to do with multi index

2007-04-27 Thread James liu


i think curl_multi is slow.

thks, i will try.

2007/4/27, Michael Kimsal [EMAIL PROTECTED]:


The curl_multi is probably the most effective way, using straight PHP.
Another option would be to spawn several jobs, assuming unix/linux, and
wait
for them to get done.  It doesn't give you very good error handling (well,
none at all actually!) but would let you run multiple indexing jobs at
once.

Visit http://us.php.net/shell_exec and look at the 'class exec'
contributed
note about halfway down the page.  It'll give you an idea of how to easily
spawn multiple jobs.

If you're using PHP5, the proc_open function may be another way to go.
proc_open was available in 4, but there were a number of extra parameters
and controls made available in 5.
http://us.php.net/manual/en/function.proc-open.php

An adventurous soul could combine the two concepts in to one class to
manage
pipes communication between multiple child processes effectively.

On 4/26/07, James liu [EMAIL PROTECTED] wrote:

 php not support multi thread,,,and how can u solve with multi index in
 parallel？

 now i use curl_multi

 maybe more effect way i don't know,,,so if u know, tell me. thks.


 --
 regards
 jl




--
Michael Kimsal
http://webdevradio.com





--
regards
jl

Re: resin faile to start with solr.

2007-04-27 Thread James liu


now i test the newest solr (nothing modified)

i failed to start solr with resin 3.0

2007/4/28, James liu [EMAIL PROTECTED]:


yes, i tried and failed.

afternoon i will redownload solr and test .

2007/4/28, Bill Au [EMAIL PROTECTED]:

 Have you tried using the schema.xml that is in example/solr/conf.  It
 that
 works then the problem is definitely in your schema.xml.

 Bill

 On 4/26/07, James liu  [EMAIL PROTECTED] wrote:
 
  but it is ok when i use tomcat.
 
  2007/4/26, Ken Krugler [EMAIL PROTECTED]:
  
   3.0.23 yesterday i try and fail.
   
   which version u use,,,i just not use pro version.
  
   From the error below, either your schema.xml file is messed up, or
 it
   might be that you still need to uncomment out the lines at the
   beginning of the web.xml file.
  
   These are the ones that say Uncomment if you are trying to use a
   Resin version before 3.0.19). Even though you're using a later
   version of Resin, I've had lots of issues with their XML parsing.
  
   -- Ken
  
  
  
   
   2007/4/26, Bill Au [EMAIL PROTECTED]:
   
   Have you tried resin 3.0.x?  3.1 is a development branch so it is
 less
   stable as 3.0.
   
   Bill
   
   On 4/19/07, James liu [EMAIL PROTECTED]  wrote:
   
 It work well when i use tomcat with solr
   
 now i wanna test resin,,,i use resin-3.1.0
   
 now it show me
   
 [03:47:34.047] WebApp[http://localhost:8080] starting
 [03:47:34.691 ] WebApp[http://localhost:8080/resin-doc]
 starting
 [03:47:34.927] WebApp[http://localhost:8080/solr1] starting
 [03:47:35.051] SolrServlet.init()
 [03:47:35.077] Solr home set to '/usr/solrapp/solr1/'
 [03:47:35.077] user.dir=/tmp/resin-3.1.0/bin
 [03:47:35.231] Loaded SolrConfig: solrconfig.xml
 [03:47:35.522] adding requestHandler standard=
   solr.StandardRequestHandler
 [03:47: 35.621] adding requestHandler dismax=
  solr.DisMaxRequestHandler
 [03:47:35.692] adding requestHandler partitioned=
   solr.DisMaxRequestHandler
 [03:47: 35.721] adding requestHandler instock=
   solr.DisMaxRequestHandler
 [03:47:35.819] Opening new SolrCore at /usr/solrapp/solr1/,
 dataDir=/usr/solrapp/solr1/data
 [03:47:35.884] Reading Solr Schema
  [03:47:35.916] Schema name=example
 [03:47:35.929] org.apache.solr.core.SolrException: Schema
 Parsing
   Failed
  [03:47:35.929]  at
 org.apache.solr.schema.IndexSchema.readConfig(
 IndexSchema.java:441)
 [03:47:35.929]  at org.apache.solr.schema.IndexSchema .init(
 IndexSchema.java:69)
 [03:47:35.929]  at org.apache.solr.core.SolrCore.init(
  SolrCore.java
   :191)
   
   
   
 --
 regards
  jl
  
   --
   Ken Krugler
   Krugle, Inc.
   +1 530-210-6378
   Find Code, Find Answers
  
 
 
 
  --
  regards
  jl
 




--
regards
jl





--
regards
jl

Re: Requests per second/minute monitor?

2007-04-27 Thread Otis Gospodnetic

I think the real-time-ness of this is the key.  What's the current QPS?  How 
many in-flight queries do we have?  What is the average or mean response time?  
What's the response time for the 90% percentile? etc.  Anyhow, not my current 
itch, just trying to point out what Wunder is after.

Otis


- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, April 27, 2007 4:15:33 PM
Subject: Re: Requests per second/minute monitor?

On 4/27/07, Otis Gospodnetic [EMAIL PROTECTED] wrote:
 Would creating a new QueryRateFilter servlet filter be a good place to put 
 this?  This way it could stay out of the Solr core and coult be turned on/off 
 via web.xml.

There's already gotta be some nice external tools that parse log files
and produce pretty graphs, no?

-Yonik

52 matches

Mail list logo