Re: hello, a question about solr.

2008-08-18 Thread finy finy
the name field is text,which is analysed, i use the query
name:ibmT63notebook

2008/8/18, Shalin Shekhar Mangar [EMAIL PROTECTED]:

 Hi,

 What is the type of the field name?
 Does a query like name:ibm OR name:T63 OR name:notebook work for you?

 On Mon, Aug 18, 2008 at 10:43 AM, finy finy [EMAIL PROTECTED] wrote:

  i use solr for 3 months, and i find some question follow:
 
  i check the solr source code, and find it uses lucene's QueryParser to
  parse
  user's input querystring
 
  for example, a query like this name:ibmT63notebook ,solr will parse it
  like 'name:ibm T63 notebook' , it regard this as a PhrazeQuery,so it
  will use PhrazeQuery.
 
  but i want to get a result which include ibm and T63 and notebook
 at
  any postion. for example ,it should match  some sentence like i have a
  notebook ,it is t63 of ibm..
 
  but solr doesn't do that,it consider that queryparser as  a PhrazeQuery,
  how
  can i do that as my mind?
 
  thanks,
  your friend!
 



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Jetty Multicore installation doesn't work

2008-08-18 Thread Shalin Shekhar Mangar
It seems that you are trying to use a Solr 1.3 feature (multiple cores) with
a Solr 1.2 war file.

If you want to use multiple core, you must use a nightly build of Solr and
take a look at the CoreAdmin page (formerly known as MultiCore)

http://wiki.apache.org/solr/CoreAdmin

On Mon, Aug 18, 2008 at 2:19 PM, parthad76 [EMAIL PROTECTED] wrote:


 Hi

 I tried to run the multicore installation of Jetty after downloading it.
 Its
 throwing the following error and I am not sure why. I added the
 multicore.xml file in solr.home but that too doesn't work.Can someone
 please
 help?

 INFO: Solr home set to 'multicore/'
 2008-08-18 14:18:31.796::WARN:  failed SolrRequestFilter
 java.lang.NoClassDefFoundError
at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.ja
 va:74)
at
 org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:
 40)
at
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.ja
 va:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.jav
 a:1218)
at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:
 500)
at
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448
 )
at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:
 40)
at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection
 .java:147)
at
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHan
 dlerCollection.java:161)
at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:
 40)
at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection
 .java:147)
at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:
 40)
at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:
 117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:
 40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
 2008-08-18 14:18:31.875::WARN:  failed
 [EMAIL PROTECTED]

 8e059{/solr,jar:file:/D:/Projects/SaaS%20-%20Social%20Commerce%20Platform/Core%2
 0Services/Search/apache-solr-1.2.0_Single/example/webapps/solr.war!/}
 java.lang.NoClassDefFoundError
at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.ja
 va:74)
at
 org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:
 40)
at
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.ja
 va:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.jav
 a:1218)
at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:
 500)
at
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448
 )
at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:
 40)
at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection
 .java:147)
at
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHan
 dlerCollection.java:161)
at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:
 40)
at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection
 .java:147)
at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:
 40)
at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:
 117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:
 40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
 2008-08-18 14:18:31.906::WARN:  

Re: IndexOutOfBoundsException

2008-08-18 Thread Michael McCandless


Hi Ian,

I sent this to java-user, but maybe you didn't see it, so let's try  
again on solr-user:



It looks like your stored fields file (_X.fdt) is corrupt.

Are you using multiple threads to add docs?

Can you try switching to SerialMergeScheduler to verify it's  
reproducible?


When you hit this exception, can you stop Solr and then run Lucene's
CheckIndex tool (org.apache.lucene.index.CheckIndex) to verify the
index is corrupt and see which segment it is?  Then post back the
exception and ls -l of your index directory?

If you could post the client-side code you're using to build  submit
docs to Solr, and if I can get access to the Medline content, and I
can the repro the bug, then I'll track it down...

Mike

On Aug 14, 2008, at 10:18 PM, Ian Connor wrote:


I seem to be able to reproduce this very easily and the data is
medline (so I am sure I can share it if needed with a quick email to
check).

- I am using fedora:
%uname -a
Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30
13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
%java -version
java version 1.7.0
IcedTea Runtime Environment (build 1.7.0-b21)
IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
- single core (will use shards but each machine just as one HDD so
didn't see how cores would help but I am new at this)
- next run I will keep the output to check for earlier errors
- very and I can share code + data if that will help

On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley [EMAIL PROTECTED]  
wrote:

Yikes... not good.  This shouldn't be due to anything you did wrong
Ian... it looks like a lucene bug.

Some questions:
- what platform are you running on, and what JVM?
- are you using multicore? (I fixed some index locking bugs recently)
- are there any exceptions in the log before this?
- how reproducible is this?

-Yonik

On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor [EMAIL PROTECTED]  
wrote:

Hi,

I have rebuilt my index a few times (it should get up to about 4
Million but around 1 Million it starts to fall apart).

Exception in thread Lucene Merge Thread #0
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
  at  
org 
.apache 
.lucene 
.index 
.ConcurrentMergeScheduler 
.handleMergeException(ConcurrentMergeScheduler.java:323)
  at org.apache.lucene.index.ConcurrentMergeScheduler 
$MergeThread.run(ConcurrentMergeScheduler.java:300)

Caused by: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
  at java.util.ArrayList.rangeCheck(ArrayList.java:572)
  at java.util.ArrayList.get(ArrayList.java:350)
  at  
org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
  at  
org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188)
  at  
org.apache.lucene.index.SegmentReader.document(SegmentReader.java: 
670)
  at  
org 
.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java: 
349)
  at  
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134)
  at  
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
3998)
  at  
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650)
  at  
org 
.apache 
.lucene 
.index 
.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:214)
  at org.apache.lucene.index.ConcurrentMergeScheduler 
$MergeThread.run(ConcurrentMergeScheduler.java:269)



When this happens, the disk usage goes right up and the indexing
really starts to slow down. I am using a Solr build from about a  
week

ago - so my Lucene is at 2.4 according to the war files.

Has anyone seen this error before? Is it possible to tell which  
Array
is too large? Would it be an Array I am sending in or another  
internal

one?

Regards,
Ian Connor







--
Regards,

Ian Connor




Re: IndexOutOfBoundsException

2008-08-18 Thread Ian Connor
Hi Mike,

I am currently ruling out some bad memory modules. Knowing that this
is a index corruption, makes memory corruption more likely. If
replacing RAM does not fix the problem (which I need to do anyway due
to segmentation faults), I will package up the crash into a
reproducible scenario.

On Mon, Aug 18, 2008 at 5:56 AM, Michael McCandless
[EMAIL PROTECTED] wrote:

 Hi Ian,

 I sent this to java-user, but maybe you didn't see it, so let's try again on
 solr-user:


 It looks like your stored fields file (_X.fdt) is corrupt.

 Are you using multiple threads to add docs?

 Can you try switching to SerialMergeScheduler to verify it's reproducible?

 When you hit this exception, can you stop Solr and then run Lucene's
 CheckIndex tool (org.apache.lucene.index.CheckIndex) to verify the
 index is corrupt and see which segment it is?  Then post back the
 exception and ls -l of your index directory?

 If you could post the client-side code you're using to build  submit
 docs to Solr, and if I can get access to the Medline content, and I
 can the repro the bug, then I'll track it down...

 Mike

 On Aug 14, 2008, at 10:18 PM, Ian Connor wrote:

 I seem to be able to reproduce this very easily and the data is
 medline (so I am sure I can share it if needed with a quick email to
 check).

 - I am using fedora:
 %uname -a
 Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30
 13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
 %java -version
 java version 1.7.0
 IcedTea Runtime Environment (build 1.7.0-b21)
 IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
 - single core (will use shards but each machine just as one HDD so
 didn't see how cores would help but I am new at this)
 - next run I will keep the output to check for earlier errors
 - very and I can share code + data if that will help

 On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 Yikes... not good.  This shouldn't be due to anything you did wrong
 Ian... it looks like a lucene bug.

 Some questions:
 - what platform are you running on, and what JVM?
 - are you using multicore? (I fixed some index locking bugs recently)
 - are there any exceptions in the log before this?
 - how reproducible is this?

 -Yonik

 On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor [EMAIL PROTECTED] wrote:

 Hi,

 I have rebuilt my index a few times (it should get up to about 4
 Million but around 1 Million it starts to fall apart).

 Exception in thread Lucene Merge Thread #0
 org.apache.lucene.index.MergePolicy$MergeException:
 java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
  at
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:323)
  at
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:300)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
  at java.util.ArrayList.rangeCheck(ArrayList.java:572)
  at java.util.ArrayList.get(ArrayList.java:350)
  at
 org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
  at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188)
  at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:670)
  at
 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:349)
  at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134)
  at
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3998)
  at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650)
  at
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:214)
  at
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:269)


 When this happens, the disk usage goes right up and the indexing
 really starts to slow down. I am using a Solr build from about a week
 ago - so my Lucene is at 2.4 according to the war files.

 Has anyone seen this error before? Is it possible to tell which Array
 is too large? Would it be an Array I am sending in or another internal
 one?

 Regards,
 Ian Connor





 --
 Regards,

 Ian Connor





-- 
Regards,

Ian Connor


solr doc

2008-08-18 Thread dudes dudes

Hello all, 

I'm looking for a doc that full-fill the following situation?

How can two solr servers synchronised with each other ? And if one of them down 
for whatever reason the how other one can take over...

does solr has anything like master/slave tajke over ?

any docs or suggestions are thankfully welcomed ?

many thanks 
ak
_
Win New York holidays with Kellogg’s  Live Search 
http://clk.atdmt.com/UKM/go/107571440/direct/01/

Restrict Wildcards

2008-08-18 Thread Erlend Hamnaberg
Hi list.

Is it possible to create a field type in solr that does not match with
wildcard queries?

I want it to only match the complete string, so if I have indexed foo123
and foo234 i dont want foo* to match any of these.

This does not work with just using the predefined string type.

Any suggestions?


Warm regards

Erlend Hamnaberg


Re: solr doc

2008-08-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
keep a slave handy as the second aster and if the real master goes
down let the second one take over.

On Mon, Aug 18, 2008 at 4:44 PM, dudes dudes [EMAIL PROTECTED] wrote:

 Hello all,

 I'm looking for a doc that full-fill the following situation?

 How can two solr servers synchronised with each other ? And if one of them 
 down for whatever reason the how other one can take over...

 does solr has anything like master/slave tajke over ?

 any docs or suggestions are thankfully welcomed ?

 many thanks
 ak
 _
 Win New York holidays with Kellogg's  Live Search
 http://clk.atdmt.com/UKM/go/107571440/direct/01/



-- 
--Noble Paul


Re: solr doc

2008-08-18 Thread Shalin Shekhar Mangar
Take a look at http://wiki.apache.org/solr/CollectionDistribution

On Mon, Aug 18, 2008 at 4:44 PM, dudes dudes [EMAIL PROTECTED] wrote:


 Hello all,

 I'm looking for a doc that full-fill the following situation?

 How can two solr servers synchronised with each other ? And if one of them
 down for whatever reason the how other one can take over...

 does solr has anything like master/slave tajke over ?

 any docs or suggestions are thankfully welcomed ?

 many thanks
 ak
 _
 Win New York holidays with Kellogg's  Live Search
 http://clk.atdmt.com/UKM/go/107571440/direct/01/




-- 
Regards,
Shalin Shekhar Mangar.


RE: solr doc

2008-08-18 Thread dudes dudes

thanks :)

 Date: Mon, 18 Aug 2008 17:54:20 +0530
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: Re: solr doc
 
 Take a look at http://wiki.apache.org/solr/CollectionDistribution
 
 On Mon, Aug 18, 2008 at 4:44 PM, dudes dudes  wrote:
 

 Hello all,

 I'm looking for a doc that full-fill the following situation?

 How can two solr servers synchronised with each other ? And if one of them
 down for whatever reason the how other one can take over...

 does solr has anything like master/slave tajke over ?

 any docs or suggestions are thankfully welcomed ?

 many thanks
 ak
 _
 Win New York holidays with Kellogg's  Live Search
 http://clk.atdmt.com/UKM/go/107571440/direct/01/
 
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.

_
Win a voice over part with Kung Fu Panda  Live Search   and   100’s of Kung Fu 
Panda prizes to win with Live Search
http://clk.atdmt.com/UKM/go/107571439/direct/01/

Re: partialResults, distributed search SOLR-502

2008-08-18 Thread Ian Connor
I don't think this patch is working yet. If I take a shard out of
rotation (even just one out of four), I get an error:

org.apache.solr.client.solrj.SolrServerException:
java.net.ConnectException: Connection refused

org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.ConnectException: Connection refused
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:256)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1156)

from

http://localhost:8983/solr/select/?shards=10.0.16.181:8983,10.0.16.182:8983,10.0.16.183:8983,10.0.16.184:8983/solrtimeAllowed=1000q=cancer%0D%0Aversion=2.2start=0rows=10indent=on

where .181 is down but .183-.184 are up.

On Fri, Aug 15, 2008 at 1:23 PM, Brian Whitman [EMAIL PROTECTED] wrote:
 I was going to file a ticket like this:

 A SOLR-303 query with shards=host1,host2,host3 when host3 is down returns
 an error. One of the advantages of a shard implementation is that data can
 be stored redundantly across different shards, either as direct copies (e.g.
 when host1 and host3 are snapshooter'd copies of each other) or where there
 is some data RAID that stripes indexes for redundancy.

 But then I saw SOLR-502, which appears to be committed.

 If I have the above scenario (host1,host2,host3 where host3 is not up) and
 set a timeAllowed, will I still get a 400 or will it come back with
 partial results? If not, can we think of a way to get this to work? It's
 my understanding already that duplicate docIDs are merged in the SOLR-303
 response, so other than building in some this host isn't working, just move
 on and report it and of course the work to index redundantly, we wouldn't
 need anything to achieve a good redundant shard implementation.

 B






-- 
Regards,

Ian Connor


Re: hello, a question about solr.

2008-08-18 Thread Norberto Meijome
On Mon, 18 Aug 2008 15:33:02 +0800
finy finy [EMAIL PROTECTED] wrote:

 the name field is text,which is analysed, i use the query
 name:ibmT63notebook

why do you search with no spaces? is this free text entered by a user, or is it 
part of a link which you control ?

PS: please dont top-post

_
{Beto|Norberto|Numard} Meijome

Commitment is active, not passive. Commitment is doing whatever you can to 
bring about the desired result. Anything less is half-hearted.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Boosting fields by default

2008-08-18 Thread Rakesh Godhani

Hi, I¹m using the data import mechanism to pull data into my index.  If I
want to boost a certain field for all docs, (e.g. the title over the body)
what is the best way to do that?  I was expecting to change something in
schema.xml but I don¹t see any info on boosting there.

Thanks in advance
-Rakesh





Re: Restrict Wildcards

2008-08-18 Thread Otis Gospodnetic
Erlend,

This doesn't work with string?  Maybe something there is removing numbers.  
Have you tried with an example without numbers?
e.g. fooaaa and foobbb.  Does foo* match them both?  If it does, then perhaps 
you can create a custom field type and use KeywordTokenizer in it.  Example 
schema.xml has some of this stuff.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Erlend Hamnaberg [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, August 18, 2008 7:42:22 AM
 Subject: Restrict Wildcards
 
 Hi list.
 
 Is it possible to create a field type in solr that does not match with
 wildcard queries?
 
 I want it to only match the complete string, so if I have indexed foo123
 and foo234 i dont want foo* to match any of these.
 
 This does not work with just using the predefined string type.
 
 Any suggestions?
 
 
 Warm regards
 
 Erlend Hamnaberg



Re: hello, a question about solr.

2008-08-18 Thread finy finy
because i use chinese character, for example ibm笔记本电脑
solr will parse it into a term ibm and a phraze 笔记本 电脑
can i use solr to query with a term ibm and a term 笔记本  and a term 电脑?


2008/8/18, Norberto Meijome [EMAIL PROTECTED]:

 On Mon, 18 Aug 2008 15:33:02 +0800
 finy finy [EMAIL PROTECTED] wrote:

  the name field is text,which is analysed, i use the query
  name:ibmT63notebook

 why do you search with no spaces? is this free text entered by a user, or
 is it part of a link which you control ?

 PS: please dont top-post

 _
 {Beto|Norberto|Numard} Meijome

 Commitment is active, not passive. Commitment is doing whatever you can to
 bring about the desired result. Anything less is half-hearted.

 I speak for myself, not my employer. Contents may be hot. Slippery when
 wet. Reading disclaimers makes you go blind. Writing them is worse. You have
 been Warned.



Re: Order of returned fields

2008-08-18 Thread Erik Hatcher

Yes, this is normal behavior.

Does order matter in your application?  Could you explain why?

Order is maintained with multiple values of the same field name,  
though - which is important.


Erik


On Aug 17, 2008, at 6:38 PM, Pierre Auslaender wrote:


Hello,

After a Solr query, I always get the fields back in alphabetical  
order, no matter how I insert them.

Is this the normal behaviour?

This is when adding the document...
  doc
  field name=uidch.tsr.esg.domain.ProgramCollection[id: 1]/ 
field

  field name=genrecollection/field
  field name=collectionBac à sable/field
  field name=collection.urlhttp://localhost:8080/esg/api/collections/1 
/field

  /doc

... and this is when retrieving it:
  doc
  str name=collectionBac à sable/str
  str name=collection.urlhttp://localhost:8080/esg/api/collections/1 
/str

  str name=genrecollection/str
  str name=uidch.tsr.esg.domain.ProgramCollection[id:  
1]/str

  /doc

Thanks a lot,
Pierre Auslaender




SimpleFacets: Performance Boost for Tokenized Fields

2008-08-18 Thread Fuad Efendi

Hello:


Term Vectors could be much faster than Intersectings with FilterCache.
Exception: when size of DocSet is close (more than 50%) to the total  
count of documents in the index.


When it works (100 times faster than current; very specific scenario):
- use stored Term Vectors;
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.

Obviously calculating sizes of 200,000 intersections with FilterCache  
is slover than traversing 10 - 20,000 documents for smaller DocSets  
and counting frequencies of Terms.



There are some related TODOs in SOLR source.


--
Thanks,

Fuad Efendi
416-993-2060(cell)
Tokenizer Inc.
==
http://www.linkedin.com/in/liferay
http://www.tokenizer.org







.wsdl for example....

2008-08-18 Thread Norberto Meijome
hi :)

does anyone have a .wsdl definition for the example bundled with SOLR? 

if nobody has it, would it be useful to have one ?

cheers,
B
_
{Beto|Norberto|Numard} Meijome

Intelligence: Finding an error in a Knuth text.
Stupidity: Cashing that $2.56 check you got.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Boosting fields by default

2008-08-18 Thread Shalin Shekhar Mangar
On Mon, Aug 18, 2008 at 7:12 PM, Rakesh Godhani [EMAIL PROTECTED] wrote:


 Hi, I¹m using the data import mechanism to pull data into my index.  If I
 want to boost a certain field for all docs, (e.g. the title over the body)
 what is the best way to do that?  I was expecting to change something in
 schema.xml but I don¹t see any info on boosting there.


You can specify the boost as an attribute on the field in data-config.xml

field column=title boost=2.0 /

-- 
Regards,
Shalin Shekhar Mangar.


Re: partialResults, distributed search SOLR-502

2008-08-18 Thread Ian Connor
Hi,

I have traced this as far as I can figure. It does seem as though the
patch is in the trunk. I can see that timeAllowed is certainly being
set and the lucene class TimeLimitedCollector is being used when the
param is there.

However, I have tried to trace RequestHandlerBase from this stack
through to SearchHandler and get lost when the Shard is submitted. I
can see it creates a CommonsHttpSolrServer to make the request and
that at least at this point the timeAllowed param is alive and well.

However, when I try to dive into the QueryRequest and SolrServer I
realize my java is a little rusty. Can anyone explain how the
QueryRequest here uses the code that is found in SolrIndexSearcher?

On Mon, Aug 18, 2008 at 9:31 AM, Ian Connor [EMAIL PROTECTED] wrote:
 I don't think this patch is working yet. If I take a shard out of
 rotation (even just one out of four), I get an error:

 org.apache.solr.client.solrj.SolrServerException:
 java.net.ConnectException: Connection refused

 org.apache.solr.common.SolrException:
 org.apache.solr.client.solrj.SolrServerException:
 java.net.ConnectException: Connection refused
at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:256)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1156)

 from

 http://localhost:8983/solr/select/?shards=10.0.16.181:8983,10.0.16.182:8983,10.0.16.183:8983,10.0.16.184:8983/solrtimeAllowed=1000q=cancer%0D%0Aversion=2.2start=0rows=10indent=on

 where .181 is down but .183-.184 are up.

 On Fri, Aug 15, 2008 at 1:23 PM, Brian Whitman [EMAIL PROTECTED] wrote:
 I was going to file a ticket like this:

 A SOLR-303 query with shards=host1,host2,host3 when host3 is down returns
 an error. One of the advantages of a shard implementation is that data can
 be stored redundantly across different shards, either as direct copies (e.g.
 when host1 and host3 are snapshooter'd copies of each other) or where there
 is some data RAID that stripes indexes for redundancy.

 But then I saw SOLR-502, which appears to be committed.

 If I have the above scenario (host1,host2,host3 where host3 is not up) and
 set a timeAllowed, will I still get a 400 or will it come back with
 partial results? If not, can we think of a way to get this to work? It's
 my understanding already that duplicate docIDs are merged in the SOLR-303
 response, so other than building in some this host isn't working, just move
 on and report it and of course the work to index redundantly, we wouldn't
 need anything to achieve a good redundant shard implementation.

 B






 --
 Regards,

 Ian Connor




-- 
Regards,

Ian Connor


Re: partialResults, distributed search SOLR-502

2008-08-18 Thread Brian Whitman

On Aug 18, 2008, at 11:51 AM, Ian Connor wrote:
On Mon, Aug 18, 2008 at 9:31 AM, Ian Connor [EMAIL PROTECTED]  
wrote:

I don't think this patch is working yet. If I take a shard out of
rotation (even just one out of four), I get an error:

org.apache.solr.client.solrj.SolrServerException:
java.net.ConnectException: Connection refused




It's my understanding that SOLR-502 is really only concerned with  
queries timing out (i.e. they connect but take over N seconds to  
return) If the connection gets refused then a non-solr java connection  
exception is thrown. Something would have to get put in that  
(optionally) catches connection errors and still builds the response  
from the shards that did respond.






On Fri, Aug 15, 2008 at 1:23 PM, Brian Whitman [EMAIL PROTECTED] 
 wrote:

I was going to file a ticket like this:

A SOLR-303 query with shards=host1,host2,host3 when host3 is  
down returns
an error. One of the advantages of a shard implementation is that  
data can
be stored redundantly across different shards, either as direct  
copies (e.g.
when host1 and host3 are snapshooter'd copies of each other) or  
where there

is some data RAID that stripes indexes for redundancy.

But then I saw SOLR-502, which appears to be committed.

If I have the above scenario (host1,host2,host3 where host3 is not  
up) and

set a timeAllowed, will I still get a 400 or will it come back with
partial results? If not, can we think of a way to get this to  
work? It's
my understanding already that duplicate docIDs are merged in the  
SOLR-303
response, so other than building in some this host isn't working,  
just move
on and report it and of course the work to index redundantly, we  
wouldn't

need anything to achieve a good redundant shard implementation.

B







--
Regards,

Ian Connor





--
Regards,

Ian Connor


--
http://variogr.am/





Re: Localisation, faceting

2008-08-18 Thread Otis Gospodnetic
Hi,

Regarding Boolean operator localization -- there was a person who submitted 
patches for the same functionality, but for Lucene's QueryParser.  This was a 
few years ago.  I think his patch was never applied.  Perhaps that helps.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Pierre Auslaender [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Saturday, August 16, 2008 12:50:53 PM
 Subject: Localisation, faceting
 
 Hello,
 
 I have a couple of questions:
 
 1/ Is it possible to localise query operator names without writing code? 
 For instance, I'd like to issue queries with French operator names, e.g. 
 ET (instead of AND), OU (instead of OR), etc.
 
 2/ Is it possible for Solr to generate, in the XML response, the URLs or 
 complete queries for each facet in a faceted search?
 
 Here's an example. Say my first query is :
 http://localhost:8080/solr/select?q=bacfacet=truefacet.field=kindfacet.limit=-1
 
 The kind field has three values: material, immaterial, time. I get 
 back something like this:
 
 
 
 
 
 1024
 27633
 389
 
 
 
 
 If I want to drill down into one facet, say into material, I have to 
 manually rebuild a query like this:
 http://localhost:8080/solr/select?q=bacfacet=truefacet.field=kindfacet.limit=-1fq=kind:material;
 
 It's not too difficult, but surely Solr could add this URL or query 
 string under the material element. Is this possible? Or do I have to 
 XSLT the result myself?
 
 Thanks,
 
 Pierre Auslaender



Re: Solr Logo thought

2008-08-18 Thread Otis Gospodnetic
I like it, even its asymmetry. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Lukáš Vlček [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Sunday, August 17, 2008 7:02:25 PM
 Subject: Re: Solr Logo thought
 
 Hi,
 
 My initial draft of Solr logo can be found here:
 http://picasaweb.google.com/lukas.vlcek/Solr
 The reason why I haven't attached it to SOLR-84 for now is that this is just
 draft and not final design (there are a lot of unfinished details). I would
 like to get some feedback before I spend more time on it.
 
 I had several ideas but in the end I found that the simplicity works best.
 Simple font, sun motive, just two colors. Should look fine in both the large
 and small formats. As for the favicon I would use the sun motive only - it
 means the O letter with the beams. The logo font still needs a lot of small
 (but important) touches. For now I would like to get feedback mostly about
 the basic idea.
 
 Regards,
 Lukas
 
 On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote:
 
  Plenty left, but here is a template to get things started:
  http://wiki.apache.org/solr/LogoContest
 
   Speaking of which, if we want to maintain the momentum of interest in this
  topic, someone (ie: not me) should setup a LogoContest wiki page with 
  some
  of the goals discussed in the various threads on solr-user and solr-dev
  recently, as well as draft up some good guidelines for how we should run 
  the
  contest
 
 
 
 
 -- 
 http://blog.lukas-vlcek.com/



Re: partialResults, distributed search SOLR-502

2008-08-18 Thread Otis Gospodnetic
Yes, as far as I know, what Brian said is correct.  Also, as far as I know, 
there is nothing that gracefully handles problematic Solr instances during 
distributed search.  Solr 1.4 request?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Brian Whitman [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, August 18, 2008 11:57:23 AM
 Subject: Re: partialResults, distributed search  SOLR-502
 
 On Aug 18, 2008, at 11:51 AM, Ian Connor wrote:
  On Mon, Aug 18, 2008 at 9:31 AM, Ian Connor   
  wrote:
  I don't think this patch is working yet. If I take a shard out of
  rotation (even just one out of four), I get an error:
 
  org.apache.solr.client.solrj.SolrServerException:
  java.net.ConnectException: Connection refused
 
 
 
 It's my understanding that SOLR-502 is really only concerned with  
 queries timing out (i.e. they connect but take over N seconds to  
 return) If the connection gets refused then a non-solr java connection  
 exception is thrown. Something would have to get put in that  
 (optionally) catches connection errors and still builds the response  
 from the shards that did respond.
 
 
 
 
 
  On Fri, Aug 15, 2008 at 1:23 PM, Brian Whitman 
   wrote:
  I was going to file a ticket like this:
 
  A SOLR-303 query with shards=host1,host2,host3 when host3 is  
  down returns
  an error. One of the advantages of a shard implementation is that  
  data can
  be stored redundantly across different shards, either as direct  
  copies (e.g.
  when host1 and host3 are snapshooter'd copies of each other) or  
  where there
  is some data RAID that stripes indexes for redundancy.
 
  But then I saw SOLR-502, which appears to be committed.
 
  If I have the above scenario (host1,host2,host3 where host3 is not  
  up) and
  set a timeAllowed, will I still get a 400 or will it come back with
  partial results? If not, can we think of a way to get this to  
  work? It's
  my understanding already that duplicate docIDs are merged in the  
  SOLR-303
  response, so other than building in some this host isn't working,  
  just move
  on and report it and of course the work to index redundantly, we  
  wouldn't
  need anything to achieve a good redundant shard implementation.
 
  B
 
 
 
 
 
 
  --
  Regards,
 
  Ian Connor
 
 
 
 
  -- 
  Regards,
 
  Ian Connor
 
 --
 http://variogr.am/



Re: partialResults, distributed search SOLR-502

2008-08-18 Thread Ian Connor
When I put logging into SolrIndexSearcher just to see if we get there,
I don't see any messages. However, I do see logging without a problem
in QueryRequest and above. My issue is that I just cannot understand
how SolrIndexSearcher comes into play here.

On Mon, Aug 18, 2008 at 11:57 AM, Brian Whitman
[EMAIL PROTECTED] wrote:
 On Aug 18, 2008, at 11:51 AM, Ian Connor wrote:

 On Mon, Aug 18, 2008 at 9:31 AM, Ian Connor [EMAIL PROTECTED] wrote:

 I don't think this patch is working yet. If I take a shard out of
 rotation (even just one out of four), I get an error:

 org.apache.solr.client.solrj.SolrServerException:
 java.net.ConnectException: Connection refused



 It's my understanding that SOLR-502 is really only concerned with queries
 timing out (i.e. they connect but take over N seconds to return) If the
 connection gets refused then a non-solr java connection exception is thrown.
 Something would have to get put in that (optionally) catches connection
 errors and still builds the response from the shards that did respond.





 On Fri, Aug 15, 2008 at 1:23 PM, Brian Whitman [EMAIL PROTECTED]
 wrote:

 I was going to file a ticket like this:

 A SOLR-303 query with shards=host1,host2,host3 when host3 is down
 returns
 an error. One of the advantages of a shard implementation is that data
 can
 be stored redundantly across different shards, either as direct copies
 (e.g.
 when host1 and host3 are snapshooter'd copies of each other) or where
 there
 is some data RAID that stripes indexes for redundancy.

 But then I saw SOLR-502, which appears to be committed.

 If I have the above scenario (host1,host2,host3 where host3 is not up)
 and
 set a timeAllowed, will I still get a 400 or will it come back with
 partial results? If not, can we think of a way to get this to work?
 It's
 my understanding already that duplicate docIDs are merged in the
 SOLR-303
 response, so other than building in some this host isn't working, just
 move
 on and report it and of course the work to index redundantly, we
 wouldn't
 need anything to achieve a good redundant shard implementation.

 B






 --
 Regards,

 Ian Connor




 --
 Regards,

 Ian Connor

 --
 http://variogr.am/







-- 
Regards,

Ian Connor
1 Leighton St #605
Cambridge, MA 02141
Direct Line: +1 (978) 672
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor


Re: Boosting fields by default

2008-08-18 Thread Rakesh Godhani
Sweet, cool, thanks
-Rakesh



On 8/18/08 11:31 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote:

 On Mon, Aug 18, 2008 at 7:12 PM, Rakesh Godhani [EMAIL PROTECTED] wrote:
 
 
 Hi, I¹m using the data import mechanism to pull data into my index.  If I
 want to boost a certain field for all docs, (e.g. the title over the body)
 what is the best way to do that?  I was expecting to change something in
 schema.xml but I don¹t see any info on boosting there.
 
 
 You can specify the boost as an attribute on the field in data-config.xml
 
 field column=title boost=2.0 /




Re: partialResults, distributed search SOLR-50

2008-08-18 Thread Yonik Seeley
On Mon, Aug 18, 2008 at 12:16 PM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 Yes, as far as I know, what Brian said is correct.  Also, as far as I know, 
 there is nothing that gracefully handles problematic Solr instances during 
 distributed search.

Right... we punted that issue to a load balancer (which assumes that
you have more than one copy of each shard).

-Yonik


Re: partialResults, distributed search SOLR-50

2008-08-18 Thread Brian Whitman

On Aug 18, 2008, at 12:31 PM, Yonik Seeley wrote:


On Mon, Aug 18, 2008 at 12:16 PM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
Yes, as far as I know, what Brian said is correct.  Also, as far as  
I know, there is nothing that gracefully handles problematic Solr  
instances during distributed search.


Right... we punted that issue to a load balancer (which assumes that
you have more than one copy of each shard).



Can you explain how you have a LB handling shards? Do you put a  
separate LB in front of each group of replica shards?




Re: partialResults, distributed search SOLR-50

2008-08-18 Thread Otis Gospodnetic
Right.  And a LB that is configured to, say, make use of Solr's ping response 
to determine if Solr healthy?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Yonik Seeley [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, August 18, 2008 12:31:03 PM
 Subject: Re: partialResults, distributed search  SOLR-50
 
 On Mon, Aug 18, 2008 at 12:16 PM, Otis Gospodnetic
 wrote:
  Yes, as far as I know, what Brian said is correct.  Also, as far as I know, 
 there is nothing that gracefully handles problematic Solr instances during 
 distributed search.
 
 Right... we punted that issue to a load balancer (which assumes that
 you have more than one copy of each shard).
 
 -Yonik



Re: Administrative questions

2008-08-18 Thread Otis Gospodnetic
Thanks!
I put that up on http://wiki.apache.org/solr/Daemontools , so if you want to 
add/change anything, you can do so at any time (anyone can edit or create wiki 
pages).

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Jon Drukman [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Friday, August 15, 2008 4:47:27 PM
 Subject: Re: Administrative questions
 
 Jason Rennie wrote:
  On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman wrote:
  
  Duh.  I should have thought of that.  I'm a big fan of djbdns so I'm quite
  familiar with daemontools.
 
  Thanks!
 
  
  :)  My pleasure.  Was nice to hear recently that DJB is moving toward more
  flexible licensing terms.  For anyone unfamiliar w/ daemontools, here's
  DJB's explanation of why they rock compared to inittab, ttys, init.d, and
  rc.local:
  
  http://cr.yp.to/daemontools/faq/create.html#why
 
 in case anybody wants to know, here's how to run solr under daemontools.
 
 1. install daemontools
 2. create /etc/solr
 3. create a user and group called solr
 4. create shell script /etc/solr/run  (edit to taste, i'm using the 
 default jetty that comes with solr)
 
 #!/bin/sh
 exec 21
 cd /usr/local/apache-solr-1.2.0/example
 exec setuidgid solr java -jar start.jar
 
 
 4. create /etc/solr/log/run containing:
 
 #!/bin/sh
 exec setuidgid solr multilog t ./main
 
 5. ln -s /etc/solr /service/solr
 
 that is all.  as long as you've got svscan set to launch when the system 
 boots, solr will run and auto-restart on crashes.  logs will be in 
 /service/solr/log/main (auto-rotated).
 
 yay.
 -jsd-



Re: partialResults, distributed search SOLR-50

2008-08-18 Thread Yonik Seeley
On Mon, Aug 18, 2008 at 12:34 PM, Brian Whitman
[EMAIL PROTECTED] wrote:
 On Aug 18, 2008, at 12:31 PM, Yonik Seeley wrote:

 On Mon, Aug 18, 2008 at 12:16 PM, Otis Gospodnetic
 [EMAIL PROTECTED] wrote:

 Yes, as far as I know, what Brian said is correct.  Also, as far as I
 know, there is nothing that gracefully handles problematic Solr instances
 during distributed search.

 Right... we punted that issue to a load balancer (which assumes that
 you have more than one copy of each shard).


 Can you explain how you have a LB handling shards? Do you put a separate LB
 in front of each group of replica shards?

A single load balancer should be fine... each shard has it's own VIP
which maps to 2 or more solr servers with a replica of that shard.

-Yonik


Re: partialResults, distributed search SOLR-5

2008-08-18 Thread Ian Connor
My interest now is beyond the initial problem and would love if
someone could explain how you get from a QueryRequest being created to
using the code in SolrIndexSearcher.

On Mon, Aug 18, 2008 at 12:34 PM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 Right.  And a LB that is configured to, say, make use of Solr's ping response 
 to determine if Solr healthy?


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Yonik Seeley [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, August 18, 2008 12:31:03 PM
 Subject: Re: partialResults, distributed search  SOLR-50

 On Mon, Aug 18, 2008 at 12:16 PM, Otis Gospodnetic
 wrote:
  Yes, as far as I know, what Brian said is correct.  Also, as far as I know,
 there is nothing that gracefully handles problematic Solr instances during
 distributed search.

 Right... we punted that issue to a load balancer (which assumes that
 you have more than one copy of each shard).

 -Yonik





-- 
Regards,

Ian Connor


Re: Auto commit error and java.io.FileNotFoundException

2008-08-18 Thread Chris Harris
I'm assuming that one way to do this would be to set the logging level
to FINEST in the logging page in the solr admin tool, and then to
make sure my logging.properties file is also set to record the FINEST
logging level. Let me know if that won't enable to sort of debugging
info you are talking about. (I do understand that the logging page in
the admin tool makes temporary changes that will get reverted when you
restart Solr.)

On Mon, Aug 18, 2008 at 3:05 AM, Michael McCandless
[EMAIL PROTECTED] wrote:

 Since it seems reproducible, could you turn on debugging output
 (IndexWriter.setInfoStream(...)), get the FileNotFoundException to happen
 again, and post the resulting output?

 Mike


Re: Auto commit error and java.io.FileNotFoundException

2008-08-18 Thread Michael McCandless


Alas, I think this won't actually turn on IndexWriter's infoStream.

I think you may need to modify the SolrIndexWriter.java sources, in  
the init method, to add a call to setInfoStream(...).


Can any Solr developers confirm this?

Mike

Chris Harris wrote:


I'm assuming that one way to do this would be to set the logging level
to FINEST in the logging page in the solr admin tool, and then to
make sure my logging.properties file is also set to record the FINEST
logging level. Let me know if that won't enable to sort of debugging
info you are talking about. (I do understand that the logging page in
the admin tool makes temporary changes that will get reverted when you
restart Solr.)

On Mon, Aug 18, 2008 at 3:05 AM, Michael McCandless
[EMAIL PROTECTED] wrote:


Since it seems reproducible, could you turn on debugging output
(IndexWriter.setInfoStream(...)), get the FileNotFoundException to  
happen

again, and post the resulting output?

Mike




Synonyms with spaces not working

2008-08-18 Thread Matthew Runo

Hello folks!

Sorry to ask such a basic question but synonyms might be the end of  
me.. I suspect that there is something fundamentally wrong with the  
field type I've set up..


fieldType name=text class=solr.TextField  
positionIncrementGap=100

analyzer
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt  
ignoreCase=true expand=true/


tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1  
generateNumberParts=1 catenateWords=1 catenateNumbers=1  
catenateAll=0 splitOnCaseChange=1/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

In synonyms.txt I have a *large* list of synonyms in the following  
format..


a, b, c d e, f, g = something

I'm having the behavior that searches for a, b, f, and g all work, but  
the c d e does not. I suspected that was because things were getting  
split on white space before they were going to the synonym filter, so  
I moved the synonym filters to be before the tokenizer. Something's  
still wrong though... any help would be most appreciated!


Thank you for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833



Re: Synonyms with spaces not working

2008-08-18 Thread Otis Gospodnetic
Matthew, there is a good page page about synonyms on the Wiki that covers the 
multi-word synonyms stuff.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Matthew Runo [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, August 18, 2008 1:39:52 PM
 Subject: Synonyms with spaces not working
 
 Hello folks!
 
 Sorry to ask such a basic question but synonyms might be the end of  
 me.. I suspect that there is something fundamentally wrong with the  
 field type I've set up..
 
 
 positionIncrementGap=100
 
 
 
 
 ignoreCase=true expand=true/
 
 
 
 words=stopwords.txt/
 
 protected=protwords.txt/
 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1  
 catenateAll=0 splitOnCaseChange=1/
 
 
 
 
 In synonyms.txt I have a *large* list of synonyms in the following  
 format..
 
 a, b, c d e, f, g = something
 
 I'm having the behavior that searches for a, b, f, and g all work, but  
 the c d e does not. I suspected that was because things were getting  
 split on white space before they were going to the synonym filter, so  
 I moved the synonym filters to be before the tokenizer. Something's  
 still wrong though... any help would be most appreciated!
 
 Thank you for your time!
 
 Matthew Runo
 Software Engineer, Zappos.com
 [EMAIL PROTECTED] - 702-943-7833



Re: Auto commit error and java.io.FileNotFoundException

2008-08-18 Thread Fuad Efendi

Lucene v.2.1 has a bug with autocommit...



Re: Auto commit error and java.io.FileNotFoundException

2008-08-18 Thread Yonik Seeley
On Mon, Aug 18, 2008 at 1:12 PM, Michael McCandless
[EMAIL PROTECTED] wrote:

 Alas, I think this won't actually turn on IndexWriter's infoStream.

 I think you may need to modify the SolrIndexWriter.java sources, in the init
 method, to add a call to setInfoStream(...).

 Can any Solr developers confirm this?

Yeah, we don't have that feature yet.

-Yonik


Re: Auto commit error and java.io.FileNotFoundException

2008-08-18 Thread Yonik Seeley
On Mon, Aug 18, 2008 at 6:05 AM, Michael McCandless
[EMAIL PROTECTED] wrote:
 The output from CheckIndex shows quite a few missing files!  Is there any
 possibility that two instances of Solr were somehow sharing the same index
 directory?

To eliminate that possibility, the lock factory should be set to
simple and unlockOnStartup should be false in solrconfig.xml

-Yonik


RE: Synonyms with spaces not working

2008-08-18 Thread Steven A Rowe
Hi Matthew,

On 08/18/2008 at 1:39 PM, Matthew Runo wrote:
 fieldType name=text class=solr.TextField positionIncrementGap=100 
   analyzer 
 [...]
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 [...]

I can see from SOLR-702 that most of your synonym rules have a single 
term/phrase on the right-hand side.

The SynonymFilterFactory section of the AnalyzersTokenizersTokenFilters wiki 
page 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46
 says:

   # If expand==true, ipod, i-pod, i pod is equivalent to the explicit 
mapping:
   ipod, i-pod, i pod = ipod, i-pod, i pod

AFAICT from looking at the code, however, the expand option is ignored when 
there is an explict right-hand side of a rule (i.e. = something).

 a, b, c d e, f, g = something

So documents containing c d e (or a or b or f or g) will only be 
indexed with something.

 I'm having the behavior that searches for a, b, f, and g all
 work, but the c d e does not.

As Otis mentioned earlier in this thread, the above-linked wiki page mentions 
some gotchas about mixing phrases, synonyms, and the Lucene QueryParser.

Perhaps you could address the problem by creating separate rules for your 
phrasal terms, e.g.:

   a, b, f, g = something
   c d e, something

Using the above rule with no right-hand side, and with expand==true, both c d 
e and something will be indexed for documents containing c d e.

Steve


Re: .wsdl for example....

2008-08-18 Thread Erik Hatcher


On Aug 18, 2008, at 11:27 AM, Norberto Meijome wrote:

does anyone have a .wsdl definition for the example bundled with SOLR?


WSDL?   surely you jest.

Erik



Re: partialResults, distributed search SOLR-50

2008-08-18 Thread Ian Connor
I have been using HAProxy on different ports (same IP). It seems to
work but have not tested it in production yet.

On Mon, Aug 18, 2008 at 12:37 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Mon, Aug 18, 2008 at 12:34 PM, Brian Whitman
 [EMAIL PROTECTED] wrote:
 On Aug 18, 2008, at 12:31 PM, Yonik Seeley wrote:

 On Mon, Aug 18, 2008 at 12:16 PM, Otis Gospodnetic
 [EMAIL PROTECTED] wrote:

 Yes, as far as I know, what Brian said is correct.  Also, as far as I
 know, there is nothing that gracefully handles problematic Solr instances
 during distributed search.

 Right... we punted that issue to a load balancer (which assumes that
 you have more than one copy of each shard).


 Can you explain how you have a LB handling shards? Do you put a separate LB
 in front of each group of replica shards?

 A single load balancer should be fine... each shard has it's own VIP
 which maps to 2 or more solr servers with a replica of that shard.

 -Yonik




-- 
Regards,

Ian Connor


Re: Order of returned fields

2008-08-18 Thread Pierre Auslaender
Order matters in my application because I'm indexing structured data - 
actually, a domain object model (a bit like with Hibernate Search), only 
I'm adding parents to children, instead of children to parents. So say I 
have Cities and People, with a 1-N relationship between City and People. 
I'm indexing documents for Cities, and documents for People, and the 
documents for People contain the fields of the City they're living in.


When I display the results, I'd like the People fields to display before 
the City fields. I can parse the Solr response and rearrange the fields 
(in the Java middle-tier, or with XSLT, or in the Javascript client), 
but then I have to know of the domain in too many places. I have to 
know of the domain in my Java application, in the SOLR schema file, 
and in the Javascript that rearranges the fields... I thought maybe I 
could avoid the latter and put as much application information as 
possible in the SOLR schema, for instance specifiy an order for the 
returned fields...


Thanks anyway,

Pierre

Erik Hatcher a écrit :

Yes, this is normal behavior.

Does order matter in your application?  Could you explain why?

Order is maintained with multiple values of the same field name, 
though - which is important.


Erik


On Aug 17, 2008, at 6:38 PM, Pierre Auslaender wrote:


Hello,

After a Solr query, I always get the fields back in alphabetical 
order, no matter how I insert them.

Is this the normal behaviour?

This is when adding the document...
  doc
  field name=uidch.tsr.esg.domain.ProgramCollection[id: 
1]/field

  field name=genrecollection/field
  field name=collectionBac à sable/field
  field 
name=collection.urlhttp://localhost:8080/esg/api/collections/1/field 


  /doc

... and this is when retrieving it:
  doc
  str name=collectionBac à sable/str
  str 
name=collection.urlhttp://localhost:8080/esg/api/collections/1/str

  str name=genrecollection/str
  str name=uidch.tsr.esg.domain.ProgramCollection[id: 
1]/str

  /doc

Thanks a lot,
Pierre Auslaender





Re: Localisation, faceting

2008-08-18 Thread Pierre Auslaender
Would that be of any interest to the SOLR / Lucene community, given the 
trend to globalisation / regionalisation ? My base is Switzerland - 4 
official national tongues, none of them English.


If one were to localise the boolean operators, would that have to be at 
the Lucene level, or could that be done at the SOLR level ?


Thanks,
Pierre

Otis Gospodnetic a écrit :

Hi,

Regarding Boolean operator localization -- there was a person who submitted 
patches for the same functionality, but for Lucene's QueryParser.  This was a 
few years ago.  I think his patch was never applied.  Perhaps that helps.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: Pierre Auslaender [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, August 16, 2008 12:50:53 PM
Subject: Localisation, faceting

Hello,

I have a couple of questions:

1/ Is it possible to localise query operator names without writing code? 
For instance, I'd like to issue queries with French operator names, e.g. 
ET (instead of AND), OU (instead of OR), etc.


2/ Is it possible for Solr to generate, in the XML response, the URLs or 
complete queries for each facet in a faceted search?


Here's an example. Say my first query is :
http://localhost:8080/solr/select?q=bacfacet=truefacet.field=kindfacet.limit=-1

The kind field has three values: material, immaterial, time. I get 
back something like this:






1024

27633
389




If I want to drill down into one facet, say into material, I have to 
manually rebuild a query like this:

http://localhost:8080/solr/select?q=bacfacet=truefacet.field=kindfacet.limit=-1fq=kind:material;

It's not too difficult, but surely Solr could add this URL or query 
string under the material element. Is this possible? Or do I have to 
XSLT the result myself?


Thanks,

Pierre Auslaender




  


Re: Localisation, faceting

2008-08-18 Thread Walter Underwood
I would do it in the client, even if it meant parsing the query,
modifying it, then unparsing it.

This is exactly like changing To: to Zu: in a mail header.
Show that in the client, but make it standard before it goes
onto the network.

If queries at the Solr/Lucene level are standard, then users
with different locale settings could share saved queries.

wunder

On 8/18/08 2:18 PM, Pierre Auslaender [EMAIL PROTECTED] wrote:

 Would that be of any interest to the SOLR / Lucene community, given the
 trend to globalisation / regionalisation ? My base is Switzerland - 4
 official national tongues, none of them English.
 
 If one were to localise the boolean operators, would that have to be at
 the Lucene level, or could that be done at the SOLR level ?
 
 Thanks,
 Pierre
 
 Otis Gospodnetic a écrit :
 Hi,
 
 Regarding Boolean operator localization -- there was a person who submitted
 patches for the same functionality, but for Lucene's QueryParser.  This was a
 few years ago.  I think his patch was never applied.  Perhaps that helps.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
   
 From: Pierre Auslaender [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Saturday, August 16, 2008 12:50:53 PM
 Subject: Localisation, faceting
 
 Hello,
 
 I have a couple of questions:
 
 1/ Is it possible to localise query operator names without writing code?
 For instance, I'd like to issue queries with French operator names, e.g.
 ET (instead of AND), OU (instead of OR), etc.
 
 2/ Is it possible for Solr to generate, in the XML response, the URLs or
 complete queries for each facet in a faceted search?
 
 Here's an example. Say my first query is :
 http://localhost:8080/solr/select?q=bacfacet=truefacet.field=kindfacet.li
 mit=-1
 
 The kind field has three values: material, immaterial, time. I get
 back something like this:
 
 
 
 
 
 1024
 27633
 389
 
 
 
 
 If I want to drill down into one facet, say into material, I have to
 manually rebuild a query like this:
 http://localhost:8080/solr/select?q=bacfacet=truefacet.field=kindfacet.li
 mit=-1fq=kind:material
 
 It's not too difficult, but surely Solr could add this URL or query
 string under the material element. Is this possible? Or do I have to
 XSLT the result myself?
 
 Thanks,
 
 Pierre Auslaender
 
 
 
   



Re: Order of returned fields

2008-08-18 Thread Alexander Ramos Jardim
Hey Pierre,

I don't know if my case helps you, but what I do to keep relational
information is to put the related data all in the same field.

Let me give you an example:

I have a product index. Each product has a list of manufacturer properties,
like dimensions, color, connections supported (usb, bluetooth and so on),
etc etc etc. Each property  belongs to a context, so I index data following
this model:

propertyId ^ propertyLabel ^ propertyType ^ propertyValue

Then I parse each result returned on my application.

Does that help you?

2008/8/18 Pierre Auslaender [EMAIL PROTECTED]

 Order matters in my application because I'm indexing structured data -
 actually, a domain object model (a bit like with Hibernate Search), only I'm
 adding parents to children, instead of children to parents. So say I have
 Cities and People, with a 1-N relationship between City and People. I'm
 indexing documents for Cities, and documents for People, and the documents
 for People contain the fields of the City they're living in.

 When I display the results, I'd like the People fields to display before
 the City fields. I can parse the Solr response and rearrange the fields (in
 the Java middle-tier, or with XSLT, or in the Javascript client), but then I
 have to know of the domain in too many places. I have to know of the
 domain in my Java application, in the SOLR schema file, and in the
 Javascript that rearranges the fields... I thought maybe I could avoid the
 latter and put as much application information as possible in the SOLR
 schema, for instance specifiy an order for the returned fields...

 Thanks anyway,

 Pierre

 Erik Hatcher a écrit :

  Yes, this is normal behavior.

 Does order matter in your application?  Could you explain why?

 Order is maintained with multiple values of the same field name, though -
 which is important.

Erik


 On Aug 17, 2008, at 6:38 PM, Pierre Auslaender wrote:

  Hello,

 After a Solr query, I always get the fields back in alphabetical order,
 no matter how I insert them.
 Is this the normal behaviour?

 This is when adding the document...
  doc
  field name=uidch.tsr.esg.domain.ProgramCollection[id: 1]/field
  field name=genrecollection/field
  field name=collectionBac à sable/field
  field name=collection.url
 http://localhost:8080/esg/api/collections/1/field
  /doc

 ... and this is when retrieving it:
  doc
  str name=collectionBac à sable/str
  str name=collection.url
 http://localhost:8080/esg/api/collections/1/str
  str name=genrecollection/str
  str name=uidch.tsr.esg.domain.ProgramCollection[id: 1]/str
  /doc

 Thanks a lot,
 Pierre Auslaender






-- 
Alexander Ramos Jardim


Re: Localisation, faceting

2008-08-18 Thread Pierre Auslaender
Excellent point about the saved queries. Thanks! So I could sniff the 
locale (from the HTML page or the Java application,...) and infer the 
query language, or try to do automatic guessing of the language 
based on the operator names (if they don't collide with indexed terms).


This brings up an other question: which query parser should I use? I 
guess it would be a bad idea to invent one, it would be better to reuse 
or adapt the query parser used by SOLR - or is it Lucene? Can you 
point me to the parser?


Thanks,
Pierre

Walter Underwood a écrit :

I would do it in the client, even if it meant parsing the query,
modifying it, then unparsing it.

This is exactly like changing To: to Zu: in a mail header.
Show that in the client, but make it standard before it goes
onto the network.

If queries at the Solr/Lucene level are standard, then users
with different locale settings could share saved queries.

wunder

On 8/18/08 2:18 PM, Pierre Auslaender [EMAIL PROTECTED] wrote:

  

Would that be of any interest to the SOLR / Lucene community, given the
trend to globalisation / regionalisation ? My base is Switzerland - 4
official national tongues, none of them English.

If one were to localise the boolean operators, would that have to be at
the Lucene level, or could that be done at the SOLR level ?

Thanks,
Pierre

Otis Gospodnetic a écrit :


Hi,

Regarding Boolean operator localization -- there was a person who submitted
patches for the same functionality, but for Lucene's QueryParser.  This was a
few years ago.  I think his patch was never applied.  Perhaps that helps.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  
  

From: Pierre Auslaender [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, August 16, 2008 12:50:53 PM
Subject: Localisation, faceting

Hello,

I have a couple of questions:

1/ Is it possible to localise query operator names without writing code?
For instance, I'd like to issue queries with French operator names, e.g.
ET (instead of AND), OU (instead of OR), etc.

2/ Is it possible for Solr to generate, in the XML response, the URLs or
complete queries for each facet in a faceted search?

Here's an example. Say my first query is :
http://localhost:8080/solr/select?q=bacfacet=truefacet.field=kindfacet.li
mit=-1

The kind field has three values: material, immaterial, time. I get
back something like this:





1024

27633
389





If I want to drill down into one facet, say into material, I have to
manually rebuild a query like this:
http://localhost:8080/solr/select?q=bacfacet=truefacet.field=kindfacet.li
mit=-1fq=kind:material

It's not too difficult, but surely Solr could add this URL or query
string under the material element. Is this possible? Or do I have to
XSLT the result myself?

Thanks,

Pierre Auslaender


  
  



  


Re: .wsdl for example....

2008-08-18 Thread Alexander Ramos Jardim
Do you wanna a full web service for SOLR example? How a .wsdl will help you?
Why don't you use the HTTP interface SOLR provides?

Anyways, if you need to develop a web service (SOAP compliant) to access
SOLR, just remember to use an embedded core on your webservice.

2008/8/18 Norberto Meijome [EMAIL PROTECTED]

 hi :)

 does anyone have a .wsdl definition for the example bundled with SOLR?

 if nobody has it, would it be useful to have one ?

 cheers,
 B
 _
 {Beto|Norberto|Numard} Meijome

 Intelligence: Finding an error in a Knuth text.
 Stupidity: Cashing that $2.56 check you got.

 I speak for myself, not my employer. Contents may be hot. Slippery when
 wet. Reading disclaimers makes you go blind. Writing them is worse. You have
 been Warned.




-- 
Alexander Ramos Jardim


Re: Restrict Wildcards

2008-08-18 Thread Erlend Hamnaberg
I will try this tomorrow.

Thanks for the suggestion.

- Erlend

On Mon, Aug 18, 2008 at 5:01 PM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:

 Erlend,

 This doesn't work with string?  Maybe something there is removing
 numbers.  Have you tried with an example without numbers?
 e.g. fooaaa and foobbb.  Does foo* match them both?  If it does, then
 perhaps you can create a custom field type and use KeywordTokenizer in it.
  Example schema.xml has some of this stuff.


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Erlend Hamnaberg [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Monday, August 18, 2008 7:42:22 AM
  Subject: Restrict Wildcards
 
  Hi list.
 
  Is it possible to create a field type in solr that does not match with
  wildcard queries?
 
  I want it to only match the complete string, so if I have indexed
 foo123
  and foo234 i dont want foo* to match any of these.
 
  This does not work with just using the predefined string type.
 
  Any suggestions?
 
 
  Warm regards
 
  Erlend Hamnaberg




Solr won't start under jetty on RHEL5.2

2008-08-18 Thread Jon Drukman
I just migrated my solr instance to a new server, running RHEL5.2.  I 
installed java from yum but I suspect it's different from the one I used 
to use.


Anyway, my Solr no longer works.

2008-08-18 18:01:12.079::INFO:  Logging to STDERR via 
org.mortbay.log.StdErrLog

2008-08-18 18:01:12.229::INFO:  jetty-6.1.3
2008-08-18 18:01:12.330::INFO:  Extract 
jar:file:/home/apps/solr/solr-1.2.0/webapps/solr.war!/ to 
/tmp/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp
2008-08-18 18:01:12.452::INFO:  NO JSP Support for /solr, did not find 
org.apache.jasper.servlet.JspServlet

18-Aug-08 6:01:12 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
18-Aug-08 6:01:12 PM org.apache.solr.core.Config getInstanceDir
INFO: JNDI not configured for Solr (NoInitialContextEx)
18-Aug-08 6:01:12 PM org.apache.solr.core.Config getInstanceDir
INFO: Solr home defaulted to 'null' (could not find system property or JNDI)
18-Aug-08 6:01:12 PM org.apache.solr.core.Config setInstanceDir
INFO: Solr home set to 'solr/'
18-Aug-08 6:01:12 PM org.apache.solr.core.SolrConfig initConfig
INFO: Loaded SolrConfig: solrconfig.xml
18-Aug-08 6:01:12 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: user.dir=/home/apps/solr/solr-1.2.0
2008-08-18 18:01:12.663::WARN:  failed SolrRequestFilter
java.lang.NoClassDefFoundError: org.apache.solr.core.SolrCore
   at java.lang.Class.initializeClass(libgcj.so.7rh)
   at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)

   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)

   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
   at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
   at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)

   at org.mortbay.jetty.Server.doStart(Server.java:210)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)

   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
   at java.lang.reflect.Method.invoke(libgcj.so.7rh)
   at org.mortbay.start.Main.invokeMain(Main.java:183)
   at org.mortbay.start.Main.start(Main.java:497)
   at org.mortbay.start.Main.main(Main.java:115)


All attempts to load solr pages result in 404 not found errors.  I 
suspect this is a Jetty configuration problem but I know nothing about 
jetty or servlet containers or anything like that.  Could someone 
explain in words of one syllable or less how to get it to find the 
installation please?


Thanks
-jsd-



Re: Solr won't start under jetty on RHEL5.2

2008-08-18 Thread Jon Drukman

Jon Drukman wrote:
I just migrated my solr instance to a new server, running RHEL5.2.  I 
installed java from yum but I suspect it's different from the one I used 
to use.



Turns out my instincts were correct.  The version from yum does not 
work. I installed the official sun jdk and now it starts fine.


bad:

java version 1.4.2
gij (GNU libgcj) version 4.1.2 20071124 (Red Hat 4.1.2-42)

good:

java version 1.6.0_07
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)


-jsd-



Re: hello, a question about solr.

2008-08-18 Thread Norberto Meijome
On Mon, 18 Aug 2008 23:07:19 +0800
finy finy [EMAIL PROTECTED] wrote:

 because i use chinese character, for example ibm___
 solr will parse it into a term ibm and a phraze _ __
 can i use solr to query with a term ibm and a term _  and a term 
 __?

Hi finy,
you should look into n-gram tokenizers. Not sure if it is documented in the 
wiki, but it has been discussed in the mailing list quite a few times.

in short, an n-gram tokenizer breaks your input into blocks of characters of 
size n , which are then used to compare in the index. I think for Chinese , 
bi-gram is the favoured approach.

good luck,
B
_
{Beto|Norberto|Numard} Meijome

I used to hate weddings; all the Grandmas would poke me and
say, You're next sonny! They stopped doing that when i
started to do it to them at funerals.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: hello, a question about solr.

2008-08-18 Thread finy finy
thanks for your help.

could you give me your gmail talk address or msn?


2008/8/19, Norberto Meijome [EMAIL PROTECTED]:

 On Mon, 18 Aug 2008 23:07:19 +0800
 finy finy [EMAIL PROTECTED] wrote:

  because i use chinese character, for example ibm___
  solr will parse it into a term ibm and a phraze _ __
  can i use solr to query with a term ibm and a term _  and a
 term __?

 Hi finy,
 you should look into n-gram tokenizers. Not sure if it is documented in the
 wiki, but it has been discussed in the mailing list quite a few times.

 in short, an n-gram tokenizer breaks your input into blocks of characters
 of size n , which are then used to compare in the index. I think for Chinese
 , bi-gram is the favoured approach.

 good luck,
 B
 _
 {Beto|Norberto|Numard} Meijome

 I used to hate weddings; all the Grandmas would poke me and
 say, You're next sonny! They stopped doing that when i
 started to do it to them at funerals.

 I speak for myself, not my employer. Contents may be hot. Slippery when
 wet. Reading disclaimers makes you go blind. Writing them is worse. You have
 been Warned.



Re: .wsdl for example....

2008-08-18 Thread Norberto Meijome
On Mon, 18 Aug 2008 19:08:24 -0300
Alexander Ramos Jardim [EMAIL PROTECTED] wrote:

 Do you wanna a full web service for SOLR example? How a .wsdl will help you?
 Why don't you use the HTTP interface SOLR provides?
 
 Anyways, if you need to develop a web service (SOAP compliant) to access
 SOLR, just remember to use an embedded core on your webservice.

On Mon, 18 Aug 2008 15:37:24 -0400
Erik Hatcher [EMAIL PROTECTED] wrote:

 WSDL?   surely you jest.
 
   Erik

:D I obviously said something terribly stupid, oh well, not the first time and 
most likely wont be the last one either.

Anyway, the reason for my asking is : 
 - I've put together a SOLR search service with a few cores. Nothing fancy, it 
works great as is.
 -  the .NET developer I am working with on this  asked for a .wsdl (or .asmx) 
file to import into Visual Studio ... yes, he can access the service directly, 
but he seems to prefer a more 'well defined' interface (haven't really decided 
whether it is worth the effort, but that is another question altogether)

The way I see it, SOLR is a  RESTful service. I am not looking into wrapping 
the whole thing behind SOAP ( I actually much prefer REST than SOAP, but that 
is entering into quasi-religious grounds...) - which should be able to be 
defined with a .wsdl ( v 1.1 should suffice as only GET + POST are supported in 
SOLR anyway).

Am I missing anything here ?

thanks in advance for your time + thoughts ,
B
_
{Beto|Norberto|Numard} Meijome

He has no enemies, but is intensely disliked by his friends.
  Oscar Wilde

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: .wsdl for example....

2008-08-18 Thread Norberto Meijome
On Tue, 19 Aug 2008 11:23:48 +1000
Norberto Meijome [EMAIL PROTECTED] wrote:

 On Mon, 18 Aug 2008 19:08:24 -0300
 Alexander Ramos Jardim [EMAIL PROTECTED] wrote:
 
  Do you wanna a full web service for SOLR example? How a .wsdl will help you?
  Why don't you use the HTTP interface SOLR provides?
  
  Anyways, if you need to develop a web service (SOAP compliant) to access
  SOLR, just remember to use an embedded core on your webservice.
 
 On Mon, 18 Aug 2008 15:37:24 -0400
 Erik Hatcher [EMAIL PROTECTED] wrote:
 
  WSDL?   surely you jest.
  
  Erik
 
 :D I obviously said something terribly stupid, oh well, not the first time 
 and most likely wont be the last one either.
 
 Anyway, the reason for my asking is : 
  - I've put together a SOLR search service with a few cores. Nothing fancy, 
 it works great as is.
  -  the .NET developer I am working with on this  asked for a .wsdl (or 
 .asmx) file to import into Visual Studio ... yes, he can access the service 
 directly, but he seems to prefer a more 'well defined' interface (haven't 
 really decided whether it is worth the effort, but that is another question 
 altogether)
 
 The way I see it, SOLR is a  RESTful service. I am not looking into wrapping 
 the whole thing behind SOAP ( I actually much prefer REST than SOAP, but that 
 is entering into quasi-religious grounds...) - which should be able to be 
 defined with a .wsdl ( v 1.1 should suffice as only GET + POST are supported 
 in SOLR anyway).
 
 Am I missing anything here ?
 
 thanks in advance for your time + thoughts ,
 B

To be clear, i don't suggest we should have a .wsdl for example, simply asking 
if there would be any use in having one.

but given the responses I got, I'm curious now to understand what I have gotten 
wrong :)

Best,
B
_
{Beto|Norberto|Numard} Meijome

 I sense much NT in you.
 NT leads to Bluescreen.
 Bluescreen leads to downtime.
 Downtime leads to suffering.
 NT is the path to the darkside.
 Powerful Unix is.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Clarification on facets

2008-08-18 Thread Norberto Meijome
On Tue, 19 Aug 2008 10:18:12 +1200
Gene Campbell [EMAIL PROTECTED] wrote:

 Is this interpreted as meaning, there are 10 documents that will match
 with 'car' in the title, and likewise 6 'boat' and 2 'bike'?

Correct.

 If so, is there any way to get counts for the *number times* a value
 is found in a document.  I'm looking for a way to determine the number
 of times 'car' is repeated in the title, for example

Not sure - i would suggest that a field with a term repeated several times 
would receive a higher score when searching for that term, but not sure how you 
could get the information you seek...maybe with the Luke handler ? ( but on a 
per-document basis...slow... ? )

B
_
{Beto|Norberto|Numard} Meijome

Computers are like air conditioners; they can't do their job properly if you 
open windows.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: .wsdl for example....

2008-08-18 Thread Ryan McKinley

check SolrSharp
http://wiki.apache.org/solr/SolrSharp


On Aug 18, 2008, at 9:23 PM, Norberto Meijome wrote:


On Mon, 18 Aug 2008 19:08:24 -0300
Alexander Ramos Jardim [EMAIL PROTECTED] wrote:

Do you wanna a full web service for SOLR example? How a .wsdl will  
help you?

Why don't you use the HTTP interface SOLR provides?

Anyways, if you need to develop a web service (SOAP compliant) to  
access

SOLR, just remember to use an embedded core on your webservice.


On Mon, 18 Aug 2008 15:37:24 -0400
Erik Hatcher [EMAIL PROTECTED] wrote:


WSDL?   surely you jest.

Erik


:D I obviously said something terribly stupid, oh well, not the  
first time and most likely wont be the last one either.


Anyway, the reason for my asking is :
- I've put together a SOLR search service with a few cores. Nothing  
fancy, it works great as is.
-  the .NET developer I am working with on this  asked for a .wsdl  
(or .asmx) file to import into Visual Studio ... yes, he can access  
the service directly, but he seems to prefer a more 'well defined'  
interface (haven't really decided whether it is worth the effort,  
but that is another question altogether)


The way I see it, SOLR is a  RESTful service. I am not looking into  
wrapping the whole thing behind SOAP ( I actually much prefer REST  
than SOAP, but that is entering into quasi-religious grounds...) -  
which should be able to be defined with a .wsdl ( v 1.1 should  
suffice as only GET + POST are supported in SOLR anyway).


Am I missing anything here ?

thanks in advance for your time + thoughts ,
B
_
{Beto|Norberto|Numard} Meijome

He has no enemies, but is intensely disliked by his friends.
 Oscar Wilde

I speak for myself, not my employer. Contents may be hot. Slippery  
when wet. Reading disclaimers makes you go blind. Writing them is  
worse. You have been Warned.




RE: .wsdl for example....

2008-08-18 Thread Lance Norskog
Various Java web service libraries come with 'wsdl2java' and 'java2wsdl'
programs. You just run 'java2wsdl' on the Java soap description. 

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 18, 2008 6:53 PM
To: solr-user@lucene.apache.org
Subject: Re: .wsdl for example

check SolrSharp
http://wiki.apache.org/solr/SolrSharp


On Aug 18, 2008, at 9:23 PM, Norberto Meijome wrote:

 On Mon, 18 Aug 2008 19:08:24 -0300
 Alexander Ramos Jardim [EMAIL PROTECTED] wrote:

 Do you wanna a full web service for SOLR example? How a .wsdl will 
 help you?
 Why don't you use the HTTP interface SOLR provides?

 Anyways, if you need to develop a web service (SOAP compliant) to 
 access SOLR, just remember to use an embedded core on your 
 webservice.

 On Mon, 18 Aug 2008 15:37:24 -0400
 Erik Hatcher [EMAIL PROTECTED] wrote:

 WSDL?   surely you jest.

  Erik

 :D I obviously said something terribly stupid, oh well, not the first 
 time and most likely wont be the last one either.

 Anyway, the reason for my asking is :
 - I've put together a SOLR search service with a few cores. Nothing 
 fancy, it works great as is.
 -  the .NET developer I am working with on this  asked for a .wsdl (or 
 .asmx) file to import into Visual Studio ... yes, he can access the 
 service directly, but he seems to prefer a more 'well defined'
 interface (haven't really decided whether it is worth the effort, but 
 that is another question altogether)

 The way I see it, SOLR is a  RESTful service. I am not looking into 
 wrapping the whole thing behind SOAP ( I actually much prefer REST 
 than SOAP, but that is entering into quasi-religious grounds...) - 
 which should be able to be defined with a .wsdl ( v 1.1 should suffice 
 as only GET + POST are supported in SOLR anyway).

 Am I missing anything here ?

 thanks in advance for your time + thoughts , B 
 _ {Beto|Norberto|Numard} Meijome

 He has no enemies, but is intensely disliked by his friends.
  Oscar Wilde

 I speak for myself, not my employer. Contents may be hot. Slippery 
 when wet. Reading disclaimers makes you go blind. Writing them is 
 worse. You have been Warned.




Deadlock in lucene?

2008-08-18 Thread Matthew Runo

Hello folks!

I was just wondering if anyone else has seen this issue under heavy  
load. We had some servers set to very high thread limits (12 core  
servers with 32 gigs of ram), and found several threads would end up  
in this state


Name: http-8080-891
State: BLOCKED on [EMAIL PROTECTED] owned  
by: http-8080-191

Total blocked: 97,926  Total waited: 16

Stack trace:
org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:674)
org.apache.solr.search.function.FunctionQuery 
$AllScorer.next(FunctionQuery.java:116)
org 
.apache 
.lucene 
.util.ScorerDocQueue.topNextAndAdjustElsePop(ScorerDocQueue.java:116)
org 
.apache 
.lucene 
.search 
.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:175)
org 
.apache 
.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java: 
228)

org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:76)
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:357)
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:320)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137)
org.apache.lucene.search.Searcher.search(Searcher.java:126)
org.apache.lucene.search.Searcher.search(Searcher.java:105)
org 
.apache 
.solr 
.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 
1148)
org 
.apache 
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:834)
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 
269)
org 
.apache 
.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
org 
.apache 
.solr 
.handler.component.SearchHandler.handleRequestBody(SearchHandler.java: 
169)
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
128)

org.apache.solr.core.SolrCore.execute(SolrCore.java:1143)
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
235)
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
org 
.apache 
.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 
233)
org 
.apache 
.catalina.core.StandardContextValve.invoke(StandardContextValve.java: 
175)
org 
.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java: 
128)
org 
.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: 
102)
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
286)
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
844)
org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:583)

org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
java.lang.Thread.run(Thread.java:619)

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833



Re: Deadlock in lucene?

2008-08-18 Thread Yonik Seeley
It's not a deadlock (just a synchronization bottleneck) , but it is a
known issue in Lucene and there has been some progress in improving
the situation.
-Yonik


On Mon, Aug 18, 2008 at 10:55 PM, Matthew Runo [EMAIL PROTECTED] wrote:
 Hello folks!

 I was just wondering if anyone else has seen this issue under heavy load. We
 had some servers set to very high thread limits (12 core servers with 32
 gigs of ram), and found several threads would end up in this state

 Name: http-8080-891
 State: BLOCKED on [EMAIL PROTECTED] owned by:
 http-8080-191
 Total blocked: 97,926  Total waited: 16

 Stack trace:
 org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:674)
 org.apache.solr.search.function.FunctionQuery$AllScorer.next(FunctionQuery.java:116)
 org.apache.lucene.util.ScorerDocQueue.topNextAndAdjustElsePop(ScorerDocQueue.java:116)
 org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:175)
 org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:228)
 org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:76)
 org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:357)
 org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:320)
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137)
 org.apache.lucene.search.Searcher.search(Searcher.java:126)
 org.apache.lucene.search.Searcher.search(Searcher.java:105)
 org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1148)
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:834)
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1143)
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 java.lang.Thread.run(Thread.java:619)

 Thanks for your time!

 Matthew Runo
 Software Engineer, Zappos.com
 [EMAIL PROTECTED] - 702-943-7833




Re: Clarification on facets

2008-08-18 Thread Gene Campbell
Thank you for the response.  Always nice to have something willing to
validate your thinking!

Of course, if anyone has any ideas on how to get the numbers of times
term is repeated in a document,
I'm all ears.

cheers
gene


On Tue, Aug 19, 2008 at 1:42 PM, Norberto Meijome [EMAIL PROTECTED] wrote:
 On Tue, 19 Aug 2008 10:18:12 +1200
 Gene Campbell [EMAIL PROTECTED] wrote:

 Is this interpreted as meaning, there are 10 documents that will match
 with 'car' in the title, and likewise 6 'boat' and 2 'bike'?

 Correct.

 If so, is there any way to get counts for the *number times* a value
 is found in a document.  I'm looking for a way to determine the number
 of times 'car' is repeated in the title, for example

 Not sure - i would suggest that a field with a term repeated several times 
 would receive a higher score when searching for that term, but not sure how 
 you could get the information you seek...maybe with the Luke handler ? ( but 
 on a per-document basis...slow... ? )

 B
 _
 {Beto|Norberto|Numard} Meijome

 Computers are like air conditioners; they can't do their job properly if you 
 open windows.

 I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
 Reading disclaimers makes you go blind. Writing them is worse. You have been 
 Warned.