Re: sorting on aggregate averages

2008-04-02 Thread Umar Shah
Thanks!
I'll have a look at that.



On Wed, Apr 2, 2008 at 6:25 AM, Chris Hostetter [EMAIL PROTECTED]
wrote:

 : I am computing a sorted rank list and returning a slice (for pagination)
 but
 : have to recompute the result for each request, although the actual q
 : parameter and fq would be cached but not the sorted list which I could
 cache
 : to reuse on subsequent requests.
 :
 : I might have a look at the caching also, any suggestions in this regard.

 Take a look at User/Generic Caches here...

http://wiki.apache.org/solr/SolrCaching

 Your custom handler/component can use SolrIndexSearcher.getCache to see if
 a cache with a specific name has been defined, if it has you can do the
 normal get/put operations on it. The cache will worry about expulsion of
 items if it's full (the only Impl that comes with Solr is an LRUCache, but
 you could write your own if you want), and SolrCore will worry about
 giving you a new cache instance when a new reader is opened.  If you
 implement a CacheRegenerator (and configure it for this cache) then you
 can put whatever custome code in that you want for autowarming entries in
 the cache based on the keys/values of the old cache (ie: warm all the
 keys, warm the first N keys, warm all the keys whose values indicate
 they were expensive to compute, etc)

 (just make sure your custom handler/component can function ok even if the
 cache doesn't exist, or if there are cache misses even when you don't
 expect them -- it is after all just a cache, good code should be able to
 function (slowly) without it if it's turned off.)

 -Hoss




Search exact terms

2008-04-02 Thread Tim Mahy
Hi all,

is there a Solr wide setting that with which I can achieve the following :

if I now search for q=onderwij, I also receive documents with results of 
onderwijs etc.. this is ofcourse the behavior that is described but if I 
search on onderwij, I still get the onderwijs hits, I use for this field 
the type text from the schema.xml that is supplied with the default Solr.

Is there a global setting on Solr to always search Exact ?

Greetings,

Tim





Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op 
geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit 
bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud 
hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan 
ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze 
Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te 
Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per 
omgaande kosteloos toe.

De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen is verboden. 
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze 
informatie aan derden is niet toegestaan.

Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht 
dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door 
een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail 
vervolgens vernietigt.


problem with ShowFileRequestHandler

2008-04-02 Thread 李银松
Edward.Zhang had commit the problem before

 I want to programmatically retrieve the schema and the config from the
 ShowFileRequestHandler.  I encounter some trouble. There are CJK
characters
 in the xml files as follows:


   !-- Field to use to determine and enforce document uniqueness.
Unless this field is marked with required=false, it will be a
  required field
 --
   uniqueKey记录号/uniqueKey
 

 But I get a confusing response from solr using /admin/file/?file=
 schema.xml. IE and firefox both report parse errors.I try
 /admin/file/?file=schema.xcontentType=text/plain and I get the same
 result as follow:


   !-- Field to use to determine and enforce document uniqueness.
Unless this field is marked with required=false, it will be a
  required field
 --
   uniqueKey?/uniqueKey


 BTW: The xml files are encoded in UTF-8 and they work fine when I open
 these files locally using IE. And I set tomcat's 8080 connector
 URIEncoding argument UTF-8 too.
 So is there anything missing for me? Or is it a bug?

 Every reply would be appreciated.

Ryan  has changed the RawResponseWriter to use the Reader
but problem seems not solved
For example:
my schema.xml is a UTF-8 File
but reader's default encoding is GBK
then i still can't get the right String


Re: Search exact terms

2008-04-02 Thread Umar Shah
If you want this behavior then the field type should not be 'text'.
for default fieldtype=text there are many filters applied before the values
are indexed, this includes stemming (reducing the word to root word,
removing s in ur case.

try using fieldtype=string instead. this will match strictly to the values
in the field (exact match, case sensitive)

try tweaking schema.xml in the conf folder .

you can tweak the definition in this file to be able to use delimiter/ case
filters as seems fir for your case.


-umar


2008/4/2 Tim Mahy [EMAIL PROTECTED]:

 Hi all,

 is there a Solr wide setting that with which I can achieve the following :

 if I now search for q=onderwij, I also receive documents with results of
 onderwijs etc.. this is ofcourse the behavior that is described but if I
 search on onderwij, I still get the onderwijs hits, I use for this field
 the type text from the schema.xml that is supplied with the default
 Solr.

 Is there a global setting on Solr to always search Exact ?

 Greetings,

 Tim





 Info Support - http://www.infosupport.com

 Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is
 op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit
 bericht en staat niet in voor de juiste en volledige overbrenging van de
 inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al
 de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen
 - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van
 Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw
 verzoek per omgaande kosteloos toe.

 De informatie in dit e-mailbericht is uitsluitend bestemd voor de
 geadresseerde. Gebruik van deze informatie door anderen is verboden.
 Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze
 informatie aan derden is niet toegestaan.

 Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit
 bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de
 zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze
 e-mail vervolgens vernietigt.



Wildcard search + case insensitive

2008-04-02 Thread Tim Mahy
Hi all,

I use this type definition in my schema.xml :

fieldtype name=exactText class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldtype

When I have a document with the term demo in it and I search for dem* , I 
receive the document back from Solr, but when I search on Dem* I don't get the 
document.

Is the LowerCaseFilterFactory not executed when a wildcard search is being 
performed ?

Greetings,
Tim




Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op 
geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit 
bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud 
hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan 
ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze 
Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te 
Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per 
omgaande kosteloos toe.

De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen is verboden. 
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze 
informatie aan derden is niet toegestaan.

Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht 
dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door 
een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail 
vervolgens vernietigt.


java.io.FileNotFoundException?

2008-04-02 Thread Doug Steigerwald

We just started hitting a FileNotFoundException for no real apparent reason for 
both our regular
index and our spellchecker index, and only a few minute after we restarted Solr.  I did some 
searching and didn't find much that helped.


We started to do some load testing, and after about 10 minutes we started 
getting these errors.

We hit the spellchecker every request through a SpellcheckComponent that we created (ie, code ripped 
out of SpellCheckRequestHandler for now).  It runs essentially the same code as the spellcheck 
request handler when we specify a parameter (spellcheck=true).


We have 34 cores.  All but two cores are fully optimized (haven't been updated in 2 months).  Only 
two cores are actively updated.  We started Solr around 11:45am, not much happened until 12:27 when 
we started load testing (just a few queries, maybe 100 updates).


find /home/dsteiger/local/solr/cores/*/data/index|wc -l  = 414
find /home/dsteiger/local/solr/cores/*/data/spell|wc -l  = 6 (only the two 'active' cores use the 
spell checker).  So, not many files are open.


Anyone have any idea what might cause the two below errors to happen?  When I restarted Solr around 
11:45am it was to test a new patch that set the mergeFactor in the lucene spellchecker to 2 instead 
of 300 because we kept running into 'too many files open' errors when rebuilding more than one spell 
index at a time.  The spell indexes were rebuilt manually using the mergeFactor of 300, solr 
restarted, and any subsequent rebuild of the spell index would use a mergeFactor of 2.


After we hit this error, I rebuilt the spell indexes with the new code replicated them to the slave, 
restarted Solr, and all has been well.  We ran the load testing for more than an hour and the issue 
hasn't returned.


Could the old spell indexes that were created using the high mergeFactor cause an issue like this 
somehow?  Could the opening and closing of searchers so fast cause this?  I don't have the slightest 
idea.  All of our search queries hit the slave, and the master just handles updates.  The master had 
no issues through all of this.


Caused by: java.io.IOException: cannot read directory
org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qaa/data/spell:
 list() returned null
at 
org.apache.lucene.index.SegmentInfos.getCurrentSegmentGeneration(SegmentInfos.java:115)
at org.apache.lucene.index.IndexReader.indexExists(IndexReader.java:506)
at 
org.apache.lucene.search.spell.SpellChecker.setSpellIndex(SpellChecker.java:102)
at 
org.apache.lucene.search.spell.SpellChecker.init(SpellChecker.java:89)


And this happened I believe when running the snapinstaller (done through 
cron)...

Caused by: java.io.FileNotFoundException: no segments* file found in
org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qab/data/index:
 files: null
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:587)
at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
at 
org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:93)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:706)

We're running r614955.

Thanks.
Doug



Re: How to use Solr in java program

2008-04-02 Thread khirb7


hossman wrote:
 
 
 :  I recommend using Solr as a webservice, even if your client is Java. 
 but 
 :  there are options for embedding Solr directly into your applications
 using 
 
 : thank you hossman for your response,I have another question : I have
 writen
 : a small java program using sockets to send an http query  to solr which
 is
 : running under tomcat then I got a response in xml format is  
 : that an example of using it as Web services since the the comminication
 is
 : based on http/xml ; or using tools such Axis is mondatory to talk about
 web
 : services (or Solr in it self by its behaviour is a web service).
 
 semantics are either wonderful or horrible - depending on perspective. to 
 some people, the term webservice has a *very* specific meaning, i 
 however was just using it in the more relaxed sense of communicating over 
 HTTP - so yes, you understood my meaning.
 
 But really: opening your own raw Socket to do hte HTTP communication is 
 one level lower then anyone should ever consider coding.  it's HTTP, 
 there are lots of libraries that will take care of the nitty gritty 
 details for you and make your life easier.
 
 Like i said before: look at the wiki, try out SolrJ, it should make your 
 life much easier.
 
 
 
 -Hoss
 
 
 
Thank you Hossman for your reply, now I see solr differently and clearly; i
will try the  SolrJ. 
-- 
View this message in context: 
http://www.nabble.com/How-to-use-Solr-in-java-program-tp16236930p16446909.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Wildcard search + case insensitive

2008-04-02 Thread Tim Mahy
Hi all,

I already found the answer to my question on the following blog : 
http://michaelkimsal.com/blog/2007/04/solr-case-sensitivty/

greetings,
Tim


-Oorspronkelijk bericht-
Van: Tim Mahy [mailto:[EMAIL PROTECTED]
Verzonden: wo 2-4-2008 13:19
Aan: solr-user@lucene.apache.org
Onderwerp: Wildcard search + case insensitive
 
Hi all,

I use this type definition in my schema.xml :

fieldtype name=exactText class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldtype

When I have a document with the term demo in it and I search for dem* , I 
receive the document back from Solr, but when I search on Dem* I don't get the 
document.

Is the LowerCaseFilterFactory not executed when a wildcard search is being 
performed ?

Greetings,
Tim




Info Support - http://www.infosupport.com 

Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op 
geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit 
bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud 
hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan 
ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze 
Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te 
Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per 
omgaande kosteloos toe.

De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen is verboden. 
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze 
informatie aan derden is niet toegestaan. 

Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht 
dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door 
een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail 
vervolgens vernietigt. 




Re: Multiple unique field?

2008-04-02 Thread Ryan McKinley


Thank you for your reply
In other word, can I set 2 unique key field?


directly in solr: no

In your own code, yes -- either in the client or in custom plugin.

ryan


Help with XmlPullParserException

2008-04-02 Thread Phillip Farber

Hello all,

I'm indexing a body of OCR and encountered this exception. Apparently 
it's some kind of XML parser error.  Out of thousands of documents, 
which I create with significant processing to make sure they are XML 
compliant, only this one appears to have a problem.  But can anyone tell 
me what this specific error message means?



SEVERE: org.xmlpull.v1.XmlPullParserException: character reference (with 
decimal value) may not contain a (position: START_TAG seen ...dieses aus 
dem \nZusammenbestehen der Gleichungen \n\naajj2 -)- 2#1a... @21781:16)



Thanks!

Phil

==

Full trace:

 SEVERE: org.xmlpull.v1.XmlPullParserException: character reference 
(with decimal value) may not contain a (position: START_TAG seen 
...dieses aus dem \nZusammenbestehen der Gleichungen \n\naajj2 -)- 
2#1a... @21781:16)

at org.xmlpull.mxp1.MXParser.parseEntityRef(MXParser.java:2195)
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1275)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)
	at 
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332)
	at 
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162)
	at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
	at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
	at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
	at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
	at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)

at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
	at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
	at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)

at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
	at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
	at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


Re: problem with ShowFileRequestHandler

2008-04-02 Thread Ryan McKinley


On Apr 2, 2008, at 5:03 AM, 李银松 wrote:

Edward.Zhang had commit the problem before

I want to programmatically retrieve the schema and the config from  
the

ShowFileRequestHandler.  I encounter some trouble. There are CJK

characters

in the xml files as follows:



!-- Field to use to determine and enforce document uniqueness.
 Unless this field is marked with required=false, it will be a
required field
  --
uniqueKey记录号/uniqueKey



But I get a confusing response from solr using /admin/file/?file=
schema.xml. IE and firefox both report parse errors.I try
/admin/file/?file=schema.xcontentType=text/plain and I get the  
same

result as follow:



!-- Field to use to determine and enforce document uniqueness.
 Unless this field is marked with required=false, it will be a
required field
  --
uniqueKey?/uniqueKey



BTW: The xml files are encoded in UTF-8 and they work fine when I  
open

these files locally using IE. And I set tomcat's 8080 connector
URIEncoding argument UTF-8 too.
So is there anything missing for me? Or is it a bug?

Every reply would be appreciated.


Ryan  has changed the RawResponseWriter to use the Reader
but problem seems not solved
For example:
my schema.xml is a UTF-8 File
but reader's default encoding is GBK
then i still can't get the right String



I just changed this to use the same ContentStream code we use for  
posting files -- so it should now respect the contentType param


You should be able to see things properly with:
 ?file=xxxcontentType=UTF-8

ryan



Re: Search exact terms

2008-04-02 Thread Ryan McKinley

search is based on the fields you index and how you index them.

If you index using the text field -- with stemming etc, you will  
have to search with the same criteria.


If you want exact search, consider the string type.  If you want  
both, you can use the copyField to copy the same content into  
multiple fields so it is searchable multiple ways


ryan



On Apr 2, 2008, at 4:46 AM, Tim Mahy wrote:

Hi all,

is there a Solr wide setting that with which I can achieve the  
following :


if I now search for q=onderwij, I also receive documents with  
results of onderwijs etc.. this is ofcourse the behavior that is  
described but if I search on onderwij, I still get the onderwijs  
hits, I use for this field the type text from the schema.xml that  
is supplied with the default Solr.


Is there a global setting on Solr to always search Exact ?

Greetings,

Tim





Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info  
Support is op geen enkele wijze aansprakelijk voor vergissingen of  
onjuistheden in dit bericht en staat niet in voor de juiste en  
volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden  
door Info Support uitgevoerd en op al de aan ons gegeven opdrachten  
zijn - tenzij expliciet anders overeengekomen - onze Algemene  
Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel  
te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw  
verzoek per omgaande kosteloos toe.


De informatie in dit e-mailbericht is uitsluitend bestemd voor de  
geadresseerde. Gebruik van deze informatie door anderen is verboden.  
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking  
van deze informatie aan derden is niet toegestaan.


Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u  
dit bericht dus per ongeluk ontvangt, stelt Info Support het op  
prijs als u de zender door een antwoord op deze e-mail hiervan op de  
hoogte brengt en deze e-mail vervolgens vernietigt.




Re: java.io.FileNotFoundException?

2008-04-02 Thread Otis Gospodnetic
Hi Doug,

Sounds fishy, especially increasing/decreasing mergeFactor to funny values 
(try changing your OS setting instead).

My guess is this is happening only with the 2 indices that are being modified 
and I'll guess that the FNFE is due to a bad/incomplete rsync from the master.  
Do snappuller logs mention any errors?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Doug Steigerwald [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Tuesday, April 1, 2008 4:12:25 PM
Subject: java.io.FileNotFoundException?

We just started hitting a FileNotFoundException for no real apparent reason for 
both our regular
index and our spellchecker index, and only a few minute after we restarted 
Solr.  I did some 
searching and didn't find much that helped.

We started to do some load testing, and after about 10 minutes we started 
getting these errors.

We hit the spellchecker every request through a SpellcheckComponent that we 
created (ie, code ripped 
out of SpellCheckRequestHandler for now).  It runs essentially the same code as 
the spellcheck 
request handler when we specify a parameter (spellcheck=true).

We have 34 cores.  All but two cores are fully optimized (haven't been updated 
in 2 months).  Only 
two cores are actively updated.  We started Solr around 11:45am, not much 
happened until 12:27 when 
we started load testing (just a few queries, maybe 100 updates).

find /home/dsteiger/local/solr/cores/*/data/index|wc -l  = 414
find /home/dsteiger/local/solr/cores/*/data/spell|wc -l  = 6 (only the two 
'active' cores use the 
spell checker).  So, not many files are open.

Anyone have any idea what might cause the two below errors to happen?  When I 
restarted Solr around 
11:45am it was to test a new patch that set the mergeFactor in the lucene 
spellchecker to 2 instead 
of 300 because we kept running into 'too many files open' errors when 
rebuilding more than one spell 
index at a time.  The spell indexes were rebuilt manually using the mergeFactor 
of 300, solr 
restarted, and any subsequent rebuild of the spell index would use a 
mergeFactor of 2.

After we hit this error, I rebuilt the spell indexes with the new code 
replicated them to the slave, 
restarted Solr, and all has been well.  We ran the load testing for more than 
an hour and the issue 
hasn't returned.

Could the old spell indexes that were created using the high mergeFactor cause 
an issue like this 
somehow?  Could the opening and closing of searchers so fast cause this?  I 
don't have the slightest 
idea.  All of our search queries hit the slave, and the master just handles 
updates.  The master had 
no issues through all of this.

Caused by: java.io.IOException: cannot read directory
org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qaa/data/spell:
 list() returned null
at 
org.apache.lucene.index.SegmentInfos.getCurrentSegmentGeneration(SegmentInfos.java:115)
at org.apache.lucene.index.IndexReader.indexExists(IndexReader.java:506)
at 
org.apache.lucene.search.spell.SpellChecker.setSpellIndex(SpellChecker.java:102)
at org.apache.lucene.search.spell.SpellChecker.init(SpellChecker.java:89)


And this happened I believe when running the snapinstaller (done through 
cron)...

Caused by: java.io.FileNotFoundException: no segments* file found in
org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qab/data/index:
 files: null
 at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:587)
 at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
 at 
org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:93)
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:706)

We're running r614955.

Thanks.
Doug






Brazilian Portuguese synonyms

2008-04-02 Thread Rogerio Pereira
Hi guys!

Lucas, I would like know more about your work with support of brazilian
portguese synonyms in solr.

Thanks for any help.

-- 
Yours truly (Atenciosamente),

Rogério (_rogerio_)
http://faces.eti.br

Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento,
distribua e aprenda mais. (http://faces.eti.br/?p=45)


Re: java.io.FileNotFoundException?

2008-04-02 Thread Doug Steigerwald
The user that runs our apps is configured to allow 65536 open files in limits.conf.  Shouldn't even 
come close to that number.  Solr is the only app we have running on these machines as our app user.


We hit the same type of issue when we had our mergeFactor set to 40 for all of our indexes.  We 
lowered it to 5 and have been fine since.


No errors in the snappuller for either core.  The spellcheck index is rebuilt once a night around 
midnight and copied to the slave afterwards.  I had even rebuilt the spell index manually for the 
two cores, pulled them, installed them, and tested to make sure it was working with a few queries 
before the load testing started (this was before we released the patch to lower the spell index 
mergeFactor).


We were even getting errors trying to run out postCommit script on the slave (it doesn't end up 
doing anything since it's the slave).


SEVERE: java.io.IOException: Cannot run program ./solr/bin/snapctl: java.io.IOException: error=24, 
Too many open files

at java.lang.ProcessBuilder.start(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)

And a correction from my previous email.  The errors started 10 -seconds- after load testing 
started.  This was about 40 minutes after Solr started, and less than 30 queries had been run on the 
server before load testing started.


Load testing has been fine since I restarted Solr and rebuilt the spellcheck indexes with the 
lowered mergeFactor.


Doug

Otis Gospodnetic wrote:

Hi Doug,

Sounds fishy, especially increasing/decreasing mergeFactor to funny values 
(try changing your OS setting instead).

My guess is this is happening only with the 2 indices that are being modified 
and I'll guess that the FNFE is due to a bad/incomplete rsync from the master.  
Do snappuller logs mention any errors?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


Re: Help with XmlPullParserException

2008-04-02 Thread Phillip Farber
I just looked at this again and I think the problem is that the message 
is referring to the garbage string of characters 2#1a where #1a 
looks like a decimal numeric character reference but the letter 'a' is a 
hex digit.  I'll have to go back to my OCR cleanup routine ...  Thanks 
for reading.


Phil

Phillip Farber wrote:

Hello all,

I'm indexing a body of OCR and encountered this exception. Apparently 
it's some kind of XML parser error.  Out of thousands of documents, 
which I create with significant processing to make sure they are XML 
compliant, only this one appears to have a problem.  But can anyone tell 
me what this specific error message means?



SEVERE: org.xmlpull.v1.XmlPullParserException: character reference (with 
decimal value) may not contain a (position: START_TAG seen ...dieses aus 
dem \nZusammenbestehen der Gleichungen \n\naajj2 -)- 2#1a... @21781:16)



Thanks!

Phil

==

Full trace:

 SEVERE: org.xmlpull.v1.XmlPullParserException: character reference 
(with decimal value) may not contain a (position: START_TAG seen 
...dieses aus dem \nZusammenbestehen der Gleichungen \n\naajj2 -)- 
2#1a... @21781:16)

at org.xmlpull.mxp1.MXParser.parseEntityRef(MXParser.java:2195)
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1275)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332) 

at 
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162) 

at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) 

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) 


at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191) 

at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) 

at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) 

at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) 

at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) 

at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)

at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) 


at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) 

at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) 



Re: Brazilian Portuguese synonyms

2008-04-02 Thread Lucas F. A. Teixeira

Synonyms support?

Actually, we just have a big list of portuguese synonyms.
I was talking about a portuguese steemer. Interested?

Anything, just mail me @ [EMAIL PROTECTED]

[]s,

Lucas


Rogerio Pereira wrote:

Hi guys!

Lucas, I would like know more about your work with support of brazilian
portguese synonyms in solr.

Thanks for any help.

  


Re: Wildcard search + case insensitive

2008-04-02 Thread Matthew Runo
Hmm. I'd like the ability to turn on or off in the config case  
sensitivity... I'm looking forward to this patch.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Apr 2, 2008, at 5:48 AM, Tim Mahy wrote:


Hi all,

I already found the answer to my question on the following blog : 
http://michaelkimsal.com/blog/2007/04/solr-case-sensitivty/

greetings,
Tim


-Oorspronkelijk bericht-
Van: Tim Mahy [mailto:[EMAIL PROTECTED]
Verzonden: wo 2-4-2008 13:19
Aan: solr-user@lucene.apache.org
Onderwerp: Wildcard search + case insensitive

Hi all,

I use this type definition in my schema.xml :

   fieldtype name=exactText class=solr.TextField  
positionIncrementGap=100

 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldtype

When I have a document with the term demo in it and I search for  
dem* , I receive the document back from Solr, but when I search on  
Dem* I don't get the document.


Is the LowerCaseFilterFactory not executed when a wildcard search is  
being performed ?


Greetings,
Tim




Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info  
Support is op geen enkele wijze aansprakelijk voor vergissingen of  
onjuistheden in dit bericht en staat niet in voor de juiste en  
volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden  
door Info Support uitgevoerd en op al de aan ons gegeven opdrachten  
zijn - tenzij expliciet anders overeengekomen - onze Algemene  
Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel  
te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw  
verzoek per omgaande kosteloos toe.


De informatie in dit e-mailbericht is uitsluitend bestemd voor de  
geadresseerde. Gebruik van deze informatie door anderen is verboden.  
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking  
van deze informatie aan derden is niet toegestaan.


Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u  
dit bericht dus per ongeluk ontvangt, stelt Info Support het op  
prijs als u de zender door een antwoord op deze e-mail hiervan op de  
hoogte brengt en deze e-mail vervolgens vernietigt.







Re: Brazilian Portuguese synonyms

2008-04-02 Thread Rogerio Pereira
Yes!

2008/4/2, Lucas F. A. Teixeira [EMAIL PROTECTED]:

 Synonyms support?

 Actually, we just have a big list of portuguese synonyms.
 I was talking about a portuguese steemer. Interested?

 Anything, just mail me @ [EMAIL PROTECTED]

 []s,


 Lucas



 Rogerio Pereira wrote:
  Hi guys!
 
  Lucas, I would like know more about your work with support of brazilian
  portguese synonyms in solr.
 
  Thanks for any help.
 
 




-- 
Yours truly (Atenciosamente),

Rogério (_rogerio_)
http://faces.eti.br

Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento,
distribua e aprenda mais. (http://faces.eti.br/?p=45)


numDocs and maxDoc

2008-04-02 Thread Vinci

Hi,

I am trying to update the index by 2 stage posting: part of the index will
be posted in stage 1 by 1.xml, then after a meanwhiles the left of the index
of the entry will be posted by 2.xml. Assume both 1.xml and 2.xml have 3
document and id is used as unique field, what I see in the admin panel make
me feels confusing:
numDocs : 3
maxDoc : 6 
which number is the value of document exist in system? Is maxDoc just only a
stat, not involved in any calculating process?
If the maxDoc is the true number of document exist in system, is the
optimization tool is the only way to compress the index?

Thank you,
Vinci
-- 
View this message in context: 
http://www.nabble.com/numDocs-and-maxDoc-tp16448068p16448068.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing a word in url

2008-04-02 Thread Simon Rosenthal
I also couldn't  get the exact results I wanted for indexing URL components
using WordDelimeterFilter or patternTokenizer, so resorted to adding a new
field ('pathparts'), plus a few lines of code to  generate the tokens in our
content preprocessor which submits documents to SOLR for indexing.

-Simon

On Tue, Apr 1, 2008 at 7:24 PM, Chris Hostetter [EMAIL PROTECTED]
wrote:


 : Actually I want to use anything that is not alphabet or digit to be the
 : separator - anything between them will be a word (so that I can use the
 URL
 : fragment to see what is indexed about this site)...any suggestion?

 In addition to Mike's suggestion of trying out the WordDelimiterFilter,
 take a look at the PatternTokenizerFactory.



 -Hoss




Re: numDocs and maxDoc

2008-04-02 Thread Mike Klaas


On 2-Apr-08, at 11:29 AM, Vinci wrote:


Hi,

I am trying to update the index by 2 stage posting: part of the  
index will
be posted in stage 1 by 1.xml, then after a meanwhiles the left of  
the index
of the entry will be posted by 2.xml. Assume both 1.xml and 2.xml  
have 3
document and id is used as unique field, what I see in the admin  
panel make

me feels confusing:
numDocs : 3
maxDoc : 6
which number is the value of document exist in system? Is maxDoc  
just only a

stat, not involved in any calculating process?
If the maxDoc is the true number of document exist in system, is the
optimization tool is the only way to compress the index?


When you add a document that has the same unique id as a document  
currently in the index, the previous document is marked as deleted  
and the new one added.   This results in 6 documents physically on  
disk (BUT when searching you will never see the deleted docs).


Deleted documents are purged during segment merging, which will occur  
for the whole index during optimization and will happen naturally as  
you add more documents to the system without optimization.  Normally  
it isn't something to worry about.


-Mike 


Re: Wildcard search + case insensitive

2008-04-02 Thread Chris Hostetter

: Hmm. I'd like the ability to turn on or off in the config case sensitivity...
: I'm looking forward to this patch.

FYI: here's the relevant issue...

http://issues.apache.org/jira/browse/SOLR-218

NOTE: no one has ever contributed any patches to address this problem. 
(although yonik did felsh out a POC patch for an alternate DWIM approach 
in SOLR-219)



-Hoss



Re: numDocs and maxDoc

2008-04-02 Thread Chris Hostetter

: I am trying to update the index by 2 stage posting: part of the index will
: be posted in stage 1 by 1.xml, then after a meanwhiles the left of the index
: of the entry will be posted by 2.xml. Assume both 1.xml and 2.xml have 3
: document and id is used as unique field, what I see in the admin panel make

my gut tells me that what you mean by this is that you want to index 
fields A and B for documents 1, 2, and 3; and then later you want to 
provide valudes for additional fields C and D for the same documents (1,2 
and 3)

updating documents is not currently supported in Solr.  there has 
been lots of dicsussion about it in the past, and some patches exist in 
Jira that approach the problem, but it's a lot harder then it seems like 
it should be because of hte way Lucene works - esentially Solr under the 
covers does the exact same thing you currently have do do: keep a record 
of all the fields for all the documents, and reindex the *whole* document 
once you have them.

: me feels confusing:
: numDocs : 3
: maxDoc : 6 

numDocs is hte number of unique live Documents in the index.  it's how 
many docs you would get back fro ma query for *:*.  maxDoc is the maximum 
internal document id currently in use.  the difference between those 
numbers gives you an idea of how many deleted (orreplaced) documents are 
currently still in the index ... they gradually get cleaned up as segments 
get merged or when the index gets optimized.



-Hoss



RE: Search exact terms

2008-04-02 Thread Norskog, Lance
This is confusing advice to a beginner. A string field will not find a
word in the middle of a sentence.

To get normal searches without this confusions, copy the 'text' type and
make a variant without the Stemmer. The problem is that you are using an
English language stemmer for what appears to be Dutch. There is a Dutch
stemmer, it might be better for your needs if the content is all Dutch.

To make an exact search field which still has helpful searching
properties, make another variant of text that breaks up words but does
not stem. You might also want to add the ISOLatin1 filter which maps all
European characters to USASCII equivalents. This is also very helpful
for multi-language searching.

Lance

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 02, 2008 7:06 AM
To: solr-user@lucene.apache.org
Subject: Re: Search exact terms

search is based on the fields you index and how you index them.

If you index using the text field -- with stemming etc, you will have
to search with the same criteria.

If you want exact search, consider the string type.  If you want both,
you can use the copyField to copy the same content into multiple
fields so it is searchable multiple ways

ryan



On Apr 2, 2008, at 4:46 AM, Tim Mahy wrote:
 Hi all,

 is there a Solr wide setting that with which I can achieve the 
 following :

 if I now search for q=onderwij, I also receive documents with results 
 of onderwijs etc.. this is ofcourse the behavior that is described 
 but if I search on onderwij, I still get the onderwijs
 hits, I use for this field the type text from the schema.xml that is

 supplied with the default Solr.

 Is there a global setting on Solr to always search Exact ?

 Greetings,

 Tim





 Info Support - http://www.infosupport.com

 Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support

 is op geen enkele wijze aansprakelijk voor vergissingen of 
 onjuistheden in dit bericht en staat niet in voor de juiste en 
 volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden 
 door Info Support uitgevoerd en op al de aan ons gegeven opdrachten 
 zijn - tenzij expliciet anders overeengekomen - onze Algemene 
 Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te

 Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek 
 per omgaande kosteloos toe.

 De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
 geadresseerde. Gebruik van deze informatie door anderen is verboden.
 Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van

 deze informatie aan derden is niet toegestaan.

 Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit

 bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als 
 u de zender door een antwoord op deze e-mail hiervan op de hoogte 
 brengt en deze e-mail vervolgens vernietigt.



dataimport handler multiple databases

2008-04-02 Thread Ismail Siddiqui
Hi I have a situaion where I am using dataimport handler with development db
and  going to use it with production database in production environment

I have entry in solr-config.xml like this

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
  str name=config/home/username/data-config.xml/str
  lst name=datasource
 str name=drivercom.mysql.jdbc.Driver/str
 str name=urljdbc:mysql://localhost/dbname/str
 str name=userdb_username/str
 str name=passworddb_password/str
  /lst
/lst
  /requestHandler

I understand i can add  another datasource called datasource-2 . but how can
I can use this datasource to index data

currently i am colling somethign  /dataimport?command=full-import or
/dataimport?command=delta-import.How can i define a particular db to be
called
so it indexes dev db on development machine and prod db in production
environmnt.


thanks


searching like RDBMS way

2008-04-02 Thread Sunil . Sarje

This is very general requirement and I am sure somebody might have thought
about the solution.

Sample scenario to explain my question ---

There is a many-to-many relationship between 2 entities - Sales Person 
Client

One sales person can work for many clients.
One Client may be served by many sales persons.

I will have 3 separate index storages.

1. Only for Sales Persons
2. Id combinations for IDs of sales persons and clients  (many-to-many
list)
3. Only for Clients

Query Requirement -  Get all the clients for a given sales person.
For this I need to hook to index 2 and 3 to get the full result.

One immediate solution would be
- Make first query to get client ids from 2nd index
- and then make another query using those client ids to pull client detail
information from 3rd index.

I cannot make 2 separate search calls since there could be thousands of
clients for a sales person.
This results into maxClause count error. I know how to increase it but not
a good solutions.


Thanks
Sunil
**
This e-mail transmission and any attachments that accompany it may 
contain information that is privileged, confidential or otherwise 
exempt from disclosure under applicable law and is intended solely for
the use of the individual(s) to whom it was intended to be addressed.
If you have received this e-mail by mistake, or you are not the
intended recipient, any disclosure, dissemination, distribution, 
copying or other use or retention of this communication or its 
substance is prohibited.  If you have received this communication in
error, please immediately reply to the author via e-mail that you 
received this message by mistake and also permanently delete the 
original and all copies of this e-mail and any attachments from your
computer. Thank you.
**


Re: problem with ShowFileRequestHandler

2008-04-02 Thread 李银松
Thanks Ryan

2008/4/2, Ryan McKinley [EMAIL PROTECTED]:


 On Apr 2, 2008, at 5:03 AM, 李银松 wrote:

  Edward.Zhang had commit the problem before
 
  I want to programmatically retrieve the schema and the config from the
   ShowFileRequestHandler.  I encounter some trouble. There are CJK
  
  characters
 
   in the xml files as follows:
  
  
   !-- Field to use to determine and enforce document uniqueness.
Unless this field is marked with required=false, it will be a
required field
 --
uniqueKey记录号/uniqueKey
   
   
   But I get a confusing response from solr using /admin/file/?file=
   schema.xml. IE and firefox both report parse errors.I try
   /admin/file/?file=schema.xcontentType=text/plain and I get the same
   result as follow:
  
  
   !-- Field to use to determine and enforce document uniqueness.
Unless this field is marked with required=false, it will be a
required field
 --
uniqueKey?/uniqueKey
   
  
  
   BTW: The xml files are encoded in UTF-8 and they work fine when I open
   these files locally using IE. And I set tomcat's 8080 connector
   URIEncoding argument UTF-8 too.
   So is there anything missing for me? Or is it a bug?
  
   Every reply would be appreciated.
  
 
  Ryan  has changed the RawResponseWriter to use the Reader
  but problem seems not solved
  For example:
  my schema.xml is a UTF-8 File
  but reader's default encoding is GBK
  then i still can't get the right String
 


 I just changed this to use the same ContentStream code we use for posting
 files -- so it should now respect the contentType param

 You should be able to see things properly with:
  ?file=xxxcontentType=UTF-8

 ryan




Re: searching like RDBMS way

2008-04-02 Thread Norberto Meijome
On Wed, 2 Apr 2008 15:31:43 -0500
[EMAIL PROTECTED] wrote:

 This is very general requirement and I am sure somebody might have thought
 about the solution.

Hi Sunil,
- please don't hijack the thread :)

- why don't you use the right tool for the problem? from what you said, a RDBMS 
sounds like is what you need.
B

_
{Beto|Norberto|Numard} Meijome

Sysadmins can't be sued for malpractice, but surgeons don't have to
deal with patients who install new versions of their own innards.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: dataimport handler multiple databases

2008-04-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
each entity has an optional attribute called dataSource.
If you have multiple dataSources give them a name and use the name is
dataSource .So you solrconfig must look like
requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
   lst name=defaults
 str name=config/home/username/data-config.xml/str
 lst name=datasource
str name=namedatasource-1/str
str name=drivercom.mysql.jdbc.Driver/str

 /lst
  lst name=datasource
str name=namedatasource-2/str
str name=drivercom.mysql.jdbc.Driver/str

 /lst
   /lst
 /requestHandler

and each entity can have its dataSource attribute refer to something
eg:
entity name=one dataSource=datasource-1 ..
/entity

entity name=two dataSource=datasource-2 ..
/entity

But as I see you have a usecase where prod and qa uses different dbs. But

So betweenprod and qa us can change the solrconfig xml
--Noble

On undefined, Ismail Siddiqui [EMAIL PROTECTED] wrote:
 Hi I have a situaion where I am using dataimport handler with development db
  and  going to use it with production database in production environment

  I have entry in solr-config.xml like this

  requestHandler name=/dataimport
  class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
   str name=config/home/username/data-config.xml/str
   lst name=datasource
  str name=drivercom.mysql.jdbc.Driver/str
  str name=urljdbc:mysql://localhost/dbname/str
  str name=userdb_username/str
  str name=passworddb_password/str
   /lst
 /lst
   /requestHandler

  I understand i can add  another datasource called datasource-2 . but how can
  I can use this datasource to index data

  currently i am colling somethign  /dataimport?command=full-import or
  /dataimport?command=delta-import.How can i define a particular db to be
  called
  so it indexes dev db on development machine and prod db in production
  environmnt.


  thanks




-- 
--Noble Paul


Re: numDocs and maxDoc

2008-04-02 Thread Vinci

Hi,

Thanks hossman, this is exactly what I want to do.
Final question: so I need to merge the field by myself first? (Actually my
original plan is to do 2 consecutive postingso merging is possible)

Thank you,
Vinci


hossman wrote:
 
 
 : I am trying to update the index by 2 stage posting: part of the index
 will
 : be posted in stage 1 by 1.xml, then after a meanwhiles the left of the
 index
 : of the entry will be posted by 2.xml. Assume both 1.xml and 2.xml have 3
 : document and id is used as unique field, what I see in the admin panel
 make
 
 my gut tells me that what you mean by this is that you want to index 
 fields A and B for documents 1, 2, and 3; and then later you want to 
 provide valudes for additional fields C and D for the same documents (1,2 
 and 3)
 
 updating documents is not currently supported in Solr.  there has 
 been lots of dicsussion about it in the past, and some patches exist in 
 Jira that approach the problem, but it's a lot harder then it seems like 
 it should be because of hte way Lucene works - esentially Solr under the 
 covers does the exact same thing you currently have do do: keep a record 
 of all the fields for all the documents, and reindex the *whole* document 
 once you have them.
 
 : me feels confusing:
 : numDocs : 3
 : maxDoc : 6 
 
 numDocs is hte number of unique live Documents in the index.  it's how 
 many docs you would get back fro ma query for *:*.  maxDoc is the maximum 
 internal document id currently in use.  the difference between those 
 numbers gives you an idea of how many deleted (orreplaced) documents are 
 currently still in the index ... they gradually get cleaned up as segments 
 get merged or when the index gets optimized.
 
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/numDocs-and-maxDoc-tp16448068p16465796.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple unique field?

2008-04-02 Thread Vinci

Hi,

Thank you for your reply.
When I set 2 unique key field, it looks like Solr only accept the first
definition in schema.xml...question: so once the unique Key defined, it
can't be overrided?

Thank you,
Vinci


ryantxu wrote:
 

 Thank you for your reply
 In other word, can I set 2 unique key field?
 
 directly in solr: no
 
 In your own code, yes -- either in the client or in custom plugin.
 
 ryan
 
 

-- 
View this message in context: 
http://www.nabble.com/Multiple-unique-field--tp16367339p16465798.html
Sent from the Solr - User mailing list archive at Nabble.com.