date:20100212

Text search within facets?

2010-02-12 Thread chasiubao


Hello,

Is it possible to do a text search within facets?  Something that will
return me what words solr used to gather my results and how many of those
results were found.

For example, if I have the following field:

field name=dog type=string indexed=true stored=true/

and it has docs that contain something like

str name=dogenglish bulldog/str
str name=dogfrench bulldog/str
str name=dogbichon frise/str

If I search for english bulldog and facet on dog, I will get the
following:

int name=english bulldog135/int
int name=french bulldog23/int
int name=bichon frise12/int

But I really want only the ones that contain the words english and
bulldog like 

int name=english bulldog135/int
int name=french bulldog23/int

Thanks for your help!
-- 
View this message in context: 
http://old.nabble.com/Text-search-within-facets--tp27560090p27560090.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to reindex data without restarting server

2010-02-12 Thread Emad Mushtaq

Hi,

Thanks ! This is very useful :) :)

On Fri, Feb 12, 2010 at 7:55 AM, Joe Calderon calderon@gmail.comwrote:

 if you use the core model via solr.xml you can reload a core without having
 to to restart the servlet container,
 http://wiki.apache.org/solr/CoreAdmin

 On 02/11/2010 02:40 PM, Emad Mushtaq wrote:

 Hi,

 I would like to know if there is a way of reindexing data without
 restarting
 the server. Lets say I make a change in the schema file. That would
 require
 me to reindex data. Is there a solution to this ?







-- 
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/

EmbeddedSolrServer vs CommonsHttpSolrServer

2010-02-12 Thread dcdmailbox-info

Hi all,

I am new to solr/solrj.

I correctly started up the server example given in the distribution 
(apache-solr-1.4.0\example\solr), populated the index with test data set, and 
successfully tested with http query string via browser (es. 
http://localhost:8983/solr/select/?indent=onq=videofl=name,id)

I am trying to set up solrj clients using both CommonsHttpSolrServer and 
EmbeddedSolrServer.

My examples are with single core configuration.

Here below the method used for CommonsHttpSolrServer initialization:

[code.1]
public SolrServer getCommonsHttpSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException {
String url = http://localhost:8983/solr;;
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
server.setSoTimeout(1000); // socket read timeout
server.setConnectionTimeout(100);
server.setDefaultMaxConnectionsPerHost(100);
server.setMaxTotalConnections(100);
server.setFollowRedirects(false); // defaults to false
// allowCompression defaults to false.
// Server side must support gzip or deflate for this to have any effect.
server.setAllowCompression(true);
server.setMaxRetries(1); // defaults to 0.  1 not recommended.
return server;
}

Here below the method used for EmbeddedSolrServer initialization (provided in 
the wiki section):

[code.2]
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException {
System.setProperty(solr.solr.home, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr);
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, );
return server;
}

Here below the common code used to query the server: 
[code.3]

SolrServer server = mintIdxMain.getEmbeddedSolrServer();
//SolrServer server = mintIdxMain.getCommonsHttpSolrServer();

SolrQuery query = new SolrQuery(video);
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();

System.out.println(Found:  + docs.getNumFound());
System.out.println(Start:  + docs.getStart());
System.out.println(Max Score:  + docs.getMaxScore());

 
CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives 
always no results.
What's wrong with the initialization and/or the configuration of the 
EmbeddedSolrServer?
CoreContainer.Initializer() seems to not recognize the single core from 
solrconfig.xml...

If I modify [code.2] with the following code, it seems to work. 
I manually added only explicit Core Container registration. 
Is [code.4] the correct way?

[code.4]
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException {
System.setProperty(solr.solr.home, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr);

CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();

/*  */
SolrConfig solrConfig = new 
SolrConfig(/WORKSPACE/bin/apache-solr-1.4.0/example/solr, solrconfig.xml, 
null);
IndexSchema indexSchema = new IndexSchema(solrConfig, schema.xml, 
null);
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, , 
solrConfig.getResourceLoader().getInstanceDir());
SolrCore core = new SolrCore(null, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data, solrConfig, indexSchema, 
coreDescriptor);
coreContainer.register(, core, false);
/*  */

EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, );
return server;
}

Many thanks in advance for the support and the great work realized with all the 
lucene/solr projects.

Dino.
--

inconsistency between analysis.jsp and actual search

2010-02-12 Thread Lukas Kahwe Smith

Hi

I am indexing the name FC St. Gallen using the following type:
fieldType name=prefix_token class=solr.TextField 
positionIncrementGap=1
  analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt /
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.EdgeNGramFilterFactory minGramSize=1 
maxGramSize=20 /
  /analyzer
  analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt /
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
  /analyzer
/fieldType

Which according to analysis.jsp gets split into:
f | fc | s | st | g | ga | gal | gall | galle | gallen

So far so good.

Now if I search for fc st.gallen according to analysis.jsp it will search for:
fc | st | gallen

But when I do a dismax search using the following handler:
  requestHandler name=auto class=solr.SearchHandler default=true
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
 int name=rows10/int
 str name=qfname firstname email^0.5 telefon^0.5 city^0.6 
street^0.6/str
 str name=flid,type,name,firstname,zipcode,city,street,urlizedname/str
/lst
  /requestHandler

I do not get a match.
Looking at the debug of the query I can see that its actually splitting the 
query into fc and st gallen:
str name=rawquerystringfc st.gallen/str
str name=querystringfc st.gallen/str
str name=parsedquery
+((DisjunctionMaxQuery((telefon:fc^0.5 | firstname:fc | email:fc^0.5 | 
street:fc^0.6 | city:fc^0.6 | name:fc)) DisjunctionMaxQuery((telefon:st 
gallen^0.5 | firstname:st gallen | email:st gallen^0.5 | street:st 
gallen^0.6 | city:st gallen^0.6 | name:st gallen)))~2) ()
/str
str name=parsedquery_toString
+(((telefon:fc^0.5 | firstname:fc | email:fc^0.5 | street:fc^0.6 | city:fc^0.6 
| name:fc) (telefon:st gallen^0.5 | firstname:st gallen | email:st 
gallen^0.5 | street:st gallen^0.6 | city:st gallen^0.6 | name:st 
gallen))~2) ()
/str

Whats going on there?

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

Re: EmbeddedSolrServer vs CommonsHttpSolrServer

2010-02-12 Thread Ron Chan

I suspect this has something to do with the dataDir setting in the example 's 
solrconfig.xml 

dataDir${solr.data.dir:./solr/data}/dataDir 

we use the example's solrconfig.xml as the base for our deployments and always 
comment this out 

the default of having conf and data sitting under the solr home works well 


- Original Message - 
From: dcdmailbox-i...@yahoo.it 
To: solr-user@lucene.apache.org 
Sent: Friday, 12 February, 2010 8:30:57 AM 
Subject: EmbeddedSolrServer vs CommonsHttpSolrServer 

Hi all, 

I am new to solr/solrj. 

I correctly started up the server example given in the distribution 
(apache-solr-1.4.0\example\solr), populated the index with test data set, and 
successfully tested with http query string via browser (es. 
http://localhost:8983/solr/select/?indent=onq=videofl=name,id) 

I am trying to set up solrj clients using both CommonsHttpSolrServer and 
EmbeddedSolrServer. 

My examples are with single core configuration. 

Here below the method used for CommonsHttpSolrServer initialization: 

[code.1] 
public SolrServer getCommonsHttpSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
String url = http://localhost:8983/solr;; 
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); 
server.setSoTimeout(1000); // socket read timeout 
server.setConnectionTimeout(100); 
server.setDefaultMaxConnectionsPerHost(100); 
server.setMaxTotalConnections(100); 
server.setFollowRedirects(false); // defaults to false 
// allowCompression defaults to false. 
// Server side must support gzip or deflate for this to have any effect. 
server.setAllowCompression(true); 
server.setMaxRetries(1); // defaults to 0.  1 not recommended. 
return server; 
} 

Here below the method used for EmbeddedSolrServer initialization (provided in 
the wiki section): 

[code.2] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty(solr.solr.home, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr); 
CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); 
return server; 
} 

Here below the common code used to query the server: 
[code.3] 

SolrServer server = mintIdxMain.getEmbeddedSolrServer(); 
//SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); 

SolrQuery query = new SolrQuery(video); 
QueryResponse rsp = server.query(query); 
SolrDocumentList docs = rsp.getResults(); 

System.out.println(Found :  + docs.getNumFound()); 
System.out.println(Start :  + docs.getStart()); 
System.out.println(Max Score:  + docs.getMaxScore()); 


CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives 
always no results. 
What's wrong with the initialization and/or the configuration of the 
EmbeddedSolrServer? 
CoreContainer.Initializer() seems to not recognize the single core from 
solrconfig.xml... 

If I modify [code.2] with the following code, it seems to work. 
I manually added only explicit Core Container registration. 
Is [code.4] the correct way? 

[code.4] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty(solr.solr.home, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr); 

CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 

/*  */ 
SolrConfig solrConfig = new 
SolrConfig(/WORKSPACE/bin/apache-solr-1.4.0/example/solr, solrconfig.xml, 
null); 
IndexSchema indexSchema = new IndexSchema(solrConfig, schema.xml, null); 
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, , 
solrConfig.getResourceLoader().getInstanceDir()); 
SolrCore core = new SolrCore(null, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data, solrConfig, indexSchema, 
coreDescriptor); 
coreContainer.register(, core, false); 
/*  */ 

EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); 
return server; 
} 

Many thanks in advance for the support and the great work realized with all the 
lucene/solr projects. 

Dino. 
--

Local Solr Inconsistent results for radius

2010-02-12 Thread Emad Mushtaq

Hello,

I have a question related to local solr. For certain locations (latitude,
longitude), the spatial search does not work. Here is the query I try to
make which gives me no results:

q=*qt=geosort=geo_distance asclat=33.718151long=73.
060547radius=450

However if I make the same query with radius=449, it gives me results.

Here is part of my solrconfig.xml containing startTier and endTier:

updateRequestProcessorChain
 processor
class=com.pjaol.search.solr.update.LocalUpdateProcessorFactory
str name=latFieldlatitude/str !-- The field used to store
your latitude --
str name=lngFieldlongitude/str !-- The field used to store
your longitude --

int name=startTier9/int
int name=endTier17/int
   /processor
   processor class=solr.RunUpdateProcessorFactory /
   processor class=solr.LogUpdateProcessorFactory /
   /updateRequestProcessorChain

What do I need to do to fix this problem?


-- 
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/

Re: inconsistency between analysis.jsp and actual search

2010-02-12 Thread Lukas Kahwe Smith


On 12.02.2010, at 11:17, Ahmet Arslan wrote:
 analysis.jsp does not do actual query parsing. just shows produced tokens 
 step by step in analysis (charfilter, tokenizer, tokenfilter) phase.
 admin/analysis.jsp page will show you how your field is processed while 
 indexing and while querying, and if a particular query matches. [1]
 
 [1]http://wiki.apache.org/solr/FAQ#My_search_returns_too_many_.2BAC8_too_little_.2BAC8_unexpected_results.2C_how_to_debug.3F


I see, thats good to know. Maybe even something that should be noted in the 
analysis.jsp page itself.

Anyways so how can I get st.gallen split into two terms at query time?

fieldType name=prefix_token class=solr.TextField 
positionIncrementGap=1
  analyzer type=index
...
  /analyzer
  analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt /
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
  /analyzer
/fieldType

It seems I should probably use the solr.StandardTokenizerFactory anyways, but 
for this case it wouldnt help either.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

optimize is taking too much time

2010-02-12 Thread mklprasad


hi 
in my solr u have 1,42,45,223 records having some 50GB .
Now when iam loading a new record and when its trying optimize the docs its
taking 2 much memory and time 


can any body please tell do we have any property in solr to get rid of this.

Thanks in advance

-- 
View this message in context: 
http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: EmbeddedSolrServer vs CommonsHttpSolrServer

2010-02-12 Thread dcdmailbox-info

Yes you are right.
[code.2] works fine by commenting out the following lines on solrconfig.xml 

!-- 
Used to specify an alternate directory to hold all index data
other than the default ./data under the Solr home.
If replication is in use, this should match the replication configuration. 
--
!--
dataDir${solr.data.dir:./solr/data}/dataDir
--


Is it correct this different behaviour  from EmbeddedSolrServer ?
Or it can be considered a low priority bug?
Thanks for you prompt reply!
Dino.
--





Da: Ron Chan rc...@i-tao.com
A: solr-user@lucene.apache.org
Inviato: Ven 12 febbraio 2010, 11:14:58
Oggetto: Re: EmbeddedSolrServer  vs CommonsHttpSolrServer

I suspect this has something to do with the dataDir setting in the example 's 
solrconfig.xml 

dataDir${solr.data.dir:./solr/data}/dataDir 

we use the example's solrconfig.xml as the base for our deployments and always 
comment this out 

the default of having conf and data sitting under the solr home works well 


- Original Message - 
From: dcdmailbox-i...@yahoo.it 
To: solr-user@lucene.apache.org 
Sent: Friday, 12 February, 2010 8:30:57 AM 
Subject: EmbeddedSolrServer vs CommonsHttpSolrServer 

Hi all, 

I am new to solr/solrj. 

I correctly started up the server example given in the distribution 
(apache-solr-1.4.0\example\solr), populated the index with test data set, and 
successfully tested with http query string via browser (es. 
http://localhost:8983/solr/select/?indent=onq=videofl=name,id) 

I am trying to set up solrj clients using both CommonsHttpSolrServer and 
EmbeddedSolrServer. 

My examples are with single core configuration. 

Here below the method used for CommonsHttpSolrServer initialization: 

[code.1] 
public SolrServer getCommonsHttpSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
String url = http://localhost:8983/solr;; 
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); 
server.setSoTimeout(1000); // socket read timeout 
server.setConnectionTimeout(100); 
server.setDefaultMaxConnectionsPerHost(100); 
server.setMaxTotalConnections(100); 
server.setFollowRedirects(false); // defaults to false 
// allowCompression defaults to false. 
// Server side must support gzip or deflate for this to have any effect. 
server.setAllowCompression(true); 
server.setMaxRetries(1); // defaults to 0.  1 not recommended. 
return server; 
} 

Here below the method used for EmbeddedSolrServer initialization (provided in 
the wiki section): 

[code.2] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty(solr.solr.home, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr); 
CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); 
return server; 
} 

Here below the common code used to query the server: 
[code.3] 

SolrServer server = mintIdxMain.getEmbeddedSolrServer(); 
//SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); 

SolrQuery query = new SolrQuery(video); 
QueryResponse rsp = server.query(query); 
SolrDocumentList docs = rsp.getResults(); 

System.out.println(Found :  + docs.getNumFound()); 
System.out.println(Start :  + docs.getStart()); 
System.out.println(Max Score:  + docs.getMaxScore()); 


CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives 
always no results. 
What's wrong with the initialization and/or the configuration of the 
EmbeddedSolrServer? 
CoreContainer.Initializer() seems to not recognize the single core from 
solrconfig.xml... 

If I modify [code.2] with the following code, it seems to work. 
I manually added only explicit Core Container registration. 
Is [code.4] the correct way? 

[code.4] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty(solr.solr.home, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr); 

CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 

/*  */ 
SolrConfig solrConfig = new 
SolrConfig(/WORKSPACE/bin/apache-solr-1.4.0/example/solr, solrconfig.xml, 
null); 
IndexSchema indexSchema = new IndexSchema(solrConfig, schema.xml, null); 
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, , 
solrConfig.getResourceLoader().getInstanceDir()); 
SolrCore core = new SolrCore(null, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data, solrConfig, indexSchema, 
coreDescriptor); 
coreContainer.register(, core, false); 
/*  */ 

EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); 
return server; 
} 

Many thanks in advance for the support and the great work realized with all the 
lucene/solr projects. 

Dino. 
--

Re: EmbeddedSolrServer vs CommonsHttpSolrServer

2010-02-12 Thread Erik Hatcher

When using EmbeddedSolrServer, you could simply set the solr.data.dir  
system property or launch your process from the same working directory  
where you are launching the HTTP version of Solr.  Either of those  
should also work to alleviate this issue.


Erik

On Feb 12, 2010, at 5:36 AM, dcdmailbox-i...@yahoo.it wrote:


Yes you are right.
[code.2] works fine by commenting out the following lines on  
solrconfig.xml


!--
Used to specify an alternate directory to hold all index data
other than the default ./data under the Solr home.
If replication is in use, this should match the replication  
configuration.

--
!--
dataDir${solr.data.dir:./solr/data}/dataDir
--


Is it correct this different behaviour  from EmbeddedSolrServer ?
Or it can be considered a low priority bug?
Thanks for you prompt reply!
Dino.
--





Da: Ron Chan rc...@i-tao.com
A: solr-user@lucene.apache.org
Inviato: Ven 12 febbraio 2010, 11:14:58
Oggetto: Re: EmbeddedSolrServer  vs CommonsHttpSolrServer

I suspect this has something to do with the dataDir setting in the  
example 's solrconfig.xml


dataDir${solr.data.dir:./solr/data}/dataDir

we use the example's solrconfig.xml as the base for our deployments  
and always comment this out


the default of having conf and data sitting under the solr home  
works well



- Original Message -
From: dcdmailbox-i...@yahoo.it
To: solr-user@lucene.apache.org
Sent: Friday, 12 February, 2010 8:30:57 AM
Subject: EmbeddedSolrServer vs CommonsHttpSolrServer

Hi all,

I am new to solr/solrj.

I correctly started up the server example given in the distribution  
(apache-solr-1.4.0\example\solr), populated the index with test data  
set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=onq=videofl=name,id)


I am trying to set up solrj clients using both CommonsHttpSolrServer  
and EmbeddedSolrServer.


My examples are with single core configuration.

Here below the method used for CommonsHttpSolrServer initialization:

[code.1]
public SolrServer getCommonsHttpSolrServer() throws IOException,  
ParserConfigurationException, SAXException, SolrServerException {

String url = http://localhost:8983/solr;;
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
server.setSoTimeout(1000); // socket read timeout
server.setConnectionTimeout(100);
server.setDefaultMaxConnectionsPerHost(100);
server.setMaxTotalConnections(100);
server.setFollowRedirects(false); // defaults to false
// allowCompression defaults to false.
// Server side must support gzip or deflate for this to have any  
effect.

server.setAllowCompression(true);
server.setMaxRetries(1); // defaults to 0.  1 not recommended.
return server;
}

Here below the method used for EmbeddedSolrServer initialization  
(provided in the wiki section):


[code.2]
public SolrServer getEmbeddedSolrServer() throws IOException,  
ParserConfigurationException, SAXException, SolrServerException {
System.setProperty(solr.solr.home, /WORKSPACE/bin/apache- 
solr-1.4.0/example/solr);
CoreContainer.Initializer initializer = new  
CoreContainer.Initializer();

CoreContainer coreContainer = initializer.initialize();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, );
return server;
}

Here below the common code used to query the server:
[code.3]

SolrServer server = mintIdxMain.getEmbeddedSolrServer();
//SolrServer server = mintIdxMain.getCommonsHttpSolrServer();

SolrQuery query = new SolrQuery(video);
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();

System.out.println(Found :  + docs.getNumFound());
System.out.println(Start :  + docs.getStart());
System.out.println(Max Score:  + docs.getMaxScore());


CommonsHttpSolrServer gives correct results whereas  
EmbeddedSolrServer gives always no results.

What's wrong with the initialization and/or the configuration of the
EmbeddedSolrServer?
CoreContainer.Initializer() seems to not recognize the single core  
from solrconfig.xml...


If I modify [code.2] with the following code, it seems to work.
I manually added only explicit Core Container registration.
Is [code.4] the correct way?

[code.4]
public SolrServer getEmbeddedSolrServer() throws IOException,  
ParserConfigurationException, SAXException, SolrServerException {
System.setProperty(solr.solr.home, /WORKSPACE/bin/apache- 
solr-1.4.0/example/solr);


CoreContainer.Initializer initializer = new  
CoreContainer.Initializer();

CoreContainer coreContainer = initializer.initialize();

/*  */
SolrConfig solrConfig = new SolrConfig(/WORKSPACE/bin/apache- 
solr-1.4.0/example/solr, solrconfig.xml, null);
IndexSchema indexSchema = new IndexSchema(solrConfig, schema.xml,  
null);
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer,  
, solrConfig.getResourceLoader().getInstanceDir());
SolrCore core = new SolrCore(null, /WORKSPACE/bin/apache-solr-1.4.0/ 
example/solr/data, solrConfig, indexSchema,

Re: EmbeddedSolrServer vs CommonsHttpSolrServer

2010-02-12 Thread Ron Chan

don't think this is a bug, the default behaviour is for /data to sit under Solr 
home 

there should be no need to use this parameter unless it is special case 

not sure why it is like this in the example 


- Original Message - 
From: dcdmailbox-i...@yahoo.it 
To: solr-user@lucene.apache.org 
Sent: Friday, 12 February, 2010 10:36:41 AM 
Subject: Re: EmbeddedSolrServer vs CommonsHttpSolrServer 

Yes you are right. 
[code.2] works fine by commenting out the following lines on solrconfig.xml 

!-- 
Used to specify an alternate directory to hold all index data 
other than the default ./data under the Solr home. 
If replication is in use, this should match the replication configuration. 
-- 
!-- 
dataDir${solr.data.dir:./solr/data}/dataDir 
-- 


Is it correct this different behaviour from EmbeddedSolrServer ? 
Or it can be considered a low priority bug? 
Thanks for you prompt reply! 
Dino. 
-- 




 
Da: Ron Chan rc...@i-tao.com 
A: solr-user@lucene.apache.org 
Inviato: Ven 12 febbraio 2010, 11:14:58 
Oggetto: Re: EmbeddedSolrServer vs CommonsHttpSolrServer 

I suspect this has something to do with the dataDir setting in the example 's 
solrconfig.xml 

dataDir${solr.data.dir:./solr/data}/dataDir 

we use the example's solrconfig.xml as the base for our deployments and always 
comment this out 

the default of having conf and data sitting under the solr home works well 


- Original Message - 
From: dcdmailbox-i...@yahoo.it 
To: solr-user@lucene.apache.org 
Sent: Friday, 12 February, 2010 8:30:57 AM 
Subject: EmbeddedSolrServer vs CommonsHttpSolrServer 

Hi all, 

I am new to solr/solrj. 

I correctly started up the server example given in the distribution 
(apache-solr-1.4.0\example\solr), populated the index with test data set, and 
successfully tested with http query string via browser (es. 
http://localhost:8983/solr/select/?indent=onq=videofl=name,id) 

I am trying to set up solrj clients using both CommonsHttpSolrServer and 
EmbeddedSolrServer. 

My examples are with single core configuration. 

Here below the method used for CommonsHttpSolrServer initialization: 

[code.1] 
public SolrServer getCommonsHttpSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
String url = http://localhost:8983/solr;; 
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); 
server.setSoTimeout(1000); // socket read timeout 
server.setConnectionTimeout(100); 
server.setDefaultMaxConnectionsPerHost(100); 
server.setMaxTotalConnections(100); 
server.setFollowRedirects(false); // defaults to false 
// allowCompression defaults to false. 
// Server side must support gzip or deflate for this to have any effect. 
server.setAllowCompression(true); 
server.setMaxRetries(1); // defaults to 0.  1 not recommended. 
return server; 
} 

Here below the method used for EmbeddedSolrServer initialization (provided in 
the wiki section): 

[code.2] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty(solr.solr.home, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr); 
CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); 
return server; 
} 

Here below the common code used to query the server: 
[code.3] 

SolrServer server = mintIdxMain.getEmbeddedSolrServer(); 
//SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); 

SolrQuery query = new SolrQuery(video); 
QueryResponse rsp = server.query(query); 
SolrDocumentList docs = rsp.getResults(); 

System.out.println(Found :  + docs.getNumFound()); 
System.out.println(Start :  + docs.getStart()); 
System.out.println(Max Score:  + docs.getMaxScore()); 


CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives 
always no results. 
What's wrong with the initialization and/or the configuration of the 
EmbeddedSolrServer? 
CoreContainer.Initializer() seems to not recognize the single core from 
solrconfig.xml... 

If I modify [code.2] with the following code, it seems to work. 
I manually added only explicit Core Container registration. 
Is [code.4] the correct way? 

[code.4] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty(solr.solr.home, 
/WORKSPACE/bin/apache-solr-1.4.0/example/solr); 

CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 

/*  */ 
SolrConfig solrConfig = new 
SolrConfig(/WORKSPACE/bin/apache-solr-1.4.0/example/solr, solrconfig.xml, 
null); 
IndexSchema indexSchema = new IndexSchema(solrConfig, schema.xml, null); 
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, , 
solrConfig.getResourceLoader().getInstanceDir());

Good literature on search basics

2010-02-12 Thread javaxmlsoapdev


Does anyone know good literature(web resources, books etc) on basics of
search? I do have Solr 1.4 and Lucene books but wanted to go in more details
on basics. 

Thanks,
-- 
View this message in context: 
http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html
Sent from the Solr - User mailing list archive at Nabble.com.

persistent cache

2010-02-12 Thread Tim Terlegård

Does Solr use some sort of a persistent cache?

I do this 10 times in a loop:
  * start solr
  * create a core
  * execute warmup query
  * execute query with sort fields
  * stop solr

Executing the query with sort fields takes 5-20 times longer the first
iteration than the other 9 iterations. For instance I have a query
'hockey' with one date sort field. That takes 768 ms in the first
iteration of the loop. The next 9 iterations the query takes 52 ms.
The solr and jetty server really stops in each iteration so the RAM
must be emptied. So the only way I can think of why this happens is
because there is some persistent cache that survives the solr
restarts. Is this the case? Or why could this be?

/Tim

Re: persistent cache

2010-02-12 Thread Shalin Shekhar Mangar

2010/2/12 Tim Terlegård tim.terleg...@gmail.com

 Does Solr use some sort of a persistent cache?

 I do this 10 times in a loop:
  * start solr
  * create a core
  * execute warmup query
  * execute query with sort fields
  * stop solr

 Executing the query with sort fields takes 5-20 times longer the first
 iteration than the other 9 iterations. For instance I have a query
 'hockey' with one date sort field. That takes 768 ms in the first
 iteration of the loop. The next 9 iterations the query takes 52 ms.
 The solr and jetty server really stops in each iteration so the RAM
 must be emptied. So the only way I can think of why this happens is
 because there is some persistent cache that survives the solr
 restarts. Is this the case? Or why could this be?


Solr does not have a persistent cache. That is the operating system's file
cache at work.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Dismax phrase queries

2010-02-12 Thread Shalin Shekhar Mangar

On Fri, Feb 12, 2010 at 6:06 AM, Jason Rutherglen 
jason.rutherg...@gmail.com wrote:

 I'd like to boost an exact phrase match such as q=video poker over
 q=video poker.  How would I do this using dismax?

 I tried pre-processing video poker into, video poker video poker
 however that just gets munged by dismax into video poker video
 poker... Which is wrong.


Have you tried the pf parameter?

-- 
Regards,
Shalin Shekhar Mangar.

Re: spellcheck

2010-02-12 Thread michaelnazaruk


I try to config spellcheck, but I still have this problem:
Config:
lst name=spellchecker
  str name=classnamesolr.FileBasedSpellChecker/str
  str name=namefile/str
  str name=sourceLocationspellings.txt/str
  str name=characterEncodingUTF-8/str
  str name=spellcheckIndexDir./spellcheckerFile/str
/lst

  /searchComponent


  requestHandler name=/spell class=solr.SearchHandler lazy=true
lst name=file
  str name=spellcheck.onlyMorePopularfalse/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count1/str
  str name=spellchecktrue/str
  str name=spellcheck.dictionaryfile/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

Maybe I have this result because I work with dictionary? For request
'popular' I still get 'populars', but in dictionary I have popular and
populars! 
-- 
View this message in context: 
http://old.nabble.com/spellcheck-tp27527425p27562959.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Local Solr Inconsistent results for radius

2010-02-12 Thread Mauricio Scheffer

Hi Emad,

I had the same issue (
http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it
seems that this happens only on eastern areas of the world. Try inverting
the sign of all your longitudes, or translate all your longitudes to the
west.

Cheers,
Mauricio

On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq
emad.mush...@sigmatec.com.pkwrote:

 Hello,

 I have a question related to local solr. For certain locations (latitude,
 longitude), the spatial search does not work. Here is the query I try to
 make which gives me no results:

 q=*qt=geosort=geo_distance asclat=33.718151long=73.
 060547radius=450

 However if I make the same query with radius=449, it gives me results.

 Here is part of my solrconfig.xml containing startTier and endTier:

 updateRequestProcessorChain
 processor
 class=com.pjaol.search.solr.update.LocalUpdateProcessorFactory
str name=latFieldlatitude/str !-- The field used to store
 your latitude --
str name=lngFieldlongitude/str !-- The field used to store
 your longitude --

int name=startTier9/int
int name=endTier17/int
   /processor
   processor class=solr.RunUpdateProcessorFactory /
   processor class=solr.LogUpdateProcessorFactory /
   /updateRequestProcessorChain

 What do I need to do to fix this problem?


 --
 Muhammad Emad Mushtaq
 http://www.emadmushtaq.com/

Re: inconsistency between analysis.jsp and actual search

2010-02-12 Thread Ahmet Arslan

 Anyways so how can I get st.gallen split into two terms
 at query time?

As you mentioned in your first mail, query st.gallen is already broken into two 
terms/words. But query parser constructs a phrase query.

There was an disscussion about this behaviour earlier.
http://www.lucidimagination.com/search/document/d41bc0ef422b9238/understanding_the_query_parser#85db37e69ef29dba

Fwd: indexing: issue with default values

2010-02-12 Thread nabil rabhi

in the schema.xml I have fileds with int type and default value
exp:  field name=postal_code type=int indexed=true stored=true
default=0/
but when a document has no value for the field postal_code
at indexing, I get the following error:

Posting file Immo.xml to http://localhost:8983/solr/update/
html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 500 /title
/head
bodyh2HTTP ERROR: 500/h2preFor input string: 

java.lang.NumberFormatException: For input string: 
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:470)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
/pre

/body
/html

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime4/int/lst
/response

any help? thx

Re: persistent cache

2010-02-12 Thread Tim Terlegård

2010/2/12 Shalin Shekhar Mangar shalinman...@gmail.com:
 2010/2/12 Tim Terlegård tim.terleg...@gmail.com

 Does Solr use some sort of a persistent cache?

 Solr does not have a persistent cache. That is the operating system's file
 cache at work.

Aha, that's very interesting and seems to make sense.

So is the primary goal of warmup queries to allow the operating system
to cache all the files in the data/index directory? Because I think
the difference (768ms vs 52ms) is pretty big. I just do one warmup
query and get 52 ms response on a 40 million documents index. I think
that's pretty nice performance without tinkering with the caches at
all. The only tinkering that seems to be needed is this operating
system file caching. What's the best way to make sure that my warmup
queries have cached all the files? And does a file cache have the
complete file in memory? I guess it can get tough to get my 100GB
index into the 16GB memory.

/Tim

Re: Good literature on search basics

2010-02-12 Thread Jaco

See http://markmail.org/thread/z5sq2jr2a6eayth4


On 12 February 2010 12:14, javaxmlsoapdev vika...@yahoo.com wrote:


 Does anyone know good literature(web resources, books etc) on basics of
 search? I do have Solr 1.4 and Lucene books but wanted to go in more
 details
 on basics.

 Thanks,
 --
 View this message in context:
 http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing: issue with default values

2010-02-12 Thread Erik Hatcher

When a document has no value, are you still sending a postal_code  
field in your post to Solr?  Seems like you are.


Erik

On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote:


in the schema.xml I have fileds with int type and default value
exp:  field name=postal_code type=int indexed=true  
stored=true

default=0/
but when a document has no value for the field postal_code
at indexing, I get the following error:

Posting file Immo.xml to http://localhost:8983/solr/update/
html
head
meta http-equiv=Content-Type content=text/html;  
charset=ISO-8859-1/

titleError 500 /title
/head
bodyh2HTTP ERROR: 500/h2preFor input string: 

java.lang.NumberFormatException: For input string: 
   at
java 
.lang 
.NumberFormatException.forInputString(NumberFormatException.java:48)

   at java.lang.Integer.parseInt(Integer.java:470)
   at java.lang.Integer.parseInt(Integer.java:499)
   at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
   at  
org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)

   at
org 
.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java: 
246)

   at
org 
.apache 
.solr 
.update 
.processor 
.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java: 
139)

   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at
org 
.apache 
.solr 
.handler 
.ContentStreamHandlerBase 
.handleRequestBody(ContentStreamHandlerBase.java:54)

   at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

   at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)

   at
org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)

   at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java: 
365)

   at
org 
.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 
216)

   at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java: 
181)

   at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java: 
712)
   at  
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)

   at
org 
.mortbay 
.jetty 
.handler 
.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)

   at
org 
.mortbay 
.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)

   at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java: 
139)

   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)

   at
org.mortbay.jetty.HttpConnection 
$RequestHandler.content(HttpConnection.java:835)

   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at
org.mortbay.jetty.bio.SocketConnector 
$Connection.run(SocketConnector.java:226)

   at
org.mortbay.thread.BoundedThreadPool 
$PoolThread.run(BoundedThreadPool.java:442)

/pre

/body
/html

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime4/int/lst
/response

any help? thx

Re: Dismax phrase queries

2010-02-12 Thread Jason Rutherglen

Was going to post that I more or less figured it out.  Dismax handles
this automatically with the ps parameter, which is different than the
bs parameter...

On Fri, Feb 12, 2010 at 3:48 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Fri, Feb 12, 2010 at 6:06 AM, Jason Rutherglen 
 jason.rutherg...@gmail.com wrote:

 I'd like to boost an exact phrase match such as q=video poker over
 q=video poker.  How would I do this using dismax?

 I tried pre-processing video poker into, video poker video poker
 however that just gets munged by dismax into video poker video
 poker... Which is wrong.


 Have you tried the pf parameter?

 --
 Regards,
 Shalin Shekhar Mangar.

Re: indexing: issue with default values

2010-02-12 Thread nabil rabhi

yes, sometimes the document has postal_code with no values , i still post it
to solr
2010/2/12 Erik Hatcher erik.hatc...@gmail.com

 When a document has no value, are you still sending a postal_code field in
 your post to Solr?  Seems like you are.

Erik


 On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote:

  in the schema.xml I have fileds with int type and default value
 exp:  field name=postal_code type=int indexed=true stored=true
 default=0/
 but when a document has no value for the field postal_code
 at indexing, I get the following error:

 Posting file Immo.xml to http://localhost:8983/solr/update/
 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 500 /title
 /head
 bodyh2HTTP ERROR: 500/h2preFor input string: 

 java.lang.NumberFormatException: For input string: 
   at

 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Integer.parseInt(Integer.java:470)
   at java.lang.Integer.parseInt(Integer.java:499)
   at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
   at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
   at

 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
   at

 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at

 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at

 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at

 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at

 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at

 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
   at

 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 /pre

 /body
 /html

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime4/int/lst
 /response

 any help? thx

Re: Collating results from multiple indexes

2010-02-12 Thread Jan Høydahl / Cominvent

Really? The last time I looked at AIE, I am pretty sure there was Solr core 
msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may be 
mistaken. Anyone from Attivio here who can elaborate? Is the join stuff at 
Lucene level or on top of multiple Solr cores or what?

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote:

 Minor correction re Attivio - their stuff runs on top of Lucene, not Solr.  I 
 *think* they are trying to patent this.
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/
 
 
 
 - Original Message 
 From: Jan Høydahl / Cominvent jan@cominvent.com
 To: solr-user@lucene.apache.org
 Sent: Mon, February 8, 2010 3:33:41 PM
 Subject: Re: Collating results from multiple indexes
 
 Hi,
 
 There is no JOIN functionality in Solr. The common solution is either to 
 accept 
 the high volume update churn, or to add client side code to build a join 
 layer 
 on top of the two indices. I know that Attivio (www.attivio.com) have built 
 some 
 kind of JOIN functionality on top of Solr in their AIE product, but do not 
 know 
 the details or the actual performance.
 
 Why not open a JIRA issue, if there is no such already, to request this as a 
 feature?
 
 --
 Jan Høydahl  - search architect
 Cominvent AS - www.cominvent.com
 
 On 25. jan. 2010, at 22.01, Aaron McKee wrote:
 
 
 Is there any somewhat convenient way to collate/integrate fields from 
 separate 
 indices during result writing, if the indices use the same unique keys? 
 Basically, some sort of cross-index JOIN?
 
 As a bit of background, I have a rather heavyweight dataset of every US 
 business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours 
 to 
 fully index on a decent box). Given the size and relatively stability of the 
 dataset, I generally only update this monthly. However, I have separate 
 advertising-related datasets that need to be updated either hourly or daily 
 (e.g. today's coupon, click revenue remaining, etc.) . These advertiser 
 feeds 
 reference the same keyspace that I use in the main index, but are otherwise 
 significantly lighter weight. Importing and indexing them discretely only 
 takes 
 a couple minutes. Given that Solr/Lucene doesn't support field updating, 
 without 
 having to drop and re-add an entire document, it doesn't seem practical to 
 integrate this data into the main index (the system would be under a 
 constant 
 state of churn, if we did document re-inserts, and the performance impact 
 would 
 probably be debilitating). It may be nice if this data could participate in 
 filtering (e.g. only show advertisers), but it doesn't need to participate 
 in 
 scoring/ranking.
 
 I'm guessing that someone else has had a similar need, at some point?  I 
 can 
 have our front-end query the smaller indices separately, using the keys 
 returned 
 by the primary index, but would prefer to avoid the extra sequential 
 roundtrips. 
 I'm hoping to also avoid a coding solution, if only to avoid the maintenance 
 overhead as we drop in new builds of Solr, but that's also feasible.
 
 Thank you for your insight,
 Aaron

Re: indexing: issue with default values

2010-02-12 Thread nabil rabhi

thanx Eric, that was very helpfull

2010/2/12 Erik Hatcher erik.hatc...@gmail.com

 That would be the problem then, I believe.  Simply don't post a value to
 get the default value to work.

Erik


 On Feb 12, 2010, at 10:18 AM, nabil rabhi wrote:

  yes, sometimes the document has postal_code with no values , i still post
 it
 to solr
 2010/2/12 Erik Hatcher erik.hatc...@gmail.com

  When a document has no value, are you still sending a postal_code field
 in
 your post to Solr?  Seems like you are.

  Erik


 On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote:

 in the schema.xml I have fileds with int type and default value

 exp:  field name=postal_code type=int indexed=true stored=true
 default=0/
 but when a document has no value for the field postal_code
 at indexing, I get the following error:

 Posting file Immo.xml to http://localhost:8983/solr/update/
 html
 head
 meta http-equiv=Content-Type content=text/html;
 charset=ISO-8859-1/
 titleError 500 /title
 /head
 bodyh2HTTP ERROR: 500/h2preFor input string: 

 java.lang.NumberFormatException: For input string: 
  at


 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
  at java.lang.Integer.parseInt(Integer.java:470)
  at java.lang.Integer.parseInt(Integer.java:499)
  at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
  at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
  at


 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
  at


 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
  at


 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
  at


 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at


 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at


 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at


 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
  at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
  at


 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
  at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
  at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
  at


 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
  at


 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
  at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
  at org.mortbay.jetty.Server.handle(Server.java:285)
  at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
  at


 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
  at


 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
  at


 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 /pre

 /body
 /html

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime4/int/lst
 /response

 any help? thx

Re: persistent cache

2010-02-12 Thread Tommy Chheng

 One solution is to add the persistent cache with memcache at the 
application layer.


--
Tommy Chheng

Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com



On 2/12/10 5:19 AM, Tim Terlegård wrote:

2010/2/12 Shalin Shekhar Mangarshalinman...@gmail.com:

2010/2/12 Tim Terlegårdtim.terleg...@gmail.com


Does Solr use some sort of a persistent cache?


Solr does not have a persistent cache. That is the operating system's file
cache at work.

Aha, that's very interesting and seems to make sense.

So is the primary goal of warmup queries to allow the operating system
to cache all the files in the data/index directory? Because I think
the difference (768ms vs 52ms) is pretty big. I just do one warmup
query and get 52 ms response on a 40 million documents index. I think
that's pretty nice performance without tinkering with the caches at
all. The only tinkering that seems to be needed is this operating
system file caching. What's the best way to make sure that my warmup
queries have cached all the files? And does a file cache have the
complete file in memory? I guess it can get tough to get my 100GB
index into the 16GB memory.

/Tim



--
Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com

Re: Text search within facets?

2010-02-12 Thread Ahmet Arslan

 For example, if I have the following field:
 
 field name=dog type=string indexed=true
 stored=true/
 
 and it has docs that contain something like
 
 str name=dogenglish bulldog/str
 str name=dogfrench bulldog/str
 str name=dogbichon frise/str
 
 If I search for english bulldog and facet on dog, I
 will get the
 following:
 
 int name=english bulldog135/int
 int name=french bulldog23/int
 int name=bichon frise12/int

Thats strange. The query english bulldog should return only 
str name=dogenglish bulldog/str since type of dog is string which is not 
tokenized. 
What is your default search field defined in schema.xml? Can you try 
q=dog:english bulldogfacet=truefacet.field=dogfacet.mincount=1

expire/delete documents

2010-02-12 Thread Matthieu Labour

HiIs there a way for solr or lucene to expire documents based on a field in a 
document. Let's say that I have a createTime field whose type is date, can i 
set a policy in schema.xml for solr to delete the documents older than X 
days?Thank you

Re: Local Solr Inconsistent results for radius

2010-02-12 Thread Emad Mushtaq

Hello Mauricio,

Do you know why such a problem occurs. Has it to do with certain latitudes,
longitudes. If so why is it happening. Is it a bug in local solr?

On Fri, Feb 12, 2010 at 5:50 PM, Mauricio Scheffer 
mauricioschef...@gmail.com wrote:

 Hi Emad,

 I had the same issue (
 http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it
 seems that this happens only on eastern areas of the world. Try inverting
 the sign of all your longitudes, or translate all your longitudes to the
 west.

 Cheers,
 Mauricio

 On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq
 emad.mush...@sigmatec.com.pkwrote:

  Hello,
 
  I have a question related to local solr. For certain locations (latitude,
  longitude), the spatial search does not work. Here is the query I try to
  make which gives me no results:
 
  q=*qt=geosort=geo_distance asclat=33.718151long=73.
  060547radius=450
 
  However if I make the same query with radius=449, it gives me results.
 
  Here is part of my solrconfig.xml containing startTier and endTier:
 
  updateRequestProcessorChain
  processor
  class=com.pjaol.search.solr.update.LocalUpdateProcessorFactory
 str name=latFieldlatitude/str !-- The field used to store
  your latitude --
 str name=lngFieldlongitude/str !-- The field used to store
  your longitude --
 
 int name=startTier9/int
 int name=endTier17/int
/processor
processor class=solr.RunUpdateProcessorFactory /
processor class=solr.LogUpdateProcessorFactory /
/updateRequestProcessorChain
 
  What do I need to do to fix this problem?
 
 
  --
  Muhammad Emad Mushtaq
  http://www.emadmushtaq.com/
 




-- 
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/

Re: expire/delete documents

2010-02-12 Thread Mat Brown

You could easily have a scheduled job that ran delete by query to
remove posts older than a certain date...

On Fri, Feb 12, 2010 at 13:00, Matthieu Labour
matthieu_lab...@yahoo.com wrote:
 HiIs there a way for solr or lucene to expire documents based on a field in a 
 document. Let's say that I have a createTime field whose type is date, can i 
 set a policy in schema.xml for solr to delete the documents older than X 
 days?Thank you

Re: Deleting spelll checker index

2010-02-12 Thread darniz


HI Guys 
Opening this thread again.
I need to get around this issue.
i have a spellcheck field defined and i am copying two fileds make and model
to this field
copyField source=make dest=spellText/
copyField source=model dest=spellText/
i have buildoncommit and buildonoptimize set to true hence when i index data
and try to search for a work accod i get back suggestion accord since model
is also being copied.
I stop the sorl server removed the copy filed for model. now i only copy
make to the spellText field and started solr server. 
i refreshed the dictiaonry by issuring the following command.
spellcheck.build=truespellcheck.dictionary=default
So i hope it should rebuild by dictionary, bu the strange thing is that it
still gives a suggestion for accrd.
I have to reindex data again and then it wont offer me suggestion which is
the correct behavour.

How can i create the dictionary again by changing my schema and issuing a
command 
spellcheck.build=truespellcheck.dictionary=default

i cant afford to reindex data everytime.

Any answer ASAP will be appreciated

Thanks
darniz









darniz wrote:
 
 Then i assume the easiest way is to delete the directory itself.
 
 darniz
 
 
 hossman wrote:
 
 
 : We are using Index based spell checker.
 : i was wondering with the help of any url parameters can we delete the
 spell
 : check index directory.
 
 I don't think so.
 
 You might be able to configure two differnet spell check components that 
 point at the same directory -- one hat builds off of a real field, and
 one 
 that builds off of an (empty) text field (using FileBasedSpellChecker) .. 
 then you could trigger a rebuild of an empty spell checking index using 
 the second component.
 
 But i've never tried it so i have no idea if it would work.
 
 
 -Hoss
 
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27567465.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Local Solr Inconsistent results for radius

2010-02-12 Thread Mauricio Scheffer

Yes, it seems to be a bug, at least with the code you and I are using. If
you don't need to search across the whole globe, try translating your
longitudes as I suggested.

On Fri, Feb 12, 2010 at 3:04 PM, Emad Mushtaq
emad.mush...@sigmatec.com.pkwrote:

 Hello Mauricio,

 Do you know why such a problem occurs. Has it to do with certain latitudes,
 longitudes. If so why is it happening. Is it a bug in local solr?

 On Fri, Feb 12, 2010 at 5:50 PM, Mauricio Scheffer 
 mauricioschef...@gmail.com wrote:

  Hi Emad,
 
  I had the same issue (
  http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it
  seems that this happens only on eastern areas of the world. Try inverting
  the sign of all your longitudes, or translate all your longitudes to the
  west.
 
  Cheers,
  Mauricio
 
  On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq
  emad.mush...@sigmatec.com.pkwrote:
 
   Hello,
  
   I have a question related to local solr. For certain locations
 (latitude,
   longitude), the spatial search does not work. Here is the query I try
 to
   make which gives me no results:
  
   q=*qt=geosort=geo_distance asclat=33.718151long=73.
   060547radius=450
  
   However if I make the same query with radius=449, it gives me results.
  
   Here is part of my solrconfig.xml containing startTier and endTier:
  
   updateRequestProcessorChain
   processor
   class=com.pjaol.search.solr.update.LocalUpdateProcessorFactory
  str name=latFieldlatitude/str !-- The field used to store
   your latitude --
  str name=lngFieldlongitude/str !-- The field used to
 store
   your longitude --
  
  int name=startTier9/int
  int name=endTier17/int
 /processor
 processor class=solr.RunUpdateProcessorFactory /
 processor class=solr.LogUpdateProcessorFactory /
 /updateRequestProcessorChain
  
   What do I need to do to fix this problem?
  
  
   --
   Muhammad Emad Mushtaq
   http://www.emadmushtaq.com/
  
 



 --
 Muhammad Emad Mushtaq
 http://www.emadmushtaq.com/

Re: persistent cache

2010-02-12 Thread Tom Burton-West


Hi Tim,

We generally run about 1600 cache-warming queries to warm up the OS disk
cache and the Solr caches when we mount a new index.

Do you have/expect phrase queries?   If you don't, then you don't need to
get any position information into your OS disk cache.  Our position
information takes about 85% of the total index size (*prx files).  So with a
100GB index, your *frq files might only be 15-20GB and you could probably
get more than half of that in 16GB of memory.

If you have limited memory and a large index, then you need to choose cache
warming queries carefully as once the cache is full, further queries will
start evicting older data from the cache.  The tradeoff is to populate the
cache with data that would require the most disk access if the data was not
in the cache versus populating the cache based on your best guess of what
queries your users will execute.  A good overview of the issues is the paper
by Baeza-Yates ( http://doi.acm.org/10.1145/1277741.125 The Impact of
Caching on Search Engines )


Tom Burton-West
Digital Library Production Service
University of Michigan Library
-- 
View this message in context: 
http://old.nabble.com/persistent-cache-tp27562126p27567840.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Has anyone prepared a general purpose synonyms.txt for search engines

2010-02-12 Thread Julian Hille

Hi,

at openthesaurus.org or .com you can find a mysql version of synonyms you just 
have to join it to fit the synonym schema of solr yourself.


Am 12.02.2010 um 20:03 schrieb Emad Mushtaq:

 Hi,
 
 I was wondering if anyone has prepared a synonyms.txt for general purpose
 search engines,  that can be shared. If not could you refer me to places
 where such a synonym list or thesaurus can be found. Synonyms for search
 engines are different from the regular thesaurus. Any help would be highly
 appreciated. Thanks.
 
 -- 
 Muhammad Emad Mushtaq
 http://www.emadmushtaq.com/

Mit freundlichen Grüßen,
Julian Hille

Re: Has anyone prepared a general purpose synonyms.txt for search engines

2010-02-12 Thread Emad Mushtaq

Wow thanks!! You all are awesome! :D :D

On Sat, Feb 13, 2010 at 12:32 AM, Julian Hille jul...@netimpact.de wrote:

 Hi,

 at openthesaurus.org or .com you can find a mysql version of synonyms you
 just have to join it to fit the synonym schema of solr yourself.


 Am 12.02.2010 um 20:03 schrieb Emad Mushtaq:

  Hi,
 
  I was wondering if anyone has prepared a synonyms.txt for general purpose
  search engines,  that can be shared. If not could you refer me to places
  where such a synonym list or thesaurus can be found. Synonyms for search
  engines are different from the regular thesaurus. Any help would be
 highly
  appreciated. Thanks.
 
  --
  Muhammad Emad Mushtaq
  http://www.emadmushtaq.com/

 Mit freundlichen Grüßen,
 Julian Hille





-- 
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/

Re: Has anyone prepared a general purpose synonyms.txt for search engines

2010-02-12 Thread Julian Hille

Hi,

Your welcome. Thats something google came up with some weeks ago :)


Am 12.02.2010 um 20:42 schrieb Emad Mushtaq:

 Wow thanks!! You all are awesome! :D :D
 
 On Sat, Feb 13, 2010 at 12:32 AM, Julian Hille jul...@netimpact.de wrote:
 
 Hi,
 
 at openthesaurus.org or .com you can find a mysql version of synonyms you
 just have to join it to fit the synonym schema of solr yourself.
 
 
 Am 12.02.2010 um 20:03 schrieb Emad Mushtaq:
 
 Hi,
 
 I was wondering if anyone has prepared a synonyms.txt for general purpose
 search engines,  that can be shared. If not could you refer me to places
 where such a synonym list or thesaurus can be found. Synonyms for search
 engines are different from the regular thesaurus. Any help would be
 highly
 appreciated. Thanks.
 
 --
 Muhammad Emad Mushtaq
 http://www.emadmushtaq.com/
 
 Mit freundlichen Grüßen,
 Julian Hille
 
 
 
 
 
 -- 
 Muhammad Emad Mushtaq
 http://www.emadmushtaq.com/

Mit freundlichen Grüßen,
Julian Hille


---
NetImpact KG
Altonaer Straße 8
20357 Hamburg

Tel: 040 / 6738363 2
Mail: jul...@netimpact.de

Geschäftsführer: Tarek Müller

Re: implementing profanity detector

2010-02-12 Thread Mike Perham

On Thu, Feb 11, 2010 at 10:49 AM, Grant Ingersoll gsing...@apache.org wrote:

 Otherwise, I'd do it via copy fields.  Your first field is your main field 
 and is analyzed as before.  Your second field does the profanity detection 
 and simply outputs a single token at the end, safe/unsafe.

 How long are your documents?  The extra copy field is extra work, but in this 
 case it should be fast as you should be able to create a pretty streamlined 
 analyzer chain for the second task.


The documents are web page text, so they shouldn't be more than 10-20k
generally.  Would something like this do the trick?

  @Override
  public boolean incrementToken() throws IOException {
while (input.incrementToken()) {
  if (profanities.contains(termAtt.termBuffer(), 0, termAtt.termLength())) {
  termAtt.setTermBuffer(y, 0, 1);
  return false;
  }
}
termAtt.setTermBuffer(n, 0, 1);
return false;
  }

mike

Re: For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Yonik Seeley

On Fri, Feb 12, 2010 at 5:23 PM, Jay Hill jayallenh...@gmail.com wrote:
 If I've done a lot of research and have a very good idea of where my cache
 sizes are having monitored the stats right before commits, is there any
 reason why I wouldn't just set the initialSize and size counts to the same
 values? Is there any reason to set a smaller initialSize if I know reliably
 that where my limit will almost always be?

Probably not much...
The only savings will be the 8 bytes (on a 64 bit proc) per unused
array slot (in the HashMap).
Maybe we should consider removing the initialSize param from the
example config to reduce the amount of stuff a user needs to think
about.

-Yonik
http://www.lucidimagination.com

reloading sharedlib folder

2010-02-12 Thread Joe Calderon

when using solr.xml, you can specify a sharedlib directory to share
among cores, is it possible to reload the classes in this dir without
having to restart the servlet container? it would be useful to be able
to make changes to those classes on the fly or be able to drop in new
plugins

RE: For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Fuad Efendi

Funny, Arrays.copy() for HashMap... but something similar...

Anyway, I use same values for initial size and max size, to be safe... and
to have OOP at startup :) 



 -Original Message-
 From: Fuad Efendi [mailto:f...@efendi.ca]
 Sent: February-12-10 6:55 PM
 To: solr-user@lucene.apache.org; yo...@lucidimagination.com
 Subject: RE: For caches, any reason to not set initialSize and size to
 the same value?
 
 I always use initial size = max size,
 just to avoid Arrays.copyOf()...
 
 Initial (default) capacity for HashMap is 16, when it is not enough -
 array
 copy to new 32-element array, then to 64, ...
 - too much wasted space! (same for ConcurrentHashMap)
 
 Excuse me if I didn't understand the question...
 
 -Fuad
 http://www.tokenizer.ca
 
 
 
  -Original Message-
  From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
  Seeley
  Sent: February-12-10 6:30 PM
  To: solr-user@lucene.apache.org
  Subject: Re: For caches, any reason to not set initialSize and size to
  the same value?
 
  On Fri, Feb 12, 2010 at 5:23 PM, Jay Hill jayallenh...@gmail.com
  wrote:
   If I've done a lot of research and have a very good idea of where my
  cache
   sizes are having monitored the stats right before commits, is there
  any
   reason why I wouldn't just set the initialSize and size counts to
 the
  same
   values? Is there any reason to set a smaller initialSize if I know
  reliably
   that where my limit will almost always be?
 
  Probably not much...
  The only savings will be the 8 bytes (on a 64 bit proc) per unused
  array slot (in the HashMap).
  Maybe we should consider removing the initialSize param from the
  example config to reduce the amount of stuff a user needs to think
  about.
 
  -Yonik
  http://www.lucidimagination.com

Re: Deleting spelll checker index

2010-02-12 Thread darniz


Any update on this
Do you guys want to rephrase my question, if its not clear.

Thanks
darniz


darniz wrote:
 
 HI Guys 
 Opening this thread again.
 I need to get around this issue.
 i have a spellcheck field defined and i am copying two fileds make and
 model to this field
 copyField source=make dest=spellText/
 copyField source=model dest=spellText/
 i have buildoncommit and buildonoptimize set to true hence when i index
 data and try to search for a work accod i get back suggestion accord since
 model is also being copied.
 I stop the sorl server removed the copy filed for model. now i only copy
 make to the spellText field and started solr server. 
 i refreshed the dictiaonry by issuring the following command.
 spellcheck.build=truespellcheck.dictionary=default
 So i hope it should rebuild by dictionary, bu the strange thing is that it
 still gives a suggestion for accrd.
 I have to reindex data again and then it wont offer me suggestion which is
 the correct behavour.
 
 How can i create the dictionary again by changing my schema and issuing a
 command 
 spellcheck.build=truespellcheck.dictionary=default
 
 i cant afford to reindex data everytime.
 
 Any answer ASAP will be appreciated
 
 Thanks
 darniz
 
 
 
 
 
 
 
 
 
 darniz wrote:
 
 Then i assume the easiest way is to delete the directory itself.
 
 darniz
 
 
 hossman wrote:
 
 
 : We are using Index based spell checker.
 : i was wondering with the help of any url parameters can we delete the
 spell
 : check index directory.
 
 I don't think so.
 
 You might be able to configure two differnet spell check components that 
 point at the same directory -- one hat builds off of a real field, and
 one 
 that builds off of an (empty) text field (using FileBasedSpellChecker)
 .. 
 then you could trigger a rebuild of an empty spell checking index using 
 the second component.
 
 But i've never tried it so i have no idea if it would work.
 
 
 -Hoss
 
 
 
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27570613.html
Sent from the Solr - User mailing list archive at Nabble.com.

migrating from solr 1.3 to 1.4

2010-02-12 Thread Sachin Sebastian


Hi there,

   I'm trying to migrate from solr 1.3 to solr 1.4 and I've few 
issues. Initially my localsolr was throwing NullPointer exception and I 
fixed it by changing type of lat and lng to 'tdouble'. But now I'm not 
able to update index. When I try to update index it throws out error 
saying -


Feb 12, 2010 2:14:11 PM 
org.apache.solr.update.processor.LogUpdateProcessor finish

INFO: {} 0 0
Feb 12, 2010 2:14:11 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoSuchFieldError: log
at 
com.pjaol.search.solr.update.LocalUpdaterProcessor.processAdd(LocalUpdateProcessorFactory.java:138) 



I tried searching on net, but none of post regarding this issue is 
answered. Have anyone come across this issue?


Thanks,
Sachin.

cannot match on phrase queries

2010-02-12 Thread Kevin Osborn

I am seeing this in several of my fields. I have something like Samsung 
X150 or Nokia BH-212. And my query will not match on X150 or BH-212.

So, my query is something like +model:(Samsung X150). Through debugQuery, I see 
that this gets converted to +(model:samsung model:x 150). It 
matches on Samsung, but not X150. A simple query like model:BH-212 
simply fails. model:BH212 also fails. The only query that seems to work 
is model:(BH 212).

Here is the schema for that field:

fieldType name=text class=solr.TextField positionIncrementGap=100 
  analyzer type=index
tokenizer 
class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=index_synonyms.txt ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /

filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=1 /
filter 
class=solr.LowerCaseFilterFactory /
filter 
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
protected=protwords.txt /
filter 
class=solr.RemoveDuplicatesTokenFilterFactory /
  
/analyzer
  analyzer type=query
tokenizer 
class=solr.WhitespaceTokenizerFactory /
filter 
class=solr.SynonymFilterFactory synonyms=query_synonyms.txt 
ignoreCase=true expand=true /
filter 
class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt 
/
filter class=solr.WordDelimiterFilterFactory 
splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 
catenateWords=0 catenateNumbers=0 catenateAll=0 /

filter class=solr.LowerCaseFilterFactory /

filter 
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
/fieldType

field 
name=model type=text indexed=true stored=true omitNorms=true 
omitTermFreqAndPositions=true /

Any ideas? According to the analyzer, I would expect the phrase BH-212 to 
match on bh and 
212. Or am I missing something?

Also, is there anyway to tell the parser to not convert X150 into a phrase 
query. I have some cases when it would be more useful to turn it into +(X 150).

Re: Solr 1.4: Full import FileNotFoundException

2010-02-12 Thread Chris Hostetter


: I have noticed that when I run concurrent full-imports using DIH in Solr
: 1.4, the index ends up getting corrupted. I see the following in the log

I'm fairly confident that concurrent imports won't work -- but it 
shouldn't corrupt your index -- even if the DIH didn't actively check for 
this type of situation, the underlying Lucene LockFactory should ensure 
that one of the inports wins ... you'll need to tell us what kind of 
Filesystem you are using, and show us the relevent settings from your 
solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, 
etc...)

At worst you should get a lock time out exception.

: But I looked at:
: 
http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html
: 
: and was under the impression that this issue was fixed in Solr 1.4.

...right, attempting to run two concurrent imports with DIH should cause 
the second one to abort immediatley.




-Hoss

Re: Cannot get like exact searching to work

2010-02-12 Thread Chris Hostetter


:  Can your query consist of more than one words?
: 
: Yes, and I expect it almost always will (the query string is coming
: from a search box on a website).
...
: Actually it won't. The data I am indexing has extra spaces in front
: and is capitalized. I really need to be able to filter it through the
: lowercase and trim filter without tokenizing it.
...
:  The idea is that a phrase match would be boosted over the
:  normal
:  token matches and would show up first in the listing. Let

This is starting to smell like an XY Problem...
http://people.apache.org/~hossman/#xyproblem

...you mentioned wanting prefix type queries to work, but that seems to be 
based on your initial approach of using an exact (ie: untokenized) field 
for your matches -- all of your examples seem to want matching at a word 
level, not partial words.

If your ultimate goal is just that exact' matches score higher then 
documents containing all fo the same words in a differnet order (which 
should score higher then docs only containing a few of the words) then i 
think you are just making things harder for yourself then you really need 
... defType=dismax should be able to solve all of your problems -- just 
specify the field(s) you want to search in the qf and pf params and 
documents with all the words in a phrase will appear first.



-Hoss

Interesting stuff; Solr as a syslog store.

2010-02-12 Thread Antonio Lobato

Hey everyone, I don't actually have a question, but I just thought I'd 
share something really cool that I did with Solr for our company.


We run a good amount of servers, well into the several hundreds, and 
naturally we need a way to centralize all of the system logs.  For a 
while we used a commercial solution to centralize and search our logs, 
but they wanted to charge us tens of thousands of dollars for just one 
gigabyte/day more of indexed data.  So I said forget it, I'll write my 
own solution!


We already use Solr for some of our other backend searching systems, so 
I came up with an idea to index all of our logs to Solr.  I wrote a 
daemon in perl that listens on the syslog port, and pointed every single 
system's syslog to forward to this single server.  From there, this 
daemon will write to a Solr indexing server after parsing them into 
fields, such as date/time, host, program, pid, text, etc.  I then wrote 
a cool javascript/ajax web front end for Solr searching, and bam.  Real 
time searching of all of our syslogs from a web interface, for no cost!


Just thought this would be a neat story to share with you all.  I've 
really grown to love Solr, it's something else!


Thanks,
-Antonio

Re: sorting

2010-02-12 Thread Chris Hostetter


:str name=bftitle^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8/str
:str name=qftitle^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8/str

FWIW: I don't think you understand what the bf param is for ... it's not 
analogous to qf and pf, it's for expressing a list of boost functions -- a 
function can be a simple field name, but that typically only makes sense 
if it's numeric.

that *may* be causing your problem, if the function parser is attempting 
to generate the FieldCache for your content fields.

: now, solr is complaining about some sorting issues on content* as they

solr is complaining is relaly vauge... please explain *exactly* what the 
error message is, where you see it, what the full stack trace looks like 
if there is one, and what you did to trigger te error (ie: did it happen 
on startup?  did it happen when you executed a query? what was the full 
URL of hte query?



-Hoss

Re: sorting

2010-02-12 Thread Chris Hostetter


: that *may* be causing your problem, if the function parser is attempting 
: to generate the FieldCache for your content fields.

Yep ... that's it ... if you use a barefield name as a function, and that 
field name is not numeric, the result is an OrdFieldSource shiceh uses the 
FieldCache.

I opened a bug to improve the error message...

https://issues.apache.org/jira/browse/SOLR-1771


-Hoss

RE: expire/delete documents

2010-02-12 Thread Fuad Efendi

 or since you specificly asked about delteing anything older
 then X days (in this example i'm assuming x=7)...
 
   deletequerycreateTime:[NOW-7DAYS TO *]/query/delete

createTime:[* TO NOW-7DAYS]

Re: How to reindex data without restarting server

2010-02-12 Thread Chris Hostetter


: if you use the core model via solr.xml you can reload a core without having to
: to restart the servlet container,
: http://wiki.apache.org/solr/CoreAdmin

For making a schema change, the steps would be:
  - create a new_core with the new schema
  - reindex all the docs into new_core
  - SWAP old_core and new_core so all the old URLs now point at the 
new core with the new schema.

-Hoss

Re: Deleting spelll checker index

2010-02-12 Thread Chris Hostetter


: Any update on this

Patience my friend ... 5 hours after you send an email isn't long enough 
to wait before asking for any update on this -- it's just increasing the 
volume of mail everyone gets and distracting people from actual 
bugs/issues.

FWIW: this doesn't really seem directly related to the thread you
initially started about Deleting the spell checker index -- what you're
asking about now is rebuilding the spellchecker index...

:  I stop the sorl server removed the copy filed for model. now i only copy
:  make to the spellText field and started solr server.
:  i refreshed the dictiaonry by issuring the following command.
:  spellcheck.build=truespellcheck.dictionary=default
:  So i hope it should rebuild by dictionary, bu the strange thing is that it
:  still gives a suggestion for accrd.

that's because removing the copyField declaration doens't change anything
about the values that have already been copied to the spellText field
-- rebuilding your spellcheker index is just re-reading the same
indexed values from that field.

:  How can i create the dictionary again by changing my schema and issuing a
:  command 
:  spellcheck.build=truespellcheck.dictionary=default

it's just not possible.  a schema change like that doesn't magicly 
undo all of the values that were already copied.



-Hoss

Re: cannot match on phrase queries

2010-02-12 Thread Kevin Osborn

It appears that 
omitTermFreqAndPositions is indeed the culprit. I assume it has to do with the 
fact that the index parsing of BH-212 puts multiple terms in the same position.

From: Kevin Osborn osbo...@yahoo.com
To: Solr solr-user@lucene.apache.org
Sent: Fri, February 12, 2010 5:28:08 PM
Subject: cannot match on phrase queries

I am seeing this in several of my fields. I have something like Samsung 
X150 or Nokia BH-212. And my query will not match on X150 or BH-212.

So, my query is something like +model:(Samsung X150). Through debugQuery, I see 
that this gets converted to +(model:samsung model:x 150). It 
matches on Samsung, but not X150. A simple query like model:BH-212 
simply fails. model:BH212 also fails. The only query that seems to work 
is model:(BH 212).

Here is the schema for that field:

fieldType name=text class=solr.TextField positionIncrementGap=100 
  analyzer type=index
tokenizer 
class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=index_synonyms.txt ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /

filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=1 /
filter 
class=solr.LowerCaseFilterFactory /
filter 
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
protected=protwords.txt /
filter 
class=solr.RemoveDuplicatesTokenFilterFactory /

/analyzer
  analyzer type=query
tokenizer 
class=solr.WhitespaceTokenizerFactory /
filter 
class=solr.SynonymFilterFactory synonyms=query_synonyms.txt 
ignoreCase=true expand=true /
filter 
class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt 
/
filter class=solr.WordDelimiterFilterFactory 
splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 
catenateWords=0 catenateNumbers=0 catenateAll=0 /

filter class=solr.LowerCaseFilterFactory /

filter 
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
/fieldType

field 
name=model type=text indexed=true stored=true omitNorms=true 
omitTermFreqAndPositions=true /

Any ideas? According to the analyzer, I would expect the phrase BH-212 to 
match on bh and 
212. Or am I missing something?

Also, is there anyway to tell the parser to not convert X150 into a phrase 
query. I have some cases when it would be more useful to turn it into +(X 150).

Re: Solr 1.4: Full import FileNotFoundException

2010-02-12 Thread Noble Paul നോബിള്‍ नोब्ळ्

concurrent imports are not allowed in DIH, unless u setup multiple DIH instances

On Sat, Feb 13, 2010 at 7:05 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : I have noticed that when I run concurrent full-imports using DIH in Solr
 : 1.4, the index ends up getting corrupted. I see the following in the log

 I'm fairly confident that concurrent imports won't work -- but it
 shouldn't corrupt your index -- even if the DIH didn't actively check for
 this type of situation, the underlying Lucene LockFactory should ensure
 that one of the inports wins ... you'll need to tell us what kind of
 Filesystem you are using, and show us the relevent settings from your
 solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH,
 etc...)

 At worst you should get a lock time out exception.

 : But I looked at:
 : 
 http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html
 :
 : and was under the impression that this issue was fixed in Solr 1.4.

 ...right, attempting to run two concurrent imports with DIH should cause
 the second one to abort immediatley.




 -Hoss





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Solr 1.4: Full import FileNotFoundException

2010-02-12 Thread Chris Hostetter


: concurrent imports are not allowed in DIH, unless u setup multiple DIH 
instances

Right, but that's not the issue -- the question is wether attemping 
to do so might be causing index corruption (either because of a bug or 
because of some possibly really odd config we currently know nothing about)


:  : I have noticed that when I run concurrent full-imports using DIH in Solr
:  : 1.4, the index ends up getting corrupted. I see the following in the log
: 
:  I'm fairly confident that concurrent imports won't work -- but it
:  shouldn't corrupt your index -- even if the DIH didn't actively check for
:  this type of situation, the underlying Lucene LockFactory should ensure
:  that one of the inports wins ... you'll need to tell us what kind of
:  Filesystem you are using, and show us the relevent settings from your
:  solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH,
:  etc...)
: 
:  At worst you should get a lock time out exception.
: 
:  : But I looked at:
:  : 
http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html
:  :
:  : and was under the impression that this issue was fixed in Solr 1.4.
: 
:  ...right, attempting to run two concurrent imports with DIH should cause
:  the second one to abort immediatley.
: 
: 
: 
: 
:  -Hoss
: 
: 
: 
: 
: 
: -- 
: -
: Noble Paul | Systems Architect| AOL | http://aol.com
: 



-Hoss

parsing strings into phrase queries

2010-02-12 Thread Kevin Osborn

Right now if I have the query model:(Nokia BH-212V), the parser turns this into 
+(model:nokia model:bh 212 v). The problem is that I might have a model 
called Nokia BH-212, so this is completely missed. In my case, I would like my 
query to be +(model:nokia model:bh model:212 model:v).

This is my schema for the field:

fieldType name=text class=solr.TextField positionIncrementGap=100 
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=1 /
filter class=solr.LowerCaseFilterFactory /
filter 
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory / 
filter class=solr.SynonymFilterFactory synonyms=query_synonyms.txt 
ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt / 
filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 / 
filter class=solr.LowerCaseFilterFactory / 
filter 
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory / 
  /analyzer
/fieldType

Re: Interesting stuff; Solr as a syslog store.

2010-02-12 Thread Olivier Dobberkau


Am 13.02.2010 um 03:02 schrieb Antonio Lobato:

 Just thought this would be a neat story to share with you all.  I've really 
 grown to love Solr, it's something else!

Hi Antonio,

Great.

Would you also share the source code somewhere! 
May the Source be with you. 

Thanks.

Olivier

57 matches

Mail list logo