Re: Configure Collection Distribution in Solr 1.3

2009-06-12 Thread Aleksander M. Stensby
As some people have mentioned here on this mailing lists, the solr 1.3  
distribution scripts (snappuller / shooter) etc do not work on windows.  
Some have indicated that it might be possible to use cygwin but I have  
doubts. So unfortunately, windows users suffers with regard to replication  
(although I would reccommend everyone to use Unix for running servers;) )


That being said, you can use Solr 1.4 (one of the nightly builds) where  
you get built-in replication that is easily configured through the solr  
server configuration, and this works on Windows aswell!


So, if you don't have any real reason to not upgrade, I suggest that you  
try out Solr 1.4 (which also gives lots of new features and major  
improvements!)


Cheers,
 Aleksander


On Tue, 09 Jun 2009 21:00:27 +0200, MaheshR mahesh.ray...@gmail.com  
wrote:




Hi Aleksander ,


I gone thorugh the below links and successfully configured rsync using
cygwin on windows xp. In Solr documentation they mentioned many script  
files

like rysnc-enable, snapshooter..etc. These all UNIX based  files scripts.
where do I get these script files for windows OS ?

Any help on this would be great helpful.

Thanks
MaheshR.



Aleksander M. Stensby wrote:


You'll find everything you need in the Wiki.
http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline

http://wiki.apache.org/solr/SolrCollectionDistributionScripts

If things are still uncertain I've written a guide for when we used the
solr distribution scrips on our lucene index earlier. You can read that
guide here:
http://www.integrasco.no/index.php?option=com_contentview=articleid=51:lucene-index-replicationcatid=35:blogItemid=53

Cheers,
  Aleksander


On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR mahesh.ray...@gmail.com
wrote:



Hi,

we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet
container.
Its working great. Now I need to configure collection Distribution to
replicate indexing data between master and 2 slaves. Please provide me
step
by step instructions to configure collection distribution between  
master

and
slaves would be helpful.

Thanks in advance.

Thanks
Mahesh.




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this  
e-mail









--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Query on date fields

2009-06-12 Thread Aleksander M. Stensby

Hello,
for this you can simply use the nifty date functions supplied by SOLR  
(given that you have indexed your fields with the solr Date field.


If I understand you correctly, you can achieve what you want with the  
following union query:


displayStartDate:[* TO NOW] AND displayEndDate:[NOW TO *]

Cheers,
 Aleksander



On Mon, 08 Jun 2009 09:17:26 +0200, prerna07 pkhandelw...@sapient.com  
wrote:





Hi,

I have two date attributes in my Indexes:

DisplayStartDate_dt
DisplayEndDate_dt

I need to fetch results where today's date lies between displayStartDate  
and

dislayEndDate.

However i cannot send hardcoded displayStartdate and displayEndDate date  
in

query as there are 1000 different dates in indexes

Please suggest the query.

Thanks,
Prerna








--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


highlighting on edgeGramTokenized field -- hightlighting incorrect bc. position not incremented..

2009-06-12 Thread Britske

Hi, 

I'm trying to highlight based on a (multivalued) field (prefix2) that has
(among other things) a EdgeNGramFilterFactory defined. 
highlighting doesn't increment the start-position of the highlighted
portion, so in other words the highlighted portion is always the beginning
of the field. 




for example: 
for prefix2: Orlando Verenigde Staten
the query:
http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,idq=prefix2:%22ver%22wt=xmlhl=truehl.fl=prefix2

returns: 
emOrl/emando Verenigde Staten
while it should be: 
Orlando emVer/emenigde Staten

the field def: 

fieldType name=prefix_token class=solr.TextField
positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

I checked that removing the EdgeNGramFilterFactory results in correct
positioning of  highlighting. (But then I can't search for ngrams...) 

What am I missing? 
Thanks in advance, 
Britske



-- 
View this message in context: 
http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: change data dir location

2009-06-12 Thread sandeeptagore

- It should be possible to specify dataDir directly for a core in solr.xml
(over and above specifying it as a variable). It should also be possible to
pass the dataDir as a request parameter while creating a core through the
REST API.

- A simple scenario which requires this feature is when the location of the
data directory depends on runtime parameters (such as free disk space or
number of directories inside a directory).

- You could accomplish this by using symlinks if u r running Solr under
UNIX


-- 
View this message in context: 
http://www.nabble.com/change-data-dir-location-tp23992946p23996245.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Search Phrase Wildcard?

2009-06-12 Thread Sandeep Tagore

Yes...!! you can search for phrases with wild cards.
You dont have a direct support for it.. but u can achieve like the
following...

User input:  Solr we
Query should be: (name:Solr AND (name:we* OR name:we)) OR name:Solr we

The query builder parses the original input and builds one that simulates a
wildcard phrase query. It looks for all the words the user entered and adds
a wildcard (*) to the last word. It also searches for the whole phrase the
user entered using a phrase query in case the whole phrase is found in the
index. This should work!

let me know if you have any issues...
-- 
View this message in context: 
http://www.nabble.com/Search-Phrase-Wildcard--tp23978330p23996409.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Getting details from delete

2009-06-12 Thread Sandeep Tagore

Anything sent with delete query will be deleted. It doesnt give u the details
of the deleted records.
For example, if u send a command like
deleteid20070424150841/id/delete it will delete the record with id
20070424150841 but not give u the record details if it is already deleted. 
We need to send some query to solr like 
http://localhost:8080/solr/delete?id:20070424150841name:deleted_record;. 
But I dont think that we have this option now.
-- 
View this message in context: 
http://www.nabble.com/Getting-details-from-%3Cdelete%3E-tp23982798p23996672.html
Sent from the Solr - User mailing list archive at Nabble.com.



Custom Request handler Error:

2009-06-12 Thread Noor

hi,
 i am new to apache solr.
I need to create a custom request handler class. So i create a new one 
and changed the solr-config.xml file as,

  requestHandler name=/select class=solr.my.MyCustomHandler
   lst name=defaults
   str name=echoParamsexplicit/str
   str name=qtandem/str
   str name=debugQuerytrue/str
   /lst
   /requestHandler

And in my java class, the code is,

public class MyCustomHandler extends RequestHandlerBase{
  public CoreContainer coreContainer;
  public void handleRequestBody(SolrQueryRequest request, 
SolrQueryResponse response) throws Exception {

   SolrCore coreToRequest = coreContainer.getCore(core2);
   ModifiableSolrParams params = new ModifiableSolrParams();
   params.set(echoParams, explicit);
   params.set(q, text);
   params.set(debugQuery, true);
request = new LocalSolrQueryRequest(coreToRequest, params);
//  
   SolrRequestHandler reqHandler = 
coreToRequest.getRequestHandler(/select);

   coreToRequest.execute(reqHandler, request, response);
   coreToRequest.close();
   request.close();
   }
 // the abstract methods - getDescription(), getSourceId(), 
getSource(), getVersion() are //overrided... but these methods doesn't 
have any implementations.

}


But, if i search any text in my webapp from browser, gots the HTTP 500 
error.

i dont know how SolrContainer is intialized
Pls anyone give me the solution...

thanks and regards,
Mohamed


Re: DataImportHandler backwards compatibility

2009-06-12 Thread Kevin Lloyd
Thanks for the info. Just FYI, I've decided to retrofit the 1.3  
DataImportHandler with the JDBC driver params functionality to get us  
around the OOM error problem with as few changes as possible.


kevin


On 11 Jun 2009, at 14:42, Shalin Shekhar Mangar wrote:


On Thu, Jun 11, 2009 at 6:42 PM, Kevin Lloyd kll...@lulu.com wrote:



I'm in the process of implementing a DataImportHandler config for  
Solr 1.3
and I've hit across the Postgresql/JDBC Out Of Memory problem.  
Whilst the

solution is documented on the wiki FAQ page:

http://wiki.apache.org/solr/DataImportHandlerFaq

it appears that the JDBC driver parameters were implemented in
DataImportHandler post the 1.3 release.



Yes, those parameters are new in 1.4 (we should note that on the  
wiki).



I was wondering if it would be safe to take a nightly build of just  
the

DataImportHandler contrib and run it against a Solr 1.3 installation?



Solr 1.4 has a rollback command which 1.3 did not have. So, you'd  
need to
hack the DataImportHandler code to remove references to  
RollBackCommand. You

can use the 1.4 dih jar with 1.3 if you comment out the code in
SolrWriter.rollback method, remove the import of  
RollbackUpdateCommand and

recompile.

--
Regards,
Shalin Shekhar Mangar.




Re: DataImportHandler backwards compatibility

2009-06-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
you can just drop in the new JdbcDataSource.java into the 1.3 release
(and build it) and it should be just fine.



On Fri, Jun 12, 2009 at 5:55 PM, Kevin Lloydkll...@lulu.com wrote:
 Thanks for the info. Just FYI, I've decided to retrofit the 1.3
 DataImportHandler with the JDBC driver params functionality to get us around
 the OOM error problem with as few changes as possible.

 kevin


 On 11 Jun 2009, at 14:42, Shalin Shekhar Mangar wrote:

 On Thu, Jun 11, 2009 at 6:42 PM, Kevin Lloyd kll...@lulu.com wrote:


 I'm in the process of implementing a DataImportHandler config for Solr
 1.3
 and I've hit across the Postgresql/JDBC Out Of Memory problem. Whilst the
 solution is documented on the wiki FAQ page:

 http://wiki.apache.org/solr/DataImportHandlerFaq

 it appears that the JDBC driver parameters were implemented in
 DataImportHandler post the 1.3 release.


 Yes, those parameters are new in 1.4 (we should note that on the wiki).


 I was wondering if it would be safe to take a nightly build of just the
 DataImportHandler contrib and run it against a Solr 1.3 installation?


 Solr 1.4 has a rollback command which 1.3 did not have. So, you'd need to
 hack the DataImportHandler code to remove references to RollBackCommand.
 You
 can use the 1.4 dih jar with 1.3 if you comment out the code in
 SolrWriter.rollback method, remove the import of RollbackUpdateCommand and
 recompile.

 --
 Regards,
 Shalin Shekhar Mangar.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Custom Request handler Error:

2009-06-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
is there any error on the console?

On Fri, Jun 12, 2009 at 4:26 PM, Noornoo...@opentechindia.com wrote:
 hi,
  i am new to apache solr.
 I need to create a custom request handler class. So i create a new one and
 changed the solr-config.xml file as,
  requestHandler name=/select class=solr.my.MyCustomHandler
       lst name=defaults
           str name=echoParamsexplicit/str
           str name=qtandem/str
           str name=debugQuerytrue/str
       /lst
   /requestHandler

 And in my java class, the code is,

 public class MyCustomHandler extends RequestHandlerBase{
  public CoreContainer coreContainer;
  public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse
 response) throws Exception {
       SolrCore coreToRequest = coreContainer.getCore(core2);
       ModifiableSolrParams params = new ModifiableSolrParams();
       params.set(echoParams, explicit);
       params.set(q, text);
       params.set(debugQuery, true);
 request = new LocalSolrQueryRequest(coreToRequest, params);
 //             SolrRequestHandler reqHandler =
 coreToRequest.getRequestHandler(/select);
       coreToRequest.execute(reqHandler, request, response);
       coreToRequest.close();
       request.close();
   }
  // the abstract methods - getDescription(), getSourceId(), getSource(),
 getVersion() are //overrided... but these methods doesn't have any
 implementations.
 }


 But, if i search any text in my webapp from browser, gots the HTTP 500
 error.
 i dont know how SolrContainer is intialized
 Pls anyone give me the solution...

 thanks and regards,
 Mohamed




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Custom Request handler Error:

2009-06-12 Thread noor

Yes,

Nullpointer Exception. on the line

SolrCore coreToRequest = coreContainer.getCore(core2);


Noble Paul ??? ?? wrote:

is there any error on the console?

On Fri, Jun 12, 2009 at 4:26 PM, Noornoo...@opentechindia.com wrote:
  

hi,
 i am new to apache solr.
I need to create a custom request handler class. So i create a new one and
changed the solr-config.xml file as,
 requestHandler name=/select class=solr.my.MyCustomHandler
  lst name=defaults
  str name=echoParamsexplicit/str
  str name=qtandem/str
  str name=debugQuerytrue/str
  /lst
  /requestHandler

And in my java class, the code is,

public class MyCustomHandler extends RequestHandlerBase{
 public CoreContainer coreContainer;
 public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse
response) throws Exception {
  SolrCore coreToRequest = coreContainer.getCore(core2);
  ModifiableSolrParams params = new ModifiableSolrParams();
  params.set(echoParams, explicit);
  params.set(q, text);
  params.set(debugQuery, true);
request = new LocalSolrQueryRequest(coreToRequest, params);
// SolrRequestHandler reqHandler =
coreToRequest.getRequestHandler(/select);
  coreToRequest.execute(reqHandler, request, response);
  coreToRequest.close();
  request.close();
  }
 // the abstract methods - getDescription(), getSourceId(), getSource(),
getVersion() are //overrided... but these methods doesn't have any
implementations.
}


But, if i search any text in my webapp from browser, gots the HTTP 500
error.
i dont know how SolrContainer is intialized
Pls anyone give me the solution...

thanks and regards,
Mohamed






  




Identification of matching by field

2009-06-12 Thread hpn1975 nasc
Hi,

  Is possible to identify docId of document where occurred matching in
specific Term or QueryTerm ?

   For example: I have a document with some fields and my query possesss the
Query for each field. I need to know the docIds when the QueryTermX finds
value. I know that I can verify if matching in the method below, but I think
that not will performatic.

Searcher searcher = new IndexSearcher(indexReader);
   final BitSet bits = new BitSet(indexReader.maxDoc());
   searcher.search(query, new HitCollector() {
   public void collect(int doc, float score) {
*   if (reader.doc(doc).getField(Name).equals(search_word)){*
*  bits.set(doc);*
*   }
*   }
 });
  Thanks


Re: Faceting on text fields

2009-06-12 Thread Stanislaw Osinski
Hi,

Sorry for being late to the party, let me try to clear some doubts about
Carrot2.

Do you know under what circumstances or application should we cluster the
 whole corpus of documents vs just the search results?


I think it depends on what you're trying to achieve. If you'd like to give
the users some alternative way of exploring the search results by organizing
them into semantically related groups (search results clustering), Carrot2
would be the appropriate tool. Its algorithms are designed to work with
small input (up to ~1000 results) and try to provide meaningful labels for
each cluster. Currently, Carrot2 has two algorithms: an implementation of
Suffix Tree Clustering (STC, a classic in search results clustering
research, designed by O. Zamir, implemented by Dawid Weiss) and Lingo
(designed and implemented by myself). STC is very fast compared to Lingo,
but the latter will usually get you better clusters. Some comparison of the
algorithms is here: http://project.carrot2.org/algorithms.html, but
ultimately, I'd encourage you to experiment (e.g. using Clustering
Workbench). For best results, I'd recommend feeding the algorithms with
contextual snippets generated based on the user's query. If the summary
could consist of complete sentence(s) containing the query (as opposed to
individual words delimited by ...), you should be getting even nicer
labels.

One important thing for search results clustering is that it is done
on-line, so it will add extra time to each search query your server handles.
Plus, to get reasonable clusters, you'd need to fetch at least 50 documents
from your index, which may put more load on the disks as well (sometimes
clustering time may be only be a fraction of the time required to get the
documents from the index).

Finally, to compare search results clustering with facets: UI-wise they may
look similar, but I'd say they're two different things that complement each
other. While the list of facets and their values is fairly static (brand
names etc.), clusters are less stable -- they're generated dynamically for
each search and will vary across queries. Plus, as for any other
unsupervised machine learning technique, your clusters will never be 100%
correct (as opposed to facets). Almost always you'll be getting one or two
clusters that don't make much sense.

When it comes to clustering the whole collection, it might be useful in a
couple of scenarios: a) if you wanted to get some high level overview of
what's in your collection, b) if you'd wanted to e.g. use clusters to
re-rank the search results presented to the user (implicit clustering:
showing a few documents from each cluster), c) if you wanted to distribute
your index based on the semantics of the documents (wild guess, I'm not sure
if anyone tried that in practice). In general, I feel clustering the whole
index is much harder than search results clustering not only because of the
different scale, but also because you'd need to tune the algorithm for your
specific needs and data. For example, in scenario a) and a collection of 1M
documents: how many top level clusters do you generate? 10? 1? If it's
10, the clusters may end up too general / meaningless, it might be hard to
describe them concisely. If it's 1, clusters are likely to be more
focused, but hard to browse... I must admit I haven't followed Mahout too
closely, maybe there is some nice way of resolving these problems.

If you have any other questions about Carrot2, I'll try to answer them here.
Alternatively, feel free to join Carrot2 mailing lists.

Thanks,

Staszek

--
http://www.carrot2.org


Re: fq vs. q

2009-06-12 Thread Michael Ludwig

Michael Ludwig schrieb:

Martin Davidsson schrieb:

I've tried to read up on how to decide, when writing a query, what
criteria goes in the q parameter and what goes in the fq parameter,
to achieve optimal performance. Is there [...] some kind of rule of
thumb to help me decide how to split things up when querying against
one or more fields.


This is a good question. I don't know if there is any such rule. I'm
going to sum up my understanding of filter queries hoping that the
pros will point out any flaws in my assumptions.


I've summarized what I've learnt about filter queries on this page:

http://wiki.apache.org/solr/FilterQueryGuidance

Michael Ludwig


Re: Configure Collection Distribution in Solr 1.3

2009-06-12 Thread MaheshR

Thank you very much. I will try using solr nightly build.

Thanks
Mahesh R


Aleksander M. Stensby wrote:
 
 As some people have mentioned here on this mailing lists, the solr 1.3  
 distribution scripts (snappuller / shooter) etc do not work on windows.  
 Some have indicated that it might be possible to use cygwin but I have  
 doubts. So unfortunately, windows users suffers with regard to replication  
 (although I would reccommend everyone to use Unix for running servers;) )
 
 That being said, you can use Solr 1.4 (one of the nightly builds) where  
 you get built-in replication that is easily configured through the solr  
 server configuration, and this works on Windows aswell!
 
 So, if you don't have any real reason to not upgrade, I suggest that you  
 try out Solr 1.4 (which also gives lots of new features and major  
 improvements!)
 
 Cheers,
   Aleksander
 
 
 On Tue, 09 Jun 2009 21:00:27 +0200, MaheshR mahesh.ray...@gmail.com  
 wrote:
 

 Hi Aleksander ,


 I gone thorugh the below links and successfully configured rsync using
 cygwin on windows xp. In Solr documentation they mentioned many script  
 files
 like rysnc-enable, snapshooter..etc. These all UNIX based  files scripts.
 where do I get these script files for windows OS ?

 Any help on this would be great helpful.

 Thanks
 MaheshR.



 Aleksander M. Stensby wrote:

 You'll find everything you need in the Wiki.
 http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline

 http://wiki.apache.org/solr/SolrCollectionDistributionScripts

 If things are still uncertain I've written a guide for when we used the
 solr distribution scrips on our lucene index earlier. You can read that
 guide here:
 http://www.integrasco.no/index.php?option=com_contentview=articleid=51:lucene-index-replicationcatid=35:blogItemid=53

 Cheers,
   Aleksander


 On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR mahesh.ray...@gmail.com
 wrote:


 Hi,

 we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet
 container.
 Its working great. Now I need to configure collection Distribution to
 replicate indexing data between master and 2 slaves. Please provide me
 step
 by step instructions to configure collection distribution between  
 master
 and
 slaves would be helpful.

 Thanks in advance.

 Thanks
 Mahesh.



 --
 Aleksander M. Stensby
 Lead software developer and system architect
 Integrasco A/S
 www.integrasco.no
 http://twitter.com/Integrasco

 Please consider the environment before printing all or any of this  
 e-mail



 
 
 
 -- 
 Aleksander M. Stensby
 Lead software developer and system architect
 Integrasco A/S
 www.integrasco.no
 http://twitter.com/Integrasco
 
 Please consider the environment before printing all or any of this e-mail
 
 

-- 
View this message in context: 
http://www.nabble.com/Configure-Collection-Distribution-in-Solr-1.3-tp23927332p23999342.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Custom Request handler Error:

2009-06-12 Thread noor

I solved this NullPointerException, by the following changes.

In java code:
public void handleRequestBody(SolrQueryRequest request, 
SolrQueryResponse response) throws Exception {

SolrCore coreToRequest = request.getCore();//coreContainer.getCore(core2);
.
}

and in solr-config.xml:
requestHandler name=/select class=solr.my.MyCustomHandler
lst name=defaults
str name=echoParamsexplicit/str
str name=qtandem/str
str name=debugQuerytrue/str
/lst
/requestHandler

Now, my webapp runs fine by,
http://localhost:8983/mysearch
searching also working fine.
But, these are not run through my custom handler. So i felt, it wrongly 
doing searching.

Because, in solr admin statistics page,
my custom handler under QueryHandler's request count remains 0, it 
doesn't get incremented, when i search something. Rather, 
statndardReqHandler's request count is incremented.


And another thing, how do we debug solr. ???
Please anybody help me to solve this ...

Thanks in advance.

Noble Paul ??? ?? wrote:

is there any error on the console?

On Fri, Jun 12, 2009 at 4:26 PM, Noornoo...@opentechindia.com wrote:
  

hi,
 i am new to apache solr.
I need to create a custom request handler class. So i create a new one and
changed the solr-config.xml file as,
 requestHandler name=/select class=solr.my.MyCustomHandler
  lst name=defaults
  str name=echoParamsexplicit/str
  str name=qtandem/str
  str name=debugQuerytrue/str
  /lst
  /requestHandler

And in my java class, the code is,

public class MyCustomHandler extends RequestHandlerBase{
 public CoreContainer coreContainer;
 public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse
response) throws Exception {
  SolrCore coreToRequest = coreContainer.getCore(core2);
  ModifiableSolrParams params = new ModifiableSolrParams();
  params.set(echoParams, explicit);
  params.set(q, text);
  params.set(debugQuery, true);
request = new LocalSolrQueryRequest(coreToRequest, params);
// SolrRequestHandler reqHandler =
coreToRequest.getRequestHandler(/select);
  coreToRequest.execute(reqHandler, request, response);
  coreToRequest.close();
  request.close();
  }
 // the abstract methods - getDescription(), getSourceId(), getSource(),
getVersion() are //overrided... but these methods doesn't have any
implementations.
}


But, if i search any text in my webapp from browser, gots the HTTP 500
error.
i dont know how SolrContainer is intialized
Pls anyone give me the solution...

thanks and regards,
Mohamed






  




Re: Getting details from delete

2009-06-12 Thread Yonik Seeley
On Thu, Jun 11, 2009 at 10:46 AM, Jacob Elderjel...@locamoda.com wrote:
 Is there any way to get the number of deleted records from a delete request?

Nope.  I avoided adding it initially because I thought it might get
difficult to calculate that data i the future.
That's now come true - Lucene now handles the delete and buffers it
until later even.  So it's not really possible to get the number at
the time you send in the delete.

-Yonik
http://www.lucidimagination.com


Re: fq vs. q

2009-06-12 Thread Shalin Shekhar Mangar
On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com wrote:

 I've summarized what I've learnt about filter queries on this page:

 http://wiki.apache.org/solr/FilterQueryGuidance


Wow! This is great! Thanks for taking the time to write this up Michael.

I've added a section on analysis, scoring and faceting aspects.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Custom Request handler Error:

2009-06-12 Thread Shalin Shekhar Mangar
On Fri, Jun 12, 2009 at 8:07 PM, noor noo...@opentechindia.com wrote:


 requestHandler name=/select class=solr.my.MyCustomHandler
 lst name=defaults
 str name=echoParamsexplicit/str
 str name=qtandem/str
 str name=debugQuerytrue/str
 /lst
 /requestHandler

 Now, my webapp runs fine by,
 http://localhost:8983/mysearch
 searching also working fine.
 But, these are not run through my custom handler.


Specify the full package to your handler class. Packages starting with
solr are loaded in a special way.

-- 
Regards,
Shalin Shekhar Mangar.


Stats for all documents and not current search

2009-06-12 Thread Vincent Pérès

Hello,

I need to retrieve the stats of my index (using StatsComponent). It's not a
problem when my query is empty, but the stats are update according the
current search... and I need the stats of the whole index everytime.
I'm currently doing two request (one with empty keyword to get the stats,
one to get the results). Any idea which could save me one request?

Thanks !
Vincent
-- 
View this message in context: 
http://www.nabble.com/Stats-for-all-documents-and-not-current-search-tp24001883p24001883.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: highlighting on edgeGramTokenized field -- hightlighting incorrect bc. position not incremented..

2009-06-12 Thread Otis Gospodnetic

Britske,

I'd have to dig, but there are a couple of JIRA issues in Lucene's JIRA (the 
actual ngram code is part of Lucene) that have to do with ngram positions.  I 
have a feeling that may be the problem.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Britske gbr...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Friday, June 12, 2009 6:15:36 AM
 Subject: highlighting on edgeGramTokenized field -- hightlighting incorrect 
 bc. position not incremented..
 
 
 Hi, 
 
 I'm trying to highlight based on a (multivalued) field (prefix2) that has
 (among other things) a EdgeNGramFilterFactory defined. 
 highlighting doesn't increment the start-position of the highlighted
 portion, so in other words the highlighted portion is always the beginning
 of the field. 
 
 
 
 
 for example: 
 for prefix2: Orlando Verenigde Staten
 the query:
 http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,idq=prefix2:%22ver%22wt=xmlhl=truehl.fl=prefix2
 
 returns: 
 Orlando Verenigde Staten
 while it should be: 
 Orlando Verenigde Staten
 
 the field def: 
 
 
 positionIncrementGap=1
   
 
 
 
 maxGramSize=20/
   
   
 
 
   
 
 
 I checked that removing the EdgeNGramFilterFactory results in correct
 positioning of  highlighting. (But then I can't search for ngrams...) 
 
 What am I missing? 
 Thanks in advance, 
 Britske
 
 
 
 -- 
 View this message in context: 
 http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Efficient Sharding with date sorted queries

2009-06-12 Thread Garafola Timothy
I have a solr index which is going to grow 3x in the near future.  I'm
considering using distributed search and was contemplating what would
be the best approach to splitting the index.  Since most of the
searches performed on the index are sorted by date descending, I'm
considering splitting the index based on the created date of the
documents.

From Yonik Seeley's blog post,
http://yonik.wordpress.com/2008/02/27/distributed-search-for-solr/,
I've read that there are two phases to sharding.  The first phase
collects matching ids and documents across the shards.  Then the
second phase collects the stored fields for the documents.  I'm
assuming that this second phase's execution is limited by the number
of rows requested and the number of results.

So let's say I have 2 shards.  The first shard has docs with creation
dates of this year.  The Second shard contains documents from the
previous year.  I run a solr query requesting 10 rows sorted by date
and get 11 from the first shard and 3 from the second.  Will the
initial query only execute the first phase on the second shard?  If
so, that should result in more optimum performance, right?


Thanks,
-Tim


Strange missing docs when reindexing with threads.

2009-06-12 Thread Alexander Wallace

Hi all!

I'm using Solr 1.3 and currently testing reindexing...

In my client app, i am sending 17494 requests to add documents...  In 3 
different scenarios:


a) not using threads
b) using 1 thread
c) using 2 threads

In scenario a), everything seems to work fine... In my client log, is 
see 17494 requests sent to solr, in solr's log, I see the same number of 
'add' requests received, and If i search the index, i can see the same 
amount of documents.


However, if I use 1 thread, I see the right amount of requests in logs, 
but I only find 15k or so documents (this varies a bit every time i run 
this scenario).


It gets way worse if I use 2 threads... I can see the right amount of 
requests in both logs, but i end up with ~ 600 docs in the index!


In all scenarios, I don't see any errors on the logs...

As you can imagine, I need to be able to use multiple threads to speed 
up the process... It is also very concertning that I don't get any 
errors anywhere...


Looking at solr's admin stats, I see also 17494 cumulative adds, but 
only a tiny fraction of actual documents can be found...


Any clues?

BTW, these indexers work fine if I use lucene straight...

Thanks in advance for all your help!


Stable release, trunk release - same Tomcat instance

2009-06-12 Thread Jeff Rodenburg
If I want to run the stable 1.3 release and the nightly build under the same
Tomcat instance, should that be configured as multiple solr applications, or
is there a different configuration to follow?


Re: Efficient Sharding with date sorted queries

2009-06-12 Thread Shalin Shekhar Mangar
On Fri, Jun 12, 2009 at 10:28 PM, Garafola Timothy timgaraf...@gmail.comwrote:


 So let's say I have 2 shards.  The first shard has docs with creation
 dates of this year.  The Second shard contains documents from the
 previous year.  I run a solr query requesting 10 rows sorted by date
 and get 11 from the first shard and 3 from the second.


No, you cannot request specific number of results from a shard. That is
something that Solr will manage itself. It requests start+rows number of
documents from each shard to find the rows number of documents to be
returned. If you really want to get a specific number of results from a
shard, make a query to that shard alone.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Strange missing docs when reindexing with threads.

2009-06-12 Thread Shalin Shekhar Mangar
On Fri, Jun 12, 2009 at 11:40 PM, Alexander Wallace a...@rwmotloc.com wrote:

 Hi all!

 I'm using Solr 1.3 and currently testing reindexing...

 In my client app, i am sending 17494 requests to add documents...  In 3
 different scenarios:

 a) not using threads
 b) using 1 thread
 c) using 2 threads

 In scenario a), everything seems to work fine... In my client log, is see
 17494 requests sent to solr, in solr's log, I see the same number of 'add'
 requests received, and If i search the index, i can see the same amount of
 documents.

 However, if I use 1 thread, I see the right amount of requests in logs, but
 I only find 15k or so documents (this varies a bit every time i run this
 scenario).

 It gets way worse if I use 2 threads... I can see the right amount of
 requests in both logs, but i end up with ~ 600 docs in the index!

 In all scenarios, I don't see any errors on the logs...

 As you can imagine, I need to be able to use multiple threads to speed up
 the process... It is also very concertning that I don't get any errors
 anywhere...

 Looking at solr's admin stats, I see also 17494 cumulative adds, but only a
 tiny fraction of actual documents can be found...

 Any clues?


What is the uniqueKey in your schema.xml? Is it possible that those 17494
documents have a common uniqueKey and are therefore getting overwritten?

-- 
Regards,
Shalin Shekhar Mangar.


Replication problems on 1.4

2009-06-12 Thread Phil Hagelberg

I'm trying out the replication features on 1.4 (trunk) with multiple
indices using a setup based on the example multicore config.

The first time I tried it, (replicating through the admin web
interface), it worked fine. I was a little surprised that telling one
core to replicate caused both to replicate since the docs seem to imply
that replication is done on a per-core basis, but I was happy to see
that it worked.

I wanted to replay my steps, so on the slave machine I deleted
core0/data/* and core1/data/* and restarted the server. I restarted the
server on master just to be sure. Now replication doesn't work at
all. I've tried it both through the admin interface and by curl:

  curl http://localhost:8983/solr/core0/replication?command=snappull

The response from curl indicates that the replication was successful,
but nothing happened; my slave index is still empty.

My only guess as to what's going wrong here is that deleting the
coreN/data directory is not a good way to reset a core back to its
initial condition. Maybe there's a bit of state somewhere that's making
the slave think that it's already up-to-date with this master and so it
doesn't need to do any replicating? But this is a wild conjecture; I'd
appreciate any tips on where to look for what's going wrong.

As to why the replication claims to be successful, I've no idea. Am I
missing some crucial log file that explains what's going wrong?

It's also possible that this stuff is still in a heavy state of
development such that it shouldn't be expected to work by casual users,
if that is the case I can go back to the external-script-based
replication features of 1.3.

thanks,
Phil Hagelberg
http://technomancy.us


Re: fq vs. q

2009-06-12 Thread Fergus McMenemie
On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com wrote:

 I've summarized what I've learnt about filter queries on this page:

 http://wiki.apache.org/solr/FilterQueryGuidance


Wow! This is great! Thanks for taking the time to write this up Michael.

I've added a section on analysis, scoring and faceting aspects.

-- 
Regards,
Shalin Shekhar Mangar.

A very useful article.

If I could chip in with another stupid but related issue. 

The article could explain the difference between fq= and
facet.query= and when you should use one in preference to
the other.

Regards Fergus.

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Replication problems on 1.4

2009-06-12 Thread Phil Hagelberg
Phil Hagelberg p...@hagelb.org writes:

 My only guess as to what's going wrong here is that deleting the
 coreN/data directory is not a good way to reset a core back to its
 initial condition. Maybe there's a bit of state somewhere that's making
 the slave think that it's already up-to-date with this master and so it
 doesn't need to do any replicating? But this is a wild conjecture; I'd
 appreciate any tips on where to look for what's going wrong.

OK, so I inserted some more documents into the master, and now
replication works. I get the feeling it may be due to this line in the
master's solrconfig.xml:

  str name=replicateAftercommit/str

Now this is confusing since it seems that the timing of replication is
not up to the master, it's up to the slave. The slave's config has
settings for the interval at which to replicate, and you POST to the
slave to force a replication. So why is there a setting on the master to
control when replication happens?

My only interpretation from the config files is the master has some sort
of you may not replicate from me unless conditions. This seems pretty
undesirable since you may have a slave that needs to get replicated from
the master immediately; it shouldn't have to wait for a commit on the
master. Am I misunderstanding what's going on here? It certainly isn't
clear from the documents on the wiki, so I'm kind of grasping in the
dark. Perhaps I'm missing something.

thanks,
Phil Hagelberg
http://technomancy.us


Re: Stable release, trunk release - same Tomcat instance

2009-06-12 Thread Jeff Rodenburg
Um, yes this works.

On Fri, Jun 12, 2009 at 11:12 AM, Jeff Rodenburg
jeff.rodenb...@gmail.comwrote:

 If I want to run the stable 1.3 release and the nightly build under the
 same Tomcat instance, should that be configured as multiple solr
 applications, or is there a different configuration to follow?



RE: fq vs. q

2009-06-12 Thread Ensdorf Ken


 -Original Message-
 From: Fergus McMenemie [mailto:fer...@twig.me.uk]
 Sent: Friday, June 12, 2009 3:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: fq vs. q

 On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com
 wrote:
 
  I've summarized what I've learnt about filter queries on this page:
 
  http://wiki.apache.org/solr/FilterQueryGuidance
 
 
 Wow! This is great! Thanks for taking the time to write this up
 Michael.
 
 I've added a section on analysis, scoring and faceting aspects.
 

+1 definitely a great article

I ran into this very issue recently as we are using a freshness filter for 
our data that can be 6//12/18 months etc.  I discovered that even though we 
were only indexing with day-level granularity, we were specifying the query by 
computing a date down to the second and thus virutally every filter was unique. 
 It's amazing how something this simple could bring solr to it's knees on a 
large data set.  By simply changing the filter to date:[NOW-18MONTHS TO NOW] 
or equivalent, the problem vanishes.

It does bring up an interestion question though - how is NOW treated wrt to 
the cache key?  Does solr translate it to a date first?  If so, how does it 
determine the granularity?  If not, is there any mechanism to flush the cache 
when the corresponding result set changes?

-Ken


Re: Strange missing docs when reindexing with threads.

2009-06-12 Thread Alexander Wallace
Right after I sent the email I went on and checked for uniqueness of 
documents...


In theory the were all supposed to be unique... But i've realized that 
the platform I'm using to reindex, is delaying sending the requests, 
this in combination with my reindexers reusing document fields (instead 
of creating new instances to save on GC) lead to the same document being 
sent many times with invalid data...


I am fairly sure now that this is the source of my problem... My 
reindexers originally used LuceneWriter directly, which blocks thread 
excecution until the document is added to the index, and the new 
framework i'm using uses messaging which releases control back to the 
thread before the documents are actually sent to be indexed, my threads 
update the document fields meanwhile, so the data written to the index 
is transitioning and invalid...


I've done an adjustment to my reindexing threads to ensure new instances 
of everything are used... I will test it shortly...


But you point out exactly why i have less documents than 'add' requests...

Thanks!

Shalin Shekhar Mangar wrote:

On Fri, Jun 12, 2009 at 11:40 PM, Alexander Wallace a...@rwmotloc.com wrote:

  

Hi all!

I'm using Solr 1.3 and currently testing reindexing...

In my client app, i am sending 17494 requests to add documents...  In 3
different scenarios:

a) not using threads
b) using 1 thread
c) using 2 threads

In scenario a), everything seems to work fine... In my client log, is see
17494 requests sent to solr, in solr's log, I see the same number of 'add'
requests received, and If i search the index, i can see the same amount of
documents.

However, if I use 1 thread, I see the right amount of requests in logs, but
I only find 15k or so documents (this varies a bit every time i run this
scenario).

It gets way worse if I use 2 threads... I can see the right amount of
requests in both logs, but i end up with ~ 600 docs in the index!

In all scenarios, I don't see any errors on the logs...

As you can imagine, I need to be able to use multiple threads to speed up
the process... It is also very concertning that I don't get any errors
anywhere...

Looking at solr's admin stats, I see also 17494 cumulative adds, but only a
tiny fraction of actual documents can be found...

Any clues?




What is the uniqueKey in your schema.xml? Is it possible that those 17494
documents have a common uniqueKey and are therefore getting overwritten?

  


localsolr and collapse in Solr 1.4

2009-06-12 Thread Nirkhe, Chandra
Hi,

Has anyone successfully used localsolr and collapse together in Solr
1.4. I am getting two result-sets one from localsolr and other from
collapse. I need a merged result-set.

Any pointers  ???



Using The Tomcat Container

2009-06-12 Thread Mukerjee, Neiloy (Neil)
I am installing Solr 1.3.0, and currently have been trying to use Tomcat 5.5. 
This hasn't been working so far for me, and I have been told (unofficially) 
that my installation would go more smoothly if I were to use Tomcat 6. Does 
anyone have experiencing with Solr 1.3 and Tomcat 5.5?


Re: Replication problems on 1.4

2009-06-12 Thread Shalin Shekhar Mangar
On Sat, Jun 13, 2009 at 1:25 AM, Phil Hagelberg p...@hagelb.org wrote:


 OK, so I inserted some more documents into the master, and now
 replication works. I get the feeling it may be due to this line in the
 master's solrconfig.xml:

  str name=replicateAftercommit/str

 Now this is confusing since it seems that the timing of replication is
 not up to the master, it's up to the slave. The slave's config has
 settings for the interval at which to replicate, and you POST to the
 slave to force a replication. So why is there a setting on the master to
 control when replication happens?

 My only interpretation from the config files is the master has some sort
 of you may not replicate from me unless conditions. This seems pretty
 undesirable since you may have a slave that needs to get replicated from
 the master immediately; it shouldn't have to wait for a commit on the
 master. Am I misunderstanding what's going on here? It certainly isn't
 clear from the documents on the wiki, so I'm kind of grasping in the
 dark. Perhaps I'm missing something.


You are right. In Solr/Lucene, a commit exposes updates to searchers. So you
need to call commit on the master for the slave to pick up the changes.
Replicating changes from the master and then not exposing new documents to
searchers does not make sense. However, there is a lot of work going on in
Lucene to enable near real-time search (exposing documents to searchrs as
soon as possible). Once those features are mature enough, Solr's replication
will follow suit.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Replication problems on 1.4

2009-06-12 Thread Phil Hagelberg
Shalin Shekhar Mangar shalinman...@gmail.com writes:

 You are right. In Solr/Lucene, a commit exposes updates to searchers. So you
 need to call commit on the master for the slave to pick up the changes.
 Replicating changes from the master and then not exposing new documents to
 searchers does not make sense. However, there is a lot of work going on in
 Lucene to enable near real-time search (exposing documents to searchrs as
 soon as possible). Once those features are mature enough, Solr's replication
 will follow suit.

I understand that; it's totally reasonable.

What it doesn't explain is what happened in my case: the master added a
bunch of docs, committed, and then the slave replicated fine. Then the
slave lost all its data (due to me issuing an rm -rf of the data
directory, but let's say it happened due to a disk failure or something)
and tried to replicate again, but got zero docs. Once the master had
another commit issued, the slave could now replicate properly.

I would expect in this case the slave should be able to replicate after
losing its data but before the second commit. I can see why the master
would not expose uncommitted documents, but I can't see why it would
refuse to allow _any_ of its index to be replicated from.

I feel like I'm missing a piece of the picture here.

-Phil


Re: Using The Tomcat Container

2009-06-12 Thread Shalin Shekhar Mangar
On Sat, Jun 13, 2009 at 2:25 AM, Mukerjee, Neiloy (Neil) 
neil.muker...@alcatel-lucent.com wrote:

 I am installing Solr 1.3.0, and currently have been trying to use Tomcat
 5.5. This hasn't been working so far for me, and I have been told
 (unofficially) that my installation would go more smoothly if I were to use
 Tomcat 6. Does anyone have experiencing with Solr 1.3 and Tomcat 5.5?


Can you elaborate on what is not working for you? We use Solr with Tomcat
5.5 and it works fine.

-- 
Regards,
Shalin Shekhar Mangar.


Re: highlighting on edgeGramTokenized field -- hightlighting incorrect bc. position not incremented..

2009-06-12 Thread Britske

Thanks, I'll check it out. 


Otis Gospodnetic wrote:
 
 
 Britske,
 
 I'd have to dig, but there are a couple of JIRA issues in Lucene's JIRA
 (the actual ngram code is part of Lucene) that have to do with ngram
 positions.  I have a feeling that may be the problem.
 
  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: Britske gbr...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Friday, June 12, 2009 6:15:36 AM
 Subject: highlighting on edgeGramTokenized field -- hightlighting
 incorrect bc. position not incremented..
 
 
 Hi, 
 
 I'm trying to highlight based on a (multivalued) field (prefix2) that has
 (among other things) a EdgeNGramFilterFactory defined. 
 highlighting doesn't increment the start-position of the highlighted
 portion, so in other words the highlighted portion is always the
 beginning
 of the field. 
 
 
 
 
 for example: 
 for prefix2: Orlando Verenigde Staten
 the query:
 http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,idq=prefix2:%22ver%22wt=xmlhl=truehl.fl=prefix2
 
 returns: 
 Orlando Verenigde Staten
 while it should be: 
 Orlando Verenigde Staten
 
 the field def: 
 
 
 positionIncrementGap=1
   
 
 
 
 maxGramSize=20/
   
   
 
 
   
 
 
 I checked that removing the EdgeNGramFilterFactory results in correct
 positioning of  highlighting. (But then I can't search for ngrams...) 
 
 What am I missing? 
 Thanks in advance, 
 Britske
 
 
 
 -- 
 View this message in context: 
 http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p24006375.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: fq vs. q

2009-06-12 Thread Shalin Shekhar Mangar
On Sat, Jun 13, 2009 at 1:36 AM, Ensdorf Ken ensd...@zoominfo.com wrote:

 I ran into this very issue recently as we are using a freshness filter
 for our data that can be 6//12/18 months etc.  I discovered that even though
 we were only indexing with day-level granularity, we were specifying the
 query by computing a date down to the second and thus virutally every filter
 was unique.  It's amazing how something this simple could bring solr to it's
 knees on a large data set.  By simply changing the filter to
 date:[NOW-18MONTHS TO NOW] or equivalent, the problem vanishes.


Since you are indexing with day-level granularity, you should query too with
the same granularity. For example, date:[NOW/DAY-18MONTHS TO NOW/DAY]. The
'/' operator is used for rounding off in DateMath syntax (
http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html).
Perhaps this is something we should document more clearly, we recently had
high CPU issues with one of our webapps due to the same issue.



 It does bring up an interestion question though - how is NOW treated wrt
 to the cache key?  Does solr translate it to a date first?  If so, how does
 it determine the granularity?  If not, is there any mechanism to flush the
 cache when the corresponding result set changes?


The date math syntax is translated to a date before a search is performed.
NOW is always granular upto seconds (maybe milliseconds, not sure).

-- 
Regards,
Shalin Shekhar Mangar.


Re: Using The Tomcat Container

2009-06-12 Thread Yonik Seeley
On Fri, Jun 12, 2009 at 4:55 PM, Mukerjee, Neiloy
(Neil)neil.muker...@alcatel-lucent.com wrote:
 I am installing Solr 1.3.0, and currently have been trying to use Tomcat 5.5. 
 This hasn't been working so far for me, and I have been told (unofficially) 
 that my installation would go more smoothly if I were to use Tomcat 6. Does 
 anyone have experiencing with Solr 1.3 and Tomcat 5.5?

5.5 should work OK.

For nightlies, I just updated the Simple Example Install
at http://wiki.apache.org/solr/SolrTomcat

You might try stepping through those steps with your versions.

-Yonik
http://www.lucidimagination.com


Re: Strange missing docs when reindexing with threads.

2009-06-12 Thread Alexander Wallace
That was exactly my issue... i changed my code to not reuse 
document/fields and it is all good now!


Thanks for your support!

Shalin Shekhar Mangar wrote:

On Fri, Jun 12, 2009 at 11:40 PM, Alexander Wallace a...@rwmotloc.com wrote:

  

Hi all!

I'm using Solr 1.3 and currently testing reindexing...

In my client app, i am sending 17494 requests to add documents...  In 3
different scenarios:

a) not using threads
b) using 1 thread
c) using 2 threads

In scenario a), everything seems to work fine... In my client log, is see
17494 requests sent to solr, in solr's log, I see the same number of 'add'
requests received, and If i search the index, i can see the same amount of
documents.

However, if I use 1 thread, I see the right amount of requests in logs, but
I only find 15k or so documents (this varies a bit every time i run this
scenario).

It gets way worse if I use 2 threads... I can see the right amount of
requests in both logs, but i end up with ~ 600 docs in the index!

In all scenarios, I don't see any errors on the logs...

As you can imagine, I need to be able to use multiple threads to speed up
the process... It is also very concertning that I don't get any errors
anywhere...

Looking at solr's admin stats, I see also 17494 cumulative adds, but only a
tiny fraction of actual documents can be found...

Any clues?




What is the uniqueKey in your schema.xml? Is it possible that those 17494
documents have a common uniqueKey and are therefore getting overwritten?

  


Joins or subselects in solr

2009-06-12 Thread Nasseam Elkarra

Hello,

I am storing items in an index. Each item has a comma separated list  
of related items. Is it possible to bring back an item and all of its  
related items in one query? If so how and how would you distinguish  
between which one is the main item and which are the related.


Any help is much appreciated.

Thanks!
Nasseam

Solr-powered Ajax search+nav:
http://factbook.bodukai.com/

Powered by Boutique:
http://bodukai.com/boutique/

Re: Replication problems on 1.4

2009-06-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Sat, Jun 13, 2009 at 2:44 AM, Phil Hagelbergp...@hagelb.org wrote:
 Shalin Shekhar Mangar shalinman...@gmail.com writes:

 You are right. In Solr/Lucene, a commit exposes updates to searchers. So you
 need to call commit on the master for the slave to pick up the changes.
 Replicating changes from the master and then not exposing new documents to
 searchers does not make sense. However, there is a lot of work going on in
 Lucene to enable near real-time search (exposing documents to searchrs as
 soon as possible). Once those features are mature enough, Solr's replication
 will follow suit.

 I understand that; it's totally reasonable.

 What it doesn't explain is what happened in my case: the master added a
 bunch of docs, committed, and then the slave replicated fine. Then the
 slave lost all its data (due to me issuing an rm -rf of the data
 directory, but let's say it happened due to a disk failure or something)
 and tried to replicate again, but got zero docs. Once the master had
 another commit issued, the slave could now replicate properly.

if you removed the files while the slave is running , then the slave
will not know that you removed the files (assuming it is a *nix box)
and it will serve the search requests. But if you restart the slave ,
it should have automatically picked up the current index.

if it doesn't it is a bug

 I would expect in this case the slave should be able to replicate after
 losing its data but before the second commit. I can see why the master
 would not expose uncommitted documents, but I can't see why it would
 refuse to allow _any_ of its index to be replicated from.

 I feel like I'm missing a piece of the picture here.

 -Phil




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Custom Request handler Error:

2009-06-12 Thread noor

Shalin Shekhar Mangar wrote:

On Fri, Jun 12, 2009 at 8:07 PM, noor noo...@opentechindia.com wrote:

  

requestHandler name=/select class=solr.my.MyCustomHandler
lst name=defaults
str name=echoParamsexplicit/str
str name=qtandem/str
str name=debugQuerytrue/str
/lst
/requestHandler

Now, my webapp runs fine by,
http://localhost:8983/mysearch
searching also working fine.
But, these are not run through my custom handler.




Specify the full package to your handler class. Packages starting with
solr are loaded in a special way.

  
I specified like 
requestHandler name=/select class=org.apache.solr.my.MyCustomHandler

.
But still the same error.