RE: Combination of EmbeddedSolrServer and CommonHttpSolrServer

2009-03-12 Thread Kulkarni, Ajit Kamalakar
Hi Shalin Shekhar Mangar,

Thanks for your inputs.

Please see my comments below.

 

 

I wish to know if there is any user who used EmbeddedSolrServer for
indexing and CommonsHttpSolrServer for search.

I have found that this combination offers better performance for
indexing. Searching becomes flexible as you can search from more number
of http clients simultaneously.

Does anyone have any related performance data? 

 

 

Thanks,

Ajit

 

 

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, March 11, 2009 7:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer

 

On Wed, Mar 11, 2009 at 6:37 PM, Kulkarni, Ajit Kamalakar 

ajkulka...@ptc.com wrote:

 

 

 If we index the documents using CommonsHttpSolrServer and search using

 the same, we get the updated results

 

 That means we can search the latest added document as well even if it
is

 not committed to the file system

 

 

That is not possible. Without calling commit, new documents will not be

visible to a searcher.

 

 

Ajit: When I tested using CommonsHttpSolrServer for indexing as well as
searching, I could search the latest added document through solr admin
page.

I could also search the document through CommonsHttpSolrServer without
explicitly calling commit.

I am even more surprised to see the same result by using
EmbeddedSolrServer for indexing and for searching CommonsHttpSolrServer.

I used embeddedSolrServer = new
EmbeddedSolrServer(SolrCore.getSolrCore()); which is deprecated API.

For this I did not need to call commit on CommonsHttpSolrServer to get
latest document searched on either solr admin page or even
programmatically through CommonsHttpSolrServer

 

However if I use 

 

  CoreContainer multicore = new CoreContainer(); 

  File home = new File( getSolrHome() );

  File f = new File( home, solr.xml );

  multicore.load( getSolrHome(), f );

  embeddedSolrServer = new EmbeddedSolrServer( multicore,
SolrIndexConstants.DEFAULT_CORE );

 

I had to use commit on CommonsHttpSolrServer to search the latest added
documents and the document was available through solr admin page only
when I programatcaaly searched after calling commit on
CommonsHttpSolrServer

This is consistent with what you mentioned above.

 

 

 

 So it looks like there is some kind of cache that is used by both
index

 and search logic inside solr for a given SolrServer components (e. g.

 CommonsHttpSolrServer, EmbeddedSolrServer)

 

 

Indexing does not create any cache. The caching is done only by the

searcher. The old searcher/cache is discarded and a new searcher/cache
is

created when you call commit. Setting autoWarmCount on the caches in

solrconfig.xml makes the new searcher run some of the most recently used

queries on the old searcher to warm up the new cache.

 

Calling commit on the SolrServer to synch with the index data may not be

 good option as I suppose it to be expensive operation.

 

 

It is the only option. But you may be able to make the operation cheaper
by

tweaking the autowarmCount on the caches (this is specified in

solrconfig.xml). However, caches are important for good search
performance.

Depending on your search traffic, you'll need to find a sweet spot.

 

 

 The cache and hard disk data synchronization should be independent of

 the SolrServer instances managed by Solr Web Application inside
tomcat.

 

 

SolrServer is not really a server in itself. It is (a pointer to?) a
server

being used by a solrj client. The CommonsHttpSolrServer refers to a
remote

server url and makes calls through HTTP. SolrCore is the internal class

which manages the state of the server.

 

A SolrCore is created by the solr webapp. When you create another
SolrCore

for use by EmbeddedSolrServer, they do not know about each other.
Therefore

you need to notify it if you change the index through another core.

 

Ajit: If the same JVM is managing responding searchers for
EmbeddedSolrServer as well as CommonsHttpSolrServer, then why can't
responding searcher be same? I understand that EmbeddedSolrServer and
CommonsHttpSolrServer clients are separate but if searchers are managed
in same JVM, theoretically we should be able to make singleton searcher
attached to every kind of SolrServer. This searcher should be listener
for indexer.

Since searching is read operation, there won't be any threading or
scalability issue but indexer should be one

Since I don't have enough knowledge about solr and lucene so I may be
totally wrong!

 

 The issue still will be that EmbeddedSolrServer may directly access
hard

 index data as it may bypass the Solr web app totally

 

 I am embedding tomcat in my RMI server.

 

 The RMI Server is going to use EmbeddedSolrServer and it also hosts
the

 Solr WebApp inside its tomcat instance

 

 So I guess I should be able to manage a singleton cache  that is given

 to both

RE: Combination of EmbeddedSolrServer and CommonHttpSolrServer

2009-03-11 Thread Kulkarni, Ajit Kamalakar
Ryan,

If we index the documents using CommonsHttpSolrServer and search using
the same, we get the updated results

That means we can search the latest added document as well even if it is
not committed to the file system 

 

So it looks like there is some kind of cache that is used by both index
and search logic inside solr for a given SolrServer components (e. g.
CommonsHttpSolrServer, EmbeddedSolrServer)

 

Is there any way to configure that same cache  will be used by the
component that respond to HTTP request through CommonsHttpSolrServer and
the component used by EmbeddedSolrServer?

 

I don't see any reason why searcher and/or indexer for a given
SolrServer need to maintain exclusive cache

 

Calling commit on the SolrServer to synch with the index data may not be
good option as I suppose it to be expensive operation.

 

The cache and hard disk data synchronization should be independent of
the SolrServer instances managed by Solr Web Application inside tomcat.

 

The issue still will be that EmbeddedSolrServer may directly access hard
index data as it may bypass the Solr web app totally

 

I am embedding tomcat in my RMI server. 

The RMI Server is going to use EmbeddedSolrServer and it also hosts the
Solr WebApp inside its tomcat instance

 

So I guess I should be able to manage a singleton cache  that is given
to both, CommonsHttpSolrServer related components managed inside Solr
WebApp and EmbeddedSolrServer components

 

Please comment.

 

Thanks,

Ajit

 

-Original Message-
From: Ryan McKinley [mailto:ryan...@gmail.com] 
Sent: Monday, February 09, 2009 9:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer

 

 

 

Keep in mind that the way lucene/solr work is that the results are  

constant from when you open the searcher.  If new documents are added  

(without re-opening the searcher) they will not be seen.

 

commit/  tells solr to re-open the index and see the changes.

 

 

 1. Does this mean that committing on the indexing (Embedded) server  

 does

 not reflect the document changes when we fire a search through another

 (HTTP) server?

 

correct.  The HTTP server would still be open from before the indexing  

happened.

 

 

 2. What happens to the commit fired on the indexing server? Can I  

 remove

 that and just commit on the read only server?

 

Call commit on the indexing server, then the read only server then you  

can delete the Embedded server

 

 

 

 3. Do we have to fire a Commit (on the HTTP server) before we try to

 search for a document?

 

Yes -- calling commit will re-open the index and reflect any changes  

to it

 

 

 

 4. Can we make any setting (perhaps using auto-commit) on the HTTP

 server to avoid this scenario?

 

 

Not really -- the HTTP core has no idea what is happening on the other  

core.

 

 

ryan



Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer

2009-03-11 Thread Shalin Shekhar Mangar
On Wed, Mar 11, 2009 at 6:37 PM, Kulkarni, Ajit Kamalakar 
ajkulka...@ptc.com wrote:


 If we index the documents using CommonsHttpSolrServer and search using
 the same, we get the updated results

 That means we can search the latest added document as well even if it is
 not committed to the file system


That is not possible. Without calling commit, new documents will not be
visible to a searcher.


 So it looks like there is some kind of cache that is used by both index
 and search logic inside solr for a given SolrServer components (e. g.
 CommonsHttpSolrServer, EmbeddedSolrServer)


Indexing does not create any cache. The caching is done only by the
searcher. The old searcher/cache is discarded and a new searcher/cache is
created when you call commit. Setting autoWarmCount on the caches in
solrconfig.xml makes the new searcher run some of the most recently used
queries on the old searcher to warm up the new cache.

Calling commit on the SolrServer to synch with the index data may not be
 good option as I suppose it to be expensive operation.


It is the only option. But you may be able to make the operation cheaper by
tweaking the autowarmCount on the caches (this is specified in
solrconfig.xml). However, caches are important for good search performance.
Depending on your search traffic, you'll need to find a sweet spot.


 The cache and hard disk data synchronization should be independent of
 the SolrServer instances managed by Solr Web Application inside tomcat.


SolrServer is not really a server in itself. It is (a pointer to?) a server
being used by a solrj client. The CommonsHttpSolrServer refers to a remote
server url and makes calls through HTTP. SolrCore is the internal class
which manages the state of the server.

A SolrCore is created by the solr webapp. When you create another SolrCore
for use by EmbeddedSolrServer, they do not know about each other. Therefore
you need to notify it if you change the index through another core.


 The issue still will be that EmbeddedSolrServer may directly access hard
 index data as it may bypass the Solr web app totally

 I am embedding tomcat in my RMI server.

 The RMI Server is going to use EmbeddedSolrServer and it also hosts the
 Solr WebApp inside its tomcat instance

 So I guess I should be able to manage a singleton cache  that is given
 to both, CommonsHttpSolrServer related components managed inside Solr
 WebApp and EmbeddedSolrServer components


Why have two of them at all? Is the solr deployed inside tomcat serves HTTP
requests from external clients without going through your RMI server? You
can simplify things by keeping it either in tomcat or in embedded mode.

Hope that helps.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer

2009-02-09 Thread Ryan McKinley

yes.  This works fine.

But make sure only one SolrServer is writing to the index at a time.   
Also note that if you use the EmbeddedSolrServer to index and another  
one to read, you will need to call commit/ on the 'read only' server  
to refresh the index view (the work commit is a bit misleading)


ryan


On Feb 9, 2009, at 9:43 AM, Bapat, Mayur wrote:


Hi,

Has anybody tried the combination of EmbeddedSolrServer only for
indexing and CommonHttpSolrServer only for searching?
So in my architecture with the EmbeddedSolrServer I want to use the
advantage of direct API calls for indexing purpose and for searching I
would rely on HTTP requests.
I tried some basic stuff but found that since the Solr core is being
initialized twice, I am unable to find the documents indexed through  
the
embedded server using the HTTP requests unless I restart the tomcat.  
Any

idea how it could be achieved?

Mayur Bapat
PTC-Pune India
Ext - 3523




RE: Combination of EmbeddedSolrServer and CommonHttpSolrServer

2009-02-09 Thread Jana, Kumar Raja
Hi,

I have a few queries regarding this:

1. Does this mean that committing on the indexing (Embedded) server does
not reflect the document changes when we fire a search through another
(HTTP) server?
2. What happens to the commit fired on the indexing server? Can I remove
that and just commit on the read only server?
3. Do we have to fire a Commit (on the HTTP server) before we try to
search for a document? 
4. Can we make any setting (perhaps using auto-commit) on the HTTP
server to avoid this scenario?

Thanks,
Kumar

-Original Message-
From: Ryan McKinley [mailto:ryan...@gmail.com] 
Sent: Monday, February 09, 2009 8:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer

yes.  This works fine.

But make sure only one SolrServer is writing to the index at a time.   
Also note that if you use the EmbeddedSolrServer to index and another  
one to read, you will need to call commit/ on the 'read only' server  
to refresh the index view (the work commit is a bit misleading)

ryan


On Feb 9, 2009, at 9:43 AM, Bapat, Mayur wrote:

 Hi,

 Has anybody tried the combination of EmbeddedSolrServer only for
 indexing and CommonHttpSolrServer only for searching?
 So in my architecture with the EmbeddedSolrServer I want to use the
 advantage of direct API calls for indexing purpose and for searching I
 would rely on HTTP requests.
 I tried some basic stuff but found that since the Solr core is being
 initialized twice, I am unable to find the documents indexed through  
 the
 embedded server using the HTTP requests unless I restart the tomcat.  
 Any
 idea how it could be achieved?

 Mayur Bapat
 PTC-Pune India
 Ext - 3523



Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer

2009-02-09 Thread Ryan McKinley




Keep in mind that the way lucene/solr work is that the results are  
constant from when you open the searcher.  If new documents are added  
(without re-opening the searcher) they will not be seen.


commit/  tells solr to re-open the index and see the changes.


1. Does this mean that committing on the indexing (Embedded) server  
does

not reflect the document changes when we fire a search through another
(HTTP) server?


correct.  The HTTP server would still be open from before the indexing  
happened.




2. What happens to the commit fired on the indexing server? Can I  
remove

that and just commit on the read only server?


Call commit on the indexing server, then the read only server then you  
can delete the Embedded server





3. Do we have to fire a Commit (on the HTTP server) before we try to
search for a document?


Yes -- calling commit will re-open the index and reflect any changes  
to it





4. Can we make any setting (perhaps using auto-commit) on the HTTP
server to avoid this scenario?



Not really -- the HTTP core has no idea what is happening on the other  
core.



ryan