Re: abt Multicore

2008-11-17 Thread Shalin Shekhar Mangar
On Mon, Nov 17, 2008 at 2:17 PM, Raghunandan Rao 
[EMAIL PROTECTED] wrote:


 I have an app running on weblogic and oracle. Oracle DB is quite huge;
 say some 10 millions of records. I need to integrate Solr for this and I
 am planning to use multicore. How can multicore feature can be at the
 best?


To index records from a database, you can take a look at DataImportHandler.

It would help if you are a bit more specific than that. What exactly do you
want to know? It also helps if you tell us why you want to know about one
particular thing, so that we may advise on better alternative solutions.

-- 
Regards,
Shalin Shekhar Mangar.


Re: abt Multicore

2008-11-17 Thread Ryan McKinley
Are all the documents in the same search space?  That is, for a given  
query, could any of the 10MM docs be returned?


If so, I don't think you need to worry about multicore.  You may  
however need to put part of the index on various machines:

http://wiki.apache.org/solr/DistributedSearch

ryan


On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote:


Hi,

I have an app running on weblogic and oracle. Oracle DB is quite huge;
say some 10 millions of records. I need to integrate Solr for this  
and I

am planning to use multicore. How can multicore feature can be at the
best?



-Raghu





RE: abt Multicore

2008-11-17 Thread Nguyen, Joe
 
Any suggestions?
-Original Message-
From: Nguyen, Joe 
Sent: Monday, November 17, 2008 9:40 Joe
To: 'solr-user@lucene.apache.org'
Subject: RE: abt Multicore

Are all the documents in the same search space?  That is, for a given
query, could any of the 10MM docs be returned?

If so, I don't think you need to worry about multicore.  You may however
need to put part of the index on various machines:
http://wiki.apache.org/solr/DistributedSearch 

I also try to make decision whether going with muticore or distributed
search. My concern is as follow:

Does that mean having a single big schema with lot of fields?
Distributed Search requires that each document must have a unique key.
In this case, the unique key cannot be a primary key of a table.

I wonder how Solr performs in this case (distributed search vs.
multicore) 1.  Distributed Search
a.  All documents are in a single index.  Indexing a single document
would lock the index and affect query performance?  
b.  If multi machines are used, Solr will need to query each machine
and merge the result.  This also could impact performance. 
C.  Support MoreLikeThis query given a document id.
2.  Multicore
a.  Each table will be associated with a single core.  Indexing a
single document would lock only a specific core index.  Thus,quering
documents on other cores won't be impacted.
B.  Querying documents across multicore must be handle by the
caller.
C.  Can't support MoreLikeThis query since document id from one core
has no meaning on other cores.

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED]
Sent: Monday, November 17, 2008 6:09 Joe
To: solr-user@lucene.apache.org
Subject: Re: abt Multicore

Are all the documents in the same search space?  That is, for a given
query, could any of the 10MM docs be returned?

If so, I don't think you need to worry about multicore.  You may however
need to put part of the index on various machines:
http://wiki.apache.org/solr/DistributedSearch

ryan


On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote:

 Hi,

 I have an app running on weblogic and oracle. Oracle DB is quite huge;

 say some 10 millions of records. I need to integrate Solr for this and

 I am planning to use multicore. How can multicore feature can be at 
 the best?



 -Raghu




Re: abt Multicore

2008-11-17 Thread Shalin Shekhar Mangar
Some high level thoughts:

On Mon, Nov 17, 2008 at 11:10 PM, Nguyen, Joe [EMAIL PROTECTED]wrote:

 Are all the documents in the same search space?  That is, for a given
 query, could any of the 10MM docs be returned?

 If so, I don't think you need to worry about multicore.  You may however
 need to put part of the index on various machines:
 http://wiki.apache.org/solr/DistributedSearch 

 I also try to make decision whether going with muticore or distributed
 search. My concern is as follow:

 Does that mean having a single big schema with lot of fields?


Yes and that's the use-case behind multi-valued fields. De-normalizing and
avoiding joins helps to scale.


 Distributed Search requires that each document must have a unique key.
 In this case, the unique key cannot be a primary key of a table.

 I wonder how Solr performs in this case (distributed search vs.
 multicore)
 1.  Distributed Search
a.  All documents are in a single index.  Indexing a single document
 would lock the index and affect query performance?


Indexing does not lock out searchers. Solr is designed to be queried
regardless of indexing. However, depending on your machine's performance and
your configuration, you may see slow queries during commits/auto-warming.

Also, in distributed search, you have different Solr instances handling
disjoint sets of data. Indexing on one instance does not affect the rest.


b.  If multi machines are used, Solr will need to query each machine
 and merge the result.  This also could impact performance.


Yes, but in most scenarios where distributed search is required, it is just
not possible to use a single box for the while index. If you set out to
write similar kind of querying for multi-cores, it will be difficult to
optimize it as well as Solr's implementation.



C.  Support MoreLikeThis query given a document id.


MoreLikeThis is not implemented for distributed environments (yet).



 2.  Multicore
a.  Each table will be associated with a single core.  Indexing a
 single document would lock only a specific core index.  Thus,quering
 documents on other cores won't be impacted.


With multi-core, all cores are on a single box, you may see slow queries on
other cores too (again, it depends on your box's strength).



B.  Querying documents across multicore must be handle by the
 caller.


That is not a use-case for which Lucene/Solr were designed. Joins are
discouraged most of the times.



C.  Can't support MoreLikeThis query since document id from one core
 has no meaning on other cores.


MoreLikeThis makes no sense in this case because the document structure
(schema) is totally different.




 -Original Message-
 From: Ryan McKinley [mailto:[EMAIL PROTECTED]
 Sent: Monday, November 17, 2008 6:09 Joe
 To: solr-user@lucene.apache.org
 Subject: Re: abt Multicore

 Are all the documents in the same search space?  That is, for a given
 query, could any of the 10MM docs be returned?

 If so, I don't think you need to worry about multicore.  You may however
 need to put part of the index on various machines:
 http://wiki.apache.org/solr/DistributedSearch

 ryan


 On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote:

  Hi,
 
  I have an app running on weblogic and oracle. Oracle DB is quite huge;

  say some 10 millions of records. I need to integrate Solr for this and

  I am planning to use multicore. How can multicore feature can be at
  the best?
 
 
 
  -Raghu
 




-- 
Regards,
Shalin Shekhar Mangar.