Re: abt Multicore
On Mon, Nov 17, 2008 at 2:17 PM, Raghunandan Rao [EMAIL PROTECTED] wrote: I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the best? To index records from a database, you can take a look at DataImportHandler. It would help if you are a bit more specific than that. What exactly do you want to know? It also helps if you tell us why you want to know about one particular thing, so that we may advise on better alternative solutions. -- Regards, Shalin Shekhar Mangar.
Re: abt Multicore
Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch ryan On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote: Hi, I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the best? -Raghu
RE: abt Multicore
Any suggestions? -Original Message- From: Nguyen, Joe Sent: Monday, November 17, 2008 9:40 Joe To: 'solr-user@lucene.apache.org' Subject: RE: abt Multicore Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch I also try to make decision whether going with muticore or distributed search. My concern is as follow: Does that mean having a single big schema with lot of fields? Distributed Search requires that each document must have a unique key. In this case, the unique key cannot be a primary key of a table. I wonder how Solr performs in this case (distributed search vs. multicore) 1. Distributed Search a. All documents are in a single index. Indexing a single document would lock the index and affect query performance? b. If multi machines are used, Solr will need to query each machine and merge the result. This also could impact performance. C. Support MoreLikeThis query given a document id. 2. Multicore a. Each table will be associated with a single core. Indexing a single document would lock only a specific core index. Thus,quering documents on other cores won't be impacted. B. Querying documents across multicore must be handle by the caller. C. Can't support MoreLikeThis query since document id from one core has no meaning on other cores. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2008 6:09 Joe To: solr-user@lucene.apache.org Subject: Re: abt Multicore Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch ryan On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote: Hi, I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the best? -Raghu
Re: abt Multicore
Some high level thoughts: On Mon, Nov 17, 2008 at 11:10 PM, Nguyen, Joe [EMAIL PROTECTED]wrote: Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch I also try to make decision whether going with muticore or distributed search. My concern is as follow: Does that mean having a single big schema with lot of fields? Yes and that's the use-case behind multi-valued fields. De-normalizing and avoiding joins helps to scale. Distributed Search requires that each document must have a unique key. In this case, the unique key cannot be a primary key of a table. I wonder how Solr performs in this case (distributed search vs. multicore) 1. Distributed Search a. All documents are in a single index. Indexing a single document would lock the index and affect query performance? Indexing does not lock out searchers. Solr is designed to be queried regardless of indexing. However, depending on your machine's performance and your configuration, you may see slow queries during commits/auto-warming. Also, in distributed search, you have different Solr instances handling disjoint sets of data. Indexing on one instance does not affect the rest. b. If multi machines are used, Solr will need to query each machine and merge the result. This also could impact performance. Yes, but in most scenarios where distributed search is required, it is just not possible to use a single box for the while index. If you set out to write similar kind of querying for multi-cores, it will be difficult to optimize it as well as Solr's implementation. C. Support MoreLikeThis query given a document id. MoreLikeThis is not implemented for distributed environments (yet). 2. Multicore a. Each table will be associated with a single core. Indexing a single document would lock only a specific core index. Thus,quering documents on other cores won't be impacted. With multi-core, all cores are on a single box, you may see slow queries on other cores too (again, it depends on your box's strength). B. Querying documents across multicore must be handle by the caller. That is not a use-case for which Lucene/Solr were designed. Joins are discouraged most of the times. C. Can't support MoreLikeThis query since document id from one core has no meaning on other cores. MoreLikeThis makes no sense in this case because the document structure (schema) is totally different. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2008 6:09 Joe To: solr-user@lucene.apache.org Subject: Re: abt Multicore Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch ryan On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote: Hi, I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the best? -Raghu -- Regards, Shalin Shekhar Mangar.