Re: Supporting multiple indexes in one collection
Sharding always adds overhead, which balances against splitting the work up amongst several machines. Sharding works like this for queries: 1> node receives query 2> a sub-query is sent to one replica of each shard 3> each replica sends back its top N (rows parameter) with ID and sort data 4> the node in <1> sorts the candidate lists to get the overall top N 5> the node in <1> sends out another query to each replica to get the data associated with the final sorted list 6> the node in <1> assembles the results from <5> and returns the true top 10 to the client. All that takes time. OTOH, in this scenario all the replicas are only searching a subset of the data, so each sub-query can be faster. Until you reach that point, querying a single replica is faster. At some point when your index gets past a certain size, that overhead is more than made up for by, basically, throwing more hardware at the problem (assuming the shards can make use of more hardware or CPUs or threads or whatever). “A certain size” is dependent on your data, hardware and query patterns there’s no hard and fast rule. But you haven’t really told us much. You say you’ve read that SolrCloud performance degrades when the number of collections rises. True. But the “number of collections” can be in the thousands. Are you talking about 5 collections? 10 collections? 1,000,000 collections? Details matter. And how many documents are you talking about per collection? Or in total? What are your performance criteria? Do you expect to handle 5 queries/second? 50? 5,000,000? When performance differs “by a few milliseconds”, unless you’re dealing with a very high total QPS it’s usually a waste of time to worry about it. Almost certainly there are much better things to spend your time on that the end users will actually notice ;) Plus, performance measurements are very tricky to actually get right. Are you measuring with a realistic data set and queries? Are you measuring with enough different queries to be hitting the various caches in a realistic manner? Are you indexing at the same time in a manner that reflects your real world? What I’m suggesting is that before making these kinds of decisions, and some of the ideas like composite routing and the like will require significant engineering effort you be very, very sure that they’re necessary. For instance, you’ll have to monitor every replica to see if it gets overloaded. Imagine your routing puts 300,000,000 documents for some very large client on a single shard (which, again, we have no idea whether that’s something you have to worry about since you haven’t told us). Now you’ll have to go in and fix that problem. Best, Erick > On Jul 1, 2020, at 2:58 AM, Raji N wrote: > > Did the test while back . Revisiting this again. But in standalone solr we > have experienced the queries more time if the data exists in 2 shards . > That's the main reason this test was done. If anyone has experience want to > hear > > On Tue, Jun 30, 2020 at 11:50 PM Jörn Franke wrote: > >> How many documents ? >> The real difference was only a couple of ms? >> >>> Am 01.07.2020 um 07:34 schrieb Raji N : >>> >>> Had 2 indexes in 2 separate shards in one collection and had exact same >>> data published with composite router with a prefix. Disabled all caches. >>> Issued the same query which is a small query with q parameter and fq >>> parameter . Number of queries which got executed (with same threads and >>> run for same time ) were more in 2 indexes with 2 separate shards case. >>> 90th percentile response time was also few ms better. >>> >>> Thanks, >>> Raji >>> On Tue, Jun 30, 2020 at 10:06 PM Jörn Franke >> wrote: What did you test? Which queries? What were the exact results in terms >> of time ? >> Am 30.06.2020 um 22:47 schrieb Raji N : > > Hi , > > > Trying to place multiple smaller indexes in one collection (as we read > solrcloud performance degrades as number of collections increase). We >> are > exploring two ways > > > 1) Placing each index on a single shard of a collection > > In this case placing documents for a single index is manual and > automatic rebalancing not done by solr > > > 2) Solr routing composite router with a prefix . > >In this case solr doesn’t place all the docs with same prefix in >> one > shard , so searches becomes distributed. But shard rebalancing is taken > care by solr. > > > We did a small perf test with both these set up. We saw the performance for > the first case (placing an index explicitly on a shard ) is better. > > > Has anyone done anything similar. Can you please share your experience. > > > Thanks, > > Raji >>
Re: Supporting multiple indexes in one collection
Did the test while back . Revisiting this again. But in standalone solr we have experienced the queries more time if the data exists in 2 shards . That's the main reason this test was done. If anyone has experience want to hear On Tue, Jun 30, 2020 at 11:50 PM Jörn Franke wrote: > How many documents ? > The real difference was only a couple of ms? > > > Am 01.07.2020 um 07:34 schrieb Raji N : > > > > Had 2 indexes in 2 separate shards in one collection and had exact same > > data published with composite router with a prefix. Disabled all caches. > > Issued the same query which is a small query with q parameter and fq > > parameter . Number of queries which got executed (with same threads and > > run for same time ) were more in 2 indexes with 2 separate shards case. > > 90th percentile response time was also few ms better. > > > > Thanks, > > Raji > > > >> On Tue, Jun 30, 2020 at 10:06 PM Jörn Franke > wrote: > >> > >> What did you test? Which queries? What were the exact results in terms > of > >> time ? > >> > Am 30.06.2020 um 22:47 schrieb Raji N : > >>> > >>> Hi , > >>> > >>> > >>> Trying to place multiple smaller indexes in one collection (as we read > >>> solrcloud performance degrades as number of collections increase). We > are > >>> exploring two ways > >>> > >>> > >>> 1) Placing each index on a single shard of a collection > >>> > >>> In this case placing documents for a single index is manual and > >>> automatic rebalancing not done by solr > >>> > >>> > >>> 2) Solr routing composite router with a prefix . > >>> > >>> In this case solr doesn’t place all the docs with same prefix in > one > >>> shard , so searches becomes distributed. But shard rebalancing is taken > >>> care by solr. > >>> > >>> > >>> We did a small perf test with both these set up. We saw the performance > >> for > >>> the first case (placing an index explicitly on a shard ) is better. > >>> > >>> > >>> Has anyone done anything similar. Can you please share your experience. > >>> > >>> > >>> Thanks, > >>> > >>> Raji > >> >
Re: Supporting multiple indexes in one collection
How many documents ? The real difference was only a couple of ms? > Am 01.07.2020 um 07:34 schrieb Raji N : > > Had 2 indexes in 2 separate shards in one collection and had exact same > data published with composite router with a prefix. Disabled all caches. > Issued the same query which is a small query with q parameter and fq > parameter . Number of queries which got executed (with same threads and > run for same time ) were more in 2 indexes with 2 separate shards case. > 90th percentile response time was also few ms better. > > Thanks, > Raji > >> On Tue, Jun 30, 2020 at 10:06 PM Jörn Franke wrote: >> >> What did you test? Which queries? What were the exact results in terms of >> time ? >> Am 30.06.2020 um 22:47 schrieb Raji N : >>> >>> Hi , >>> >>> >>> Trying to place multiple smaller indexes in one collection (as we read >>> solrcloud performance degrades as number of collections increase). We are >>> exploring two ways >>> >>> >>> 1) Placing each index on a single shard of a collection >>> >>> In this case placing documents for a single index is manual and >>> automatic rebalancing not done by solr >>> >>> >>> 2) Solr routing composite router with a prefix . >>> >>> In this case solr doesn’t place all the docs with same prefix in one >>> shard , so searches becomes distributed. But shard rebalancing is taken >>> care by solr. >>> >>> >>> We did a small perf test with both these set up. We saw the performance >> for >>> the first case (placing an index explicitly on a shard ) is better. >>> >>> >>> Has anyone done anything similar. Can you please share your experience. >>> >>> >>> Thanks, >>> >>> Raji >>
Re: Supporting multiple indexes in one collection
Had 2 indexes in 2 separate shards in one collection and had exact same data published with composite router with a prefix. Disabled all caches. Issued the same query which is a small query with q parameter and fq parameter . Number of queries which got executed (with same threads and run for same time ) were more in 2 indexes with 2 separate shards case. 90th percentile response time was also few ms better. Thanks, Raji On Tue, Jun 30, 2020 at 10:06 PM Jörn Franke wrote: > What did you test? Which queries? What were the exact results in terms of > time ? > > > Am 30.06.2020 um 22:47 schrieb Raji N : > > > > Hi , > > > > > > Trying to place multiple smaller indexes in one collection (as we read > > solrcloud performance degrades as number of collections increase). We are > > exploring two ways > > > > > > 1) Placing each index on a single shard of a collection > > > > In this case placing documents for a single index is manual and > > automatic rebalancing not done by solr > > > > > > 2) Solr routing composite router with a prefix . > > > > In this case solr doesn’t place all the docs with same prefix in one > > shard , so searches becomes distributed. But shard rebalancing is taken > > care by solr. > > > > > > We did a small perf test with both these set up. We saw the performance > for > > the first case (placing an index explicitly on a shard ) is better. > > > > > > Has anyone done anything similar. Can you please share your experience. > > > > > > Thanks, > > > > Raji >
Re: Supporting multiple indexes in one collection
What did you test? Which queries? What were the exact results in terms of time ? > Am 30.06.2020 um 22:47 schrieb Raji N : > > Hi , > > > Trying to place multiple smaller indexes in one collection (as we read > solrcloud performance degrades as number of collections increase). We are > exploring two ways > > > 1) Placing each index on a single shard of a collection > > In this case placing documents for a single index is manual and > automatic rebalancing not done by solr > > > 2) Solr routing composite router with a prefix . > > In this case solr doesn’t place all the docs with same prefix in one > shard , so searches becomes distributed. But shard rebalancing is taken > care by solr. > > > We did a small perf test with both these set up. We saw the performance for > the first case (placing an index explicitly on a shard ) is better. > > > Has anyone done anything similar. Can you please share your experience. > > > Thanks, > > Raji
Supporting multiple indexes in one collection
Hi , Trying to place multiple smaller indexes in one collection (as we read solrcloud performance degrades as number of collections increase). We are exploring two ways 1) Placing each index on a single shard of a collection In this case placing documents for a single index is manual and automatic rebalancing not done by solr 2) Solr routing composite router with a prefix . In this case solr doesn’t place all the docs with same prefix in one shard , so searches becomes distributed. But shard rebalancing is taken care by solr. We did a small perf test with both these set up. We saw the performance for the first case (placing an index explicitly on a shard ) is better. Has anyone done anything similar. Can you please share your experience. Thanks, Raji
Re: Solr Cloud and Multiple Indexes
70M documents distributed on 2 shards (so each shard > has > > > 35M > > > > document) > > > > > > > > What type of queries are slow? > > > > I am running normal queries (queries on a field) no faceting or > > > highlights > > > > are requested. Currently, I am facing delay of 2-3 seconds but > > > previously I > > > > had delays of around 28 seconds. > > > > > > > > Are there GC pauses as they can be a cause of slowness? > > > > I doubt this as the slowness was happening for a long period of time. > > > > > > > > Are document updates/additions happening in parallel? > > > > No, I have stopped adding/updating documents and doing queries only. > > > > > > > > This is what you are already doing. Did you mean that you want to add > > > more > > > > shards? > > > > No, what I meant is that I read that previously there was a way to > > chunk > > > a > > > > large index into multiple and then do distributed search on that as > in > > > this > > > > article https://wiki.apache.org/solr/DistributedSearch. What I was > > > looking > > > > for how this is handled in Solr Cloud? > > > > > > > > > > > > Regards, > > > > Salman > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather < > > modather1...@gmail.com> > > > > wrote: > > > > > > > > > What is your index size? How much memory is used? What type of > > queries > > > > are > > > > > slow? > > > > > Are there GC pauses as they can be a cause of slowness? > > > > > Are document updates/additions happening in parallel? > > > > > > > > > > The queries are very slow to run so I was thinking to distribute > > > > > the indexes into multiple indexes and consequently distributed > > search. > > > > Can > > > > > anyone guide me to some sources (articles) that discuss this in > Solr > > > > Cloud? > > > > > > > > > > This is what you are already doing. Did you mean that you want to > add > > > > more > > > > > shards? > > > > > > > > > > Regards, > > > > > Modassar > > > > > > > > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari < > > salman.rah...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I am using Solr cloud and I have created a single index that host > > > > around > > > > > > 70M documents distributed into 2 shards (each having 35M > documents) > > > > and 2 > > > > > > replicas. The queries are very slow to run so I was thinking to > > > > > distribute > > > > > > the indexes into multiple indexes and consequently distributed > > > search. > > > > > Can > > > > > > anyone guide me to some sources (articles) that discuss this in > > Solr > > > > > Cloud? > > > > > > > > > > > > Appreciate your feedback regarding this. > > > > > > > > > > > > Regards, > > > > > > Salman > > > > > > > > > > > > > > > > > > > > >
Re: Solr Cloud and Multiple Indexes
the Xms and Xmx you are allocating to Solr and how much max > is > > > used > > > > by your solr? > > > > > > > > I doubt this as the slowness was happening for a long period of time. > > > > I mentioned this point as I have seen gc pauses of 30 seconds and > more > > in > > > > some complex queries. > > > > > > > > I am facing delay of 2-3 seconds but previously I > > > > had delays of around 28 seconds. > > > > Is this after you moved to solrcloud? > > > > > > > > Regards, > > > > Modassar > > > > > > > > > > > > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari < > salman.rah...@gmail.com > > > > > > > wrote: > > > > > > > > > Here is the current info > > > > > > > > > > How much memory is used? > > > > > Physical memory consumption: 5.48 GB out of 14 GB. > > > > > Swap space consumption: 5.83 GB out of 15.94 GB. > > > > > JVM-Memory consumption: 1.58 GB out of 3.83 GB. > > > > > > > > > > What is your index size? > > > > > I have around 70M documents distributed on 2 shards (so each shard > > has > > > > 35M > > > > > document) > > > > > > > > > > What type of queries are slow? > > > > > I am running normal queries (queries on a field) no faceting or > > > > highlights > > > > > are requested. Currently, I am facing delay of 2-3 seconds but > > > > previously I > > > > > had delays of around 28 seconds. > > > > > > > > > > Are there GC pauses as they can be a cause of slowness? > > > > > I doubt this as the slowness was happening for a long period of > time. > > > > > > > > > > Are document updates/additions happening in parallel? > > > > > No, I have stopped adding/updating documents and doing queries > only. > > > > > > > > > > This is what you are already doing. Did you mean that you want to > add > > > > more > > > > > shards? > > > > > No, what I meant is that I read that previously there was a way to > > > chunk > > > > a > > > > > large index into multiple and then do distributed search on that as > > in > > > > this > > > > > article https://wiki.apache.org/solr/DistributedSearch. What I was > > > > looking > > > > > for how this is handled in Solr Cloud? > > > > > > > > > > > > > > > Regards, > > > > > Salman > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather < > > > modather1...@gmail.com> > > > > > wrote: > > > > > > > > > > > What is your index size? How much memory is used? What type of > > > queries > > > > > are > > > > > > slow? > > > > > > Are there GC pauses as they can be a cause of slowness? > > > > > > Are document updates/additions happening in parallel? > > > > > > > > > > > > The queries are very slow to run so I was thinking to distribute > > > > > > the indexes into multiple indexes and consequently distributed > > > search. > > > > > Can > > > > > > anyone guide me to some sources (articles) that discuss this in > > Solr > > > > > Cloud? > > > > > > > > > > > > This is what you are already doing. Did you mean that you want to > > add > > > > > more > > > > > > shards? > > > > > > > > > > > > Regards, > > > > > > Modassar > > > > > > > > > > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari < > > > salman.rah...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I am using Solr cloud and I have created a single index that > host > > > > > around > > > > > > > 70M documents distributed into 2 shards (each having 35M > > documents) > > > > > and 2 > > > > > > > replicas. The queries are very slow to run so I was thinking to > > > > > > distribute > > > > > > > the indexes into multiple indexes and consequently distributed > > > > search. > > > > > > Can > > > > > > > anyone guide me to some sources (articles) that discuss this in > > > Solr > > > > > > Cloud? > > > > > > > > > > > > > > Appreciate your feedback regarding this. > > > > > > > > > > > > > > Regards, > > > > > > > Salman > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Solr Cloud and Multiple Indexes
Hi, I am using Solr cloud and I have created a single index that host around 70M documents distributed into 2 shards (each having 35M documents) and 2 replicas. The queries are very slow to run so I was thinking to distribute the indexes into multiple indexes and consequently distributed search. Can anyone guide me to some sources (articles) that discuss this in Solr Cloud? Appreciate your feedback regarding this. Regards, Salman
Re: Solr Cloud and Multiple Indexes
What is your index size? How much memory is used? What type of queries are slow? Are there GC pauses as they can be a cause of slowness? Are document updates/additions happening in parallel? The queries are very slow to run so I was thinking to distribute the indexes into multiple indexes and consequently distributed search. Can anyone guide me to some sources (articles) that discuss this in Solr Cloud? This is what you are already doing. Did you mean that you want to add more shards? Regards, Modassar On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <salman.rah...@gmail.com> wrote: > Hi, > > I am using Solr cloud and I have created a single index that host around > 70M documents distributed into 2 shards (each having 35M documents) and 2 > replicas. The queries are very slow to run so I was thinking to distribute > the indexes into multiple indexes and consequently distributed search. Can > anyone guide me to some sources (articles) that discuss this in Solr Cloud? > > Appreciate your feedback regarding this. > > Regards, > Salman >
Re: Solr Cloud and Multiple Indexes
Here is the current info How much memory is used? Physical memory consumption: 5.48 GB out of 14 GB. Swap space consumption: 5.83 GB out of 15.94 GB. JVM-Memory consumption: 1.58 GB out of 3.83 GB. What is your index size? I have around 70M documents distributed on 2 shards (so each shard has 35M document) What type of queries are slow? I am running normal queries (queries on a field) no faceting or highlights are requested. Currently, I am facing delay of 2-3 seconds but previously I had delays of around 28 seconds. Are there GC pauses as they can be a cause of slowness? I doubt this as the slowness was happening for a long period of time. Are document updates/additions happening in parallel? No, I have stopped adding/updating documents and doing queries only. This is what you are already doing. Did you mean that you want to add more shards? No, what I meant is that I read that previously there was a way to chunk a large index into multiple and then do distributed search on that as in this article https://wiki.apache.org/solr/DistributedSearch. What I was looking for how this is handled in Solr Cloud? Regards, Salman On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <modather1...@gmail.com> wrote: > What is your index size? How much memory is used? What type of queries are > slow? > Are there GC pauses as they can be a cause of slowness? > Are document updates/additions happening in parallel? > > The queries are very slow to run so I was thinking to distribute > the indexes into multiple indexes and consequently distributed search. Can > anyone guide me to some sources (articles) that discuss this in Solr Cloud? > > This is what you are already doing. Did you mean that you want to add more > shards? > > Regards, > Modassar > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <salman.rah...@gmail.com> > wrote: > > > Hi, > > > > I am using Solr cloud and I have created a single index that host around > > 70M documents distributed into 2 shards (each having 35M documents) and 2 > > replicas. The queries are very slow to run so I was thinking to > distribute > > the indexes into multiple indexes and consequently distributed search. > Can > > anyone guide me to some sources (articles) that discuss this in Solr > Cloud? > > > > Appreciate your feedback regarding this. > > > > Regards, > > Salman > > >
Re: Solr Cloud and Multiple Indexes
Thanks for your response. I have already gone through those documents before. My point was that if I am using Solr Cloud the only way to distribute my indexes is by adding shards? and I don't have to do anything manually (because all the distributed search is handled by Solr Cloud). What is the Xms and Xmx you are allocating to Solr and how much max is used by your solr? Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB How many segments are there in the index? The more the segment the slower is the search. How do I check how many segments are there in the index? Is this after you moved to solrcloud? I have been using SolrCloud from the beginning. Regards, Salman On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather <modather1...@gmail.com> wrote: > SolrCloud makes the distributed search easier. You can find details about > it under following link. > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works > > You can also refer to following link: > > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud > > From size of your index I meant index size and not the total document > alone. > How many segments are there in the index? The more the segment the slower > is the search. > What is the Xms and Xmx you are allocating to Solr and how much max is used > by your solr? > > I doubt this as the slowness was happening for a long period of time. > I mentioned this point as I have seen gc pauses of 30 seconds and more in > some complex queries. > > I am facing delay of 2-3 seconds but previously I > had delays of around 28 seconds. > Is this after you moved to solrcloud? > > Regards, > Modassar > > > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari <salman.rah...@gmail.com> > wrote: > > > Here is the current info > > > > How much memory is used? > > Physical memory consumption: 5.48 GB out of 14 GB. > > Swap space consumption: 5.83 GB out of 15.94 GB. > > JVM-Memory consumption: 1.58 GB out of 3.83 GB. > > > > What is your index size? > > I have around 70M documents distributed on 2 shards (so each shard has > 35M > > document) > > > > What type of queries are slow? > > I am running normal queries (queries on a field) no faceting or > highlights > > are requested. Currently, I am facing delay of 2-3 seconds but > previously I > > had delays of around 28 seconds. > > > > Are there GC pauses as they can be a cause of slowness? > > I doubt this as the slowness was happening for a long period of time. > > > > Are document updates/additions happening in parallel? > > No, I have stopped adding/updating documents and doing queries only. > > > > This is what you are already doing. Did you mean that you want to add > more > > shards? > > No, what I meant is that I read that previously there was a way to chunk > a > > large index into multiple and then do distributed search on that as in > this > > article https://wiki.apache.org/solr/DistributedSearch. What I was > looking > > for how this is handled in Solr Cloud? > > > > > > Regards, > > Salman > > > > > > > > > > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <modather1...@gmail.com> > > wrote: > > > > > What is your index size? How much memory is used? What type of queries > > are > > > slow? > > > Are there GC pauses as they can be a cause of slowness? > > > Are document updates/additions happening in parallel? > > > > > > The queries are very slow to run so I was thinking to distribute > > > the indexes into multiple indexes and consequently distributed search. > > Can > > > anyone guide me to some sources (articles) that discuss this in Solr > > Cloud? > > > > > > This is what you are already doing. Did you mean that you want to add > > more > > > shards? > > > > > > Regards, > > > Modassar > > > > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <salman.rah...@gmail.com > > > > > wrote: > > > > > > > Hi, > > > > > > > > I am using Solr cloud and I have created a single index that host > > around > > > > 70M documents distributed into 2 shards (each having 35M documents) > > and 2 > > > > replicas. The queries are very slow to run so I was thinking to > > > distribute > > > > the indexes into multiple indexes and consequently distributed > search. > > > Can > > > > anyone guide me to some sources (articles) that discuss this in Solr > > > Cloud? > > > > > > > > Appreciate your feedback regarding this. > > > > > > > > Regards, > > > > Salman > > > > > > > > > >
Re: Solr Cloud and Multiple Indexes
SolrCloud makes the distributed search easier. You can find details about it under following link. https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works You can also refer to following link: https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud >From size of your index I meant index size and not the total document alone. How many segments are there in the index? The more the segment the slower is the search. What is the Xms and Xmx you are allocating to Solr and how much max is used by your solr? I doubt this as the slowness was happening for a long period of time. I mentioned this point as I have seen gc pauses of 30 seconds and more in some complex queries. I am facing delay of 2-3 seconds but previously I had delays of around 28 seconds. Is this after you moved to solrcloud? Regards, Modassar On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari <salman.rah...@gmail.com> wrote: > Here is the current info > > How much memory is used? > Physical memory consumption: 5.48 GB out of 14 GB. > Swap space consumption: 5.83 GB out of 15.94 GB. > JVM-Memory consumption: 1.58 GB out of 3.83 GB. > > What is your index size? > I have around 70M documents distributed on 2 shards (so each shard has 35M > document) > > What type of queries are slow? > I am running normal queries (queries on a field) no faceting or highlights > are requested. Currently, I am facing delay of 2-3 seconds but previously I > had delays of around 28 seconds. > > Are there GC pauses as they can be a cause of slowness? > I doubt this as the slowness was happening for a long period of time. > > Are document updates/additions happening in parallel? > No, I have stopped adding/updating documents and doing queries only. > > This is what you are already doing. Did you mean that you want to add more > shards? > No, what I meant is that I read that previously there was a way to chunk a > large index into multiple and then do distributed search on that as in this > article https://wiki.apache.org/solr/DistributedSearch. What I was looking > for how this is handled in Solr Cloud? > > > Regards, > Salman > > > > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <modather1...@gmail.com> > wrote: > > > What is your index size? How much memory is used? What type of queries > are > > slow? > > Are there GC pauses as they can be a cause of slowness? > > Are document updates/additions happening in parallel? > > > > The queries are very slow to run so I was thinking to distribute > > the indexes into multiple indexes and consequently distributed search. > Can > > anyone guide me to some sources (articles) that discuss this in Solr > Cloud? > > > > This is what you are already doing. Did you mean that you want to add > more > > shards? > > > > Regards, > > Modassar > > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <salman.rah...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I am using Solr cloud and I have created a single index that host > around > > > 70M documents distributed into 2 shards (each having 35M documents) > and 2 > > > replicas. The queries are very slow to run so I was thinking to > > distribute > > > the indexes into multiple indexes and consequently distributed search. > > Can > > > anyone guide me to some sources (articles) that discuss this in Solr > > Cloud? > > > > > > Appreciate your feedback regarding this. > > > > > > Regards, > > > Salman > > > > > >
Re: Solr Cloud and Multiple Indexes
Thanks for your response. I have already gone through those documents before. My point was that if I am using Solr Cloud the only way to distribute my indexes is by adding shards? and I don't have to do anything manually (because all the distributed search is handled by Solr Cloud). Yes as per my knowledge. How do I check how many segments are there in the index? You can see into the index folder manually. Which version of solr are you using? I don't remember exactly the start version but in the latest and Solr-5.2.1 there is a "Segments info" link available where you can see number of segments. Regards, Modassar On Thu, Nov 5, 2015 at 5:41 PM, Salman Ansari <salman.rah...@gmail.com> wrote: > Thanks for your response. I have already gone through those documents > before. My point was that if I am using Solr Cloud the only way to > distribute my indexes is by adding shards? and I don't have to do anything > manually (because all the distributed search is handled by Solr Cloud). > > What is the Xms and Xmx you are allocating to Solr and how much max is > used by > your solr? > Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB > > How many segments are there in the index? The more the segment the slower > is > the search. > How do I check how many segments are there in the index? > > Is this after you moved to solrcloud? > I have been using SolrCloud from the beginning. > > Regards, > Salman > > > On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather <modather1...@gmail.com> > wrote: > > > SolrCloud makes the distributed search easier. You can find details about > > it under following link. > > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works > > > > You can also refer to following link: > > > > > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud > > > > From size of your index I meant index size and not the total document > > alone. > > How many segments are there in the index? The more the segment the slower > > is the search. > > What is the Xms and Xmx you are allocating to Solr and how much max is > used > > by your solr? > > > > I doubt this as the slowness was happening for a long period of time. > > I mentioned this point as I have seen gc pauses of 30 seconds and more in > > some complex queries. > > > > I am facing delay of 2-3 seconds but previously I > > had delays of around 28 seconds. > > Is this after you moved to solrcloud? > > > > Regards, > > Modassar > > > > > > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari <salman.rah...@gmail.com> > > wrote: > > > > > Here is the current info > > > > > > How much memory is used? > > > Physical memory consumption: 5.48 GB out of 14 GB. > > > Swap space consumption: 5.83 GB out of 15.94 GB. > > > JVM-Memory consumption: 1.58 GB out of 3.83 GB. > > > > > > What is your index size? > > > I have around 70M documents distributed on 2 shards (so each shard has > > 35M > > > document) > > > > > > What type of queries are slow? > > > I am running normal queries (queries on a field) no faceting or > > highlights > > > are requested. Currently, I am facing delay of 2-3 seconds but > > previously I > > > had delays of around 28 seconds. > > > > > > Are there GC pauses as they can be a cause of slowness? > > > I doubt this as the slowness was happening for a long period of time. > > > > > > Are document updates/additions happening in parallel? > > > No, I have stopped adding/updating documents and doing queries only. > > > > > > This is what you are already doing. Did you mean that you want to add > > more > > > shards? > > > No, what I meant is that I read that previously there was a way to > chunk > > a > > > large index into multiple and then do distributed search on that as in > > this > > > article https://wiki.apache.org/solr/DistributedSearch. What I was > > looking > > > for how this is handled in Solr Cloud? > > > > > > > > > Regards, > > > Salman > > > > > > > > > > > > > > > > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather < > modather1...@gmail.com> > > > wrote: > > > > > > > What is your index size? How much memory is used? What type of > queries > > > are > > > > slow? > > > > Are there GC pauses as they can be a cause of slowness? &g
RE: using SolrJ with SolrCloud, searching multiple indexes.
Yeah sorry didn't explain myself there, one of the three zookeepers will return me one of the solrcloud machines for me to access the index. I either need to know which machine it returned(is this feasible I can't seem to find a way to access information in SolrCloudServer) and then add the extra indexes as shards String shards = solrCloudMachine+:8080/indexB,+solrCloudMachine+:8080/indexC (solrQuery.add(shards,shards);) or do it in a new way within solrcloud. FYI My returned index is one of seven indexes under one webapp (solr_search) I want to stitch on the other six indexes so I can search all of the data (each index is updated from separate feeds). Thanks for your quick reply. Russ. From: Furkan KAMACI [furkankam...@gmail.com] Sent: 21 March 2014 22:55 To: solr-user@lucene.apache.org Subject: Re: using SolrJ with SolrCloud, searching multiple indexes. Hi Russell; You say that: | CloudSolrServer server = new CloudSolrServer(solrServer1: 2111,solrServer2:2111,solrServer2:2111); but I should mention that they are not Solr Servers that is passed into a CloudSolrServer. They are zookeeper host:port pairs optionally includes a chroot parameter at the end. Thanks; Furkan KAMACI 2014-03-21 18:11 GMT+02:00 Russell Taylor russell.tay...@interactivedata.com: Hi, just started to move my SolrJ queries over to our SolrCloud environment and I want to know how to do a query where you combine multiple indexes. Previously I had a string called shards which links all the indexes together and adds them to the query. String shards = server:8080/solr_search/bonds,server:8080/solr_search/equities,etc which I add to my SolrQuery solrQuery.add(shards,shards); I can then search across many indexes. In SolrCloud we do this CloudSolrServer server = new CloudSolrServer(solrServer1:2111,solrServer2:2111,solrServer2:2111); and add the default collection server.setDefaultCollection(bonds); How do I add the other indexes to my query in CloudSolrServer? If it's as before solrQuery.add(shards,shards); how do I find out the address of the machine CloudSolrServer has chosen? Thanks Russ. *** This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful. *** *** This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful. ***
Re: using SolrJ with SolrCloud, searching multiple indexes.
On 3/22/2014 7:34 AM, Russell Taylor wrote: Yeah sorry didn't explain myself there, one of the three zookeepers will return me one of the solrcloud machines for me to access the index. I either need to know which machine it returned(is this feasible I can't seem to find a way to access information in SolrCloudServer) and then add the extra indexes as shards String shards = solrCloudMachine+:8080/indexB,+solrCloudMachine+:8080/indexC (solrQuery.add(shards,shards);) or do it in a new way within solrcloud. FYI My returned index is one of seven indexes under one webapp (solr_search) I want to stitch on the other six indexes so I can search all of the data (each index is updated from separate feeds). SolrCloud eliminates the need to use the shards parameter, so CloudSolrServer does not expose the actual Solr instances. You *can* use the shards parameter, but typically it is done differently than traditional Solr. CloudSolrServer thinks mostly in terms of collections, not cores. There is a setDefaultCollection method on the server object, or you can do solrQuery.set(collection,name). You can query certain shards of a collection or multiple collections, without ever knowing the host/port/core combinations: http://wiki.apache.org/solr/SolrCloud#Distributed_Requests There are also collection aliases on the server side, which let you access one or more real collections with a virtual collection name. Thanks, Shawn
using SolrJ with SolrCloud, searching multiple indexes.
Hi, just started to move my SolrJ queries over to our SolrCloud environment and I want to know how to do a query where you combine multiple indexes. Previously I had a string called shards which links all the indexes together and adds them to the query. String shards = server:8080/solr_search/bonds,server:8080/solr_search/equities,etc which I add to my SolrQuery solrQuery.add(shards,shards); I can then search across many indexes. In SolrCloud we do this CloudSolrServer server = new CloudSolrServer(solrServer1:2111,solrServer2:2111,solrServer2:2111); and add the default collection server.setDefaultCollection(bonds); How do I add the other indexes to my query in CloudSolrServer? If it's as before solrQuery.add(shards,shards); how do I find out the address of the machine CloudSolrServer has chosen? Thanks Russ. *** This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful. ***
Re: using SolrJ with SolrCloud, searching multiple indexes.
Hi Russell; You say that: | CloudSolrServer server = new CloudSolrServer(solrServer1: 2111,solrServer2:2111,solrServer2:2111); but I should mention that they are not Solr Servers that is passed into a CloudSolrServer. They are zookeeper host:port pairs optionally includes a chroot parameter at the end. Thanks; Furkan KAMACI 2014-03-21 18:11 GMT+02:00 Russell Taylor russell.tay...@interactivedata.com: Hi, just started to move my SolrJ queries over to our SolrCloud environment and I want to know how to do a query where you combine multiple indexes. Previously I had a string called shards which links all the indexes together and adds them to the query. String shards = server:8080/solr_search/bonds,server:8080/solr_search/equities,etc which I add to my SolrQuery solrQuery.add(shards,shards); I can then search across many indexes. In SolrCloud we do this CloudSolrServer server = new CloudSolrServer(solrServer1:2111,solrServer2:2111,solrServer2:2111); and add the default collection server.setDefaultCollection(bonds); How do I add the other indexes to my query in CloudSolrServer? If it's as before solrQuery.add(shards,shards); how do I find out the address of the machine CloudSolrServer has chosen? Thanks Russ. *** This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful. ***
Solr 4.4 with log4j and multiple indexes on tomcat 6
Hi, My problem is that all my indexes log to one log file but I want each index to log to their own log file. I'm using solr 4.4 and I've copied jcl-over-slf4j-1.6.6.jar, jul-to-slf4j-1.6.6.jar, log4j-1.2.16.jar, slf4j-api-1.6.6.jar and slf4j-log4j12-1.6.6.jar into my tomcats lib/ directory. I've added a logging.properties to each of my solr webapps in tomcat/webapps/solr_webapp/WEB-INF/classes/logging.properties but when tomcat starts up it picks the first logging.properties (I presume) file and then all indexes log to this file. My only solution is to add the 4 jar files to each of the solr webapps in their tomcat/webapps/solr_webapp/WEB-INF/lib directory and then the webapp will pick up it's logging.properties file. Does anyone have a solution where I don't need to add the 4 jars to each of the solr webapps but still get a log file per index. Thanks Russ. *** This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful. ***
Re: Solr 4.4 with log4j and multiple indexes on tomcat 6
Hi Russ, It's not really indexes that lit, but Solr running in Tomcat, so I don't think there's a way... Otis Solr ElasticSearch Support http://sematext.com/ On Oct 15, 2013 7:14 AM, Russell Taylor russell.tay...@interactivedata.com wrote: Hi, My problem is that all my indexes log to one log file but I want each index to log to their own log file. I'm using solr 4.4 and I've copied jcl-over-slf4j-1.6.6.jar, jul-to-slf4j-1.6.6.jar, log4j-1.2.16.jar, slf4j-api-1.6.6.jar and slf4j-log4j12-1.6.6.jar into my tomcats lib/ directory. I've added a logging.properties to each of my solr webapps in tomcat/webapps/solr_webapp/WEB-INF/classes/logging.properties but when tomcat starts up it picks the first logging.properties (I presume) file and then all indexes log to this file. My only solution is to add the 4 jar files to each of the solr webapps in their tomcat/webapps/solr_webapp/WEB-INF/lib directory and then the webapp will pick up it's logging.properties file. Does anyone have a solution where I don't need to add the 4 jars to each of the solr webapps but still get a log file per index. Thanks Russ. *** This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful. ***
Re: Solr 4.4 with log4j and multiple indexes on tomcat 6
On 10/15/2013 5:13 AM, Russell Taylor wrote: My problem is that all my indexes log to one log file but I want each index to log to their own log file. I'm using solr 4.4 and I've copied jcl-over-slf4j-1.6.6.jar, jul-to-slf4j-1.6.6.jar, log4j-1.2.16.jar, slf4j-api-1.6.6.jar and slf4j-log4j12-1.6.6.jar into my tomcats lib/ directory. I've added a logging.properties to each of my solr webapps in tomcat/webapps/solr_webapp/WEB-INF/classes/logging.properties but when tomcat starts up it picks the first logging.properties (I presume) file and then all indexes log to this file. My only solution is to add the 4 jar files to each of the solr webapps in their tomcat/webapps/solr_webapp/WEB-INF/lib directory and then the webapp will pick up it's logging.properties file. Although your solution might let you log each webapp to its own file, you are incurring extra overhead by running each index in its own full Solr install. One Solr install can handle thousands of separate indexes - we call them cores. Most of the important parts of Solr will log the core name with the request. I'm in the process of trying to improve that so *everything* will include the core name in the log. See SOLR-5277. When that work is complete, it may even be possible to get those logs in separate files by switching logging implementations or writing some a custom log4j appender. Side issue: logging.properties is the configuration file used by java.util.logging ... but the jar files you have mentioned will set Solr up to use log4j. The config file for that is typically log4j.properties or log4j.xml. Thanks, Shawn
Re: Need help with search in multiple indexes
On Wed, 2013-06-12 at 23:05 +0200, smanad wrote: Is this a limitation of solr/lucene, should I be considering using other option like using Elasticsearch (which is also based on lucene)? But I am sure search in multiple indexes is kind of a common problem. You try to treat separate sources as a single index and that is tricky. Assuming you need relevance ranking, the sources need to be homogeneous in order for the scores to be somewhat comparable. That seems not to be the case for you, so even if you align your schemas to get formal compatibility, your ranking will be shot with Solr. ElasticSearch has elaborate handling of this problem http://www.elasticsearch.org/guide/reference/api/search/search-type/ and seems to be a better fit for you in this regard. - Toke Eskildsen, State and University Library, Denmark
Need help with search in multiple indexes
Hi, I am thinking of using Solr to implement Search on our site. Here is my use case, 1. We will have multiple 4-5 indexes based on different data types/structures and data will be indexed into these by several processes, like cron, on demand, thru message queue applications, etc. 2. A single web service needs to search across all these indexes and return results. I am thinking of using Solr 4.2.1 or may be 4.3 with single instance - multicore setup. I read about distributed search and I believe I should be able to search across multiple indices using shards parameters. However in my case, all shards will be on same host/port but with different core name. Is my understanding correct? Or is there any better alternative to this approach? Please suggest. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with search in multiple indexes
Manasi, Everything hinges on these indexes having similar enough schema that they can be represented as a union of all the fields from each type, where most of the searched data is common to all types. If so, you have a few options for querying them all together... distributed search, creating one large index and adding a type field, etc. If, however, your data is heterogeneous enough that the schemas are not really comparable, you're probably stuck coordinating the results externally. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Wed, Jun 12, 2013 at 3:55 PM, smanad sma...@gmail.com wrote: Hi, I am thinking of using Solr to implement Search on our site. Here is my use case, 1. We will have multiple 4-5 indexes based on different data types/structures and data will be indexed into these by several processes, like cron, on demand, thru message queue applications, etc. 2. A single web service needs to search across all these indexes and return results. I am thinking of using Solr 4.2.1 or may be 4.3 with single instance - multicore setup. I read about distributed search and I believe I should be able to search across multiple indices using shards parameters. However in my case, all shards will be on same host/port but with different core name. Is my understanding correct? Or is there any better alternative to this approach? Please suggest. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with search in multiple indexes
Thanks for the reply Michael. In some cases schema is similar but not all of them. So lets go with assumption schema NOT being similar. I am not quite sure what you mean by you're probably stuck coordinating the results externally. Do you mean, searching in each index and then somehow merge results manually? will I still be able to use shards parameters? or no? Also, I was planning to use php library SolrClient. Do you see any downside? -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070049.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with search in multiple indexes
I am not quite sure what you mean by you're probably stuck coordinating the results externally. Do you mean, searching in each index and then somehow merge results manually? will I still be able to use shards parameters? or no? If your schemas don't match up, you can't use distributed search, so yes, manual merging. You can't use the shards parameter across indexes with incompatible schema. I'd strongly consider just including all the fields in a single schema and leaving them blank if they don't apply to a given type of data. Also, I was planning to use php library SolrClient. Do you see any downside? No, this works fine!
Re: Need help with search in multiple indexes
Is this a limitation of solr/lucene, should I be considering using other option like using Elasticsearch (which is also based on lucene)? But I am sure search in multiple indexes is kind of a common problem. Also, i as reading this post http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set in one of the comments it says, So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with fields documentId,fieldC,fieldD. Then I create another core, lets say Core3 with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be importing data into this core? And then create a query handler, that includes the shard parameter. So when I query Core3, it will never really contain indexed data, but because of the shard searching it will fetch the results from the other to cores, and present it on the 3rd core? Thanks for the help! Is that what I should be doing? So all the indexing still happens in separate cores but searching happens in a one single core? -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with search in multiple indexes
I had not heard of that technique before. Interesting! But couldn't you do the same thing with a unified schema spread among your cores? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Wed, Jun 12, 2013 at 5:05 PM, smanad sma...@gmail.com wrote: Is this a limitation of solr/lucene, should I be considering using other option like using Elasticsearch (which is also based on lucene)? But I am sure search in multiple indexes is kind of a common problem. Also, i as reading this post http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set in one of the comments it says, So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with fields documentId,fieldC,fieldD. Then I create another core, lets say Core3 with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be importing data into this core? And then create a query handler, that includes the shard parameter. So when I query Core3, it will never really contain indexed data, but because of the shard searching it will fetch the results from the other to cores, and present it on the 3rd core? Thanks for the help! Is that what I should be doing? So all the indexing still happens in separate cores but searching happens in a one single core? -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with search in multiple indexes
In my case, different teams will be updating indexes at different intervals so having separate cores gives more control. However, I can still update(add/edit/delete) data with conditions like check for doc type. Its just that, using shards sounds much cleaner and readable. However, I am not yet sure if there might be any performance issues. -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070061.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with search in multiple indexes
Michael's point was that the schema need to be compatible. I mean, if you query fields A, B, C, and D, and index1 has fields A and B, while index2 has fields C and D, and index3 has fields E and F, what kind of results do you think you will get back?! Whether the schemas must be identical is not absolutely clear, but they at least have to include all the fields that queries will use. And... key values need to be unique across indexes. Yes, Solr CAN do it. But to imagine that it would give reasonable query results with no coordination between the developers of the separate indexes is a little too much. The bottom line: Somebody needs to coordinate the development of the schemas for the separate indexes so that they will be compatible from a query term and key value perspective, at a minimum. -- Jack Krupansky -Original Message- From: smanad Sent: Wednesday, June 12, 2013 5:05 PM To: solr-user@lucene.apache.org Subject: Re: Need help with search in multiple indexes Is this a limitation of solr/lucene, should I be considering using other option like using Elasticsearch (which is also based on lucene)? But I am sure search in multiple indexes is kind of a common problem. Also, i as reading this post http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set in one of the comments it says, So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with fields documentId,fieldC,fieldD. Then I create another core, lets say Core3 with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be importing data into this core? And then create a query handler, that includes the shard parameter. So when I query Core3, it will never really contain indexed data, but because of the shard searching it will fetch the results from the other to cores, and present it on the 3rd core? Thanks for the help! Is that what I should be doing? So all the indexing still happens in separate cores but searching happens in a one single core? -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multiple indexes?
This is very helpful. Thanks a lot, Shaun and Dikchant! So in default single-core situation, the index would live in data/index, correct? On Fri, Nov 30, 2012 at 11:02 PM, Shawn Heisey s...@elyograg.org wrote: On 11/30/2012 10:11 PM, Joe Zhang wrote: May I ask: how to set up multiple indexes, and specify which index to send the docs to at indexing time, and later on, how to specify which index to work with? A related question: what is the storage location and structure of solr indexes? When you index or query data, you'll use a base URL specific to the index (core). Everything goes through that base URL, which includes the name of the core: http://server:port/solr/**corename The file called solr.xml tells Solr about multiple cores.Each core has an instanceDir and a dataDir. http://wiki.apache.org/solr/**CoreAdminhttp://wiki.apache.org/solr/CoreAdmin In the dataDir, Solr will create an index dir, which contains the Lucene index. Here are the file formats for recent versions: http://lucene.apache.org/core/**4_0_0/core/org/apache/lucene/** codecs/lucene40/package-**summary.htmlhttp://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html http://lucene.apache.org/core/**3_6_1/fileformats.htmlhttp://lucene.apache.org/core/3_6_1/fileformats.html http://lucene.apache.org/core/**old_versioned_docs/versions/3_** 5_0/fileformats.htmlhttp://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html Thanks, Shawn
Re: multiple indexes?
Multiple indexes can be setup using the multi core feature of Solr. Below are the steps: 1. Add the core name and storage location of the core to the $SOLR_HOME/solr.xml file. cores adminPath=/admin/cores defaultCoreName=core-name1 *core name=core-name1 instanceDir=core-dir1 /* *core name=core-name2 instanceDir=core-dir2 /* /cores 2. Create the core-directories specified and following sub-directories in it: - conf: Contains the configs and schema definition - lib: Contains the required libraries - data: Will be created automatically on first run. This would contain the actual index. While indexing the docs, you specify the core name in the url as follows: http://host:port/solr/core-name/update?parameters Similarly you do while querying. Please refer to Solr Wiki, it has the complete details. Hope this helps! - Dikchant On Sat, Dec 1, 2012 at 10:41 AM, Joe Zhang smartag...@gmail.com wrote: May I ask: how to set up multiple indexes, and specify which index to send the docs to at indexing time, and later on, how to specify which index to work with? A related question: what is the storage location and structure of solr indexes? Thanks in advance, guys! Joe.
Re: multiple indexes?
On 11/30/2012 10:11 PM, Joe Zhang wrote: May I ask: how to set up multiple indexes, and specify which index to send the docs to at indexing time, and later on, how to specify which index to work with? A related question: what is the storage location and structure of solr indexes? When you index or query data, you'll use a base URL specific to the index (core). Everything goes through that base URL, which includes the name of the core: http://server:port/solr/corename The file called solr.xml tells Solr about multiple cores.Each core has an instanceDir and a dataDir. http://wiki.apache.org/solr/CoreAdmin In the dataDir, Solr will create an index dir, which contains the Lucene index. Here are the file formats for recent versions: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html http://lucene.apache.org/core/3_6_1/fileformats.html http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html Thanks, Shawn
Re: SOLR - To point multiple indexes in different folder
That should be OK. The recursive bit happens when you define the shard locations in your config files in the default search handler. On Fri, Nov 2, 2012 at 6:42 AM, ravi.n rav...@ornext.com wrote: Erick, We are forming request something like below for default /select request handler, will this cause an issue? So far we are not facing any recursive issues. http://94.101.147.150:8080/solr/select/?q=*%3A*version=2.2start=0rows=10indent=onshards=localhost:8080/solr/coll1,localhost:8080/solr/coll2,localhost:8080/solr/coll3,localhost:8080/solr/coll4,localhost:8080/solr/coll5,localhost:8080/solr/coll6,localhost:8080/solr/coll7 Below is the solrconfig for /select requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfrecordid/str /lst recordid - is the unique field in the document. Regards, Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640p4017783.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR - To point multiple indexes in different folder
Erick, We are forming request something like below for default /select request handler, will this cause an issue? So far we are not facing any recursive issues. http://94.101.147.150:8080/solr/select/?q=*%3A*version=2.2start=0rows=10indent=onshards=localhost:8080/solr/coll1,localhost:8080/solr/coll2,localhost:8080/solr/coll3,localhost:8080/solr/coll4,localhost:8080/solr/coll5,localhost:8080/solr/coll6,localhost:8080/solr/coll7 Below is the solrconfig for /select requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfrecordid/str /lst recordid - is the unique field in the document. Regards, Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640p4017783.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR - To point multiple indexes in different folder
Erick, Thanks for your response. All the 7 folders are of same schema, i mean document structure is same. I am not very sure how did customer get this data dump into different folders. Now we have configured Solr with multicore, each core pointing to each directory and using shards to get a single search response. Please suggest is this right approach. cores adminPath=/admin/cores sharedLib=lib defaultCoreName=coll1 core name=coll1 instanceDir=1 / core name=coll2 instanceDir=2 / core name=coll3 instanceDir=3 / core name=coll4 instanceDir=4 / core name=coll5 instanceDir=5 / core name=coll6 instanceDir=6 / core name=coll7 instanceDir=7 / /cores /solr And now we should also configure solr for indexing new data from CSV file, i am not sure how to configure this? Regards, Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640p4016946.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR - To point multiple indexes in different folder
Until you nail down what the user did, this is may cause problems. A sharded system assumes that the unique IDs uniqueKey in your schema exists on one and only one shard, otherwise you'll be getting multiple copies of the docs. And you've only shown a multi-core setup, NOT a sharded setup. You need to define a searchhandler in solrconfig.xml similar to a requestHandler and provide the shards as defaults. I don't have the reference close to hand, but you should be able to find it with some searching. Beware the recursion problem that you'll see referenced. Last I knew you can't configure your shards in the default search handler, since that's the one that gets the sub-requests for all your nodes Best Erick On Tue, Oct 30, 2012 at 5:01 AM, ravi.n rav...@ornext.com wrote: Erick, Thanks for your response. All the 7 folders are of same schema, i mean document structure is same. I am not very sure how did customer get this data dump into different folders. Now we have configured Solr with multicore, each core pointing to each directory and using shards to get a single search response. Please suggest is this right approach. cores adminPath=/admin/cores sharedLib=lib defaultCoreName=coll1 core name=coll1 instanceDir=1 / core name=coll2 instanceDir=2 / core name=coll3 instanceDir=3 / core name=coll4 instanceDir=4 / core name=coll5 instanceDir=5 / core name=coll6 instanceDir=6 / core name=coll7 instanceDir=7 / /cores /solr And now we should also configure solr for indexing new data from CSV file, i am not sure how to configure this? Regards, Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640p4016946.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR - To point multiple indexes in different folder
Hello Solr Gurus, I am newbie to solr application, below are my requirements: 1. We have 7 folders having indexed files, which SOLR application to be pointed. I understand shards feature can be used for searching. If there is any other alternative. Each folder has around 24 million documents. 2. We should configure solr for indexing new incoming data from database/SCV file, whtas is the required configuration in solr to achieve this? Any quick response on this will be appreciated. Thanks Regards, Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR - To point multiple indexes in different folder
How did you get the 7 directories anyway? From your message, they sound like they are _solr_ indexes, in which case you somehow created then with Solr. But I don't really understand the setup in that case. If these are Solr/Lucene indexes, you can use the multicore features. This treats them like separate indexes and you have to address each specifically, something like ...locahost/solr/collection2/select etc. Sharding, on the other hand, _assumes_ that all the indexes really make up one logical index and handles the distribution/collation automatically. If this makes no sense, could you explain your setup a little more? Best Erick On Mon, Oct 29, 2012 at 7:34 AM, ravi.n rav...@ornext.com wrote: Hello Solr Gurus, I am newbie to solr application, below are my requirements: 1. We have 7 folders having indexed files, which SOLR application to be pointed. I understand shards feature can be used for searching. If there is any other alternative. Each folder has around 24 million documents. 2. We should configure solr for indexing new incoming data from database/SCV file, whtas is the required configuration in solr to achieve this? Any quick response on this will be appreciated. Thanks Regards, Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640.html Sent from the Solr - User mailing list archive at Nabble.com.
Search over multiple indexes
Hello, I'm trying to implement automatic document classification and store the classified attributes as an additional field in Solr document. Then the search goes against that field like q=classified_category:xyz. The document classification is currently implemented as an UpdateRequestProcessor and works quite well. The only problem: for each change in the classification algorithm every document has to be re-indexed which, of course, makes tests and experimentation difficult and binds resources (other than Solr) for several hours. So, my idea would be to store classified attributes in a meta-index and search over the main and meta indexes simultaneously. For example: main index has got fields like color and meta index has got classified_category. The query q=classified_category:xyz AND color:black should be then split over the main and meta index. This way, the classification could run on Solr over the main index and store classified fields in the meta index so that only Solr resources are bound. Has anybody already done something like that? It's a little bit like sharding but different in that each shard would process its part of the query and live in the same Solr instance. Regards, Valeriy
Re: Three questions about: Commit, single index vs multiple indexes and implementation advice
Let's see... 1 Committing every second, even with commitWithin is probably going to be a problem. I usually think that 1 second latency is usually overkill, but that's up to your product manager. Look at the NRT (Near Real Time) stuff if you really need this. I thought that NRT was only on trunk, but it *might* be in the 3.4 code base. 2 Don't understand what a single index per entity is. How many cores do you have total? For not very many records, I'd put everything in a single index and use filterqueries to restrict views. 3 I guess this relates to 2. And I'd use a single core. If, for some reason, you decide that you need multiple indexes, use several cores with ONE Solr rather than start a new Solr per core, it's more resource expensive to have multiple JVMs around. Best Erick On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco comfortablynum...@gmail.com wrote: Hi guys! I have a couple of questions that I hope someone could help me with: 1) Recently I've implemented Solr in my app. My use case is not complicated. Suppose that there will be 50 concurrent users tops. This is an app like, let's say, a CRM. I tell you this so you have an idea in terms of how many read and write operations will be needed. What I do need is that the data that is added / updated be available right after it's added / updated (maybe a second later it's ok). I know that the commit operation is expensive, so maybe doing a commit right after each write operation is not a good idea. I'm trying to use the autoCommit feature with a maxTime of 1000ms, but then the question arised: Is this the best way to handle this type of situation? and if not, what should I do? 2) I'm using a single index per entity type because I've read that if the app is not handling lots of data (let's say, 1 million of records) then it's safe to use a single index. Is this true? if not, why? 3) Is it a problem if I use a simple setup of Solr using a single core for this use case? if not, what do you recommend? Any help in any of these topics would be greatly appreciated. Thanks in advance!
Re: Three questions about: Commit, single index vs multiple indexes and implementation advice
First of all, thanks a lot for your answer. 1) I could use 5 to 15 seconds between each commit and give it a try. Is this an acceptable configuration? I'll take a look at NRT. 2) Currently I'm using a single core, the simplest setup. I don't expect to have an overwhelming quantity of records, but I do have lots of classes to persist, and I need to search all of them at the same time, and not per class (entity). For now is working good. With multiple indexes I mean using an index for each entity. Let's say, an index for Articles, another for Users, etc. The thing is that I don't know when I should divide it and use one index for each entity (or if it's possible to make a UNION like search between every index). I've read that when an entity reaches the size of one million records then it's best to give it a dedicated index, even though I don't expect to have that size even with all my entities. But I wanted to know from you just to be sure. 3) Great! for now I think I'll stick with one index, but it's good to know that in case I need to change later for some reason. Again, thanks a lot for your help! 2011/11/4 Erick Erickson erickerick...@gmail.com Let's see... 1 Committing every second, even with commitWithin is probably going to be a problem. I usually think that 1 second latency is usually overkill, but that's up to your product manager. Look at the NRT (Near Real Time) stuff if you really need this. I thought that NRT was only on trunk, but it *might* be in the 3.4 code base. 2 Don't understand what a single index per entity is. How many cores do you have total? For not very many records, I'd put everything in a single index and use filterqueries to restrict views. 3 I guess this relates to 2. And I'd use a single core. If, for some reason, you decide that you need multiple indexes, use several cores with ONE Solr rather than start a new Solr per core, it's more resource expensive to have multiple JVMs around. Best Erick On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco comfortablynum...@gmail.com wrote: Hi guys! I have a couple of questions that I hope someone could help me with: 1) Recently I've implemented Solr in my app. My use case is not complicated. Suppose that there will be 50 concurrent users tops. This is an app like, let's say, a CRM. I tell you this so you have an idea in terms of how many read and write operations will be needed. What I do need is that the data that is added / updated be available right after it's added / updated (maybe a second later it's ok). I know that the commit operation is expensive, so maybe doing a commit right after each write operation is not a good idea. I'm trying to use the autoCommit feature with a maxTime of 1000ms, but then the question arised: Is this the best way to handle this type of situation? and if not, what should I do? 2) I'm using a single index per entity type because I've read that if the app is not handling lots of data (let's say, 1 million of records) then it's safe to use a single index. Is this true? if not, why? 3) Is it a problem if I use a simple setup of Solr using a single core for this use case? if not, what do you recommend? Any help in any of these topics would be greatly appreciated. Thanks in advance!
RE: Three questions about: Commit, single index vs multiple indexes and implementation advice
Gustavo - Even with the most basic requirements, I'd recommend setting up a multi-core configuration so you can RELOAD the main core you will be using when you make simple changes to config files. This is much cleaner than bouncing solr each time. There are other benefits to doing it, but this is the main reason I do it. Brian Date: Fri, 4 Nov 2011 15:34:27 -0300 Subject: Re: Three questions about: Commit, single index vs multiple indexes and implementation advice From: comfortablynum...@gmail.com To: solr-user@lucene.apache.org First of all, thanks a lot for your answer. 1) I could use 5 to 15 seconds between each commit and give it a try. Is this an acceptable configuration? I'll take a look at NRT. 2) Currently I'm using a single core, the simplest setup. I don't expect to have an overwhelming quantity of records, but I do have lots of classes to persist, and I need to search all of them at the same time, and not per class (entity). For now is working good. With multiple indexes I mean using an index for each entity. Let's say, an index for Articles, another for Users, etc. The thing is that I don't know when I should divide it and use one index for each entity (or if it's possible to make a UNION like search between every index). I've read that when an entity reaches the size of one million records then it's best to give it a dedicated index, even though I don't expect to have that size even with all my entities. But I wanted to know from you just to be sure. 3) Great! for now I think I'll stick with one index, but it's good to know that in case I need to change later for some reason. Again, thanks a lot for your help! 2011/11/4 Erick Erickson erickerick...@gmail.com Let's see... 1 Committing every second, even with commitWithin is probably going to be a problem. I usually think that 1 second latency is usually overkill, but that's up to your product manager. Look at the NRT (Near Real Time) stuff if you really need this. I thought that NRT was only on trunk, but it *might* be in the 3.4 code base. 2 Don't understand what a single index per entity is. How many cores do you have total? For not very many records, I'd put everything in a single index and use filterqueries to restrict views. 3 I guess this relates to 2. And I'd use a single core. If, for some reason, you decide that you need multiple indexes, use several cores with ONE Solr rather than start a new Solr per core, it's more resource expensive to have multiple JVMs around. Best Erick On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco comfortablynum...@gmail.com wrote: Hi guys! I have a couple of questions that I hope someone could help me with: 1) Recently I've implemented Solr in my app. My use case is not complicated. Suppose that there will be 50 concurrent users tops. This is an app like, let's say, a CRM. I tell you this so you have an idea in terms of how many read and write operations will be needed. What I do need is that the data that is added / updated be available right after it's added / updated (maybe a second later it's ok). I know that the commit operation is expensive, so maybe doing a commit right after each write operation is not a good idea. I'm trying to use the autoCommit feature with a maxTime of 1000ms, but then the question arised: Is this the best way to handle this type of situation? and if not, what should I do? 2) I'm using a single index per entity type because I've read that if the app is not handling lots of data (let's say, 1 million of records) then it's safe to use a single index. Is this true? if not, why? 3) Is it a problem if I use a simple setup of Solr using a single core for this use case? if not, what do you recommend? Any help in any of these topics would be greatly appreciated. Thanks in advance!
Re: Three questions about: Commit, single index vs multiple indexes and implementation advice
Hi Brian, I'll take a look at what you mentioned. I didn't think about that. I'll finish the implementation at the app level and then I'll read a little more about multi-core setups. Maybe I don't know yet all the benefits it has. Thanks a lot for your advice. 2011/11/4 Brian Gerby briange...@hotmail.com Gustavo - Even with the most basic requirements, I'd recommend setting up a multi-core configuration so you can RELOAD the main core you will be using when you make simple changes to config files. This is much cleaner than bouncing solr each time. There are other benefits to doing it, but this is the main reason I do it. Brian Date: Fri, 4 Nov 2011 15:34:27 -0300 Subject: Re: Three questions about: Commit, single index vs multiple indexes and implementation advice From: comfortablynum...@gmail.com To: solr-user@lucene.apache.org First of all, thanks a lot for your answer. 1) I could use 5 to 15 seconds between each commit and give it a try. Is this an acceptable configuration? I'll take a look at NRT. 2) Currently I'm using a single core, the simplest setup. I don't expect to have an overwhelming quantity of records, but I do have lots of classes to persist, and I need to search all of them at the same time, and not per class (entity). For now is working good. With multiple indexes I mean using an index for each entity. Let's say, an index for Articles, another for Users, etc. The thing is that I don't know when I should divide it and use one index for each entity (or if it's possible to make a UNION like search between every index). I've read that when an entity reaches the size of one million records then it's best to give it a dedicated index, even though I don't expect to have that size even with all my entities. But I wanted to know from you just to be sure. 3) Great! for now I think I'll stick with one index, but it's good to know that in case I need to change later for some reason. Again, thanks a lot for your help! 2011/11/4 Erick Erickson erickerick...@gmail.com Let's see... 1 Committing every second, even with commitWithin is probably going to be a problem. I usually think that 1 second latency is usually overkill, but that's up to your product manager. Look at the NRT (Near Real Time) stuff if you really need this. I thought that NRT was only on trunk, but it *might* be in the 3.4 code base. 2 Don't understand what a single index per entity is. How many cores do you have total? For not very many records, I'd put everything in a single index and use filterqueries to restrict views. 3 I guess this relates to 2. And I'd use a single core. If, for some reason, you decide that you need multiple indexes, use several cores with ONE Solr rather than start a new Solr per core, it's more resource expensive to have multiple JVMs around. Best Erick On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco comfortablynum...@gmail.com wrote: Hi guys! I have a couple of questions that I hope someone could help me with: 1) Recently I've implemented Solr in my app. My use case is not complicated. Suppose that there will be 50 concurrent users tops. This is an app like, let's say, a CRM. I tell you this so you have an idea in terms of how many read and write operations will be needed. What I do need is that the data that is added / updated be available right after it's added / updated (maybe a second later it's ok). I know that the commit operation is expensive, so maybe doing a commit right after each write operation is not a good idea. I'm trying to use the autoCommit feature with a maxTime of 1000ms, but then the question arised: Is this the best way to handle this type of situation? and if not, what should I do? 2) I'm using a single index per entity type because I've read that if the app is not handling lots of data (let's say, 1 million of records) then it's safe to use a single index. Is this true? if not, why? 3) Is it a problem if I use a simple setup of Solr using a single core for this use case? if not, what do you recommend? Any help in any of these topics would be greatly appreciated. Thanks in advance!
Three questions about: Commit, single index vs multiple indexes and implementation advice
Hi guys! I have a couple of questions that I hope someone could help me with: 1) Recently I've implemented Solr in my app. My use case is not complicated. Suppose that there will be 50 concurrent users tops. This is an app like, let's say, a CRM. I tell you this so you have an idea in terms of how many read and write operations will be needed. What I do need is that the data that is added / updated be available right after it's added / updated (maybe a second later it's ok). I know that the commit operation is expensive, so maybe doing a commit right after each write operation is not a good idea. I'm trying to use the autoCommit feature with a maxTime of 1000ms, but then the question arised: Is this the best way to handle this type of situation? and if not, what should I do? 2) I'm using a single index per entity type because I've read that if the app is not handling lots of data (let's say, 1 million of records) then it's safe to use a single index. Is this true? if not, why? 3) Is it a problem if I use a simple setup of Solr using a single core for this use case? if not, what do you recommend? Any help in any of these topics would be greatly appreciated. Thanks in advance!
Re: Multiple indexes
your data is being used to build an inverted index rather than being stored as a set of records. de-normalising is fine in most cases. what is your use case which requires a normalised set of indices ? 2011/6/18 François Schiettecatte fschietteca...@gmail.com: You would need to run two independent searches and then 'join' the results. It is best not to apply a 'sql' mindset to SOLR when it comes to (de)normalization, whereas you strive for normalization in sql, that is usually counter-productive in SOLR. For example, I am working on a project with 30+ normalized tables, but only 4 cores. Perhaps describing what you are trying to achieve would give us greater insight and thus be able to make more concrete recommendation? Cheers François On Jun 18, 2011, at 2:36 PM, shacky wrote: Il 18 giugno 2011 20:27, François Schiettecatte fschietteca...@gmail.com ha scritto: Sure. So I can have some searches similar to JOIN on MySQL? The problem is that I need at least two tables in which search data..
Re: Multiple indexes
2011/6/15 Edoardo Tosca e.to...@sourcesense.com: Try to use multiple cores: http://wiki.apache.org/solr/CoreAdmin Can I do concurrent searches on multiple cores?
Re: Multiple indexes
Sure. François On Jun 18, 2011, at 2:25 PM, shacky wrote: 2011/6/15 Edoardo Tosca e.to...@sourcesense.com: Try to use multiple cores: http://wiki.apache.org/solr/CoreAdmin Can I do concurrent searches on multiple cores?
Re: Multiple indexes
Il 18 giugno 2011 20:27, François Schiettecatte fschietteca...@gmail.com ha scritto: Sure. So I can have some searches similar to JOIN on MySQL? The problem is that I need at least two tables in which search data..
Re: Multiple indexes
You would need to run two independent searches and then 'join' the results. It is best not to apply a 'sql' mindset to SOLR when it comes to (de)normalization, whereas you strive for normalization in sql, that is usually counter-productive in SOLR. For example, I am working on a project with 30+ normalized tables, but only 4 cores. Perhaps describing what you are trying to achieve would give us greater insight and thus be able to make more concrete recommendation? Cheers François On Jun 18, 2011, at 2:36 PM, shacky wrote: Il 18 giugno 2011 20:27, François Schiettecatte fschietteca...@gmail.com ha scritto: Sure. So I can have some searches similar to JOIN on MySQL? The problem is that I need at least two tables in which search data..
RE: Multiple indexes
I think there are reasons to use seperate indexes for each document type but do combined searches on these indexes (for example if you need separate TFs for each document type). I wonder if in this precise case it wouldn't be pertinent to have a single index with the various document types each having each their own fields set. Isn't TF calculated field by field ?
RE: Multiple indexes
(for example if you need separate TFs for each document type). I wonder if in this precise case it wouldn't be pertinent to have a single index with the various document types each having each their own fields set. Isn't TF calculated field by field ? Oh, you are right :) So i will start testing with one mixed type index and perhaps use IndexReaderFactory afterwards in comparison. Thanks, Kai Gülzau
RE: Multiple indexes
Are there any plans to support a kind of federated search in a future solr version? I think there are reasons to use seperate indexes for each document type but do combined searches on these indexes (for example if you need separate TFs for each document type). I am aware of http://wiki.apache.org/solr/DistributedSearch and a workaround to do federated search with sharding http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set but this seems to be too much network- and maintenance overhead. Perhaps it is worth a try to use an IndexReaderFactory which returns a lucene MultiReader!? Is the IndexReaderFactory still Experimental? https://issues.apache.org/jira/browse/SOLR-1366 Regards, Kai Gülzau -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, June 15, 2011 8:43 PM To: solr-user@lucene.apache.org Subject: Re: Multiple indexes Next, however, I predict you're going to ask how you do a 'join' or otherwise query accross both these cores at once though. You can't do that in Solr. On 6/15/2011 1:00 PM, Frank Wesemann wrote: You'll configure multiple cores: http://wiki.apache.org/solr/CoreAdmin Hi. How to have multiple indexes in SOLR, with different fields and different types of data? Thank you very much! Bye.
Multiple indexes
Hi. How to have multiple indexes in SOLR, with different fields and different types of data? Thank you very much! Bye.
Re: Multiple indexes
Try to use multiple cores: http://wiki.apache.org/solr/CoreAdmin On Wed, Jun 15, 2011 at 5:55 PM, shacky shack...@gmail.com wrote: Hi. How to have multiple indexes in SOLR, with different fields and different types of data? Thank you very much! Bye. -- Edoardo Tosca Sourcesense - making sense of Open Source: http://www.sourcesense.com
Re: Multiple indexes
You'll configure multiple cores: http://wiki.apache.org/solr/CoreAdmin Hi. How to have multiple indexes in SOLR, with different fields and different types of data? Thank you very much! Bye. -- mit freundlichem Gruß, Frank Wesemann Fotofinder GmbH USt-IdNr. DE812854514 Software EntwicklungWeb: http://www.fotofinder.com/ Potsdamer Str. 96 Tel: +49 30 25 79 28 90 10785 BerlinFax: +49 30 25 79 28 999 Sitz: Berlin Amtsgericht Berlin Charlottenburg (HRB 73099) Geschäftsführer: Ali Paczensky
Re: Multiple indexes
Next, however, I predict you're going to ask how you do a 'join' or otherwise query accross both these cores at once though. You can't do that in Solr. On 6/15/2011 1:00 PM, Frank Wesemann wrote: You'll configure multiple cores: http://wiki.apache.org/solr/CoreAdmin Hi. How to have multiple indexes in SOLR, with different fields and different types of data? Thank you very much! Bye.
Re: Multiple indexes inside a single core
Here's the Jira issue for the distributed search issue. https://issues.apache.org/jira/browse/SOLR-1632 I tried applying this patch but, get the same error that is posted in the discussion section for that issue. I will be glad to help too on this one. On Sat, Oct 23, 2010 at 2:35 PM, Erick Erickson erickerick...@gmail.comwrote: Ah, I should have read more carefully... I remember this being discussed on the dev list, and I thought there might be a Jira attached but I sure can't find it. If you're willing to work on it, you might hop over to the solr dev list and start a discussion, maybe ask for a place to start. I'm sure some of the devs have thought about this... If nobody on the dev list says There's already a JIRA on it, then you should open one. The Jira issues are generally preferred when you start getting into design because the comments are preserved for the next person who tries the idea or makes changes, etc Best Erick On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess ben.bogg...@gmail.com wrote: Thanks Erick. The problem with multiple cores is that the documents are scored independently in each core. I would like to be able to search across both cores and have the scores 'normalized' in a way that's similar to what Lucene's MultiSearcher would do. As far a I understand, multiple cores would likely result in seriously skewed scores in my case since the documents are not distributed evenly or randomly. I could have one core/index with 20 million docs and another with 200. I've poked around in the code and this feature doesn't seem to exist. I would be happy with finding a decent place to try to add it. I'm not sure if there is a clean place for it. Ben On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: It seems to me that multiple cores are along the lines you need, a single instance of Solr that can search across multiple sub-indexes that do not necessarily share schemas, and are independently maintainable.. This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin HTH Erick On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote: We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: Multiple indexes inside a single core
Ah, I should have read more carefully... I remember this being discussed on the dev list, and I thought there might be a Jira attached but I sure can't find it. If you're willing to work on it, you might hop over to the solr dev list and start a discussion, maybe ask for a place to start. I'm sure some of the devs have thought about this... If nobody on the dev list says There's already a JIRA on it, then you should open one. The Jira issues are generally preferred when you start getting into design because the comments are preserved for the next person who tries the idea or makes changes, etc Best Erick On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess ben.bogg...@gmail.com wrote: Thanks Erick. The problem with multiple cores is that the documents are scored independently in each core. I would like to be able to search across both cores and have the scores 'normalized' in a way that's similar to what Lucene's MultiSearcher would do. As far a I understand, multiple cores would likely result in seriously skewed scores in my case since the documents are not distributed evenly or randomly. I could have one core/index with 20 million docs and another with 200. I've poked around in the code and this feature doesn't seem to exist. I would be happy with finding a decent place to try to add it. I'm not sure if there is a clean place for it. Ben On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: It seems to me that multiple cores are along the lines you need, a single instance of Solr that can search across multiple sub-indexes that do not necessarily share schemas, and are independently maintainable.. This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin HTH Erick On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote: We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Multiple indexes inside a single core
We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: Multiple indexes inside a single core
It seems to me that multiple cores are along the lines you need, a single instance of Solr that can search across multiple sub-indexes that do not necessarily share schemas, and are independently maintainable.. This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin HTH Erick On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote: We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: Multiple indexes inside a single core
Thanks Erick. The problem with multiple cores is that the documents are scored independently in each core. I would like to be able to search across both cores and have the scores 'normalized' in a way that's similar to what Lucene's MultiSearcher would do. As far a I understand, multiple cores would likely result in seriously skewed scores in my case since the documents are not distributed evenly or randomly. I could have one core/index with 20 million docs and another with 200. I've poked around in the code and this feature doesn't seem to exist. I would be happy with finding a decent place to try to add it. I'm not sure if there is a clean place for it. Ben On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: It seems to me that multiple cores are along the lines you need, a single instance of Solr that can search across multiple sub-indexes that do not necessarily share schemas, and are independently maintainable.. This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin HTH Erick On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote: We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: Multiple Indexes and relevance ranking question
The score of a document has no scale: it only has meaning against other score in the same query. Solr does not rank these documents correctly. Without sharing the TF/DF information across the shards, it cannot. If the shards each have a lot of the same kind of document, this problem averages out. That is, the statistical fingerprint across the shards is similar enough that each index gives the same numerical range. Yes, this is hand-wavey, and we don't have a measuring tool that verifies this assertion. Lance Valli Indraganti wrote: I an new to Solr and the search technologies. I am playing around with multiple indexes. I configured Solr for Tomcat, created two tomcat fragments so that two solr webapps listen on port 8080 in tomcat. I have created two separate indexes using each webapp successfully. My documents are very primitive. Below is the structure. I have four such documents with different doc id and increased number of the word Hello corresponding to the name of the document (this is only to make my analysis of the results easier). Documents One and two are in shar1 and three and four are in shard 2. obviously, document two is ranked higher when queried against that index (for the word Hello). And document four is ranked higher when queried against second index. When using the shards, parameter, the scores remain unaltered. My question is, if the distributed search does not consider IDF, how is it able to rank these documents correctly? Or do I not have the indexes truely distributed? Is something wrong with my term distribution? add -# doc field name=*id*Valli1/field field name=*name*One/field field name=*text*Hello!This is a test document testing relevancy scores./field /doc /add
Multiple Indexes and relevance ranking question
I an new to Solr and the search technologies. I am playing around with multiple indexes. I configured Solr for Tomcat, created two tomcat fragments so that two solr webapps listen on port 8080 in tomcat. I have created two separate indexes using each webapp successfully. My documents are very primitive. Below is the structure. I have four such documents with different doc id and increased number of the word Hello corresponding to the name of the document (this is only to make my analysis of the results easier). Documents One and two are in shar1 and three and four are in shard 2. obviously, document two is ranked higher when queried against that index (for the word Hello). And document four is ranked higher when queried against second index. When using the shards, parameter, the scores remain unaltered. My question is, if the distributed search does not consider IDF, how is it able to rank these documents correctly? Or do I not have the indexes truely distributed? Is something wrong with my term distribution? add - # doc field name=*id*Valli1/field field name=*name*One/field field name=*text*Hello!This is a test document testing relevancy scores./field /doc /add
How to set up multiple indexes?
I installed Solr according to the tutorial. My schema.xml solrconfig.xml is in ~/apache-solr-1.4.1/example/solr/conf Everything so far is just like that in the tutorial. But I want to set up a 2nd index (separate from the main index) just for the purpose of auto-complete. I understand that I need to set up multicore for this. But I'm not sure how to do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am still pretty confused. - where do I put the 2nd index? - do I need separate schema.xml solrconfig.xml for the 2nd index? Where do I put them? - how do I tell solr which index do I want a document to go to? - how do I tell solr which index do I want to query against? - any step-by-step instruction on setting up multicore? Thanks. Andy
Re: How to set up multiple indexes?
Hi Andy! I configured this a few days ago, and found a good resource -- http://wiki.apache.org/solr/MultipleIndexes That page has links that will give you the instructions for setting up Tomcat, Jetty and Resin. I used the Tomcat ones the other day, and it gave me everything that I needed to get it up and running. You basically just need to create a new directory to contain the second instance, then create a context file for it in the TOMCAT_HOME/conf/Catalina/localhost directory. Good luck! -- Chris On Wed, Sep 29, 2010 at 10:41 AM, Andy angelf...@yahoo.com wrote: I installed Solr according to the tutorial. My schema.xml solrconfig.xml is in ~/apache-solr-1.4.1/example/solr/conf Everything so far is just like that in the tutorial. But I want to set up a 2nd index (separate from the main index) just for the purpose of auto-complete. I understand that I need to set up multicore for this. But I'm not sure how to do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am still pretty confused. - where do I put the 2nd index? - do I need separate schema.xml solrconfig.xml for the 2nd index? Where do I put them? - how do I tell solr which index do I want a document to go to? - how do I tell solr which index do I want to query against? - any step-by-step instruction on setting up multicore? Thanks. Andy
Re: How to set up multiple indexes?
Check http://doc.ez.no/Extensions/eZ-Find/2.2/Advanced-Configuration/Using-multi-core-features It's for eZ-Find, but it's the basic setup for multiple cores in any environment. We have cores designed like so: solr/sfx/ solr/forum/ solr/mail/ solr/news/ solr/tracker/ each of those core directories has its own conf/ with schema.xml and solrconfig.xml. then solr/solr.xml looks like: cores adminPath=/admin/cores core name=sfx instanceDir=sfx / core name=tracker instanceDir=tracker / etc. After that you add the core name into the url for all requests to the core: http:///solr/sfx/select?... http:///solr/sfx/update... http:///solr/tracker/select?... http:///solr/tracker/update... On Wed, Sep 29, 2010 at 9:41 AM, Andy angelf...@yahoo.com wrote: I installed Solr according to the tutorial. My schema.xml solrconfig.xml is in ~/apache-solr-1.4.1/example/solr/conf Everything so far is just like that in the tutorial. But I want to set up a 2nd index (separate from the main index) just for the purpose of auto-complete. I understand that I need to set up multicore for this. But I'm not sure how to do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am still pretty confused. - where do I put the 2nd index? - do I need separate schema.xml solrconfig.xml for the 2nd index? Where do I put them? - how do I tell solr which index do I want a document to go to? - how do I tell solr which index do I want to query against? - any step-by-step instruction on setting up multicore? Thanks. Andy
Re: Collating results from multiple indexes
Thanks for your clarification and link, Will. Back to Aaron's question. There is some ongoing work to try to support updating single fields within documents (http://issues.apache.org/jira/browse/SOLR-139 and http://issues.apache.org/jira/browse/SOLR-828) which could perhaps be part of a future solution. Is it an option for you to write a smart join component which can live on top of multiple cores and do multiple sub queries in an efficient way and transparently return the final result? Forking the shards query code could be a starting point? Donating this component back to Solr may free you of maintenance burden, as I'm sure it will be useful to a larger audience? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 17. feb. 2010, at 03.27, Will Johnson wrote: Jan Hoydal / Otis, First off, Thanks for mentioning us. We do use some utility functions from SOLR but our index engine is built on top of Lucene only, there are no Solr cores involved. We do have a JOIN operator that allows us to perform relational searches while still acting like a search engine in terms of performance, ranking, faceting, etc. Our CTO wrote a blog article about it a month ago that does a pretty good of explaining how it’s used: http://www.attivio.com/blog/55-industry-insights/507-can-a-search-engine-replace-a-relational-database.html The join functionality and most of our other higher level features use separate data structures and don’t really have much to do with Lucene/SOLR except where they integrate with the query execution. If you want to learn more feel free to check out www.attivio.com. - w...@attivio.com On Fri, Feb 12, 2010 at 10:35 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Really? The last time I looked at AIE, I am pretty sure there was Solr core msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may be mistaken. Anyone from Attivio here who can elaborate? Is the join stuff at Lucene level or on top of multiple Solr cores or what? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote: Minor correction re Attivio - their stuff runs on top of Lucene, not Solr. I *think* they are trying to patent this. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Jan Høydahl / Cominvent jan@cominvent.com To: solr-user@lucene.apache.org Sent: Mon, February 8, 2010 3:33:41 PM Subject: Re: Collating results from multiple indexes Hi, There is no JOIN functionality in Solr. The common solution is either to accept the high volume update churn, or to add client side code to build a join layer on top of the two indices. I know that Attivio (www.attivio.com) have built some kind of JOIN functionality on top of Solr in their AIE product, but do not know the details or the actual performance. Why not open a JIRA issue, if there is no such already, to request this as a feature? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 25. jan. 2010, at 22.01, Aaron McKee wrote: Is there any somewhat convenient way to collate/integrate fields from separate indices during result writing, if the indices use the same unique keys? Basically, some sort of cross-index JOIN? As a bit of background, I have a rather heavyweight dataset of every US business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to fully index on a decent box). Given the size and relatively stability of the dataset, I generally only update this monthly. However, I have separate advertising-related datasets that need to be updated either hourly or daily (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds reference the same keyspace that I use in the main index, but are otherwise significantly lighter weight. Importing and indexing them discretely only takes a couple minutes. Given that Solr/Lucene doesn't support field updating, without having to drop and re-add an entire document, it doesn't seem practical to integrate this data into the main index (the system would be under a constant state of churn, if we did document re-inserts, and the performance impact would probably be debilitating). It may be nice if this data could participate in filtering (e.g. only show advertisers), but it doesn't need to participate in scoring/ranking. I'm guessing that someone else has had a similar need, at some point? I can have our front-end query the smaller indices separately, using the keys returned by the primary index, but would prefer to avoid the extra sequential roundtrips. I'm hoping to also avoid a coding solution, if only to avoid the maintenance overhead as we drop in new builds of Solr, but that's also feasible. Thank you
Re: Collating results from multiple indexes
Jan Hoydal / Otis, First off, Thanks for mentioning us. We do use some utility functions from SOLR but our index engine is built on top of Lucene only, there are no Solr cores involved. We do have a JOIN operator that allows us to perform relational searches while still acting like a search engine in terms of performance, ranking, faceting, etc. Our CTO wrote a blog article about it a month ago that does a pretty good of explaining how it’s used: http://www.attivio.com/blog/55-industry-insights/507-can-a-search-engine-replace-a-relational-database.html The join functionality and most of our other higher level features use separate data structures and don’t really have much to do with Lucene/SOLR except where they integrate with the query execution. If you want to learn more feel free to check out www.attivio.com. - w...@attivio.com On Fri, Feb 12, 2010 at 10:35 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Really? The last time I looked at AIE, I am pretty sure there was Solr core msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may be mistaken. Anyone from Attivio here who can elaborate? Is the join stuff at Lucene level or on top of multiple Solr cores or what? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote: Minor correction re Attivio - their stuff runs on top of Lucene, not Solr. I *think* they are trying to patent this. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Jan Høydahl / Cominvent jan@cominvent.com To: solr-user@lucene.apache.org Sent: Mon, February 8, 2010 3:33:41 PM Subject: Re: Collating results from multiple indexes Hi, There is no JOIN functionality in Solr. The common solution is either to accept the high volume update churn, or to add client side code to build a join layer on top of the two indices. I know that Attivio (www.attivio.com) have built some kind of JOIN functionality on top of Solr in their AIE product, but do not know the details or the actual performance. Why not open a JIRA issue, if there is no such already, to request this as a feature? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 25. jan. 2010, at 22.01, Aaron McKee wrote: Is there any somewhat convenient way to collate/integrate fields from separate indices during result writing, if the indices use the same unique keys? Basically, some sort of cross-index JOIN? As a bit of background, I have a rather heavyweight dataset of every US business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to fully index on a decent box). Given the size and relatively stability of the dataset, I generally only update this monthly. However, I have separate advertising-related datasets that need to be updated either hourly or daily (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds reference the same keyspace that I use in the main index, but are otherwise significantly lighter weight. Importing and indexing them discretely only takes a couple minutes. Given that Solr/Lucene doesn't support field updating, without having to drop and re-add an entire document, it doesn't seem practical to integrate this data into the main index (the system would be under a constant state of churn, if we did document re-inserts, and the performance impact would probably be debilitating). It may be nice if this data could participate in filtering (e.g. only show advertisers), but it doesn't need to participate in scoring/ranking. I'm guessing that someone else has had a similar need, at some point? I can have our front-end query the smaller indices separately, using the keys returned by the primary index, but would prefer to avoid the extra sequential roundtrips. I'm hoping to also avoid a coding solution, if only to avoid the maintenance overhead as we drop in new builds of Solr, but that's also feasible. Thank you for your insight, Aaron
Re: Collating results from multiple indexes
Really? The last time I looked at AIE, I am pretty sure there was Solr core msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may be mistaken. Anyone from Attivio here who can elaborate? Is the join stuff at Lucene level or on top of multiple Solr cores or what? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote: Minor correction re Attivio - their stuff runs on top of Lucene, not Solr. I *think* they are trying to patent this. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Jan Høydahl / Cominvent jan@cominvent.com To: solr-user@lucene.apache.org Sent: Mon, February 8, 2010 3:33:41 PM Subject: Re: Collating results from multiple indexes Hi, There is no JOIN functionality in Solr. The common solution is either to accept the high volume update churn, or to add client side code to build a join layer on top of the two indices. I know that Attivio (www.attivio.com) have built some kind of JOIN functionality on top of Solr in their AIE product, but do not know the details or the actual performance. Why not open a JIRA issue, if there is no such already, to request this as a feature? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 25. jan. 2010, at 22.01, Aaron McKee wrote: Is there any somewhat convenient way to collate/integrate fields from separate indices during result writing, if the indices use the same unique keys? Basically, some sort of cross-index JOIN? As a bit of background, I have a rather heavyweight dataset of every US business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to fully index on a decent box). Given the size and relatively stability of the dataset, I generally only update this monthly. However, I have separate advertising-related datasets that need to be updated either hourly or daily (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds reference the same keyspace that I use in the main index, but are otherwise significantly lighter weight. Importing and indexing them discretely only takes a couple minutes. Given that Solr/Lucene doesn't support field updating, without having to drop and re-add an entire document, it doesn't seem practical to integrate this data into the main index (the system would be under a constant state of churn, if we did document re-inserts, and the performance impact would probably be debilitating). It may be nice if this data could participate in filtering (e.g. only show advertisers), but it doesn't need to participate in scoring/ranking. I'm guessing that someone else has had a similar need, at some point? I can have our front-end query the smaller indices separately, using the keys returned by the primary index, but would prefer to avoid the extra sequential roundtrips. I'm hoping to also avoid a coding solution, if only to avoid the maintenance overhead as we drop in new builds of Solr, but that's also feasible. Thank you for your insight, Aaron
Re: Collating results from multiple indexes
Minor correction re Attivio - their stuff runs on top of Lucene, not Solr. I *think* they are trying to patent this. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Jan Høydahl / Cominvent jan@cominvent.com To: solr-user@lucene.apache.org Sent: Mon, February 8, 2010 3:33:41 PM Subject: Re: Collating results from multiple indexes Hi, There is no JOIN functionality in Solr. The common solution is either to accept the high volume update churn, or to add client side code to build a join layer on top of the two indices. I know that Attivio (www.attivio.com) have built some kind of JOIN functionality on top of Solr in their AIE product, but do not know the details or the actual performance. Why not open a JIRA issue, if there is no such already, to request this as a feature? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 25. jan. 2010, at 22.01, Aaron McKee wrote: Is there any somewhat convenient way to collate/integrate fields from separate indices during result writing, if the indices use the same unique keys? Basically, some sort of cross-index JOIN? As a bit of background, I have a rather heavyweight dataset of every US business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to fully index on a decent box). Given the size and relatively stability of the dataset, I generally only update this monthly. However, I have separate advertising-related datasets that need to be updated either hourly or daily (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds reference the same keyspace that I use in the main index, but are otherwise significantly lighter weight. Importing and indexing them discretely only takes a couple minutes. Given that Solr/Lucene doesn't support field updating, without having to drop and re-add an entire document, it doesn't seem practical to integrate this data into the main index (the system would be under a constant state of churn, if we did document re-inserts, and the performance impact would probably be debilitating). It may be nice if this data could participate in filtering (e.g. only show advertisers), but it doesn't need to participate in scoring/ranking. I'm guessing that someone else has had a similar need, at some point? I can have our front-end query the smaller indices separately, using the keys returned by the primary index, but would prefer to avoid the extra sequential roundtrips. I'm hoping to also avoid a coding solution, if only to avoid the maintenance overhead as we drop in new builds of Solr, but that's also feasible. Thank you for your insight, Aaron
Re: Collating results from multiple indexes
Hi, There is no JOIN functionality in Solr. The common solution is either to accept the high volume update churn, or to add client side code to build a join layer on top of the two indices. I know that Attivio (www.attivio.com) have built some kind of JOIN functionality on top of Solr in their AIE product, but do not know the details or the actual performance. Why not open a JIRA issue, if there is no such already, to request this as a feature? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 25. jan. 2010, at 22.01, Aaron McKee wrote: Is there any somewhat convenient way to collate/integrate fields from separate indices during result writing, if the indices use the same unique keys? Basically, some sort of cross-index JOIN? As a bit of background, I have a rather heavyweight dataset of every US business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to fully index on a decent box). Given the size and relatively stability of the dataset, I generally only update this monthly. However, I have separate advertising-related datasets that need to be updated either hourly or daily (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds reference the same keyspace that I use in the main index, but are otherwise significantly lighter weight. Importing and indexing them discretely only takes a couple minutes. Given that Solr/Lucene doesn't support field updating, without having to drop and re-add an entire document, it doesn't seem practical to integrate this data into the main index (the system would be under a constant state of churn, if we did document re-inserts, and the performance impact would probably be debilitating). It may be nice if this data could participate in filtering (e.g. only show advertisers), but it doesn't need to participate in scoring/ranking. I'm guessing that someone else has had a similar need, at some point? I can have our front-end query the smaller indices separately, using the keys returned by the primary index, but would prefer to avoid the extra sequential roundtrips. I'm hoping to also avoid a coding solution, if only to avoid the maintenance overhead as we drop in new builds of Solr, but that's also feasible. Thank you for your insight, Aaron
Collating results from multiple indexes
Is there any somewhat convenient way to collate/integrate fields from separate indices during result writing, if the indices use the same unique keys? Basically, some sort of cross-index JOIN? As a bit of background, I have a rather heavyweight dataset of every US business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to fully index on a decent box). Given the size and relatively stability of the dataset, I generally only update this monthly. However, I have separate advertising-related datasets that need to be updated either hourly or daily (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds reference the same keyspace that I use in the main index, but are otherwise significantly lighter weight. Importing and indexing them discretely only takes a couple minutes. Given that Solr/Lucene doesn't support field updating, without having to drop and re-add an entire document, it doesn't seem practical to integrate this data into the main index (the system would be under a constant state of churn, if we did document re-inserts, and the performance impact would probably be debilitating). It may be nice if this data could participate in filtering (e.g. only show advertisers), but it doesn't need to participate in scoring/ranking. I'm guessing that someone else has had a similar need, at some point? I can have our front-end query the smaller indices separately, using the keys returned by the primary index, but would prefer to avoid the extra sequential roundtrips. I'm hoping to also avoid a coding solution, if only to avoid the maintenance overhead as we drop in new builds of Solr, but that's also feasible. Thank you for your insight, Aaron
Re: All in one index, or multiple indexes?
keep in mind that everytime a commit is done all the caches are thrown away. If updates for each of these indexes happen at different time then the caches get invalidated each time you commit. so in that case smaller index helps On Wed, Jul 8, 2009 at 4:55 PM, Tim Selltrs...@gmail.com wrote: Hi, I am wondering if it is common to have just one very large index, or multiple smaller indexes specialized for different content types. We currently have multiple smaller indexes, although one of them is much larger then the others. We are considering merging them, to allow the convenience of searching across multiple types at once and get them back in one list. The largest of the current indexes has a couple of types that belong together, it has just one text field, and it is usually quite short and is similar to product names (words like The matter). Another index I would merge with this one, has multiple text fields (also quite short). We of course would still like to be able to get specific types. Is doing filtering on just one type a big performance hit compared to just querying it from it's own index? Bare in mind all these indexes run on the same machine. (we replicate them all to three machines and do load balancing). There are a number of considerations. From an application standpoint when querying across all types we may split the results out into the separate types anyway once we have the list back. If we always do this, is it silly to have them in one index, rather then query multiple indexes at once? Is multiple http requests less significant then the time to post split the results? In some ways it is easier to maintain a single index, although it has felt easier to optimize the results for the type of content if they are in separate indexes. My main concern of putting it all in one index is that we'll make it harder to work with. We will definitely want to do filtering on types sometimes, and if we go with a mashed up index I'd prefer not to maintain separate specialized indexes as well. Any thoughts? ~Tim. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: All in one index, or multiple indexes?
It will depend on how much total volume you have. If you are discussing millions and millions of records, I'd say use multicore and shards. On Wed, Jul 8, 2009 at 5:25 AM, Tim Sell trs...@gmail.com wrote: Hi, I am wondering if it is common to have just one very large index, or multiple smaller indexes specialized for different content types. We currently have multiple smaller indexes, although one of them is much larger then the others. We are considering merging them, to allow the convenience of searching across multiple types at once and get them back in one list. The largest of the current indexes has a couple of types that belong together, it has just one text field, and it is usually quite short and is similar to product names (words like The matter). Another index I would merge with this one, has multiple text fields (also quite short). We of course would still like to be able to get specific types. Is doing filtering on just one type a big performance hit compared to just querying it from it's own index? Bare in mind all these indexes run on the same machine. (we replicate them all to three machines and do load balancing). There are a number of considerations. From an application standpoint when querying across all types we may split the results out into the separate types anyway once we have the list back. If we always do this, is it silly to have them in one index, rather then query multiple indexes at once? Is multiple http requests less significant then the time to post split the results? In some ways it is easier to maintain a single index, although it has felt easier to optimize the results for the type of content if they are in separate indexes. My main concern of putting it all in one index is that we'll make it harder to work with. We will definitely want to do filtering on types sometimes, and if we go with a mashed up index I'd prefer not to maintain separate specialized indexes as well. Any thoughts? ~Tim.
All in one index, or multiple indexes?
Hi, I am wondering if it is common to have just one very large index, or multiple smaller indexes specialized for different content types. We currently have multiple smaller indexes, although one of them is much larger then the others. We are considering merging them, to allow the convenience of searching across multiple types at once and get them back in one list. The largest of the current indexes has a couple of types that belong together, it has just one text field, and it is usually quite short and is similar to product names (words like The matter). Another index I would merge with this one, has multiple text fields (also quite short). We of course would still like to be able to get specific types. Is doing filtering on just one type a big performance hit compared to just querying it from it's own index? Bare in mind all these indexes run on the same machine. (we replicate them all to three machines and do load balancing). There are a number of considerations. From an application standpoint when querying across all types we may split the results out into the separate types anyway once we have the list back. If we always do this, is it silly to have them in one index, rather then query multiple indexes at once? Is multiple http requests less significant then the time to post split the results? In some ways it is easier to maintain a single index, although it has felt easier to optimize the results for the type of content if they are in separate indexes. My main concern of putting it all in one index is that we'll make it harder to work with. We will definitely want to do filtering on types sometimes, and if we go with a mashed up index I'd prefer not to maintain separate specialized indexes as well. Any thoughts? ~Tim.
Re: Solr multiple indexes
Hello Otis, thank you for your reply. What I am trying to achieve is to index different tables with different primary keys and different fields (basically different documents/entity). Is it possible to create a data-config with different root entities/documents and index/search everything transparently? Is the attached data-config.xml valid? If it is, is the attached schema.xml valid? The issue there is that I don't know how to specify the corresponding uniquiKeyId. Thanks a lot for your help. Giovanni On 3/18/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Giovanni, It sounds like you are after a JOIN between two indices a la RDBMS JOIN? It's not possible with Solr, unless you want to do separate queries and manually join. If you are talking about merging multiple indices of the same type into a single index, that's a different story and doable, although not yet via Solr. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Giovanni De Stefano giovanni.destef...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, March 18, 2009 12:56:36 PM Subject: Solr multiple indexes Hello all, here I am with another question :-) I have to index the content of two different tables on an Oracle DB. When it comes to only one table, everything is fine: one datasource, one document, one entity in data-config, one uniqueKey in schema.xml etc. It works great. But now I have on the same DB (but it might be irrelevant), another table with a different structure from the first one: I might merge the two table to have a huge document but I don't like this solution (delta imports would be a nightmare/impossible, I might have to index data from other sources etc). I believe I should create MULTIPLE INDEXES and then merge them. I have found very little documentations about this: any idea? The Multiple Solr Webapps solution seems nice, but how could I search globally within all index at the same time? The current architecture already expects Multicore Solr (to serve different countries) so I would rather not prefer to have multicore multicore Solr... :-( Any help/link is very much appreciated! Cheers, Giovanni dataConfig dataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@localhost:1521/XE user=TEST password=TEST / document name=TEST entity name=book pk=TITLE query=select TITLE,AUTHOR,PRICE,TIMESTAMP from BOOKS deltaQuery=select TITLE from BOOKS where TIMESTAMP '${dataimporter.last_index_time}' root=true field column=TITLE name=book_title/ field column=AUTHOR name=book_author/ field column=PRICE name=book_price/ field column=TIMESTAMP name=book_timestamp/ /entity entity name=furniture pk=ID query=select ID,COLOR,SIZE,TIMESTAMP from FURNITURE deltaQuery=select ID from FURNITURE where TIMESTAMP '${dataimporter.last_index_time}' root=true field column=ID name=furniture_id/ field column=COLOR name=furniture_color/ field column=SIZE name=furniture_size/ field column=TIMESTAMP name=furniture_timestamp/ /entity /document /dataConfig ?xml version=1.0 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- schema name=DIT version=1.1 types fieldType name=text class=solr.TextField positionIncrementGap=100/ fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ /types fields !-- BOOKS -- field name=book_title type=text indexed=true stored=true multivalued=false required=true/ field name=book_author type=text indexed=true stored=true multivalued=false/ field name=book_price type=text indexed=true stored=true multivalued=false/ field name=book_timestamp type=date indexed=true stored=true multivalued=false/ field name=all_book type=text indexed=true stored=true multivalued=true/ !-- FURNITURE -- field name=furniture_id type=text indexed=true stored=true multivalued=false required=true/ field name=furniture_color type=text indexed=true stored=true multivalued=false/ field name=furniture_size type=text indexed
Solr multiple indexes
Hello all, here I am with another question :-) I have to index the content of two different tables on an Oracle DB. When it comes to only one table, everything is fine: one datasource, one document, one entity in data-config, one uniqueKey in schema.xml etc. It works great. But now I have on the same DB (but it might be irrelevant), another table with a different structure from the first one: I might merge the two table to have a huge document but I don't like this solution (delta imports would be a nightmare/impossible, I might have to index data from other sources etc). I believe I should create MULTIPLE INDEXES and then merge them. I have found very little documentations about this: any idea? The Multiple Solr Webapps solution seems nice, but how could I search globally within all index at the same time? The current architecture already expects Multicore Solr (to serve different countries) so I would rather not prefer to have multicore multicore Solr... :-( Any help/link is very much appreciated! Cheers, Giovanni
Re: Solr multiple indexes
Giovanni, It sounds like you are after a JOIN between two indices a la RDBMS JOIN? It's not possible with Solr, unless you want to do separate queries and manually join. If you are talking about merging multiple indices of the same type into a single index, that's a different story and doable, although not yet via Solr. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Giovanni De Stefano giovanni.destef...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, March 18, 2009 12:56:36 PM Subject: Solr multiple indexes Hello all, here I am with another question :-) I have to index the content of two different tables on an Oracle DB. When it comes to only one table, everything is fine: one datasource, one document, one entity in data-config, one uniqueKey in schema.xml etc. It works great. But now I have on the same DB (but it might be irrelevant), another table with a different structure from the first one: I might merge the two table to have a huge document but I don't like this solution (delta imports would be a nightmare/impossible, I might have to index data from other sources etc). I believe I should create MULTIPLE INDEXES and then merge them. I have found very little documentations about this: any idea? The Multiple Solr Webapps solution seems nice, but how could I search globally within all index at the same time? The current architecture already expects Multicore Solr (to serve different countries) so I would rather not prefer to have multicore multicore Solr... :-( Any help/link is very much appreciated! Cheers, Giovanni
multiple indexes
Hi, I would like to know how it can be implemented. Index1 has fields id,1,2,3 and index2 has fields id,5,6,7. The ID in both indexes are unique id. Can I use a kind of distributed search and/or multicore to search, sort, and facet through 2 indexes (index1 and index2)? Thanks, Jae joo
RE: Multiple Indexes
Not sure if this will work for you but you can have 3 cores (using multicore) and have your solr server or the client decide on to which core it should be hitting. With this approach your can have separate schema.xml solrconfig.xml for each of the cores obviously separate index in each core. -Raghu -Original Message- From: anshuljohri [mailto:[EMAIL PROTECTED] Sent: Thursday, August 07, 2008 5:19 PM To: solr-user@lucene.apache.org Subject: Re: Multiple Indexes Both the cases are there. As i said i need to index 3 indexes. So 2 indexes have same schema but other one has different. More specification is like this -- I have 3 indexes. In which 2 indexes have same data model but the way these are indexed is different. So i need to fire query from backend on individual indexes based on input. But the 3rd index has diff schema also. Again the query will be fired on this index based on input. So my question is how can i handle this situation. Thru configuring multiple instances of Solr/Tomcat if ya than how? else what are the other ways on Solr 1.2 -Anshul zayhen wrote: Oh, Sorry! Can you be a little more specific? Do these indexes have different schemas, or do they represent the same data model? 2008/8/7 anshuljohri [EMAIL PROTECTED] Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880973.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Indexes
Try putting them all in one index. Your fields can be s1_name for schema 1, s2_name for schema 2, and so on. The only reason to have separate indexes is if each group of content has a different update schedule and if you have high traffic (over 1M queries/day). wunder On 8/8/08 8:19 AM, Kashyap, Raghu [EMAIL PROTECTED] wrote: Not sure if this will work for you but you can have 3 cores (using multicore) and have your solr server or the client decide on to which core it should be hitting. With this approach your can have separate schema.xml solrconfig.xml for each of the cores obviously separate index in each core. -Raghu -Original Message- From: anshuljohri [mailto:[EMAIL PROTECTED] Sent: Thursday, August 07, 2008 5:19 PM To: solr-user@lucene.apache.org Subject: Re: Multiple Indexes Both the cases are there. As i said i need to index 3 indexes. So 2 indexes have same schema but other one has different. More specification is like this -- I have 3 indexes. In which 2 indexes have same data model but the way these are indexed is different. So i need to fire query from backend on individual indexes based on input. But the 3rd index has diff schema also. Again the query will be fired on this index based on input. So my question is how can i handle this situation. Thru configuring multiple instances of Solr/Tomcat if ya than how? else what are the other ways on Solr 1.2 -Anshul zayhen wrote: Oh, Sorry! Can you be a little more specific? Do these indexes have different schemas, or do they represent the same data model? 2008/8/7 anshuljohri [EMAIL PROTECTED] Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha
Re: Multiple Indexes
I meant update frequency more than schedule. If one group of content is updated once per day and the another every ten minutes, and most of the traffic is going to the slow collection, splitting them could help. wunder On 8/8/08 8:25 AM, Walter Underwood [EMAIL PROTECTED] wrote: Try putting them all in one index. Your fields can be s1_name for schema 1, s2_name for schema 2, and so on. The only reason to have separate indexes is if each group of content has a different update schedule and if you have high traffic (over 1M queries/day). wunder On 8/8/08 8:19 AM, Kashyap, Raghu [EMAIL PROTECTED] wrote: Not sure if this will work for you but you can have 3 cores (using multicore) and have your solr server or the client decide on to which core it should be hitting. With this approach your can have separate schema.xml solrconfig.xml for each of the cores obviously separate index in each core. -Raghu -Original Message- From: anshuljohri [mailto:[EMAIL PROTECTED] Sent: Thursday, August 07, 2008 5:19 PM To: solr-user@lucene.apache.org Subject: Re: Multiple Indexes Both the cases are there. As i said i need to index 3 indexes. So 2 indexes have same schema but other one has different. More specification is like this -- I have 3 indexes. In which 2 indexes have same data model but the way these are indexed is different. So i need to fire query from backend on individual indexes based on input. But the 3rd index has diff schema also. Again the query will be fired on this index based on input. So my question is how can i handle this situation. Thru configuring multiple instances of Solr/Tomcat if ya than how? else what are the other ways on Solr 1.2 -Anshul zayhen wrote: Oh, Sorry! Can you be a little more specific? Do these indexes have different schemas, or do they represent the same data model? 2008/8/7 anshuljohri [EMAIL PROTECTED] Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha
Multiple Indexes
Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Indexes
Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Indexes
Oh, Sorry! Can you be a little more specific? Do these indexes have different schemas, or do they represent the same data model? 2008/8/7 anshuljohri [EMAIL PROTECTED] Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim
Re: Multiple Indexes
Both the cases are there. As i said i need to index 3 indexes. So 2 indexes have same schema but other one has different. More specification is like this -- I have 3 indexes. In which 2 indexes have same data model but the way these are indexed is different. So i need to fire query from backend on individual indexes based on input. But the 3rd index has diff schema also. Again the query will be fired on this index based on input. So my question is how can i handle this situation. Thru configuring multiple instances of Solr/Tomcat if ya than how? else what are the other ways on Solr 1.2 -Anshul zayhen wrote: Oh, Sorry! Can you be a little more specific? Do these indexes have different schemas, or do they represent the same data model? 2008/8/7 anshuljohri [EMAIL PROTECTED] Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880973.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Multiple indexes
Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date: Thu, 8 Nov 2007 18:05:43 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Multiple indexes Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Multiple indexes
The advantages of a multi-core setup are configuration flexibility and dynamically changing available options (without a full restart). For high-performance production solr servers, I don't think there is much reason for it. You may want to split the two indexes on to two machines. You may want to run each index in a separate JVM (so if one crashes, the other does not) Maintaining 2 indexes is pretty easy, if that was a larger number or you need to create indexes for each user in a system then it would be worth investigating the multi-core setup (it is still in development) ryan Pierre-Yves LANDRON wrote: Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date: Thu, 8 Nov 2007 18:05:43 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Multiple indexes Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Multiple indexes
Here is my situation. I have 6 millions articles indexed and adding about 10k articles everyday. If I maintain only one index, whenever the daily feeding is running, it consumes the heap area and causes FGC. I am thinking the way to have multiple indexes - one is for ongoing querying service and one is for update. Once update is done, switch the index by automatically and/or my application. Thanks, Jae joo On Nov 12, 2007 8:48 AM, Ryan McKinley [EMAIL PROTECTED] wrote: The advantages of a multi-core setup are configuration flexibility and dynamically changing available options (without a full restart). For high-performance production solr servers, I don't think there is much reason for it. You may want to split the two indexes on to two machines. You may want to run each index in a separate JVM (so if one crashes, the other does not) Maintaining 2 indexes is pretty easy, if that was a larger number or you need to create indexes for each user in a system then it would be worth investigating the multi-core setup (it is still in development) ryan Pierre-Yves LANDRON wrote: Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date: Thu, 8 Nov 2007 18:05:43 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Multiple indexes Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Multiple indexes
just use the standard collection distribution stuff. That is what it is made for! http://wiki.apache.org/solr/CollectionDistribution Alternatively, open up two indexes using the same config/dir -- do your indexing on one and the searching on the other. when indexing is done (or finishes a big chunk) send commit/ to the 'searching' one and it will see the new stuff. ryan Jae Joo wrote: Here is my situation. I have 6 millions articles indexed and adding about 10k articles everyday. If I maintain only one index, whenever the daily feeding is running, it consumes the heap area and causes FGC. I am thinking the way to have multiple indexes - one is for ongoing querying service and one is for update. Once update is done, switch the index by automatically and/or my application. Thanks, Jae joo On Nov 12, 2007 8:48 AM, Ryan McKinley [EMAIL PROTECTED] wrote: The advantages of a multi-core setup are configuration flexibility and dynamically changing available options (without a full restart). For high-performance production solr servers, I don't think there is much reason for it. You may want to split the two indexes on to two machines. You may want to run each index in a separate JVM (so if one crashes, the other does not) Maintaining 2 indexes is pretty easy, if that was a larger number or you need to create indexes for each user in a system then it would be worth investigating the multi-core setup (it is still in development) ryan Pierre-Yves LANDRON wrote: Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date: Thu, 8 Nov 2007 18:05:43 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Multiple indexes Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Best way to create multiple indexes
For starters, do you need to be able to search across groups or sub-groups (in one query?) If so, then you have to stick everything in one index. You can add a field to each document saying what 'group' or 'sub-group' it is in and then limit it at query time q=kittens +group:A The advantage to splitting it into multiple indexes is that you could put each index on independent hardware. Depending on your queries and index size that may make a big difference. ryan Rishabh Joshi wrote: Hi, I have a requirement and was wondering if someone could help me in how to go about it. We have to index about 8-9 million documents and their size can be anywhere from a few KBs to a couple of MBs. These documents are categorized into many 'groups' and 'sub-groups'. I wanted to know if we can create multiple indexes based on 'groups' and then on 'sub-groups' in Solr? If yes, then how do we go about it? I tried going through the section on 'Collections' in the Solr Wiki, but could not make much use of it. Regards, Rishabh Joshi
Re: Best way to create multiple indexes
Hi Guys How do we add word documents / pdf / text / etc documents in solr ?. How the content of the files are stored or indexed ?. Does the documents are stored as XML in the filesystem ? Regards Dwarak R - Original Message - From: Ryan McKinley [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, November 12, 2007 7:43 PM Subject: Re: Best way to create multiple indexes For starters, do you need to be able to search across groups or sub-groups (in one query?) If so, then you have to stick everything in one index. You can add a field to each document saying what 'group' or 'sub-group' it is in and then limit it at query time q=kittens +group:A The advantage to splitting it into multiple indexes is that you could put each index on independent hardware. Depending on your queries and index size that may make a big difference. ryan Rishabh Joshi wrote: Hi, I have a requirement and was wondering if someone could help me in how to go about it. We have to index about 8-9 million documents and their size can be anywhere from a few KBs to a couple of MBs. These documents are categorized into many 'groups' and 'sub-groups'. I wanted to know if we can create multiple indexes based on 'groups' and then on 'sub-groups' in Solr? If yes, then how do we go about it? I tried going through the section on 'Collections' in the Solr Wiki, but could not make much use of it. Regards, Rishabh Joshi This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender[EMAIL PROTECTED] immediately and delete the original. Any other use of the email by you is prohibited.
RE: Best way to create multiple indexes
Ryan, We currently have 8-9 million documents to index and this number will grow in the future. Also, we will never have a query that will search across groups, but, we will have queries that will search across sub-groups for sure. Now, keeping this in mind we were thinking if we could have multiple indexes at the 'group' level at least. Also, can multiple indexes be created dynamically? For example: In my application if I create a 'logical group', then an index should be created for that group. Rishabh -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, November 12, 2007 7:44 PM To: solr-user@lucene.apache.org Subject: Re: Best way to create multiple indexes For starters, do you need to be able to search across groups or sub-groups (in one query?) If so, then you have to stick everything in one index. You can add a field to each document saying what 'group' or 'sub-group' it is in and then limit it at query time q=kittens +group:A The advantage to splitting it into multiple indexes is that you could put each index on independent hardware. Depending on your queries and index size that may make a big difference. ryan Rishabh Joshi wrote: Hi, I have a requirement and was wondering if someone could help me in how to go about it. We have to index about 8-9 million documents and their size can be anywhere from a few KBs to a couple of MBs. These documents are categorized into many 'groups' and 'sub-groups'. I wanted to know if we can create multiple indexes based on 'groups' and then on 'sub-groups' in Solr? If yes, then how do we go about it? I tried going through the section on 'Collections' in the Solr Wiki, but could not make much use of it. Regards, Rishabh Joshi
Re: Multiple indexes
I have built the master solr instance and indexed some files. Once I run snapshotter, i complains the error.. - snapshooter -d data/index (in solr/bin directory) Did I missed something? ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/11/12 12:38:40 taking snapshot /solr/master/solr/data/index/snapshot.20071112123840 + [[ -n '' ]] + mv /solr/master/solr/data/index/temp-snapshot.20071112123840/solr/master/solr/data/index/snapshot.20071112123840 mv: cannot access /solr/master/solr/data/index/temp-snapshot.20071112123840 Jae On Nov 12, 2007 9:09 AM, Ryan McKinley [EMAIL PROTECTED] wrote: just use the standard collection distribution stuff. That is what it is made for! http://wiki.apache.org/solr/CollectionDistribution Alternatively, open up two indexes using the same config/dir -- do your indexing on one and the searching on the other. when indexing is done (or finishes a big chunk) send commit/ to the 'searching' one and it will see the new stuff. ryan Jae Joo wrote: Here is my situation. I have 6 millions articles indexed and adding about 10k articles everyday. If I maintain only one index, whenever the daily feeding is running, it consumes the heap area and causes FGC. I am thinking the way to have multiple indexes - one is for ongoing querying service and one is for update. Once update is done, switch the index by automatically and/or my application. Thanks, Jae joo On Nov 12, 2007 8:48 AM, Ryan McKinley [EMAIL PROTECTED] wrote: The advantages of a multi-core setup are configuration flexibility and dynamically changing available options (without a full restart). For high-performance production solr servers, I don't think there is much reason for it. You may want to split the two indexes on to two machines. You may want to run each index in a separate JVM (so if one crashes, the other does not) Maintaining 2 indexes is pretty easy, if that was a larger number or you need to create indexes for each user in a system then it would be worth investigating the multi-core setup (it is still in development) ryan Pierre-Yves LANDRON wrote: Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date: Thu, 8 Nov 2007 18:05:43 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Multiple indexes Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Search Multiple indexes In Solr
It is said that this new feather will be added in solr1.3, but I am not sure about that. I think the following maybe useful for you: https://issues.apache.org/jira/browse/SOLR-303 https://issues.apache.org/jira/browse/SOLR-255 2007/11/8, j 90 [EMAIL PROTECTED]: Hi, I'm new to Solr but very familiar with Lucene. Is there a way to have Solr search in more than once index, much like the MultiSearcher in Lucene ? If so how so I configure the location of the indexes ?
Re: Multiple indexes
I've had good luck with MultiCore, but you have to sync trunk from svn and apply the most recent patch in SOLR-350. https://issues.apache.org/jira/browse/SOLR-350 -jrr Jae Joo wrote: Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo
Search Multiple indexes In Solr
Hi, I'm new to Solr but very familiar with Lucene. Is there a way to have Solr search in more than once index, much like the MultiSearcher in Lucene ? If so how so I configure the location of the indexes ?
Manage multiple indexes with Solr
Hi guys ! Is it possible to configure Solr to manage different indexes depending on the added documents ? For example: * document 1, with uniq ID ui1 will be indexed in the indexA * document 2, with uniq ID ui2 will be indexed in the indexB * document 3, with uniq ID ui1 will be indexed in the indexA Thus documents 1 and 3 are stored in index indexA and document 2 in index indexB. In this case indexA and indexB are completely separate indexes on disk. Thanks in advance cheers Y.
Re: Manage multiple indexes with Solr
i would be interested to know in both the cases : Case 1 : * document 1, with uniq ID ui1 will be indexed in the indexA * document 2, with uniq ID ui2 will be indexed in the indexB * document 3, with uniq ID ui3 will be indexed in the indexA Case 2 : * document 1, with uniq ID ui1 will be indexed in the indexA * document 2, with uniq ID ui2 will be indexed in the indexB * document 3, with uniq ID ui1 will be indexed in the indexA -vEnKAt