Re: Supporting multiple indexes in one collection

2020-07-01 Thread Erick Erickson
Sharding always adds overhead, which balances against splitting the 
work up amongst several machines. 

Sharding works like this for queries:

1> node receives query

2> a sub-query is sent to one replica of each shard

3> each replica sends back its top N (rows parameter) with ID and sort data

4> the node in <1> sorts the candidate lists to get the overall top N

5> the node in <1> sends out another query to each replica to get the data 
associated with the final sorted list

6> the node in <1> assembles the results from <5> and returns the true top 10 
to the client.


All that takes time. OTOH, in this scenario all the replicas are only searching 
a subset of the data, so each sub-query can be faster. Until you reach that 
point, querying a single replica is faster. At some point when your index gets 
past a certain size, that overhead is more than made up for by, basically, 
throwing more hardware at the problem (assuming the shards can make use of more 
hardware or CPUs or threads or whatever). “A certain size” is dependent on your 
data, hardware and query patterns there’s no hard and fast rule.

But you haven’t really told us much. You say you’ve read that SolrCloud 
performance degrades when the number of collections rises. True. But the 
“number of collections” can be in the thousands. Are you talking about 5 
collections? 10 collections? 1,000,000 collections? Details matter.

And how many documents are you talking about per collection? Or in total? 

What are your performance criteria? Do you expect to handle 5 queries/second? 
50? 5,000,000?

When performance differs “by a few milliseconds”, unless you’re dealing with a 
very high total QPS it’s usually a waste of time to worry about it. Almost 
certainly there are much better things to spend your time on that the end users 
will actually notice ;) Plus, performance measurements are very tricky to 
actually get right. Are you measuring with a realistic data set and queries? 
Are you measuring with enough different queries to be hitting the various 
caches in a realistic manner? Are you indexing at the same time in a manner 
that reflects your real world? 

What I’m suggesting is that before making these kinds of decisions, and some of 
the ideas like composite routing and the like will require significant 
engineering effort you be very, very sure that they’re necessary. For instance, 
you’ll have to monitor every replica to see if it gets overloaded. Imagine your 
routing puts 300,000,000 documents for some very large client on a single shard 
(which, again, we have no idea whether that’s something you have to worry about 
since you haven’t told us). Now you’ll have to go in and fix that problem.

Best,
Erick

> On Jul 1, 2020, at 2:58 AM, Raji N  wrote:
> 
> Did the test while back . Revisiting this again. But in standalone solr we
> have experienced the queries more time if the data exists in 2 shards .
> That's the main reason this test was done. If anyone has experience want to
> hear
> 
> On Tue, Jun 30, 2020 at 11:50 PM Jörn Franke  wrote:
> 
>> How many documents ?
>> The real difference  was only a couple of ms?
>> 
>>> Am 01.07.2020 um 07:34 schrieb Raji N :
>>> 
>>> Had 2 indexes in 2 separate shards in one collection and had exact same
>>> data published with composite router with a prefix. Disabled all caches.
>>> Issued the same query which is a small query with q parameter and fq
>>> parameter . Number of queries which got executed  (with same threads and
>>> run for same time ) were more in 2  indexes with 2 separate shards case.
>>> 90th percentile response time was also few ms better.
>>> 
>>> Thanks,
>>> Raji
>>> 
 On Tue, Jun 30, 2020 at 10:06 PM Jörn Franke 
>> wrote:
 
 What did you test? Which queries? What were the exact results in terms
>> of
 time ?
 
>> Am 30.06.2020 um 22:47 schrieb Raji N :
> 
> Hi ,
> 
> 
> Trying to place multiple smaller indexes in one collection (as we read
> solrcloud performance degrades as number of collections increase). We
>> are
> exploring two ways
> 
> 
> 1) Placing each index on a single shard of a collection
> 
> In this case placing documents for a single index is manual and
> automatic rebalancing not done by solr
> 
> 
> 2) Solr routing composite router with a prefix .
> 
>In this case solr doesn’t place all the docs with same prefix in
>> one
> shard , so searches becomes distributed. But shard rebalancing is taken
> care by solr.
> 
> 
> We did a small perf test with both these set up. We saw the performance
 for
> the first case (placing an index explicitly on a shard ) is better.
> 
> 
> Has anyone done anything similar. Can you please share your experience.
> 
> 
> Thanks,
> 
> Raji
 
>> 



Re: Supporting multiple indexes in one collection

2020-07-01 Thread Raji N
Did the test while back . Revisiting this again. But in standalone solr we
have experienced the queries more time if the data exists in 2 shards .
That's the main reason this test was done. If anyone has experience want to
hear

On Tue, Jun 30, 2020 at 11:50 PM Jörn Franke  wrote:

> How many documents ?
> The real difference  was only a couple of ms?
>
> > Am 01.07.2020 um 07:34 schrieb Raji N :
> >
> > Had 2 indexes in 2 separate shards in one collection and had exact same
> > data published with composite router with a prefix. Disabled all caches.
> > Issued the same query which is a small query with q parameter and fq
> > parameter . Number of queries which got executed  (with same threads and
> > run for same time ) were more in 2  indexes with 2 separate shards case.
> > 90th percentile response time was also few ms better.
> >
> > Thanks,
> > Raji
> >
> >> On Tue, Jun 30, 2020 at 10:06 PM Jörn Franke 
> wrote:
> >>
> >> What did you test? Which queries? What were the exact results in terms
> of
> >> time ?
> >>
>  Am 30.06.2020 um 22:47 schrieb Raji N :
> >>>
> >>> Hi ,
> >>>
> >>>
> >>> Trying to place multiple smaller indexes in one collection (as we read
> >>> solrcloud performance degrades as number of collections increase). We
> are
> >>> exploring two ways
> >>>
> >>>
> >>> 1) Placing each index on a single shard of a collection
> >>>
> >>>  In this case placing documents for a single index is manual and
> >>> automatic rebalancing not done by solr
> >>>
> >>>
> >>> 2) Solr routing composite router with a prefix .
> >>>
> >>> In this case solr doesn’t place all the docs with same prefix in
> one
> >>> shard , so searches becomes distributed. But shard rebalancing is taken
> >>> care by solr.
> >>>
> >>>
> >>> We did a small perf test with both these set up. We saw the performance
> >> for
> >>> the first case (placing an index explicitly on a shard ) is better.
> >>>
> >>>
> >>> Has anyone done anything similar. Can you please share your experience.
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Raji
> >>
>


Re: Supporting multiple indexes in one collection

2020-07-01 Thread Jörn Franke
How many documents ? 
The real difference  was only a couple of ms?

> Am 01.07.2020 um 07:34 schrieb Raji N :
> 
> Had 2 indexes in 2 separate shards in one collection and had exact same
> data published with composite router with a prefix. Disabled all caches.
> Issued the same query which is a small query with q parameter and fq
> parameter . Number of queries which got executed  (with same threads and
> run for same time ) were more in 2  indexes with 2 separate shards case.
> 90th percentile response time was also few ms better.
> 
> Thanks,
> Raji
> 
>> On Tue, Jun 30, 2020 at 10:06 PM Jörn Franke  wrote:
>> 
>> What did you test? Which queries? What were the exact results in terms of
>> time ?
>> 
 Am 30.06.2020 um 22:47 schrieb Raji N :
>>> 
>>> Hi ,
>>> 
>>> 
>>> Trying to place multiple smaller indexes in one collection (as we read
>>> solrcloud performance degrades as number of collections increase). We are
>>> exploring two ways
>>> 
>>> 
>>> 1) Placing each index on a single shard of a collection
>>> 
>>>  In this case placing documents for a single index is manual and
>>> automatic rebalancing not done by solr
>>> 
>>> 
>>> 2) Solr routing composite router with a prefix .
>>> 
>>> In this case solr doesn’t place all the docs with same prefix in one
>>> shard , so searches becomes distributed. But shard rebalancing is taken
>>> care by solr.
>>> 
>>> 
>>> We did a small perf test with both these set up. We saw the performance
>> for
>>> the first case (placing an index explicitly on a shard ) is better.
>>> 
>>> 
>>> Has anyone done anything similar. Can you please share your experience.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Raji
>> 


Re: Supporting multiple indexes in one collection

2020-06-30 Thread Raji N
Had 2 indexes in 2 separate shards in one collection and had exact same
data published with composite router with a prefix. Disabled all caches.
Issued the same query which is a small query with q parameter and fq
parameter . Number of queries which got executed  (with same threads and
run for same time ) were more in 2  indexes with 2 separate shards case.
90th percentile response time was also few ms better.

Thanks,
Raji

On Tue, Jun 30, 2020 at 10:06 PM Jörn Franke  wrote:

> What did you test? Which queries? What were the exact results in terms of
> time ?
>
> > Am 30.06.2020 um 22:47 schrieb Raji N :
> >
> > Hi ,
> >
> >
> > Trying to place multiple smaller indexes in one collection (as we read
> > solrcloud performance degrades as number of collections increase). We are
> > exploring two ways
> >
> >
> > 1) Placing each index on a single shard of a collection
> >
> >   In this case placing documents for a single index is manual and
> > automatic rebalancing not done by solr
> >
> >
> > 2) Solr routing composite router with a prefix .
> >
> >  In this case solr doesn’t place all the docs with same prefix in one
> > shard , so searches becomes distributed. But shard rebalancing is taken
> > care by solr.
> >
> >
> > We did a small perf test with both these set up. We saw the performance
> for
> > the first case (placing an index explicitly on a shard ) is better.
> >
> >
> > Has anyone done anything similar. Can you please share your experience.
> >
> >
> > Thanks,
> >
> > Raji
>


Re: Supporting multiple indexes in one collection

2020-06-30 Thread Jörn Franke
What did you test? Which queries? What were the exact results in terms of time ?

> Am 30.06.2020 um 22:47 schrieb Raji N :
> 
> Hi ,
> 
> 
> Trying to place multiple smaller indexes in one collection (as we read
> solrcloud performance degrades as number of collections increase). We are
> exploring two ways
> 
> 
> 1) Placing each index on a single shard of a collection
> 
>   In this case placing documents for a single index is manual and
> automatic rebalancing not done by solr
> 
> 
> 2) Solr routing composite router with a prefix .
> 
>  In this case solr doesn’t place all the docs with same prefix in one
> shard , so searches becomes distributed. But shard rebalancing is taken
> care by solr.
> 
> 
> We did a small perf test with both these set up. We saw the performance for
> the first case (placing an index explicitly on a shard ) is better.
> 
> 
> Has anyone done anything similar. Can you please share your experience.
> 
> 
> Thanks,
> 
> Raji


Supporting multiple indexes in one collection

2020-06-30 Thread Raji N
Hi ,


Trying to place multiple smaller indexes in one collection (as we read
solrcloud performance degrades as number of collections increase). We are
exploring two ways


1) Placing each index on a single shard of a collection

   In this case placing documents for a single index is manual and
automatic rebalancing not done by solr


2) Solr routing composite router with a prefix .

  In this case solr doesn’t place all the docs with same prefix in one
shard , so searches becomes distributed. But shard rebalancing is taken
care by solr.


We did a small perf test with both these set up. We saw the performance for
the first case (placing an index explicitly on a shard ) is better.


Has anyone done anything similar. Can you please share your experience.


Thanks,

Raji


Re: Solr Cloud and Multiple Indexes

2015-11-08 Thread Salman Ansari
70M documents distributed on 2 shards (so each shard
> has
> > > 35M
> > > > document)
> > > >
> > > > What type of queries are slow?
> > > > I am running normal queries (queries on a field) no faceting or
> > > highlights
> > > > are requested. Currently, I am facing delay of 2-3 seconds but
> > > previously I
> > > > had delays of around 28 seconds.
> > > >
> > > > Are there GC pauses as they can be a cause of slowness?
> > > > I doubt this as the slowness was happening for a long period of time.
> > > >
> > > > Are document updates/additions happening in parallel?
> > > > No, I have stopped adding/updating documents and doing queries only.
> > > >
> > > > This is what you are already doing. Did you mean that you want to add
> > > more
> > > > shards?
> > > > No, what I meant is that I read that previously there was a way to
> > chunk
> > > a
> > > > large index into multiple and then do distributed search on that as
> in
> > > this
> > > > article https://wiki.apache.org/solr/DistributedSearch. What I was
> > > looking
> > > > for how this is handled in Solr Cloud?
> > > >
> > > >
> > > > Regards,
> > > > Salman
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <
> > modather1...@gmail.com>
> > > > wrote:
> > > >
> > > > > What is your index size? How much memory is used? What type of
> > queries
> > > > are
> > > > > slow?
> > > > > Are there GC pauses as they can be a cause of slowness?
> > > > > Are document updates/additions happening in parallel?
> > > > >
> > > > > The queries are very slow to run so I was thinking to distribute
> > > > > the indexes into multiple indexes and consequently distributed
> > search.
> > > > Can
> > > > > anyone guide me to some sources (articles) that discuss this in
> Solr
> > > > Cloud?
> > > > >
> > > > > This is what you are already doing. Did you mean that you want to
> add
> > > > more
> > > > > shards?
> > > > >
> > > > > Regards,
> > > > > Modassar
> > > > >
> > > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <
> > salman.rah...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am using Solr cloud and I have created a single index that host
> > > > around
> > > > > > 70M documents distributed into 2 shards (each having 35M
> documents)
> > > > and 2
> > > > > > replicas. The queries are very slow to run so I was thinking to
> > > > > distribute
> > > > > > the indexes into multiple indexes and consequently distributed
> > > search.
> > > > > Can
> > > > > > anyone guide me to some sources (articles) that discuss this in
> > Solr
> > > > > Cloud?
> > > > > >
> > > > > > Appreciate your feedback regarding this.
> > > > > >
> > > > > > Regards,
> > > > > > Salman
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Solr Cloud and Multiple Indexes

2015-11-08 Thread Modassar Ather
the Xms and Xmx you are allocating to Solr and how much max
> is
> > > used
> > > > by your solr?
> > > >
> > > > I doubt this as the slowness was happening for a long period of time.
> > > > I mentioned this point as I have seen gc pauses of 30 seconds and
> more
> > in
> > > > some complex queries.
> > > >
> > > > I am facing delay of 2-3 seconds but previously I
> > > > had delays of around 28 seconds.
> > > > Is this after you moved to solrcloud?
> > > >
> > > > Regards,
> > > > Modassar
> > > >
> > > >
> > > > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari <
> salman.rah...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Here is the current info
> > > > >
> > > > > How much memory is used?
> > > > > Physical memory consumption: 5.48 GB out of 14 GB.
> > > > > Swap space consumption: 5.83 GB out of 15.94 GB.
> > > > > JVM-Memory consumption: 1.58 GB out of 3.83 GB.
> > > > >
> > > > > What is your index size?
> > > > > I have around 70M documents distributed on 2 shards (so each shard
> > has
> > > > 35M
> > > > > document)
> > > > >
> > > > > What type of queries are slow?
> > > > > I am running normal queries (queries on a field) no faceting or
> > > > highlights
> > > > > are requested. Currently, I am facing delay of 2-3 seconds but
> > > > previously I
> > > > > had delays of around 28 seconds.
> > > > >
> > > > > Are there GC pauses as they can be a cause of slowness?
> > > > > I doubt this as the slowness was happening for a long period of
> time.
> > > > >
> > > > > Are document updates/additions happening in parallel?
> > > > > No, I have stopped adding/updating documents and doing queries
> only.
> > > > >
> > > > > This is what you are already doing. Did you mean that you want to
> add
> > > > more
> > > > > shards?
> > > > > No, what I meant is that I read that previously there was a way to
> > > chunk
> > > > a
> > > > > large index into multiple and then do distributed search on that as
> > in
> > > > this
> > > > > article https://wiki.apache.org/solr/DistributedSearch. What I was
> > > > looking
> > > > > for how this is handled in Solr Cloud?
> > > > >
> > > > >
> > > > > Regards,
> > > > > Salman
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <
> > > modather1...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > What is your index size? How much memory is used? What type of
> > > queries
> > > > > are
> > > > > > slow?
> > > > > > Are there GC pauses as they can be a cause of slowness?
> > > > > > Are document updates/additions happening in parallel?
> > > > > >
> > > > > > The queries are very slow to run so I was thinking to distribute
> > > > > > the indexes into multiple indexes and consequently distributed
> > > search.
> > > > > Can
> > > > > > anyone guide me to some sources (articles) that discuss this in
> > Solr
> > > > > Cloud?
> > > > > >
> > > > > > This is what you are already doing. Did you mean that you want to
> > add
> > > > > more
> > > > > > shards?
> > > > > >
> > > > > > Regards,
> > > > > > Modassar
> > > > > >
> > > > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <
> > > salman.rah...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am using Solr cloud and I have created a single index that
> host
> > > > > around
> > > > > > > 70M documents distributed into 2 shards (each having 35M
> > documents)
> > > > > and 2
> > > > > > > replicas. The queries are very slow to run so I was thinking to
> > > > > > distribute
> > > > > > > the indexes into multiple indexes and consequently distributed
> > > > search.
> > > > > > Can
> > > > > > > anyone guide me to some sources (articles) that discuss this in
> > > Solr
> > > > > > Cloud?
> > > > > > >
> > > > > > > Appreciate your feedback regarding this.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Salman
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Solr Cloud and Multiple Indexes

2015-11-05 Thread Salman Ansari
Hi,

I am using Solr cloud and I have created a single index that host around
70M documents distributed into 2 shards (each having 35M documents) and 2
replicas. The queries are very slow to run so I was thinking to distribute
the indexes into multiple indexes and consequently distributed search. Can
anyone guide me to some sources (articles) that discuss this in Solr Cloud?

Appreciate your feedback regarding this.

Regards,
Salman


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Modassar Ather
What is your index size? How much memory is used? What type of queries are
slow?
Are there GC pauses as they can be a cause of slowness?
Are document updates/additions happening in parallel?

The queries are very slow to run so I was thinking to distribute
the indexes into multiple indexes and consequently distributed search. Can
anyone guide me to some sources (articles) that discuss this in Solr Cloud?

This is what you are already doing. Did you mean that you want to add more
shards?

Regards,
Modassar

On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <salman.rah...@gmail.com>
wrote:

> Hi,
>
> I am using Solr cloud and I have created a single index that host around
> 70M documents distributed into 2 shards (each having 35M documents) and 2
> replicas. The queries are very slow to run so I was thinking to distribute
> the indexes into multiple indexes and consequently distributed search. Can
> anyone guide me to some sources (articles) that discuss this in Solr Cloud?
>
> Appreciate your feedback regarding this.
>
> Regards,
> Salman
>


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Salman Ansari
Here is the current info

How much memory is used?
Physical memory consumption: 5.48 GB out of 14 GB.
Swap space consumption: 5.83 GB out of 15.94 GB.
JVM-Memory consumption: 1.58 GB out of 3.83 GB.

What is your index size?
I have around 70M documents distributed on 2 shards (so each shard has 35M
document)

What type of queries are slow?
I am running normal queries (queries on a field) no faceting or highlights
are requested. Currently, I am facing delay of 2-3 seconds but previously I
had delays of around 28 seconds.

Are there GC pauses as they can be a cause of slowness?
I doubt this as the slowness was happening for a long period of time.

Are document updates/additions happening in parallel?
No, I have stopped adding/updating documents and doing queries only.

This is what you are already doing. Did you mean that you want to add more
shards?
No, what I meant is that I read that previously there was a way to chunk a
large index into multiple and then do distributed search on that as in this
article https://wiki.apache.org/solr/DistributedSearch. What I was looking
for how this is handled in Solr Cloud?


Regards,
Salman





On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <modather1...@gmail.com>
wrote:

> What is your index size? How much memory is used? What type of queries are
> slow?
> Are there GC pauses as they can be a cause of slowness?
> Are document updates/additions happening in parallel?
>
> The queries are very slow to run so I was thinking to distribute
> the indexes into multiple indexes and consequently distributed search. Can
> anyone guide me to some sources (articles) that discuss this in Solr Cloud?
>
> This is what you are already doing. Did you mean that you want to add more
> shards?
>
> Regards,
> Modassar
>
> On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am using Solr cloud and I have created a single index that host around
> > 70M documents distributed into 2 shards (each having 35M documents) and 2
> > replicas. The queries are very slow to run so I was thinking to
> distribute
> > the indexes into multiple indexes and consequently distributed search.
> Can
> > anyone guide me to some sources (articles) that discuss this in Solr
> Cloud?
> >
> > Appreciate your feedback regarding this.
> >
> > Regards,
> > Salman
> >
>


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Salman Ansari
Thanks for your response. I have already gone through those documents
before. My point was that if I am using Solr Cloud the only way to
distribute my indexes is by adding shards? and I don't have to do anything
manually (because all the distributed search is handled by Solr Cloud).

What is the Xms and Xmx you are allocating to Solr and how much max is used by
your solr?
Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB

How many segments are there in the index? The more the segment the slower is
the search.
How do I check how many segments are there in the index?

Is this after you moved to solrcloud?
I have been using SolrCloud from the beginning.

Regards,
Salman


On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather <modather1...@gmail.com>
wrote:

> SolrCloud makes the distributed search easier. You can find details about
> it under following link.
> https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
>
> You can also refer to following link:
>
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
>
> From size of your index I meant index size and not the total document
> alone.
> How many segments are there in the index? The more the segment the slower
> is the search.
> What is the Xms and Xmx you are allocating to Solr and how much max is used
> by your solr?
>
> I doubt this as the slowness was happening for a long period of time.
> I mentioned this point as I have seen gc pauses of 30 seconds and more in
> some complex queries.
>
> I am facing delay of 2-3 seconds but previously I
> had delays of around 28 seconds.
> Is this after you moved to solrcloud?
>
> Regards,
> Modassar
>
>
> On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Here is the current info
> >
> > How much memory is used?
> > Physical memory consumption: 5.48 GB out of 14 GB.
> > Swap space consumption: 5.83 GB out of 15.94 GB.
> > JVM-Memory consumption: 1.58 GB out of 3.83 GB.
> >
> > What is your index size?
> > I have around 70M documents distributed on 2 shards (so each shard has
> 35M
> > document)
> >
> > What type of queries are slow?
> > I am running normal queries (queries on a field) no faceting or
> highlights
> > are requested. Currently, I am facing delay of 2-3 seconds but
> previously I
> > had delays of around 28 seconds.
> >
> > Are there GC pauses as they can be a cause of slowness?
> > I doubt this as the slowness was happening for a long period of time.
> >
> > Are document updates/additions happening in parallel?
> > No, I have stopped adding/updating documents and doing queries only.
> >
> > This is what you are already doing. Did you mean that you want to add
> more
> > shards?
> > No, what I meant is that I read that previously there was a way to chunk
> a
> > large index into multiple and then do distributed search on that as in
> this
> > article https://wiki.apache.org/solr/DistributedSearch. What I was
> looking
> > for how this is handled in Solr Cloud?
> >
> >
> > Regards,
> > Salman
> >
> >
> >
> >
> >
> > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <modather1...@gmail.com>
> > wrote:
> >
> > > What is your index size? How much memory is used? What type of queries
> > are
> > > slow?
> > > Are there GC pauses as they can be a cause of slowness?
> > > Are document updates/additions happening in parallel?
> > >
> > > The queries are very slow to run so I was thinking to distribute
> > > the indexes into multiple indexes and consequently distributed search.
> > Can
> > > anyone guide me to some sources (articles) that discuss this in Solr
> > Cloud?
> > >
> > > This is what you are already doing. Did you mean that you want to add
> > more
> > > shards?
> > >
> > > Regards,
> > > Modassar
> > >
> > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <salman.rah...@gmail.com
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am using Solr cloud and I have created a single index that host
> > around
> > > > 70M documents distributed into 2 shards (each having 35M documents)
> > and 2
> > > > replicas. The queries are very slow to run so I was thinking to
> > > distribute
> > > > the indexes into multiple indexes and consequently distributed
> search.
> > > Can
> > > > anyone guide me to some sources (articles) that discuss this in Solr
> > > Cloud?
> > > >
> > > > Appreciate your feedback regarding this.
> > > >
> > > > Regards,
> > > > Salman
> > > >
> > >
> >
>


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Modassar Ather
SolrCloud makes the distributed search easier. You can find details about
it under following link.
https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works

You can also refer to following link:
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

>From size of your index I meant index size and not the total document alone.
How many segments are there in the index? The more the segment the slower
is the search.
What is the Xms and Xmx you are allocating to Solr and how much max is used
by your solr?

I doubt this as the slowness was happening for a long period of time.
I mentioned this point as I have seen gc pauses of 30 seconds and more in
some complex queries.

I am facing delay of 2-3 seconds but previously I
had delays of around 28 seconds.
Is this after you moved to solrcloud?

Regards,
Modassar


On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari <salman.rah...@gmail.com>
wrote:

> Here is the current info
>
> How much memory is used?
> Physical memory consumption: 5.48 GB out of 14 GB.
> Swap space consumption: 5.83 GB out of 15.94 GB.
> JVM-Memory consumption: 1.58 GB out of 3.83 GB.
>
> What is your index size?
> I have around 70M documents distributed on 2 shards (so each shard has 35M
> document)
>
> What type of queries are slow?
> I am running normal queries (queries on a field) no faceting or highlights
> are requested. Currently, I am facing delay of 2-3 seconds but previously I
> had delays of around 28 seconds.
>
> Are there GC pauses as they can be a cause of slowness?
> I doubt this as the slowness was happening for a long period of time.
>
> Are document updates/additions happening in parallel?
> No, I have stopped adding/updating documents and doing queries only.
>
> This is what you are already doing. Did you mean that you want to add more
> shards?
> No, what I meant is that I read that previously there was a way to chunk a
> large index into multiple and then do distributed search on that as in this
> article https://wiki.apache.org/solr/DistributedSearch. What I was looking
> for how this is handled in Solr Cloud?
>
>
> Regards,
> Salman
>
>
>
>
>
> On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <modather1...@gmail.com>
> wrote:
>
> > What is your index size? How much memory is used? What type of queries
> are
> > slow?
> > Are there GC pauses as they can be a cause of slowness?
> > Are document updates/additions happening in parallel?
> >
> > The queries are very slow to run so I was thinking to distribute
> > the indexes into multiple indexes and consequently distributed search.
> Can
> > anyone guide me to some sources (articles) that discuss this in Solr
> Cloud?
> >
> > This is what you are already doing. Did you mean that you want to add
> more
> > shards?
> >
> > Regards,
> > Modassar
> >
> > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <salman.rah...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am using Solr cloud and I have created a single index that host
> around
> > > 70M documents distributed into 2 shards (each having 35M documents)
> and 2
> > > replicas. The queries are very slow to run so I was thinking to
> > distribute
> > > the indexes into multiple indexes and consequently distributed search.
> > Can
> > > anyone guide me to some sources (articles) that discuss this in Solr
> > Cloud?
> > >
> > > Appreciate your feedback regarding this.
> > >
> > > Regards,
> > > Salman
> > >
> >
>


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Modassar Ather
Thanks for your response. I have already gone through those documents
before. My point was that if I am using Solr Cloud the only way to
distribute my indexes is by adding shards? and I don't have to do anything
manually (because all the distributed search is handled by Solr Cloud).

Yes as per my knowledge.

How do I check how many segments are there in the index?
You can see into the index folder manually. Which version of solr are you
using? I don't remember exactly the start version but in the latest and
Solr-5.2.1 there is a "Segments info" link available where you can see
number of segments.

Regards,
Modassar

On Thu, Nov 5, 2015 at 5:41 PM, Salman Ansari <salman.rah...@gmail.com>
wrote:

> Thanks for your response. I have already gone through those documents
> before. My point was that if I am using Solr Cloud the only way to
> distribute my indexes is by adding shards? and I don't have to do anything
> manually (because all the distributed search is handled by Solr Cloud).
>
> What is the Xms and Xmx you are allocating to Solr and how much max is
> used by
> your solr?
> Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB
>
> How many segments are there in the index? The more the segment the slower
> is
> the search.
> How do I check how many segments are there in the index?
>
> Is this after you moved to solrcloud?
> I have been using SolrCloud from the beginning.
>
> Regards,
> Salman
>
>
> On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather <modather1...@gmail.com>
> wrote:
>
> > SolrCloud makes the distributed search easier. You can find details about
> > it under following link.
> > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
> >
> > You can also refer to following link:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> >
> > From size of your index I meant index size and not the total document
> > alone.
> > How many segments are there in the index? The more the segment the slower
> > is the search.
> > What is the Xms and Xmx you are allocating to Solr and how much max is
> used
> > by your solr?
> >
> > I doubt this as the slowness was happening for a long period of time.
> > I mentioned this point as I have seen gc pauses of 30 seconds and more in
> > some complex queries.
> >
> > I am facing delay of 2-3 seconds but previously I
> > had delays of around 28 seconds.
> > Is this after you moved to solrcloud?
> >
> > Regards,
> > Modassar
> >
> >
> > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari <salman.rah...@gmail.com>
> > wrote:
> >
> > > Here is the current info
> > >
> > > How much memory is used?
> > > Physical memory consumption: 5.48 GB out of 14 GB.
> > > Swap space consumption: 5.83 GB out of 15.94 GB.
> > > JVM-Memory consumption: 1.58 GB out of 3.83 GB.
> > >
> > > What is your index size?
> > > I have around 70M documents distributed on 2 shards (so each shard has
> > 35M
> > > document)
> > >
> > > What type of queries are slow?
> > > I am running normal queries (queries on a field) no faceting or
> > highlights
> > > are requested. Currently, I am facing delay of 2-3 seconds but
> > previously I
> > > had delays of around 28 seconds.
> > >
> > > Are there GC pauses as they can be a cause of slowness?
> > > I doubt this as the slowness was happening for a long period of time.
> > >
> > > Are document updates/additions happening in parallel?
> > > No, I have stopped adding/updating documents and doing queries only.
> > >
> > > This is what you are already doing. Did you mean that you want to add
> > more
> > > shards?
> > > No, what I meant is that I read that previously there was a way to
> chunk
> > a
> > > large index into multiple and then do distributed search on that as in
> > this
> > > article https://wiki.apache.org/solr/DistributedSearch. What I was
> > looking
> > > for how this is handled in Solr Cloud?
> > >
> > >
> > > Regards,
> > > Salman
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <
> modather1...@gmail.com>
> > > wrote:
> > >
> > > > What is your index size? How much memory is used? What type of
> queries
> > > are
> > > > slow?
> > > > Are there GC pauses as they can be a cause of slowness?
&g

RE: using SolrJ with SolrCloud, searching multiple indexes.

2014-03-22 Thread Russell Taylor
Yeah sorry didn't explain myself there, one of the three zookeepers will return 
me one of the solrcloud machines for me to access the index. I either need to 
know which machine it returned(is this feasible I can't seem to find a way to 
access information in SolrCloudServer)  and then add the extra indexes as 
shards 
String shards = solrCloudMachine+:8080/indexB,+solrCloudMachine+:8080/indexC
(solrQuery.add(shards,shards);) 

or do it in a new way within solrcloud.

FYI
My returned index is one of seven indexes under one webapp (solr_search) I want 
to stitch on the other six indexes so I can search all of the data (each index 
is updated from separate feeds).



Thanks for your quick reply.

Russ.
 

From: Furkan KAMACI [furkankam...@gmail.com]
Sent: 21 March 2014 22:55
To: solr-user@lucene.apache.org
Subject: Re: using SolrJ with SolrCloud, searching multiple indexes.

Hi Russell;

You say that:

  | CloudSolrServer server = new CloudSolrServer(solrServer1:
2111,solrServer2:2111,solrServer2:2111);

but I should mention that they are not Solr Servers that is passed into a
CloudSolrServer. They are zookeeper host:port pairs optionally includes a
chroot parameter at the end.

Thanks;
Furkan KAMACI



2014-03-21 18:11 GMT+02:00 Russell Taylor 
russell.tay...@interactivedata.com:

 Hi,
 just started to move my SolrJ queries over to our SolrCloud  environment
 and I  want to know how to do a query  where you combine multiple indexes.

 Previously I had a string called shards which links all the indexes
 together and adds them to the query.
 String shards =
 server:8080/solr_search/bonds,server:8080/solr_search/equities,etc
 which I add to my SolrQuery
 solrQuery.add(shards,shards);
 I can then search across many indexes.

 In SolrCloud we do this
 CloudSolrServer server = new
 CloudSolrServer(solrServer1:2111,solrServer2:2111,solrServer2:2111);
 and add the default collection
 server.setDefaultCollection(bonds);

 How do I add the other indexes to my query in CloudSolrServer? If it's as
 before solrQuery.add(shards,shards); how do I find out the address of the
 machine CloudSolrServer has chosen?



 Thanks


 Russ.


 ***
 This message (including any files transmitted with it) may contain
 confidential and/or proprietary information, is the property of Interactive
 Data Corporation and/or its subsidiaries, and is directed only to the
 addressee(s). If you are not the designated recipient or have reason to
 believe you received this message in error, please delete this message from
 your system and notify the sender immediately. An unintended recipient's
 disclosure, copying, distribution, or use of this message or any
 attachments is prohibited and may be unlawful.
 ***



***
This message (including any files transmitted with it) may contain confidential 
and/or proprietary information, is the property of Interactive Data Corporation 
and/or its subsidiaries, and is directed only to the addressee(s). If you are 
not the designated recipient or have reason to believe you received this 
message in error, please delete this message from your system and notify the 
sender immediately. An unintended recipient's disclosure, copying, 
distribution, or use of this message or any attachments is prohibited and may 
be unlawful. 
***



Re: using SolrJ with SolrCloud, searching multiple indexes.

2014-03-22 Thread Shawn Heisey
On 3/22/2014 7:34 AM, Russell Taylor wrote:
 Yeah sorry didn't explain myself there, one of the three zookeepers will 
 return me one of the solrcloud machines for me to access the index. I either 
 need to know which machine it returned(is this feasible I can't seem to find 
 a way to access information in SolrCloudServer)  and then add the extra 
 indexes as shards 
 String shards = 
 solrCloudMachine+:8080/indexB,+solrCloudMachine+:8080/indexC
 (solrQuery.add(shards,shards);) 
 
 or do it in a new way within solrcloud.
 
 FYI
 My returned index is one of seven indexes under one webapp (solr_search) I 
 want to stitch on the other six indexes so I can search all of the data (each 
 index is updated from separate feeds).

SolrCloud eliminates the need to use the shards parameter, so
CloudSolrServer does not expose the actual Solr instances.  You *can*
use the shards parameter, but typically it is done differently than
traditional Solr.

CloudSolrServer thinks mostly in terms of collections, not cores.  There
is a setDefaultCollection method on the server object, or you can do
solrQuery.set(collection,name).

You can query certain shards of a collection or multiple collections,
without ever knowing the host/port/core combinations:

http://wiki.apache.org/solr/SolrCloud#Distributed_Requests

There are also collection aliases on the server side, which let you
access one or more real collections with a virtual collection name.

Thanks,
Shawn



using SolrJ with SolrCloud, searching multiple indexes.

2014-03-21 Thread Russell Taylor
Hi,
just started to move my SolrJ queries over to our SolrCloud  environment and I  
want to know how to do a query  where you combine multiple indexes.

Previously I had a string called shards which links all the indexes together 
and adds them to the query.
String shards = 
server:8080/solr_search/bonds,server:8080/solr_search/equities,etc
which I add to my SolrQuery
solrQuery.add(shards,shards);
I can then search across many indexes.

In SolrCloud we do this
CloudSolrServer server = new 
CloudSolrServer(solrServer1:2111,solrServer2:2111,solrServer2:2111);
and add the default collection
server.setDefaultCollection(bonds);

How do I add the other indexes to my query in CloudSolrServer? If it's as 
before solrQuery.add(shards,shards); how do I find out the address of the 
machine CloudSolrServer has chosen?



Thanks


Russ.


***
This message (including any files transmitted with it) may contain confidential 
and/or proprietary information, is the property of Interactive Data Corporation 
and/or its subsidiaries, and is directed only to the addressee(s). If you are 
not the designated recipient or have reason to believe you received this 
message in error, please delete this message from your system and notify the 
sender immediately. An unintended recipient's disclosure, copying, 
distribution, or use of this message or any attachments is prohibited and may 
be unlawful. 
***


Re: using SolrJ with SolrCloud, searching multiple indexes.

2014-03-21 Thread Furkan KAMACI
Hi Russell;

You say that:

  | CloudSolrServer server = new CloudSolrServer(solrServer1:
2111,solrServer2:2111,solrServer2:2111);

but I should mention that they are not Solr Servers that is passed into a
CloudSolrServer. They are zookeeper host:port pairs optionally includes a
chroot parameter at the end.

Thanks;
Furkan KAMACI



2014-03-21 18:11 GMT+02:00 Russell Taylor 
russell.tay...@interactivedata.com:

 Hi,
 just started to move my SolrJ queries over to our SolrCloud  environment
 and I  want to know how to do a query  where you combine multiple indexes.

 Previously I had a string called shards which links all the indexes
 together and adds them to the query.
 String shards =
 server:8080/solr_search/bonds,server:8080/solr_search/equities,etc
 which I add to my SolrQuery
 solrQuery.add(shards,shards);
 I can then search across many indexes.

 In SolrCloud we do this
 CloudSolrServer server = new
 CloudSolrServer(solrServer1:2111,solrServer2:2111,solrServer2:2111);
 and add the default collection
 server.setDefaultCollection(bonds);

 How do I add the other indexes to my query in CloudSolrServer? If it's as
 before solrQuery.add(shards,shards); how do I find out the address of the
 machine CloudSolrServer has chosen?



 Thanks


 Russ.


 ***
 This message (including any files transmitted with it) may contain
 confidential and/or proprietary information, is the property of Interactive
 Data Corporation and/or its subsidiaries, and is directed only to the
 addressee(s). If you are not the designated recipient or have reason to
 believe you received this message in error, please delete this message from
 your system and notify the sender immediately. An unintended recipient's
 disclosure, copying, distribution, or use of this message or any
 attachments is prohibited and may be unlawful.
 ***



Solr 4.4 with log4j and multiple indexes on tomcat 6

2013-10-15 Thread Russell Taylor
Hi,
My problem is that all my indexes log to one log file but I want each index to 
log to their own log file.

I'm using solr 4.4 and I've copied  jcl-over-slf4j-1.6.6.jar, 
jul-to-slf4j-1.6.6.jar, log4j-1.2.16.jar, slf4j-api-1.6.6.jar and 
slf4j-log4j12-1.6.6.jar
into my tomcats lib/ directory.

I've added a logging.properties to each of my solr webapps in 
tomcat/webapps/solr_webapp/WEB-INF/classes/logging.properties
but when tomcat starts up it picks the first logging.properties (I presume) 
file and then all indexes log to this file.


My only solution is to add the 4 jar files to each of the solr webapps in their 
tomcat/webapps/solr_webapp/WEB-INF/lib directory
and then the webapp will pick up it's logging.properties file.

Does anyone have a solution where I don't need to add the 4 jars to each of the 
solr webapps but still get a log file per index.


Thanks

Russ.



***
This message (including any files transmitted with it) may contain confidential 
and/or proprietary information, is the property of Interactive Data Corporation 
and/or its subsidiaries, and is directed only to the addressee(s). If you are 
not the designated recipient or have reason to believe you received this 
message in error, please delete this message from your system and notify the 
sender immediately. An unintended recipient's disclosure, copying, 
distribution, or use of this message or any attachments is prohibited and may 
be unlawful. 
***


Re: Solr 4.4 with log4j and multiple indexes on tomcat 6

2013-10-15 Thread Otis Gospodnetic
Hi Russ,

It's not really indexes that lit,  but Solr running in Tomcat, so I don't
think there's a way...

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Oct 15, 2013 7:14 AM, Russell Taylor 
russell.tay...@interactivedata.com wrote:

 Hi,
 My problem is that all my indexes log to one log file but I want each
 index to log to their own log file.

 I'm using solr 4.4 and I've copied  jcl-over-slf4j-1.6.6.jar,
 jul-to-slf4j-1.6.6.jar, log4j-1.2.16.jar, slf4j-api-1.6.6.jar and
 slf4j-log4j12-1.6.6.jar
 into my tomcats lib/ directory.

 I've added a logging.properties to each of my solr webapps in
 tomcat/webapps/solr_webapp/WEB-INF/classes/logging.properties
 but when tomcat starts up it picks the first logging.properties (I
 presume) file and then all indexes log to this file.


 My only solution is to add the 4 jar files to each of the solr webapps in
 their tomcat/webapps/solr_webapp/WEB-INF/lib directory
 and then the webapp will pick up it's logging.properties file.

 Does anyone have a solution where I don't need to add the 4 jars to each
 of the solr webapps but still get a log file per index.


 Thanks

 Russ.



 ***
 This message (including any files transmitted with it) may contain
 confidential and/or proprietary information, is the property of Interactive
 Data Corporation and/or its subsidiaries, and is directed only to the
 addressee(s). If you are not the designated recipient or have reason to
 believe you received this message in error, please delete this message from
 your system and notify the sender immediately. An unintended recipient's
 disclosure, copying, distribution, or use of this message or any
 attachments is prohibited and may be unlawful.
 ***



Re: Solr 4.4 with log4j and multiple indexes on tomcat 6

2013-10-15 Thread Shawn Heisey
On 10/15/2013 5:13 AM, Russell Taylor wrote:
 My problem is that all my indexes log to one log file but I want each index 
 to log to their own log file.
 
 I'm using solr 4.4 and I've copied  jcl-over-slf4j-1.6.6.jar, 
 jul-to-slf4j-1.6.6.jar, log4j-1.2.16.jar, slf4j-api-1.6.6.jar and 
 slf4j-log4j12-1.6.6.jar
 into my tomcats lib/ directory.
 
 I've added a logging.properties to each of my solr webapps in 
 tomcat/webapps/solr_webapp/WEB-INF/classes/logging.properties
 but when tomcat starts up it picks the first logging.properties (I presume) 
 file and then all indexes log to this file.
 
 
 My only solution is to add the 4 jar files to each of the solr webapps in 
 their tomcat/webapps/solr_webapp/WEB-INF/lib directory
 and then the webapp will pick up it's logging.properties file.

Although your solution might let you log each webapp to its own file,
you are incurring extra overhead by running each index in its own full
Solr install.  One Solr install can handle thousands of separate indexes
- we call them cores.

Most of the important parts of Solr will log the core name with the
request.  I'm in the process of trying to improve that so *everything*
will include the core name in the log.  See SOLR-5277.  When that work
is complete, it may even be possible to get those logs in separate files
by switching logging implementations or writing some a custom log4j
appender.

Side issue: logging.properties is the configuration file used by
java.util.logging ... but the jar files you have mentioned will set Solr
up to use log4j.  The config file for that is typically log4j.properties
or log4j.xml.

Thanks,
Shawn



Re: Need help with search in multiple indexes

2013-06-13 Thread Toke Eskildsen
On Wed, 2013-06-12 at 23:05 +0200, smanad wrote:
 Is this a limitation of solr/lucene, should I be considering using other
 option like using Elasticsearch (which is also based on lucene)? 
 But I am sure search in multiple indexes is kind of a common problem.

You try to treat separate sources as a single index and that is tricky.
Assuming you need relevance ranking, the sources need to be homogeneous
in order for the scores to be somewhat comparable. That seems not to be
the case for you, so even if you align your schemas to get formal
compatibility, your ranking will be shot with Solr.

ElasticSearch has elaborate handling of this problem
http://www.elasticsearch.org/guide/reference/api/search/search-type/
and seems to be a better fit for you in this regard.

- Toke Eskildsen, State and University Library, Denmark



Need help with search in multiple indexes

2013-06-12 Thread smanad
Hi, 

I am thinking of using Solr to implement Search on our site. Here is my use
case, 
1. We will have multiple 4-5 indexes based on different data
types/structures and data will be indexed into these by several processes,
like cron, on demand, thru message queue applications, etc. 
2. A single web service needs to search across all these indexes and return
results. 

I am thinking of using Solr 4.2.1 or may be 4.3 with single instance -
multicore setup. 
I read about distributed search and I believe I should be able to search
across multiple indices using shards parameters. However in my case, all
shards will be on same host/port but with different core name. 

Is my understanding correct? Or is there any better alternative to this
approach?

Please suggest. 
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with search in multiple indexes

2013-06-12 Thread Michael Della Bitta
Manasi,

Everything hinges on these indexes having similar enough schema that they
can be represented as a union of all the fields from each type, where most
of the searched data is common to all types. If so, you have a few options
for querying them all together... distributed search, creating one large
index and adding a type field, etc.

If, however, your data is heterogeneous enough that the schemas are not
really comparable, you're probably stuck coordinating the results
externally.


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Wed, Jun 12, 2013 at 3:55 PM, smanad sma...@gmail.com wrote:

 Hi,

 I am thinking of using Solr to implement Search on our site. Here is my use
 case,
 1. We will have multiple 4-5 indexes based on different data
 types/structures and data will be indexed into these by several processes,
 like cron, on demand, thru message queue applications, etc.
 2. A single web service needs to search across all these indexes and return
 results.

 I am thinking of using Solr 4.2.1 or may be 4.3 with single instance -
 multicore setup.
 I read about distributed search and I believe I should be able to search
 across multiple indices using shards parameters. However in my case, all
 shards will be on same host/port but with different core name.

 Is my understanding correct? Or is there any better alternative to this
 approach?

 Please suggest.
 Thanks,
 -Manasi



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need help with search in multiple indexes

2013-06-12 Thread smanad
Thanks for the reply Michael. 

In some cases schema is similar but not all of them. So lets go with
assumption schema NOT being similar.

I am not quite sure what you mean by you're probably stuck coordinating the
results externally.  Do you mean, searching in each index and then somehow
merge results manually? will I still be able to use shards parameters? or
no?

Also, I was planning to use php library SolrClient. Do you see any downside?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070049.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with search in multiple indexes

2013-06-12 Thread Michael Della Bitta
 I am not quite sure what you mean by you're probably stuck coordinating
 the
 results externally.  Do you mean, searching in each index and then somehow
 merge results manually? will I still be able to use shards parameters? or
 no?


If your schemas don't match up, you can't use distributed search, so yes,
manual merging. You can't use the shards parameter across indexes with
incompatible schema.

I'd strongly consider just including all the fields in a single schema and
leaving them blank if they don't apply to a given type of data.



 Also, I was planning to use php library SolrClient. Do you see any
 downside?


No, this works fine!


Re: Need help with search in multiple indexes

2013-06-12 Thread smanad
Is this a limitation of solr/lucene, should I be considering using other
option like using Elasticsearch (which is also based on lucene)? 
But I am sure search in multiple indexes is kind of a common problem.

Also, i as reading this post
http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
in one of the comments it says, 
So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with
fields documentId,fieldC,fieldD. Then I create another core, lets say Core3
with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be
importing data into this core? And then create a query handler, that
includes the shard parameter. So when I query Core3, it will never really
contain indexed data, but because of the shard searching it will fetch the
results from the other to cores, and present it on the 3rd core? Thanks
for the help! 

Is that what I should be doing? So all the indexing still happens in
separate cores but searching happens in a one single core?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with search in multiple indexes

2013-06-12 Thread Michael Della Bitta
I had not heard of that technique before. Interesting!

But couldn't you do the same thing with a unified schema spread among your
cores?

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Wed, Jun 12, 2013 at 5:05 PM, smanad sma...@gmail.com wrote:

 Is this a limitation of solr/lucene, should I be considering using other
 option like using Elasticsearch (which is also based on lucene)?
 But I am sure search in multiple indexes is kind of a common problem.

 Also, i as reading this post

 http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
 in one of the comments it says,
 So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with
 fields documentId,fieldC,fieldD. Then I create another core, lets say Core3
 with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be
 importing data into this core? And then create a query handler, that
 includes the shard parameter. So when I query Core3, it will never really
 contain indexed data, but because of the shard searching it will fetch the
 results from the other to cores, and present it on the 3rd core? Thanks
 for the help! 

 Is that what I should be doing? So all the indexing still happens in
 separate cores but searching happens in a one single core?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need help with search in multiple indexes

2013-06-12 Thread smanad
In my case, different teams will be updating indexes at different intervals
so having separate cores gives more control. However, I can still
update(add/edit/delete) data with conditions like check for doc type.

Its just that, using shards sounds much cleaner and readable.

However, I am not yet sure if there might be any performance issues.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070061.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with search in multiple indexes

2013-06-12 Thread Jack Krupansky
Michael's point was that the schema need to be compatible. I mean, if you 
query fields A, B, C, and D, and index1 has fields A and B, while index2 has 
fields C and D, and index3 has fields E and F, what kind of results do you 
think you will get back?!


Whether the schemas must be identical is not absolutely clear, but they at 
least have to include all the fields that queries will use. And... key 
values need to be unique across indexes.


Yes, Solr CAN do it. But to imagine that it would give reasonable query 
results with no coordination between the developers of the separate indexes 
is a little too much.


The bottom line: Somebody needs to coordinate the development of the schemas 
for the separate indexes so that they will be compatible from a query term 
and key value perspective, at a minimum.


-- Jack Krupansky

-Original Message- 
From: smanad

Sent: Wednesday, June 12, 2013 5:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Need help with search in multiple indexes

Is this a limitation of solr/lucene, should I be considering using other
option like using Elasticsearch (which is also based on lucene)?
But I am sure search in multiple indexes is kind of a common problem.

Also, i as reading this post
http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
in one of the comments it says,
So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with
fields documentId,fieldC,fieldD. Then I create another core, lets say Core3
with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be
importing data into this core? And then create a query handler, that
includes the shard parameter. So when I query Core3, it will never really
contain indexed data, but because of the shard searching it will fetch the
results from the other to cores, and present it on the 3rd core? Thanks
for the help! 

Is that what I should be doing? So all the indexing still happens in
separate cores but searching happens in a one single core?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: multiple indexes?

2012-12-02 Thread Joe Zhang
This is very helpful. Thanks a lot, Shaun and Dikchant!

So in default single-core situation, the index would live in data/index,
correct?

On Fri, Nov 30, 2012 at 11:02 PM, Shawn Heisey s...@elyograg.org wrote:

 On 11/30/2012 10:11 PM, Joe Zhang wrote:

 May I ask: how to set up multiple indexes, and specify which index to send
 the docs to at indexing time, and later on, how to specify which index to
 work with?

 A related question: what is the storage location and structure of solr
 indexes?

 When you index or query data, you'll use a base URL specific to the index
 (core).  Everything goes through that base URL, which includes the name of
 the core:

 http://server:port/solr/**corename

 The file called solr.xml tells Solr about multiple cores.Each core has an
 instanceDir and a dataDir.

 http://wiki.apache.org/solr/**CoreAdminhttp://wiki.apache.org/solr/CoreAdmin

 In the dataDir, Solr will create an index dir, which contains the Lucene
 index.  Here are the file formats for recent versions:

 http://lucene.apache.org/core/**4_0_0/core/org/apache/lucene/**
 codecs/lucene40/package-**summary.htmlhttp://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html
 http://lucene.apache.org/core/**3_6_1/fileformats.htmlhttp://lucene.apache.org/core/3_6_1/fileformats.html
 http://lucene.apache.org/core/**old_versioned_docs/versions/3_**
 5_0/fileformats.htmlhttp://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html

 Thanks,
 Shawn




Re: multiple indexes?

2012-11-30 Thread Dikchant Sahi
Multiple indexes can be setup using the multi core feature of Solr.

Below are the steps:
1. Add the core name and storage location of the core to
the $SOLR_HOME/solr.xml file.
  cores adminPath=/admin/cores defaultCoreName=core-name1 
*core name=core-name1 instanceDir=core-dir1 /*
*core name=core-name2 instanceDir=core-dir2 /*
  /cores

2. Create the core-directories specified and following sub-directories in
it:
- conf: Contains the configs and schema definition
- lib: Contains the required libraries
- data: Will be created automatically on first run. This would contain
the actual index.

While indexing the docs, you specify the core name in the url as follows:
  http://host:port/solr/core-name/update?parameters

Similarly you do while querying.

Please refer to Solr Wiki, it has the complete details.

Hope this helps!

- Dikchant

On Sat, Dec 1, 2012 at 10:41 AM, Joe Zhang smartag...@gmail.com wrote:

 May I ask: how to set up multiple indexes, and specify which index to send
 the docs to at indexing time, and later on, how to specify which index to
 work with?

 A related question: what is the storage location and structure of solr
 indexes?

 Thanks in advance, guys!

 Joe.



Re: multiple indexes?

2012-11-30 Thread Shawn Heisey

On 11/30/2012 10:11 PM, Joe Zhang wrote:

May I ask: how to set up multiple indexes, and specify which index to send
the docs to at indexing time, and later on, how to specify which index to
work with?

A related question: what is the storage location and structure of solr
indexes?
When you index or query data, you'll use a base URL specific to the 
index (core).  Everything goes through that base URL, which includes the 
name of the core:


http://server:port/solr/corename

The file called solr.xml tells Solr about multiple cores.Each core has 
an instanceDir and a dataDir.


http://wiki.apache.org/solr/CoreAdmin

In the dataDir, Solr will create an index dir, which contains the Lucene 
index.  Here are the file formats for recent versions:


http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html
http://lucene.apache.org/core/3_6_1/fileformats.html
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html

Thanks,
Shawn



Re: SOLR - To point multiple indexes in different folder

2012-11-02 Thread Erick Erickson
That should be OK. The recursive bit happens when you define the shard
locations in your config files in the default search handler.


On Fri, Nov 2, 2012 at 6:42 AM, ravi.n rav...@ornext.com wrote:

 Erick,

 We are forming request something like below for default /select request
 handler, will this cause an issue?
 So far we are not facing any recursive issues.


 http://94.101.147.150:8080/solr/select/?q=*%3A*version=2.2start=0rows=10indent=onshards=localhost:8080/solr/coll1,localhost:8080/solr/coll2,localhost:8080/solr/coll3,localhost:8080/solr/coll4,localhost:8080/solr/coll5,localhost:8080/solr/coll6,localhost:8080/solr/coll7

 Below is the solrconfig for /select

   requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dfrecordid/str
  /lst

 recordid - is the unique field in the document.

 Regards,
 Ravi



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640p4017783.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR - To point multiple indexes in different folder

2012-11-02 Thread ravi.n
Erick,

We are forming request something like below for default /select request
handler, will this cause an issue?
So far we are not facing any recursive issues.

http://94.101.147.150:8080/solr/select/?q=*%3A*version=2.2start=0rows=10indent=onshards=localhost:8080/solr/coll1,localhost:8080/solr/coll2,localhost:8080/solr/coll3,localhost:8080/solr/coll4,localhost:8080/solr/coll5,localhost:8080/solr/coll6,localhost:8080/solr/coll7

Below is the solrconfig for /select

  requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dfrecordid/str
 /lst

recordid - is the unique field in the document.

Regards,
Ravi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640p4017783.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR - To point multiple indexes in different folder

2012-10-30 Thread ravi.n
Erick,

Thanks for your response.
All the 7 folders are of same schema, i mean document structure is same. I
am not very sure how did customer get this data dump into different folders.
Now we have configured Solr with multicore, each core pointing to each
directory and using shards to get a single search response. Please suggest
is this right approach.

  cores adminPath=/admin/cores sharedLib=lib defaultCoreName=coll1
core name=coll1 instanceDir=1 /
core name=coll2 instanceDir=2 /
core name=coll3 instanceDir=3 /
core name=coll4 instanceDir=4 /
core name=coll5 instanceDir=5 /
core name=coll6 instanceDir=6 /
core name=coll7 instanceDir=7 /
  /cores
/solr

And now we should also configure solr for indexing new data from CSV file, i
am not sure how to configure this?

Regards,
Ravi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640p4016946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR - To point multiple indexes in different folder

2012-10-30 Thread Erick Erickson
Until you nail down what the user did, this is may cause
problems. A sharded system assumes that the unique IDs
uniqueKey in your schema exists on one and only one shard,
otherwise you'll be getting multiple copies of the docs.

And you've only shown a multi-core setup, NOT a sharded
setup. You need to define a searchhandler in solrconfig.xml
similar to a requestHandler and provide the shards as
defaults.

I don't have the reference close to hand, but you should be able
to find it with some searching. Beware the recursion problem
that you'll see referenced. Last I knew you can't configure your
shards in the default search handler, since that's the one that
gets the sub-requests for all your nodes

Best
Erick


On Tue, Oct 30, 2012 at 5:01 AM, ravi.n rav...@ornext.com wrote:
 Erick,

 Thanks for your response.
 All the 7 folders are of same schema, i mean document structure is same. I
 am not very sure how did customer get this data dump into different folders.
 Now we have configured Solr with multicore, each core pointing to each
 directory and using shards to get a single search response. Please suggest
 is this right approach.

   cores adminPath=/admin/cores sharedLib=lib defaultCoreName=coll1
 core name=coll1 instanceDir=1 /
 core name=coll2 instanceDir=2 /
 core name=coll3 instanceDir=3 /
 core name=coll4 instanceDir=4 /
 core name=coll5 instanceDir=5 /
 core name=coll6 instanceDir=6 /
 core name=coll7 instanceDir=7 /
   /cores
 /solr

 And now we should also configure solr for indexing new data from CSV file, i
 am not sure how to configure this?

 Regards,
 Ravi



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640p4016946.html
 Sent from the Solr - User mailing list archive at Nabble.com.


SOLR - To point multiple indexes in different folder

2012-10-29 Thread ravi.n
Hello Solr Gurus,

I am newbie to solr application, below are my requirements:

1. We have 7 folders having indexed files, which SOLR application to be
pointed. I understand shards feature can be used for searching. If there is
any other alternative. Each folder has around 24 million documents.
2. We should configure solr for indexing new incoming data from database/SCV
file, whtas is the required configuration in solr to achieve this?

Any quick response on this will be appreciated.
Thanks

Regards,
Ravi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR - To point multiple indexes in different folder

2012-10-29 Thread Erick Erickson
How did you get the 7 directories anyway? From your message,
they sound like they are _solr_ indexes, in which case you
somehow created then with Solr. But I don't really understand
the setup in that case.

If these are Solr/Lucene indexes, you can use the multicore
features. This treats them like separate indexes and you have
to address each specifically, something like ...locahost/solr/collection2/select
etc.

Sharding, on the other hand, _assumes_ that all the indexes
really make up one logical index and handles the distribution/collation
automatically.

If this makes no sense, could you explain your setup a little more?

Best
Erick


On Mon, Oct 29, 2012 at 7:34 AM, ravi.n rav...@ornext.com wrote:
 Hello Solr Gurus,

 I am newbie to solr application, below are my requirements:

 1. We have 7 folders having indexed files, which SOLR application to be
 pointed. I understand shards feature can be used for searching. If there is
 any other alternative. Each folder has around 24 million documents.
 2. We should configure solr for indexing new incoming data from database/SCV
 file, whtas is the required configuration in solr to achieve this?

 Any quick response on this will be appreciated.
 Thanks

 Regards,
 Ravi



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SOLR-To-point-multiple-indexes-in-different-folder-tp4016640.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Search over multiple indexes

2011-11-28 Thread Valeriy Felberg
Hello,

I'm trying to implement automatic document classification and store
the classified attributes as an additional field in Solr document.
Then the search goes against that field like
q=classified_category:xyz. The document classification is currently
implemented as an UpdateRequestProcessor and works quite well. The
only problem: for each change in the classification algorithm every
document has to be re-indexed which, of course, makes tests and
experimentation difficult and binds resources (other than Solr) for
several hours.

So, my idea would be to store classified attributes in a meta-index
and search over the main and meta indexes simultaneously. For example:
main index has got fields like color and meta index has got
classified_category. The query q=classified_category:xyz AND
color:black should be then split over the main and meta index. This
way, the classification could run on Solr over the main index and
store classified fields in the meta index so that only Solr resources
are bound.

Has anybody already done something like that? It's a little bit like
sharding but different in that each shard would process its part of
the query and live in the same Solr instance.

Regards,
Valeriy


Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Erick Erickson
Let's see...
1 Committing every second, even with commitWithin is probably going
to be a problem.
 I usually think that 1 second latency is usually overkill, but
that's up to your
 product manager. Look at the NRT (Near Real Time) stuff if you
really need this.
 I thought that NRT was only on trunk, but it *might* be in the
3.4 code base.
2 Don't understand what a single index per entity is. How many cores do you
 have total? For not very many records, I'd put everything in a
single index and
 use filterqueries to restrict views.
3 I guess this relates to 2. And I'd use a single core. If, for
some reason, you decide
 that you need multiple indexes, use several cores with ONE Solr
rather than start
 a new Solr per core, it's more resource expensive to have
multiple JVMs around.

Best
Erick

On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco
comfortablynum...@gmail.com wrote:
 Hi guys!

 I have a couple of questions that I hope someone could help me with:

 1) Recently I've implemented Solr in my app. My use case is not
 complicated. Suppose that there will be 50 concurrent users tops. This is
 an app like, let's say, a CRM. I tell you this so you have an idea in terms
 of how many read and write operations will be needed. What I do need is
 that the data that is added / updated be available right after it's added /
 updated (maybe a second later it's ok). I know that the commit operation is
 expensive, so maybe doing a commit right after each write operation is not
 a good idea. I'm trying to use the autoCommit feature with a maxTime of
 1000ms, but then the question arised: Is this the best way to handle this
 type of situation? and if not, what should I do?

 2) I'm using a single index per entity type because I've read that if the
 app is not handling lots of data (let's say, 1 million of records) then
 it's safe to use a single index. Is this true? if not, why?

 3) Is it a problem if I use a simple setup of Solr using a single core for
 this use case? if not, what do you recommend?



 Any help in any of these topics would be greatly appreciated.

 Thanks in advance!



Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Gustavo Falco
First of all, thanks a lot for your answer.

1) I could use 5 to 15 seconds between each commit and give it a try. Is
this an acceptable configuration? I'll take a look at NRT.
2) Currently I'm using a single core, the simplest setup. I don't expect to
have an overwhelming quantity of records, but I do have lots of classes to
persist, and I need to search all of them at the same time, and not per
class (entity). For now is working good. With multiple indexes I mean using
an index for each entity. Let's say, an index for Articles, another for
Users, etc. The thing is that I don't know when I should divide it and
use one index for each entity (or if it's possible to make a UNION like
search between every index). I've read that when an entity reaches the size
of one million records then it's best to give it a dedicated index, even
though I don't expect to have that size even with all my entities. But I
wanted to know from you just to be sure.
3) Great! for now I think I'll stick with one index, but it's good to know
that in case I need to change later for some reason.



Again, thanks a lot for your help!

2011/11/4 Erick Erickson erickerick...@gmail.com

 Let's see...
 1 Committing every second, even with commitWithin is probably going
 to be a problem.
 I usually think that 1 second latency is usually overkill, but
 that's up to your
 product manager. Look at the NRT (Near Real Time) stuff if you
 really need this.
 I thought that NRT was only on trunk, but it *might* be in the
 3.4 code base.
 2 Don't understand what a single index per entity is. How many cores do
 you
 have total? For not very many records, I'd put everything in a
 single index and
 use filterqueries to restrict views.
 3 I guess this relates to 2. And I'd use a single core. If, for
 some reason, you decide
 that you need multiple indexes, use several cores with ONE Solr
 rather than start
 a new Solr per core, it's more resource expensive to have
 multiple JVMs around.

 Best
 Erick

 On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco
 comfortablynum...@gmail.com wrote:
  Hi guys!
 
  I have a couple of questions that I hope someone could help me with:
 
  1) Recently I've implemented Solr in my app. My use case is not
  complicated. Suppose that there will be 50 concurrent users tops. This is
  an app like, let's say, a CRM. I tell you this so you have an idea in
 terms
  of how many read and write operations will be needed. What I do need is
  that the data that is added / updated be available right after it's
 added /
  updated (maybe a second later it's ok). I know that the commit operation
 is
  expensive, so maybe doing a commit right after each write operation is
 not
  a good idea. I'm trying to use the autoCommit feature with a maxTime of
  1000ms, but then the question arised: Is this the best way to handle this
  type of situation? and if not, what should I do?
 
  2) I'm using a single index per entity type because I've read that if the
  app is not handling lots of data (let's say, 1 million of records) then
  it's safe to use a single index. Is this true? if not, why?
 
  3) Is it a problem if I use a simple setup of Solr using a single core
 for
  this use case? if not, what do you recommend?
 
 
 
  Any help in any of these topics would be greatly appreciated.
 
  Thanks in advance!
 



RE: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Brian Gerby

Gustavo - 

Even with the most basic requirements, I'd recommend setting up a multi-core 
configuration so you can RELOAD the main core you will be using when you make 
simple changes to config files. This is much cleaner than bouncing solr each 
time. There are other benefits to doing it, but this is the main reason I do 
it.  

Brian 

 Date: Fri, 4 Nov 2011 15:34:27 -0300
 Subject: Re: Three questions about: Commit, single index vs multiple indexes 
 and implementation advice
 From: comfortablynum...@gmail.com
 To: solr-user@lucene.apache.org
 
 First of all, thanks a lot for your answer.
 
 1) I could use 5 to 15 seconds between each commit and give it a try. Is
 this an acceptable configuration? I'll take a look at NRT.
 2) Currently I'm using a single core, the simplest setup. I don't expect to
 have an overwhelming quantity of records, but I do have lots of classes to
 persist, and I need to search all of them at the same time, and not per
 class (entity). For now is working good. With multiple indexes I mean using
 an index for each entity. Let's say, an index for Articles, another for
 Users, etc. The thing is that I don't know when I should divide it and
 use one index for each entity (or if it's possible to make a UNION like
 search between every index). I've read that when an entity reaches the size
 of one million records then it's best to give it a dedicated index, even
 though I don't expect to have that size even with all my entities. But I
 wanted to know from you just to be sure.
 3) Great! for now I think I'll stick with one index, but it's good to know
 that in case I need to change later for some reason.
 
 
 
 Again, thanks a lot for your help!
 
 2011/11/4 Erick Erickson erickerick...@gmail.com
 
  Let's see...
  1 Committing every second, even with commitWithin is probably going
  to be a problem.
  I usually think that 1 second latency is usually overkill, but
  that's up to your
  product manager. Look at the NRT (Near Real Time) stuff if you
  really need this.
  I thought that NRT was only on trunk, but it *might* be in the
  3.4 code base.
  2 Don't understand what a single index per entity is. How many cores do
  you
  have total? For not very many records, I'd put everything in a
  single index and
  use filterqueries to restrict views.
  3 I guess this relates to 2. And I'd use a single core. If, for
  some reason, you decide
  that you need multiple indexes, use several cores with ONE Solr
  rather than start
  a new Solr per core, it's more resource expensive to have
  multiple JVMs around.
 
  Best
  Erick
 
  On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco
  comfortablynum...@gmail.com wrote:
   Hi guys!
  
   I have a couple of questions that I hope someone could help me with:
  
   1) Recently I've implemented Solr in my app. My use case is not
   complicated. Suppose that there will be 50 concurrent users tops. This is
   an app like, let's say, a CRM. I tell you this so you have an idea in
  terms
   of how many read and write operations will be needed. What I do need is
   that the data that is added / updated be available right after it's
  added /
   updated (maybe a second later it's ok). I know that the commit operation
  is
   expensive, so maybe doing a commit right after each write operation is
  not
   a good idea. I'm trying to use the autoCommit feature with a maxTime of
   1000ms, but then the question arised: Is this the best way to handle this
   type of situation? and if not, what should I do?
  
   2) I'm using a single index per entity type because I've read that if the
   app is not handling lots of data (let's say, 1 million of records) then
   it's safe to use a single index. Is this true? if not, why?
  
   3) Is it a problem if I use a simple setup of Solr using a single core
  for
   this use case? if not, what do you recommend?
  
  
  
   Any help in any of these topics would be greatly appreciated.
  
   Thanks in advance!
  
 
  

Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Gustavo Falco
Hi Brian,

I'll take a look at what you mentioned. I didn't think about that. I'll
finish the implementation at the app level and then I'll read a little more
about multi-core setups. Maybe I don't know yet all the benefits it has.


Thanks a lot for your advice.

2011/11/4 Brian Gerby briange...@hotmail.com


 Gustavo -

 Even with the most basic requirements, I'd recommend setting up a
 multi-core configuration so you can RELOAD the main core you will be using
 when you make simple changes to config files. This is much cleaner than
 bouncing solr each time. There are other benefits to doing it, but this is
 the main reason I do it.

 Brian

  Date: Fri, 4 Nov 2011 15:34:27 -0300
  Subject: Re: Three questions about: Commit, single index vs multiple
 indexes and implementation advice
  From: comfortablynum...@gmail.com
  To: solr-user@lucene.apache.org
 
  First of all, thanks a lot for your answer.
 
  1) I could use 5 to 15 seconds between each commit and give it a try. Is
  this an acceptable configuration? I'll take a look at NRT.
  2) Currently I'm using a single core, the simplest setup. I don't expect
 to
  have an overwhelming quantity of records, but I do have lots of classes
 to
  persist, and I need to search all of them at the same time, and not per
  class (entity). For now is working good. With multiple indexes I mean
 using
  an index for each entity. Let's say, an index for Articles, another for
  Users, etc. The thing is that I don't know when I should divide it and
  use one index for each entity (or if it's possible to make a UNION like
  search between every index). I've read that when an entity reaches the
 size
  of one million records then it's best to give it a dedicated index, even
  though I don't expect to have that size even with all my entities. But I
  wanted to know from you just to be sure.
  3) Great! for now I think I'll stick with one index, but it's good to
 know
  that in case I need to change later for some reason.
 
 
 
  Again, thanks a lot for your help!
 
  2011/11/4 Erick Erickson erickerick...@gmail.com
 
   Let's see...
   1 Committing every second, even with commitWithin is probably going
   to be a problem.
   I usually think that 1 second latency is usually overkill, but
   that's up to your
   product manager. Look at the NRT (Near Real Time) stuff if you
   really need this.
   I thought that NRT was only on trunk, but it *might* be in the
   3.4 code base.
   2 Don't understand what a single index per entity is. How many
 cores do
   you
   have total? For not very many records, I'd put everything in a
   single index and
   use filterqueries to restrict views.
   3 I guess this relates to 2. And I'd use a single core. If, for
   some reason, you decide
   that you need multiple indexes, use several cores with ONE Solr
   rather than start
   a new Solr per core, it's more resource expensive to have
   multiple JVMs around.
  
   Best
   Erick
  
   On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco
   comfortablynum...@gmail.com wrote:
Hi guys!
   
I have a couple of questions that I hope someone could help me with:
   
1) Recently I've implemented Solr in my app. My use case is not
complicated. Suppose that there will be 50 concurrent users tops.
 This is
an app like, let's say, a CRM. I tell you this so you have an idea in
   terms
of how many read and write operations will be needed. What I do need
 is
that the data that is added / updated be available right after it's
   added /
updated (maybe a second later it's ok). I know that the commit
 operation
   is
expensive, so maybe doing a commit right after each write operation
 is
   not
a good idea. I'm trying to use the autoCommit feature with a maxTime
 of
1000ms, but then the question arised: Is this the best way to handle
 this
type of situation? and if not, what should I do?
   
2) I'm using a single index per entity type because I've read that
 if the
app is not handling lots of data (let's say, 1 million of records)
 then
it's safe to use a single index. Is this true? if not, why?
   
3) Is it a problem if I use a simple setup of Solr using a single
 core
   for
this use case? if not, what do you recommend?
   
   
   
Any help in any of these topics would be greatly appreciated.
   
Thanks in advance!
   
  




Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-03 Thread Gustavo Falco
Hi guys!

I have a couple of questions that I hope someone could help me with:

1) Recently I've implemented Solr in my app. My use case is not
complicated. Suppose that there will be 50 concurrent users tops. This is
an app like, let's say, a CRM. I tell you this so you have an idea in terms
of how many read and write operations will be needed. What I do need is
that the data that is added / updated be available right after it's added /
updated (maybe a second later it's ok). I know that the commit operation is
expensive, so maybe doing a commit right after each write operation is not
a good idea. I'm trying to use the autoCommit feature with a maxTime of
1000ms, but then the question arised: Is this the best way to handle this
type of situation? and if not, what should I do?

2) I'm using a single index per entity type because I've read that if the
app is not handling lots of data (let's say, 1 million of records) then
it's safe to use a single index. Is this true? if not, why?

3) Is it a problem if I use a simple setup of Solr using a single core for
this use case? if not, what do you recommend?



Any help in any of these topics would be greatly appreciated.

Thanks in advance!


Re: Multiple indexes

2011-06-19 Thread lee carroll
your data is being used to build an inverted index rather than being
stored as a set of records. de-normalising is fine in most cases. what
is your use case which requires a normalised set of indices ?

2011/6/18 François Schiettecatte fschietteca...@gmail.com:
 You would need to run two independent searches and then 'join' the results.

 It is best not to apply a 'sql' mindset to SOLR when it comes to 
 (de)normalization, whereas you strive for normalization in sql, that is 
 usually counter-productive in SOLR. For example, I am working on a project 
 with 30+ normalized tables, but only 4 cores.

 Perhaps describing what you are trying to achieve would give us greater 
 insight and thus be able to make more concrete recommendation?

 Cheers

 François

 On Jun 18, 2011, at 2:36 PM, shacky wrote:

 Il 18 giugno 2011 20:27, François Schiettecatte
 fschietteca...@gmail.com ha scritto:
 Sure.

 So I can have some searches similar to JOIN on MySQL?
 The problem is that I need at least two tables in which search data..




Re: Multiple indexes

2011-06-18 Thread shacky
2011/6/15 Edoardo Tosca e.to...@sourcesense.com:
 Try to use multiple cores:
 http://wiki.apache.org/solr/CoreAdmin

Can I do concurrent searches on multiple cores?


Re: Multiple indexes

2011-06-18 Thread François Schiettecatte
Sure.

François

On Jun 18, 2011, at 2:25 PM, shacky wrote:

 2011/6/15 Edoardo Tosca e.to...@sourcesense.com:
 Try to use multiple cores:
 http://wiki.apache.org/solr/CoreAdmin
 
 Can I do concurrent searches on multiple cores?



Re: Multiple indexes

2011-06-18 Thread shacky
Il 18 giugno 2011 20:27, François Schiettecatte
fschietteca...@gmail.com ha scritto:
 Sure.

So I can have some searches similar to JOIN on MySQL?
The problem is that I need at least two tables in which search data..


Re: Multiple indexes

2011-06-18 Thread François Schiettecatte
You would need to run two independent searches and then 'join' the results.

It is best not to apply a 'sql' mindset to SOLR when it comes to 
(de)normalization, whereas you strive for normalization in sql, that is usually 
counter-productive in SOLR. For example, I am working on a project with 30+ 
normalized tables, but only 4 cores.

Perhaps describing what you are trying to achieve would give us greater insight 
and thus be able to make more concrete recommendation?

Cheers

François 

On Jun 18, 2011, at 2:36 PM, shacky wrote:

 Il 18 giugno 2011 20:27, François Schiettecatte
 fschietteca...@gmail.com ha scritto:
 Sure.
 
 So I can have some searches similar to JOIN on MySQL?
 The problem is that I need at least two tables in which search data..



RE: Multiple indexes

2011-06-17 Thread Pierre GOSSE
 I think there are reasons to use seperate indexes for each document type
 but do combined searches on these indexes
 (for example if you need separate TFs for each document type).

I wonder if in this precise case it wouldn't be pertinent to have a single 
index with the various document types each having each their own fields set. 
Isn't TF calculated field by field ?


RE: Multiple indexes

2011-06-17 Thread Kai Gülzau
  (for example if you need separate TFs for each document type).
 
 I wonder if in this precise case it wouldn't be pertinent to 
 have a single index with the various document types each 
 having each their own fields set. Isn't TF calculated field by field ?

Oh, you are right :)
So i will start testing with one mixed type index and
perhaps use IndexReaderFactory afterwards in comparison.

Thanks,

Kai Gülzau

RE: Multiple indexes

2011-06-16 Thread Kai Gülzau
Are there any plans to support a kind of federated search
in a future solr version?

I think there are reasons to use seperate indexes for each document type
but do combined searches on these indexes
(for example if you need separate TFs for each document type).

I am aware of http://wiki.apache.org/solr/DistributedSearch
and a workaround to do federated search with sharding
http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
but this seems to be too much network- and maintenance overhead.

Perhaps it is worth a try to use an IndexReaderFactory which
returns a lucene MultiReader!?
Is the IndexReaderFactory still Experimental?
https://issues.apache.org/jira/browse/SOLR-1366


Regards,

Kai Gülzau

 

 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
 Sent: Wednesday, June 15, 2011 8:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Multiple indexes
 
 Next, however, I predict you're going to ask how you do a 'join' or 
 otherwise query accross both these cores at once though. You can't do 
 that in Solr.
 
 On 6/15/2011 1:00 PM, Frank Wesemann wrote:
  You'll configure multiple cores:
  http://wiki.apache.org/solr/CoreAdmin
  Hi.
 
  How to have multiple indexes in SOLR, with different fields and
  different types of data?
 
  Thank you very much!
  Bye.
 
 
 

Multiple indexes

2011-06-15 Thread shacky
Hi.

How to have multiple indexes in SOLR, with different fields and
different types of data?

Thank you very much!
Bye.


Re: Multiple indexes

2011-06-15 Thread Edoardo Tosca
Try to use multiple cores:
http://wiki.apache.org/solr/CoreAdmin

On Wed, Jun 15, 2011 at 5:55 PM, shacky shack...@gmail.com wrote:

 Hi.

 How to have multiple indexes in SOLR, with different fields and
 different types of data?

 Thank you very much!
 Bye.




-- 
Edoardo Tosca
Sourcesense - making sense of Open Source: http://www.sourcesense.com


Re: Multiple indexes

2011-06-15 Thread Frank Wesemann

You'll configure multiple cores:
http://wiki.apache.org/solr/CoreAdmin

Hi.

How to have multiple indexes in SOLR, with different fields and
different types of data?

Thank you very much!
Bye.
  



--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





Re: Multiple indexes

2011-06-15 Thread Jonathan Rochkind
Next, however, I predict you're going to ask how you do a 'join' or 
otherwise query accross both these cores at once though. You can't do 
that in Solr.


On 6/15/2011 1:00 PM, Frank Wesemann wrote:

You'll configure multiple cores:
http://wiki.apache.org/solr/CoreAdmin

Hi.

How to have multiple indexes in SOLR, with different fields and
different types of data?

Thank you very much!
Bye.





Re: Multiple indexes inside a single core

2010-10-29 Thread Valli Indraganti
Here's the Jira issue for the distributed search issue.
https://issues.apache.org/jira/browse/SOLR-1632

I tried applying this patch but, get the same error that is posted in the
discussion section for that issue. I will be glad to help too on this one.

On Sat, Oct 23, 2010 at 2:35 PM, Erick Erickson erickerick...@gmail.comwrote:

 Ah, I should have read more carefully...

 I remember this being discussed on the dev list, and I thought there might
 be
 a Jira attached but I sure can't find it.

 If you're willing to work on it, you might hop over to the solr dev list
 and
 start
 a discussion, maybe ask for a place to start. I'm sure some of the devs
 have
 thought about this...

 If nobody on the dev list says There's already a JIRA on it, then you
 should
 open one. The Jira issues are generally preferred when you start getting
 into
 design because the comments are preserved for the next person who tries
 the idea or makes changes, etc

 Best
 Erick

 On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess ben.bogg...@gmail.com
 wrote:

  Thanks Erick.  The problem with multiple cores is that the documents are
  scored independently in each core.  I would like to be able to search
 across
  both cores and have the scores 'normalized' in a way that's similar to
 what
  Lucene's MultiSearcher would do.  As far a I understand, multiple cores
  would likely result in seriously skewed scores in my case since the
  documents are not distributed evenly or randomly.  I could have one
  core/index with 20 million docs and another with 200.
 
  I've poked around in the code and this feature doesn't seem to exist.  I
  would be happy with finding a decent place to try to add it.  I'm not
 sure
  if there is a clean place for it.
 
  Ben
 
  On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   It seems to me that multiple cores are along the lines you
   need, a single instance of Solr that can search across multiple
   sub-indexes that do not necessarily share schemas, and are
   independently maintainable..
  
   This might be a good place to start:
  http://wiki.apache.org/solr/CoreAdmin
  
   HTH
   Erick
  
   On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com
  wrote:
  
   We are trying to convert a Lucene-based search solution to a
   Solr/Lucene-based solution.  The problem we have is that we currently
  have
   our data split into many indexes and Solr expects things to be in a
  single
   index unless you're sharding.  In addition to this, our indexes
 wouldn't
   work well using the distributed search functionality in Solr because
 the
   documents are not evenly or randomly distributed.  We are currently
  using
   Lucene's MultiSearcher to search over subsets of these indexes.
  
   I know this has been brought up a number of times in previous posts
 and
  the
   typical response is that the best thing to do is to convert everything
  into
   a single index.  One of the major reasons for having the indexes split
  up
   the way we do is because different types of data need to be indexed at
   different intervals.  You may need one index to be updated every 20
  minutes
   and another is only updated every week.  If we move to a single index,
  then
   we will constantly be warming and replacing searchers for the entire
   dataset, and will essentially render the searcher caches useless.  If
 we
   were able to have multiple indexes, they would each have a searcher
 and
   updates would be isolated to a subset of the data.
  
   The other problem is that we will likely need to shard this large
 single
   index and there isn't a clean way to shard randomly and evenly across
  the
   of
   the data.  We would, however like to shard a single data type.  If we
  could
   use multiple indexes, we would likely be also sharding a small sub-set
  of
   them.
  
   Thanks in advance,
  
   Ben
  
 



Re: Multiple indexes inside a single core

2010-10-23 Thread Erick Erickson
Ah, I should have read more carefully...

I remember this being discussed on the dev list, and I thought there might
be
a Jira attached but I sure can't find it.

If you're willing to work on it, you might hop over to the solr dev list and
start
a discussion, maybe ask for a place to start. I'm sure some of the devs have
thought about this...

If nobody on the dev list says There's already a JIRA on it, then you
should
open one. The Jira issues are generally preferred when you start getting
into
design because the comments are preserved for the next person who tries
the idea or makes changes, etc

Best
Erick

On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess ben.bogg...@gmail.com wrote:

 Thanks Erick.  The problem with multiple cores is that the documents are
 scored independently in each core.  I would like to be able to search across
 both cores and have the scores 'normalized' in a way that's similar to what
 Lucene's MultiSearcher would do.  As far a I understand, multiple cores
 would likely result in seriously skewed scores in my case since the
 documents are not distributed evenly or randomly.  I could have one
 core/index with 20 million docs and another with 200.

 I've poked around in the code and this feature doesn't seem to exist.  I
 would be happy with finding a decent place to try to add it.  I'm not sure
 if there is a clean place for it.

 Ben

 On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  It seems to me that multiple cores are along the lines you
  need, a single instance of Solr that can search across multiple
  sub-indexes that do not necessarily share schemas, and are
  independently maintainable..
 
  This might be a good place to start:
 http://wiki.apache.org/solr/CoreAdmin
 
  HTH
  Erick
 
  On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com
 wrote:
 
  We are trying to convert a Lucene-based search solution to a
  Solr/Lucene-based solution.  The problem we have is that we currently
 have
  our data split into many indexes and Solr expects things to be in a
 single
  index unless you're sharding.  In addition to this, our indexes wouldn't
  work well using the distributed search functionality in Solr because the
  documents are not evenly or randomly distributed.  We are currently
 using
  Lucene's MultiSearcher to search over subsets of these indexes.
 
  I know this has been brought up a number of times in previous posts and
 the
  typical response is that the best thing to do is to convert everything
 into
  a single index.  One of the major reasons for having the indexes split
 up
  the way we do is because different types of data need to be indexed at
  different intervals.  You may need one index to be updated every 20
 minutes
  and another is only updated every week.  If we move to a single index,
 then
  we will constantly be warming and replacing searchers for the entire
  dataset, and will essentially render the searcher caches useless.  If we
  were able to have multiple indexes, they would each have a searcher and
  updates would be isolated to a subset of the data.
 
  The other problem is that we will likely need to shard this large single
  index and there isn't a clean way to shard randomly and evenly across
 the
  of
  the data.  We would, however like to shard a single data type.  If we
 could
  use multiple indexes, we would likely be also sharding a small sub-set
 of
  them.
 
  Thanks in advance,
 
  Ben
 



Multiple indexes inside a single core

2010-10-20 Thread ben boggess
We are trying to convert a Lucene-based search solution to a
Solr/Lucene-based solution.  The problem we have is that we currently have
our data split into many indexes and Solr expects things to be in a single
index unless you're sharding.  In addition to this, our indexes wouldn't
work well using the distributed search functionality in Solr because the
documents are not evenly or randomly distributed.  We are currently using
Lucene's MultiSearcher to search over subsets of these indexes.

I know this has been brought up a number of times in previous posts and the
typical response is that the best thing to do is to convert everything into
a single index.  One of the major reasons for having the indexes split up
the way we do is because different types of data need to be indexed at
different intervals.  You may need one index to be updated every 20 minutes
and another is only updated every week.  If we move to a single index, then
we will constantly be warming and replacing searchers for the entire
dataset, and will essentially render the searcher caches useless.  If we
were able to have multiple indexes, they would each have a searcher and
updates would be isolated to a subset of the data.

The other problem is that we will likely need to shard this large single
index and there isn't a clean way to shard randomly and evenly across the of
the data.  We would, however like to shard a single data type.  If we could
use multiple indexes, we would likely be also sharding a small sub-set of
them.

Thanks in advance,

Ben


Re: Multiple indexes inside a single core

2010-10-20 Thread Erick Erickson
It seems to me that multiple cores are along the lines you
need, a single instance of Solr that can search across multiple
sub-indexes that do not necessarily share schemas, and are
independently maintainable..

This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin

HTH
Erick

On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote:

 We are trying to convert a Lucene-based search solution to a
 Solr/Lucene-based solution.  The problem we have is that we currently have
 our data split into many indexes and Solr expects things to be in a single
 index unless you're sharding.  In addition to this, our indexes wouldn't
 work well using the distributed search functionality in Solr because the
 documents are not evenly or randomly distributed.  We are currently using
 Lucene's MultiSearcher to search over subsets of these indexes.

 I know this has been brought up a number of times in previous posts and the
 typical response is that the best thing to do is to convert everything into
 a single index.  One of the major reasons for having the indexes split up
 the way we do is because different types of data need to be indexed at
 different intervals.  You may need one index to be updated every 20 minutes
 and another is only updated every week.  If we move to a single index, then
 we will constantly be warming and replacing searchers for the entire
 dataset, and will essentially render the searcher caches useless.  If we
 were able to have multiple indexes, they would each have a searcher and
 updates would be isolated to a subset of the data.

 The other problem is that we will likely need to shard this large single
 index and there isn't a clean way to shard randomly and evenly across the
 of
 the data.  We would, however like to shard a single data type.  If we could
 use multiple indexes, we would likely be also sharding a small sub-set of
 them.

 Thanks in advance,

 Ben



Re: Multiple indexes inside a single core

2010-10-20 Thread Ben Boggess
Thanks Erick.  The problem with multiple cores is that the documents are scored 
independently in each core.  I would like to be able to search across both 
cores and have the scores 'normalized' in a way that's similar to what Lucene's 
MultiSearcher would do.  As far a I understand, multiple cores would likely 
result in seriously skewed scores in my case since the documents are not 
distributed evenly or randomly.  I could have one core/index with 20 million 
docs and another with 200.

I've poked around in the code and this feature doesn't seem to exist.  I would 
be happy with finding a decent place to try to add it.  I'm not sure if there 
is a clean place for it.

Ben

On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote:

 It seems to me that multiple cores are along the lines you
 need, a single instance of Solr that can search across multiple
 sub-indexes that do not necessarily share schemas, and are
 independently maintainable..
 
 This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin
 
 HTH
 Erick
 
 On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote:
 
 We are trying to convert a Lucene-based search solution to a
 Solr/Lucene-based solution.  The problem we have is that we currently have
 our data split into many indexes and Solr expects things to be in a single
 index unless you're sharding.  In addition to this, our indexes wouldn't
 work well using the distributed search functionality in Solr because the
 documents are not evenly or randomly distributed.  We are currently using
 Lucene's MultiSearcher to search over subsets of these indexes.
 
 I know this has been brought up a number of times in previous posts and the
 typical response is that the best thing to do is to convert everything into
 a single index.  One of the major reasons for having the indexes split up
 the way we do is because different types of data need to be indexed at
 different intervals.  You may need one index to be updated every 20 minutes
 and another is only updated every week.  If we move to a single index, then
 we will constantly be warming and replacing searchers for the entire
 dataset, and will essentially render the searcher caches useless.  If we
 were able to have multiple indexes, they would each have a searcher and
 updates would be isolated to a subset of the data.
 
 The other problem is that we will likely need to shard this large single
 index and there isn't a clean way to shard randomly and evenly across the
 of
 the data.  We would, however like to shard a single data type.  If we could
 use multiple indexes, we would likely be also sharding a small sub-set of
 them.
 
 Thanks in advance,
 
 Ben
 


Re: Multiple Indexes and relevance ranking question

2010-10-01 Thread Lance Norskog
The score of a document has no scale: it only has meaning against other 
score in the same query.


Solr does not rank these documents correctly. Without sharing the TF/DF 
information across the shards, it cannot.


If the shards each have a lot of the same kind of document, this 
problem averages out. That is, the statistical fingerprint across the 
shards is similar enough that each index gives the same numerical range. 
Yes, this is hand-wavey, and we don't have a measuring tool that 
verifies this assertion.


Lance

Valli Indraganti wrote:

I an new to Solr and the search technologies. I am playing around with
multiple indexes. I configured Solr for Tomcat, created two tomcat fragments
so that two solr webapps listen on port 8080 in tomcat. I have created two
separate indexes using each webapp successfully.

My documents are very primitive. Below is the structure. I have four such
documents with different doc id and increased number of the word Hello
corresponding to the name of the document (this is only to make my analysis
of the results easier). Documents One and two are in shar1 and three and
four are in shard 2. obviously, document two is ranked higher when queried
against that index (for the word Hello). And document four is ranked higher
when queried against second index. When using the shards, parameter, the
scores remain unaltered.
My question is, if the distributed search does not consider IDF, how is it
able to rank these documents correctly? Or do I not have the indexes truely
distributed? Is something wrong with my term distribution?

add
  -#  doc
field name=*id*Valli1/field
field name=*name*One/field
field name=*text*Hello!This is a test document testing relevancy
scores./field
   /doc
/add

   


Multiple Indexes and relevance ranking question

2010-09-30 Thread Valli Indraganti
I an new to Solr and the search technologies. I am playing around with
multiple indexes. I configured Solr for Tomcat, created two tomcat fragments
so that two solr webapps listen on port 8080 in tomcat. I have created two
separate indexes using each webapp successfully.

My documents are very primitive. Below is the structure. I have four such
documents with different doc id and increased number of the word Hello
corresponding to the name of the document (this is only to make my analysis
of the results easier). Documents One and two are in shar1 and three and
four are in shard 2. obviously, document two is ranked higher when queried
against that index (for the word Hello). And document four is ranked higher
when queried against second index. When using the shards, parameter, the
scores remain unaltered.
My question is, if the distributed search does not consider IDF, how is it
able to rank these documents correctly? Or do I not have the indexes truely
distributed? Is something wrong with my term distribution?

add
 - # doc
   field name=*id*Valli1/field
   field name=*name*One/field
   field name=*text*Hello!This is a test document testing relevancy
scores./field
  /doc
/add


How to set up multiple indexes?

2010-09-29 Thread Andy
I installed Solr according to the tutorial. My schema.xml  solrconfig.xml is 
in 
~/apache-solr-1.4.1/example/solr/conf

Everything so far is just like that in the tutorial. But I want to set up a 2nd 
index (separate from the main index) just for the purpose of auto-complete.

I understand that I need to set up multicore for this. But I'm not sure how to 
do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am still 
pretty confused.

- where do I put the 2nd index?
- do I need separate schema.xml  solrconfig.xml for the 2nd index? Where do I 
put them?
- how do I tell solr which index do I want a document to go to?
- how do I tell solr which index do I want to query against?
- any step-by-step instruction on setting up multicore?

Thanks.
Andy



  


Re: How to set up multiple indexes?

2010-09-29 Thread Christopher Gross
Hi Andy!

I configured this a few days ago, and found a good resource --
http://wiki.apache.org/solr/MultipleIndexes

That page has links that will give you the instructions for setting up
Tomcat, Jetty and Resin.  I used the Tomcat ones the other day, and it gave
me everything that I needed to get it up and running.  You basically just
need to create a new directory to contain the second instance, then create a
context file for it in the TOMCAT_HOME/conf/Catalina/localhost directory.

Good luck!

-- Chris


On Wed, Sep 29, 2010 at 10:41 AM, Andy angelf...@yahoo.com wrote:

 I installed Solr according to the tutorial. My schema.xml  solrconfig.xml
 is in
 ~/apache-solr-1.4.1/example/solr/conf

 Everything so far is just like that in the tutorial. But I want to set up a
 2nd index (separate from the main index) just for the purpose of
 auto-complete.

 I understand that I need to set up multicore for this. But I'm not sure how
 to do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am
 still pretty confused.

 - where do I put the 2nd index?
 - do I need separate schema.xml  solrconfig.xml for the 2nd index? Where
 do I put them?
 - how do I tell solr which index do I want a document to go to?
 - how do I tell solr which index do I want to query against?
 - any step-by-step instruction on setting up multicore?

 Thanks.
 Andy







Re: How to set up multiple indexes?

2010-09-29 Thread Luke Crouch
Check
http://doc.ez.no/Extensions/eZ-Find/2.2/Advanced-Configuration/Using-multi-core-features

It's for eZ-Find, but it's the basic setup for multiple cores in any
environment.

We have cores designed like so:

solr/sfx/
solr/forum/
solr/mail/
solr/news/
solr/tracker/

each of those core directories has its own conf/ with schema.xml and
solrconfig.xml. then solr/solr.xml looks like:

  cores adminPath=/admin/cores
core name=sfx instanceDir=sfx /
core name=tracker instanceDir=tracker /

etc.

After that you add the core name into the url for all requests to the core:

http:///solr/sfx/select?...
http:///solr/sfx/update...
http:///solr/tracker/select?...
http:///solr/tracker/update...

On Wed, Sep 29, 2010 at 9:41 AM, Andy angelf...@yahoo.com wrote:

 I installed Solr according to the tutorial. My schema.xml  solrconfig.xml
 is in
 ~/apache-solr-1.4.1/example/solr/conf

 Everything so far is just like that in the tutorial. But I want to set up a
 2nd index (separate from the main index) just for the purpose of
 auto-complete.

 I understand that I need to set up multicore for this. But I'm not sure how
 to do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am
 still pretty confused.

 - where do I put the 2nd index?
 - do I need separate schema.xml  solrconfig.xml for the 2nd index? Where
 do I put them?
 - how do I tell solr which index do I want a document to go to?
 - how do I tell solr which index do I want to query against?
 - any step-by-step instruction on setting up multicore?

 Thanks.
 Andy







Re: Collating results from multiple indexes

2010-02-17 Thread Jan Høydahl / Cominvent
Thanks for your clarification and link, Will.

Back to Aaron's question. There is some ongoing work to try to support updating 
single fields within documents (http://issues.apache.org/jira/browse/SOLR-139 
and http://issues.apache.org/jira/browse/SOLR-828) which could perhaps be part 
of a future solution.

Is it an option for you to write a smart join component which can live on top 
of multiple cores and do multiple sub queries in an efficient way and 
transparently return the final result? Forking the shards query code could be a 
starting point? Donating this component back to Solr may free you of 
maintenance burden, as I'm sure it will be useful to a larger audience?

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 17. feb. 2010, at 03.27, Will Johnson wrote:

 Jan Hoydal / Otis,
 
 
 
 First off, Thanks for mentioning us.  We do use some utility functions from
 SOLR but our index engine is built on top of Lucene only, there are no Solr
 cores involved.  We do have a JOIN operator that allows us to perform
 relational searches while still acting like a search engine in terms of
 performance, ranking, faceting, etc.  Our CTO wrote a blog article about it
 a month ago that does a pretty good of explaining how it’s used:
 http://www.attivio.com/blog/55-industry-insights/507-can-a-search-engine-replace-a-relational-database.html
 
 
 
 The join functionality and most of our other higher level features use
 separate data structures and don’t really have much to do with Lucene/SOLR
 except where they integrate with the query execution.  If you want to learn
 more feel free to check out www.attivio.com.
 
 
 
 -  w...@attivio.com
 
 
 On Fri, Feb 12, 2010 at 10:35 AM, Jan Høydahl / Cominvent 
 jan@cominvent.com wrote:
 
 Really? The last time I looked at AIE, I am pretty sure there was Solr core
 msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may
 be mistaken. Anyone from Attivio here who can elaborate? Is the join stuff
 at Lucene level or on top of multiple Solr cores or what?
 
 --
 Jan Høydahl  - search architect
 Cominvent AS - www.cominvent.com
 
 On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote:
 
 Minor correction re Attivio - their stuff runs on top of Lucene, not
 Solr.  I *think* they are trying to patent this.
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/
 
 
 
 - Original Message 
 From: Jan Høydahl / Cominvent jan@cominvent.com
 To: solr-user@lucene.apache.org
 Sent: Mon, February 8, 2010 3:33:41 PM
 Subject: Re: Collating results from multiple indexes
 
 Hi,
 
 There is no JOIN functionality in Solr. The common solution is either to
 accept
 the high volume update churn, or to add client side code to build a
 join layer
 on top of the two indices. I know that Attivio (www.attivio.com) have
 built some
 kind of JOIN functionality on top of Solr in their AIE product, but do
 not know
 the details or the actual performance.
 
 Why not open a JIRA issue, if there is no such already, to request this
 as a
 feature?
 
 --
 Jan Høydahl  - search architect
 Cominvent AS - www.cominvent.com
 
 On 25. jan. 2010, at 22.01, Aaron McKee wrote:
 
 
 Is there any somewhat convenient way to collate/integrate fields from
 separate
 indices during result writing, if the indices use the same unique keys?
 Basically, some sort of cross-index JOIN?
 
 As a bit of background, I have a rather heavyweight dataset of every US
 business (~25m records, an on-disk index footprint of ~30g, and 5-10
 hours to
 fully index on a decent box). Given the size and relatively stability of
 the
 dataset, I generally only update this monthly. However, I have separate
 advertising-related datasets that need to be updated either hourly or
 daily
 (e.g. today's coupon, click revenue remaining, etc.) . These advertiser
 feeds
 reference the same keyspace that I use in the main index, but are
 otherwise
 significantly lighter weight. Importing and indexing them discretely
 only takes
 a couple minutes. Given that Solr/Lucene doesn't support field updating,
 without
 having to drop and re-add an entire document, it doesn't seem practical
 to
 integrate this data into the main index (the system would be under a
 constant
 state of churn, if we did document re-inserts, and the performance
 impact would
 probably be debilitating). It may be nice if this data could participate
 in
 filtering (e.g. only show advertisers), but it doesn't need to
 participate in
 scoring/ranking.
 
 I'm guessing that someone else has had a similar need, at some point?
 I can
 have our front-end query the smaller indices separately, using the keys
 returned
 by the primary index, but would prefer to avoid the extra sequential
 roundtrips.
 I'm hoping to also avoid a coding solution, if only to avoid the
 maintenance
 overhead as we drop in new builds of Solr, but that's also feasible.
 
 Thank you

Re: Collating results from multiple indexes

2010-02-16 Thread Will Johnson
Jan Hoydal / Otis,



First off, Thanks for mentioning us.  We do use some utility functions from
SOLR but our index engine is built on top of Lucene only, there are no Solr
cores involved.  We do have a JOIN operator that allows us to perform
relational searches while still acting like a search engine in terms of
performance, ranking, faceting, etc.  Our CTO wrote a blog article about it
a month ago that does a pretty good of explaining how it’s used:
http://www.attivio.com/blog/55-industry-insights/507-can-a-search-engine-replace-a-relational-database.html



The join functionality and most of our other higher level features use
separate data structures and don’t really have much to do with Lucene/SOLR
except where they integrate with the query execution.  If you want to learn
more feel free to check out www.attivio.com.



-  w...@attivio.com


On Fri, Feb 12, 2010 at 10:35 AM, Jan Høydahl / Cominvent 
jan@cominvent.com wrote:

 Really? The last time I looked at AIE, I am pretty sure there was Solr core
 msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may
 be mistaken. Anyone from Attivio here who can elaborate? Is the join stuff
 at Lucene level or on top of multiple Solr cores or what?

 --
 Jan Høydahl  - search architect
 Cominvent AS - www.cominvent.com

  On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote:

  Minor correction re Attivio - their stuff runs on top of Lucene, not
 Solr.  I *think* they are trying to patent this.
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Hadoop ecosystem search :: http://search-hadoop.com/
 
 
 
  - Original Message 
  From: Jan Høydahl / Cominvent jan@cominvent.com
  To: solr-user@lucene.apache.org
  Sent: Mon, February 8, 2010 3:33:41 PM
  Subject: Re: Collating results from multiple indexes
 
  Hi,
 
  There is no JOIN functionality in Solr. The common solution is either to
 accept
  the high volume update churn, or to add client side code to build a
 join layer
  on top of the two indices. I know that Attivio (www.attivio.com) have
 built some
  kind of JOIN functionality on top of Solr in their AIE product, but do
 not know
  the details or the actual performance.
 
  Why not open a JIRA issue, if there is no such already, to request this
 as a
  feature?
 
  --
  Jan Høydahl  - search architect
  Cominvent AS - www.cominvent.com
 
  On 25. jan. 2010, at 22.01, Aaron McKee wrote:
 
 
  Is there any somewhat convenient way to collate/integrate fields from
 separate
  indices during result writing, if the indices use the same unique keys?
  Basically, some sort of cross-index JOIN?
 
  As a bit of background, I have a rather heavyweight dataset of every US
  business (~25m records, an on-disk index footprint of ~30g, and 5-10
 hours to
  fully index on a decent box). Given the size and relatively stability of
 the
  dataset, I generally only update this monthly. However, I have separate
  advertising-related datasets that need to be updated either hourly or
 daily
  (e.g. today's coupon, click revenue remaining, etc.) . These advertiser
 feeds
  reference the same keyspace that I use in the main index, but are
 otherwise
  significantly lighter weight. Importing and indexing them discretely
 only takes
  a couple minutes. Given that Solr/Lucene doesn't support field updating,
 without
  having to drop and re-add an entire document, it doesn't seem practical
 to
  integrate this data into the main index (the system would be under a
 constant
  state of churn, if we did document re-inserts, and the performance
 impact would
  probably be debilitating). It may be nice if this data could participate
 in
  filtering (e.g. only show advertisers), but it doesn't need to
 participate in
  scoring/ranking.
 
  I'm guessing that someone else has had a similar need, at some point?
  I can
  have our front-end query the smaller indices separately, using the keys
 returned
  by the primary index, but would prefer to avoid the extra sequential
 roundtrips.
  I'm hoping to also avoid a coding solution, if only to avoid the
 maintenance
  overhead as we drop in new builds of Solr, but that's also feasible.
 
  Thank you for your insight,
  Aaron
 
 




Re: Collating results from multiple indexes

2010-02-12 Thread Jan Høydahl / Cominvent
Really? The last time I looked at AIE, I am pretty sure there was Solr core 
msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may be 
mistaken. Anyone from Attivio here who can elaborate? Is the join stuff at 
Lucene level or on top of multiple Solr cores or what?

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote:

 Minor correction re Attivio - their stuff runs on top of Lucene, not Solr.  I 
 *think* they are trying to patent this.
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/
 
 
 
 - Original Message 
 From: Jan Høydahl / Cominvent jan@cominvent.com
 To: solr-user@lucene.apache.org
 Sent: Mon, February 8, 2010 3:33:41 PM
 Subject: Re: Collating results from multiple indexes
 
 Hi,
 
 There is no JOIN functionality in Solr. The common solution is either to 
 accept 
 the high volume update churn, or to add client side code to build a join 
 layer 
 on top of the two indices. I know that Attivio (www.attivio.com) have built 
 some 
 kind of JOIN functionality on top of Solr in their AIE product, but do not 
 know 
 the details or the actual performance.
 
 Why not open a JIRA issue, if there is no such already, to request this as a 
 feature?
 
 --
 Jan Høydahl  - search architect
 Cominvent AS - www.cominvent.com
 
 On 25. jan. 2010, at 22.01, Aaron McKee wrote:
 
 
 Is there any somewhat convenient way to collate/integrate fields from 
 separate 
 indices during result writing, if the indices use the same unique keys? 
 Basically, some sort of cross-index JOIN?
 
 As a bit of background, I have a rather heavyweight dataset of every US 
 business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours 
 to 
 fully index on a decent box). Given the size and relatively stability of the 
 dataset, I generally only update this monthly. However, I have separate 
 advertising-related datasets that need to be updated either hourly or daily 
 (e.g. today's coupon, click revenue remaining, etc.) . These advertiser 
 feeds 
 reference the same keyspace that I use in the main index, but are otherwise 
 significantly lighter weight. Importing and indexing them discretely only 
 takes 
 a couple minutes. Given that Solr/Lucene doesn't support field updating, 
 without 
 having to drop and re-add an entire document, it doesn't seem practical to 
 integrate this data into the main index (the system would be under a 
 constant 
 state of churn, if we did document re-inserts, and the performance impact 
 would 
 probably be debilitating). It may be nice if this data could participate in 
 filtering (e.g. only show advertisers), but it doesn't need to participate 
 in 
 scoring/ranking.
 
 I'm guessing that someone else has had a similar need, at some point?  I 
 can 
 have our front-end query the smaller indices separately, using the keys 
 returned 
 by the primary index, but would prefer to avoid the extra sequential 
 roundtrips. 
 I'm hoping to also avoid a coding solution, if only to avoid the maintenance 
 overhead as we drop in new builds of Solr, but that's also feasible.
 
 Thank you for your insight,
 Aaron
 
 



Re: Collating results from multiple indexes

2010-02-11 Thread Otis Gospodnetic
Minor correction re Attivio - their stuff runs on top of Lucene, not Solr.  I 
*think* they are trying to patent this.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: Jan Høydahl / Cominvent jan@cominvent.com
 To: solr-user@lucene.apache.org
 Sent: Mon, February 8, 2010 3:33:41 PM
 Subject: Re: Collating results from multiple indexes
 
 Hi,
 
 There is no JOIN functionality in Solr. The common solution is either to 
 accept 
 the high volume update churn, or to add client side code to build a join 
 layer 
 on top of the two indices. I know that Attivio (www.attivio.com) have built 
 some 
 kind of JOIN functionality on top of Solr in their AIE product, but do not 
 know 
 the details or the actual performance.
 
 Why not open a JIRA issue, if there is no such already, to request this as a 
 feature?
 
 --
 Jan Høydahl  - search architect
 Cominvent AS - www.cominvent.com
 
 On 25. jan. 2010, at 22.01, Aaron McKee wrote:
 
  
  Is there any somewhat convenient way to collate/integrate fields from 
  separate 
 indices during result writing, if the indices use the same unique keys? 
 Basically, some sort of cross-index JOIN?
  
  As a bit of background, I have a rather heavyweight dataset of every US 
 business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to 
 fully index on a decent box). Given the size and relatively stability of the 
 dataset, I generally only update this monthly. However, I have separate 
 advertising-related datasets that need to be updated either hourly or daily 
 (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds 
 reference the same keyspace that I use in the main index, but are otherwise 
 significantly lighter weight. Importing and indexing them discretely only 
 takes 
 a couple minutes. Given that Solr/Lucene doesn't support field updating, 
 without 
 having to drop and re-add an entire document, it doesn't seem practical to 
 integrate this data into the main index (the system would be under a constant 
 state of churn, if we did document re-inserts, and the performance impact 
 would 
 probably be debilitating). It may be nice if this data could participate in 
 filtering (e.g. only show advertisers), but it doesn't need to participate in 
 scoring/ranking.
  
  I'm guessing that someone else has had a similar need, at some point?  I 
  can 
 have our front-end query the smaller indices separately, using the keys 
 returned 
 by the primary index, but would prefer to avoid the extra sequential 
 roundtrips. 
 I'm hoping to also avoid a coding solution, if only to avoid the maintenance 
 overhead as we drop in new builds of Solr, but that's also feasible.
  
  Thank you for your insight,
  Aaron
  



Re: Collating results from multiple indexes

2010-02-08 Thread Jan Høydahl / Cominvent
Hi,

There is no JOIN functionality in Solr. The common solution is either to accept 
the high volume update churn, or to add client side code to build a join 
layer on top of the two indices. I know that Attivio (www.attivio.com) have 
built some kind of JOIN functionality on top of Solr in their AIE product, but 
do not know the details or the actual performance.

Why not open a JIRA issue, if there is no such already, to request this as a 
feature?

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 25. jan. 2010, at 22.01, Aaron McKee wrote:

 
 Is there any somewhat convenient way to collate/integrate fields from 
 separate indices during result writing, if the indices use the same unique 
 keys? Basically, some sort of cross-index JOIN?
 
 As a bit of background, I have a rather heavyweight dataset of every US 
 business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to 
 fully index on a decent box). Given the size and relatively stability of the 
 dataset, I generally only update this monthly. However, I have separate 
 advertising-related datasets that need to be updated either hourly or daily 
 (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds 
 reference the same keyspace that I use in the main index, but are otherwise 
 significantly lighter weight. Importing and indexing them discretely only 
 takes a couple minutes. Given that Solr/Lucene doesn't support field 
 updating, without having to drop and re-add an entire document, it doesn't 
 seem practical to integrate this data into the main index (the system would 
 be under a constant state of churn, if we did document re-inserts, and the 
 performance impact would probably be debilitating). It may be nice if this 
 data could participate in filtering (e.g. only show advertisers), but it 
 doesn't need to participate in scoring/ranking.
 
 I'm guessing that someone else has had a similar need, at some point?  I can 
 have our front-end query the smaller indices separately, using the keys 
 returned by the primary index, but would prefer to avoid the extra sequential 
 roundtrips. I'm hoping to also avoid a coding solution, if only to avoid the 
 maintenance overhead as we drop in new builds of Solr, but that's also 
 feasible.
 
 Thank you for your insight,
 Aaron
 



Collating results from multiple indexes

2010-01-25 Thread Aaron McKee


Is there any somewhat convenient way to collate/integrate fields from 
separate indices during result writing, if the indices use the same 
unique keys? Basically, some sort of cross-index JOIN?


As a bit of background, I have a rather heavyweight dataset of every US 
business (~25m records, an on-disk index footprint of ~30g, and 5-10 
hours to fully index on a decent box). Given the size and relatively 
stability of the dataset, I generally only update this monthly. However, 
I have separate advertising-related datasets that need to be updated 
either hourly or daily (e.g. today's coupon, click revenue remaining, 
etc.) . These advertiser feeds reference the same keyspace that I use in 
the main index, but are otherwise significantly lighter weight. 
Importing and indexing them discretely only takes a couple minutes. 
Given that Solr/Lucene doesn't support field updating, without having to 
drop and re-add an entire document, it doesn't seem practical to 
integrate this data into the main index (the system would be under a 
constant state of churn, if we did document re-inserts, and the 
performance impact would probably be debilitating). It may be nice if 
this data could participate in filtering (e.g. only show advertisers), 
but it doesn't need to participate in scoring/ranking.


I'm guessing that someone else has had a similar need, at some point?  I 
can have our front-end query the smaller indices separately, using the 
keys returned by the primary index, but would prefer to avoid the extra 
sequential roundtrips. I'm hoping to also avoid a coding solution, if 
only to avoid the maintenance overhead as we drop in new builds of Solr, 
but that's also feasible.


Thank you for your insight,
Aaron



Re: All in one index, or multiple indexes?

2009-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
keep in mind that everytime a commit is done all the caches are thrown
away. If  updates for each of these indexes happen at different time
then the caches get invalidated each time you commit. so in that case
smaller index helps

On Wed, Jul 8, 2009 at 4:55 PM, Tim Selltrs...@gmail.com wrote:
 Hi,
 I am wondering if it is common to have just one very large index, or
 multiple smaller indexes specialized for different content types.

 We currently have multiple smaller indexes, although one of them is
 much larger then the others. We are considering merging them, to allow
 the convenience of searching across multiple types at once and get
 them back in one list. The largest of the current indexes has a couple
 of types that belong together, it has just one text field, and it is
 usually quite short and is similar to product names (words like The
 matter). Another index I would merge with this one, has multiple text
 fields (also quite short).

 We of course would still like to be able to get specific types. Is
 doing filtering on just one type a big performance hit compared to
 just querying it from it's own index? Bare in mind all these indexes
 run on the same machine. (we replicate them all to three machines and
 do load balancing).

 There are a number of considerations. From an application standpoint
 when querying across all types we may split the results out into the
 separate types anyway once we have the list back. If we always do
 this, is it silly to have them in one index, rather then query
 multiple indexes at once? Is multiple http requests less significant
 then the time to post split the results?

 In some ways it is easier to maintain a single index, although it has
 felt easier to optimize the results for the type of content if they
 are in separate indexes. My main concern of putting it all in one
 index is that we'll make it harder to work with. We will definitely
 want to do filtering on types sometimes, and if we go with a mashed up
 index I'd prefer not to maintain separate specialized indexes as well.

 Any thoughts?

 ~Tim.




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: All in one index, or multiple indexes?

2009-07-21 Thread Jim Adams
It will depend on how much total volume you have.  If you are discussing
millions and millions of records, I'd say use multicore and shards.

On Wed, Jul 8, 2009 at 5:25 AM, Tim Sell trs...@gmail.com wrote:

 Hi,
 I am wondering if it is common to have just one very large index, or
 multiple smaller indexes specialized for different content types.

 We currently have multiple smaller indexes, although one of them is
 much larger then the others. We are considering merging them, to allow
 the convenience of searching across multiple types at once and get
 them back in one list. The largest of the current indexes has a couple
 of types that belong together, it has just one text field, and it is
 usually quite short and is similar to product names (words like The
 matter). Another index I would merge with this one, has multiple text
 fields (also quite short).

 We of course would still like to be able to get specific types. Is
 doing filtering on just one type a big performance hit compared to
 just querying it from it's own index? Bare in mind all these indexes
 run on the same machine. (we replicate them all to three machines and
 do load balancing).

 There are a number of considerations. From an application standpoint
 when querying across all types we may split the results out into the
 separate types anyway once we have the list back. If we always do
 this, is it silly to have them in one index, rather then query
 multiple indexes at once? Is multiple http requests less significant
 then the time to post split the results?

 In some ways it is easier to maintain a single index, although it has
 felt easier to optimize the results for the type of content if they
 are in separate indexes. My main concern of putting it all in one
 index is that we'll make it harder to work with. We will definitely
 want to do filtering on types sometimes, and if we go with a mashed up
 index I'd prefer not to maintain separate specialized indexes as well.

 Any thoughts?

 ~Tim.



All in one index, or multiple indexes?

2009-07-08 Thread Tim Sell
Hi,
I am wondering if it is common to have just one very large index, or
multiple smaller indexes specialized for different content types.

We currently have multiple smaller indexes, although one of them is
much larger then the others. We are considering merging them, to allow
the convenience of searching across multiple types at once and get
them back in one list. The largest of the current indexes has a couple
of types that belong together, it has just one text field, and it is
usually quite short and is similar to product names (words like The
matter). Another index I would merge with this one, has multiple text
fields (also quite short).

We of course would still like to be able to get specific types. Is
doing filtering on just one type a big performance hit compared to
just querying it from it's own index? Bare in mind all these indexes
run on the same machine. (we replicate them all to three machines and
do load balancing).

There are a number of considerations. From an application standpoint
when querying across all types we may split the results out into the
separate types anyway once we have the list back. If we always do
this, is it silly to have them in one index, rather then query
multiple indexes at once? Is multiple http requests less significant
then the time to post split the results?

In some ways it is easier to maintain a single index, although it has
felt easier to optimize the results for the type of content if they
are in separate indexes. My main concern of putting it all in one
index is that we'll make it harder to work with. We will definitely
want to do filtering on types sometimes, and if we go with a mashed up
index I'd prefer not to maintain separate specialized indexes as well.

Any thoughts?

~Tim.


Re: Solr multiple indexes

2009-03-19 Thread Giovanni De Stefano
Hello Otis,

thank you for your reply.

What I am trying to achieve is to index different tables with different
primary keys and different fields (basically different documents/entity).

Is it possible to create a data-config with different root
entities/documents and index/search everything transparently? Is the
attached data-config.xml valid?

If it is, is the attached schema.xml valid? The issue there is that I don't
know how to specify the corresponding uniquiKeyId.

Thanks a lot for your help.

Giovanni


On 3/18/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:


 Giovanni,

 It sounds like you are after a JOIN between two indices a la RDBMS
 JOIN?  It's not possible with Solr, unless you want to do separate queries
 and manually join.  If you are talking about merging multiple indices of the
 same type into a single index, that's a different story and doable, although
 not yet via Solr.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Giovanni De Stefano giovanni.destef...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Wednesday, March 18, 2009 12:56:36 PM
  Subject: Solr multiple indexes
 
  Hello all,
 
  here I am with another question :-)
 
  I have to index the content of two different tables on an Oracle DB.
 
  When it comes to only one table, everything is fine: one datasource, one
  document, one entity in data-config, one uniqueKey in schema.xml etc. It
  works great.
 
  But now I have on the same DB (but it might be irrelevant), another table
  with a different structure from the first one: I might merge the two
 table
  to have a huge document but I don't like this solution (delta imports
 would
  be a nightmare/impossible, I might have to index data from other sources
  etc).
 
  I believe I should create MULTIPLE INDEXES and then merge them. I have
 found
  very little documentations about this: any idea?
 
  The Multiple Solr Webapps solution seems nice, but how could I search
  globally within all index at the same time?
 
  The current architecture already expects Multicore Solr (to serve
 different
  countries) so I would rather not prefer to have multicore multicore
 Solr...
  :-(
 
  Any help/link is very much appreciated!
 
  Cheers,
  Giovanni


dataConfig
  dataSource 
driver=oracle.jdbc.driver.OracleDriver 
url=jdbc:oracle:thin:@localhost:1521/XE 
user=TEST 
password=TEST
  /
  document name=TEST
entity 
name=book 
pk=TITLE 
query=select TITLE,AUTHOR,PRICE,TIMESTAMP from BOOKS 
deltaQuery=select TITLE from BOOKS where TIMESTAMP  '${dataimporter.last_index_time}'
root=true

  field column=TITLE name=book_title/
  field column=AUTHOR name=book_author/
  field column=PRICE name=book_price/
  field column=TIMESTAMP name=book_timestamp/
/entity

entity
name=furniture
pk=ID
query=select ID,COLOR,SIZE,TIMESTAMP from FURNITURE
deltaQuery=select ID from FURNITURE where TIMESTAMP  '${dataimporter.last_index_time}'
root=true

  field column=ID name=furniture_id/
  field column=COLOR name=furniture_color/
  field column=SIZE name=furniture_size/
  field column=TIMESTAMP name=furniture_timestamp/
/entity

  /document
/dataConfig
?xml version=1.0 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

schema name=DIT version=1.1
  types
   fieldType name=text class=solr.TextField positionIncrementGap=100/
   fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/
  /types

 fields   
  !-- BOOKS --
   field name=book_title type=text indexed=true stored=true multivalued=false required=true/
   field name=book_author type=text indexed=true stored=true multivalued=false/
   field name=book_price type=text indexed=true stored=true multivalued=false/
   field name=book_timestamp type=date indexed=true stored=true multivalued=false/
   field name=all_book type=text indexed=true stored=true multivalued=true/

  !-- FURNITURE --
   field name=furniture_id type=text indexed=true stored=true multivalued=false required=true/
   field name=furniture_color type=text indexed=true stored=true multivalued=false/
   field name=furniture_size type=text indexed

Solr multiple indexes

2009-03-18 Thread Giovanni De Stefano
Hello all,

here I am with another question :-)

I have to index the content of two different tables on an Oracle DB.

When it comes to only one table, everything is fine: one datasource, one
document, one entity in data-config, one uniqueKey in schema.xml etc. It
works great.

But now I have on the same DB (but it might be irrelevant), another table
with a different structure from the first one: I might merge the two table
to have a huge document but I don't like this solution (delta imports would
be a nightmare/impossible, I might have to index data from other sources
etc).

I believe I should create MULTIPLE INDEXES and then merge them. I have found
very little documentations about this: any idea?

The Multiple Solr Webapps solution seems nice, but how could I search
globally within all index at the same time?

The current architecture already expects Multicore Solr (to serve different
countries) so I would rather not prefer to have multicore multicore Solr...
:-(

Any help/link is very much appreciated!

Cheers,
Giovanni


Re: Solr multiple indexes

2009-03-18 Thread Otis Gospodnetic

Giovanni,

It sounds like you are after a JOIN between two indices a la RDBMS JOIN?  It's 
not possible with Solr, unless you want to do separate queries and manually 
join.  If you are talking about merging multiple indices of the same type into 
a single index, that's a different story and doable, although not yet via Solr.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Giovanni De Stefano giovanni.destef...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, March 18, 2009 12:56:36 PM
 Subject: Solr multiple indexes
 
 Hello all,
 
 here I am with another question :-)
 
 I have to index the content of two different tables on an Oracle DB.
 
 When it comes to only one table, everything is fine: one datasource, one
 document, one entity in data-config, one uniqueKey in schema.xml etc. It
 works great.
 
 But now I have on the same DB (but it might be irrelevant), another table
 with a different structure from the first one: I might merge the two table
 to have a huge document but I don't like this solution (delta imports would
 be a nightmare/impossible, I might have to index data from other sources
 etc).
 
 I believe I should create MULTIPLE INDEXES and then merge them. I have found
 very little documentations about this: any idea?
 
 The Multiple Solr Webapps solution seems nice, but how could I search
 globally within all index at the same time?
 
 The current architecture already expects Multicore Solr (to serve different
 countries) so I would rather not prefer to have multicore multicore Solr...
 :-(
 
 Any help/link is very much appreciated!
 
 Cheers,
 Giovanni



multiple indexes

2009-01-27 Thread Jae Joo
Hi,

I would like to know how it can be implemented.

Index1 has fields id,1,2,3 and index2 has fields id,5,6,7.
The ID in both indexes are unique id.

Can I use a kind of  distributed search and/or multicore to search, sort,
and facet through 2 indexes (index1 and index2)?

Thanks,

Jae joo


RE: Multiple Indexes

2008-08-08 Thread Kashyap, Raghu
Not sure if this will work for you but you can have 3 cores (using
multicore) and have your solr server or the client decide on to which
core it should be hitting. With this approach your can have separate
schema.xml  solrconfig.xml for each of the cores  obviously separate
index in each core.

-Raghu

-Original Message-
From: anshuljohri [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 07, 2008 5:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Multiple Indexes


Both the cases are there. As i said i need to index 3 indexes. So 2
indexes
have same schema but other one has different. More specification is like
this --

I have 3 indexes. In which 2 indexes have same data model but the way
these
are indexed is different. So i need to fire query from backend on
individual
indexes based on input. But the 3rd index has diff schema also. Again
the
query will be fired on this index based on input. 

So my question is how can i handle this situation. Thru configuring
multiple
instances of Solr/Tomcat if ya than how? else what are the other ways on
Solr 1.2

-Anshul


zayhen wrote:
 
 Oh,
 
 Sorry!
 
 Can you be a little more specific? Do these indexes have different
 schemas,
 or do they represent the same data model?
 
 2008/8/7 anshuljohri [EMAIL PROTECTED]
 

 Thanks zayhen for such a quick response but am not talking about
 sharding.
 I
 have requirement of indexing 3 indexes. Need to do query on diff
indexes
 based on input.

 -Anshul

 zayhen wrote:
 
  2008/8/7 anshuljohri [EMAIL PROTECTED]
 
 
  Hi everybody!
 
  I need to create multiple indexes lets say 3 due to project
 requirement.
  And
  the query will be fired from backend on different indexes based on
 input.
  I
  can't do it in one index with the help of fq parameter. As i
have
  already
  thought on it but thats of no use.
 
 
  I assume you are talking about sharding. Go 1.3-dev. It runs smooth
in
 my
  environment!
 
 
 
 
  So i searched a lot in this forum but couldn't get satisfactory
 answer.
 I
  found that there are 3 ways to do it. In which 2 ways are not
 applicable
  in
  1.2 version. So i have to go with Multiple Tomcat instances option
as
 in
  multiple webapps config.
  But still am not clear whether I need 3 diff solrConfig.xml 
 schema.xml
  or
  I can do it with symlinks.
  Is there any tutorial or some reading material for this. Can
anybody
 plz
  help me out?
 
 
 
 
  Thanks is advance
  -Anshul Johri
  --
  View this message in context:
  http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  Alexander Ramos Jardim
 
 
  -
  RPG da Ilha
 

 --
 View this message in context:
 http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Alexander Ramos Jardim
 
 
 -
 RPG da Ilha 
 

-- 
View this message in context:
http://www.nabble.com/Multiple-Indexes-tp18880284p18880973.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple Indexes

2008-08-08 Thread Walter Underwood
Try putting them all in one index. Your fields can be s1_name for
schema 1, s2_name for schema 2, and so on.

The only reason to have separate indexes is if each group of
content has a different update schedule and if you have high
traffic (over 1M queries/day).

wunder

On 8/8/08 8:19 AM, Kashyap, Raghu [EMAIL PROTECTED] wrote:

 Not sure if this will work for you but you can have 3 cores (using
 multicore) and have your solr server or the client decide on to which
 core it should be hitting. With this approach your can have separate
 schema.xml  solrconfig.xml for each of the cores  obviously separate
 index in each core.
 
 -Raghu
 
 -Original Message-
 From: anshuljohri [mailto:[EMAIL PROTECTED]
 Sent: Thursday, August 07, 2008 5:19 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Multiple Indexes
 
 
 Both the cases are there. As i said i need to index 3 indexes. So 2
 indexes
 have same schema but other one has different. More specification is like
 this --
 
 I have 3 indexes. In which 2 indexes have same data model but the way
 these
 are indexed is different. So i need to fire query from backend on
 individual
 indexes based on input. But the 3rd index has diff schema also. Again
 the
 query will be fired on this index based on input.
 
 So my question is how can i handle this situation. Thru configuring
 multiple
 instances of Solr/Tomcat if ya than how? else what are the other ways on
 Solr 1.2
 
 -Anshul
 
 
 zayhen wrote:
 
 Oh,
 
 Sorry!
 
 Can you be a little more specific? Do these indexes have different
 schemas,
 or do they represent the same data model?
 
 2008/8/7 anshuljohri [EMAIL PROTECTED]
 
 
 Thanks zayhen for such a quick response but am not talking about
 sharding.
 I
 have requirement of indexing 3 indexes. Need to do query on diff
 indexes
 based on input.
 
 -Anshul
 
 zayhen wrote:
 
 2008/8/7 anshuljohri [EMAIL PROTECTED]
 
 
 Hi everybody!
 
 I need to create multiple indexes lets say 3 due to project
 requirement.
 And
 the query will be fired from backend on different indexes based on
 input.
 I
 can't do it in one index with the help of fq parameter. As i
 have
 already
 thought on it but thats of no use.
 
 
 I assume you are talking about sharding. Go 1.3-dev. It runs smooth
 in
 my
 environment!
 
 
 
 
 So i searched a lot in this forum but couldn't get satisfactory
 answer.
 I
 found that there are 3 ways to do it. In which 2 ways are not
 applicable
 in
 1.2 version. So i have to go with Multiple Tomcat instances option
 as
 in
 multiple webapps config.
 But still am not clear whether I need 3 diff solrConfig.xml 
 schema.xml
 or
 I can do it with symlinks.
 Is there any tutorial or some reading material for this. Can
 anybody
 plz
 help me out?
 
 
 
 
 Thanks is advance
 -Anshul Johri
 --
 View this message in context:
 http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Alexander Ramos Jardim
 
 
 -
 RPG da Ilha
 
 
 --
 View this message in context:
 http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 -- 
 Alexander Ramos Jardim
 
 
 -
 RPG da Ilha 
 



Re: Multiple Indexes

2008-08-08 Thread Walter Underwood
I meant update frequency more than schedule. If one group of content
is updated once per day and the another every ten minutes, and most of
the traffic is going to the slow collection, splitting them could help.

wunder

On 8/8/08 8:25 AM, Walter Underwood [EMAIL PROTECTED] wrote:

 Try putting them all in one index. Your fields can be s1_name for
 schema 1, s2_name for schema 2, and so on.
 
 The only reason to have separate indexes is if each group of
 content has a different update schedule and if you have high
 traffic (over 1M queries/day).
 
 wunder
 
 On 8/8/08 8:19 AM, Kashyap, Raghu [EMAIL PROTECTED] wrote:
 
 Not sure if this will work for you but you can have 3 cores (using
 multicore) and have your solr server or the client decide on to which
 core it should be hitting. With this approach your can have separate
 schema.xml  solrconfig.xml for each of the cores  obviously separate
 index in each core.
 
 -Raghu
 
 -Original Message-
 From: anshuljohri [mailto:[EMAIL PROTECTED]
 Sent: Thursday, August 07, 2008 5:19 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Multiple Indexes
 
 
 Both the cases are there. As i said i need to index 3 indexes. So 2
 indexes
 have same schema but other one has different. More specification is like
 this --
 
 I have 3 indexes. In which 2 indexes have same data model but the way
 these
 are indexed is different. So i need to fire query from backend on
 individual
 indexes based on input. But the 3rd index has diff schema also. Again
 the
 query will be fired on this index based on input.
 
 So my question is how can i handle this situation. Thru configuring
 multiple
 instances of Solr/Tomcat if ya than how? else what are the other ways on
 Solr 1.2
 
 -Anshul
 
 
 zayhen wrote:
 
 Oh,
 
 Sorry!
 
 Can you be a little more specific? Do these indexes have different
 schemas,
 or do they represent the same data model?
 
 2008/8/7 anshuljohri [EMAIL PROTECTED]
 
 
 Thanks zayhen for such a quick response but am not talking about
 sharding.
 I
 have requirement of indexing 3 indexes. Need to do query on diff
 indexes
 based on input.
 
 -Anshul
 
 zayhen wrote:
 
 2008/8/7 anshuljohri [EMAIL PROTECTED]
 
 
 Hi everybody!
 
 I need to create multiple indexes lets say 3 due to project
 requirement.
 And
 the query will be fired from backend on different indexes based on
 input.
 I
 can't do it in one index with the help of fq parameter. As i
 have
 already
 thought on it but thats of no use.
 
 
 I assume you are talking about sharding. Go 1.3-dev. It runs smooth
 in
 my
 environment!
 
 
 
 
 So i searched a lot in this forum but couldn't get satisfactory
 answer.
 I
 found that there are 3 ways to do it. In which 2 ways are not
 applicable
 in
 1.2 version. So i have to go with Multiple Tomcat instances option
 as
 in
 multiple webapps config.
 But still am not clear whether I need 3 diff solrConfig.xml 
 schema.xml
 or
 I can do it with symlinks.
 Is there any tutorial or some reading material for this. Can
 anybody
 plz
 help me out?
 
 
 
 
 Thanks is advance
 -Anshul Johri
 --
 View this message in context:
 http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Alexander Ramos Jardim
 
 
 -
 RPG da Ilha
 
 
 --
 View this message in context:
 http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 -- 
 Alexander Ramos Jardim
 
 
 -
 RPG da Ilha 
 
 



Multiple Indexes

2008-08-07 Thread anshuljohri

Hi everybody!

I need to create multiple indexes lets say 3 due to project requirement. And
the query will be fired from backend on different indexes based on input. I
can't do it in one index with the help of fq parameter. As i have already
thought on it but thats of no use.

So i searched a lot in this forum but couldn't get satisfactory answer. I
found that there are 3 ways to do it. In which 2 ways are not applicable in
1.2 version. So i have to go with Multiple Tomcat instances option as in
multiple webapps config.
But still am not clear whether I need 3 diff solrConfig.xml  schema.xml or
I can do it with symlinks.  
Is there any tutorial or some reading material for this. Can anybody plz
help me out?

Thanks is advance
-Anshul Johri   
-- 
View this message in context: 
http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple Indexes

2008-08-07 Thread anshuljohri

Thanks zayhen for such a quick response but am not talking about sharding. I
have requirement of indexing 3 indexes. Need to do query on diff indexes
based on input.

-Anshul

zayhen wrote:
 
 2008/8/7 anshuljohri [EMAIL PROTECTED]
 

 Hi everybody!

 I need to create multiple indexes lets say 3 due to project requirement.
 And
 the query will be fired from backend on different indexes based on input.
 I
 can't do it in one index with the help of fq parameter. As i have
 already
 thought on it but thats of no use.
 
 
 I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my
 environment!
 
 


 So i searched a lot in this forum but couldn't get satisfactory answer. I
 found that there are 3 ways to do it. In which 2 ways are not applicable
 in
 1.2 version. So i have to go with Multiple Tomcat instances option as in
 multiple webapps config.
 But still am not clear whether I need 3 diff solrConfig.xml  schema.xml
 or
 I can do it with symlinks.
 Is there any tutorial or some reading material for this. Can anybody plz
 help me out?

 
 

 Thanks is advance
 -Anshul Johri
 --
 View this message in context:
 http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Alexander Ramos Jardim
 
 
 -
 RPG da Ilha 
 

-- 
View this message in context: 
http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple Indexes

2008-08-07 Thread Alexander Ramos Jardim
Oh,

Sorry!

Can you be a little more specific? Do these indexes have different schemas,
or do they represent the same data model?

2008/8/7 anshuljohri [EMAIL PROTECTED]


 Thanks zayhen for such a quick response but am not talking about sharding.
 I
 have requirement of indexing 3 indexes. Need to do query on diff indexes
 based on input.

 -Anshul

 zayhen wrote:
 
  2008/8/7 anshuljohri [EMAIL PROTECTED]
 
 
  Hi everybody!
 
  I need to create multiple indexes lets say 3 due to project requirement.
  And
  the query will be fired from backend on different indexes based on
 input.
  I
  can't do it in one index with the help of fq parameter. As i have
  already
  thought on it but thats of no use.
 
 
  I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my
  environment!
 
 
 
 
  So i searched a lot in this forum but couldn't get satisfactory answer.
 I
  found that there are 3 ways to do it. In which 2 ways are not applicable
  in
  1.2 version. So i have to go with Multiple Tomcat instances option as in
  multiple webapps config.
  But still am not clear whether I need 3 diff solrConfig.xml  schema.xml
  or
  I can do it with symlinks.
  Is there any tutorial or some reading material for this. Can anybody plz
  help me out?
 
 
 
 
  Thanks is advance
  -Anshul Johri
  --
  View this message in context:
  http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  Alexander Ramos Jardim
 
 
  -
  RPG da Ilha
 

 --
 View this message in context:
 http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Alexander Ramos Jardim


Re: Multiple Indexes

2008-08-07 Thread anshuljohri

Both the cases are there. As i said i need to index 3 indexes. So 2 indexes
have same schema but other one has different. More specification is like
this --

I have 3 indexes. In which 2 indexes have same data model but the way these
are indexed is different. So i need to fire query from backend on individual
indexes based on input. But the 3rd index has diff schema also. Again the
query will be fired on this index based on input. 

So my question is how can i handle this situation. Thru configuring multiple
instances of Solr/Tomcat if ya than how? else what are the other ways on
Solr 1.2

-Anshul


zayhen wrote:
 
 Oh,
 
 Sorry!
 
 Can you be a little more specific? Do these indexes have different
 schemas,
 or do they represent the same data model?
 
 2008/8/7 anshuljohri [EMAIL PROTECTED]
 

 Thanks zayhen for such a quick response but am not talking about
 sharding.
 I
 have requirement of indexing 3 indexes. Need to do query on diff indexes
 based on input.

 -Anshul

 zayhen wrote:
 
  2008/8/7 anshuljohri [EMAIL PROTECTED]
 
 
  Hi everybody!
 
  I need to create multiple indexes lets say 3 due to project
 requirement.
  And
  the query will be fired from backend on different indexes based on
 input.
  I
  can't do it in one index with the help of fq parameter. As i have
  already
  thought on it but thats of no use.
 
 
  I assume you are talking about sharding. Go 1.3-dev. It runs smooth in
 my
  environment!
 
 
 
 
  So i searched a lot in this forum but couldn't get satisfactory
 answer.
 I
  found that there are 3 ways to do it. In which 2 ways are not
 applicable
  in
  1.2 version. So i have to go with Multiple Tomcat instances option as
 in
  multiple webapps config.
  But still am not clear whether I need 3 diff solrConfig.xml 
 schema.xml
  or
  I can do it with symlinks.
  Is there any tutorial or some reading material for this. Can anybody
 plz
  help me out?
 
 
 
 
  Thanks is advance
  -Anshul Johri
  --
  View this message in context:
  http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  Alexander Ramos Jardim
 
 
  -
  RPG da Ilha
 

 --
 View this message in context:
 http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Alexander Ramos Jardim
 
 
 -
 RPG da Ilha 
 

-- 
View this message in context: 
http://www.nabble.com/Multiple-Indexes-tp18880284p18880973.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Multiple indexes

2007-11-12 Thread Pierre-Yves LANDRON

Hello,

Until now, i've used two instance of solr, one for each of my collections ; it 
works fine, but i wonder
if there is an advantage to use multiple indexes in one instance over several 
instances with one index each ?
Note that the two indexes have different schema.xml.

Thanks.
PL

 Date: Thu, 8 Nov 2007 18:05:43 -0500
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: Multiple indexes
 
 Hi,
 
 I am looking for the way to utilize the multiple indexes for signle sole
 instance.
 I saw that there is the patch 215  available  and would like to ask someone
 who knows how to use multiple indexes.
 
 Thanks,
 
 Jae Joo

_
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE

Re: Multiple indexes

2007-11-12 Thread Ryan McKinley
The advantages of a multi-core setup are configuration flexibility and 
dynamically changing available options (without a full restart).


For high-performance production solr servers, I don't think there is 
much reason for it.  You may want to split the two indexes on to two 
machines.  You may want to run each index in a separate JVM (so if one 
crashes, the other does not)


Maintaining 2 indexes is pretty easy, if that was a larger number or you 
need to create indexes for each user in a system then it would be worth 
investigating the multi-core setup (it is still in development)


ryan


Pierre-Yves LANDRON wrote:

Hello,

Until now, i've used two instance of solr, one for each of my collections ; it 
works fine, but i wonder
if there is an advantage to use multiple indexes in one instance over several 
instances with one index each ?
Note that the two indexes have different schema.xml.

Thanks.
PL


Date: Thu, 8 Nov 2007 18:05:43 -0500
From: [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Subject: Multiple indexes

Hi,

I am looking for the way to utilize the multiple indexes for signle sole
instance.
I saw that there is the patch 215  available  and would like to ask someone
who knows how to use multiple indexes.

Thanks,

Jae Joo


_
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE




Re: Multiple indexes

2007-11-12 Thread Jae Joo
Here is my situation.

I have 6 millions articles indexed and adding about 10k articles everyday.
If I maintain only one index, whenever the daily feeding is running, it
consumes the heap area and causes FGC.
I am thinking the way to have multiple indexes - one is for ongoing querying
service and one is for update. Once update is done, switch the index by
automatically and/or my application.

Thanks,

Jae joo


On Nov 12, 2007 8:48 AM, Ryan McKinley [EMAIL PROTECTED] wrote:

 The advantages of a multi-core setup are configuration flexibility and
 dynamically changing available options (without a full restart).

 For high-performance production solr servers, I don't think there is
 much reason for it.  You may want to split the two indexes on to two
 machines.  You may want to run each index in a separate JVM (so if one
 crashes, the other does not)

 Maintaining 2 indexes is pretty easy, if that was a larger number or you
 need to create indexes for each user in a system then it would be worth
 investigating the multi-core setup (it is still in development)

 ryan


 Pierre-Yves LANDRON wrote:
  Hello,
 
  Until now, i've used two instance of solr, one for each of my
 collections ; it works fine, but i wonder
  if there is an advantage to use multiple indexes in one instance over
 several instances with one index each ?
  Note that the two indexes have different schema.xml.
 
  Thanks.
  PL
 
  Date: Thu, 8 Nov 2007 18:05:43 -0500
  From: [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Subject: Multiple indexes
 
  Hi,
 
  I am looking for the way to utilize the multiple indexes for signle
 sole
  instance.
  I saw that there is the patch 215  available  and would like to ask
 someone
  who knows how to use multiple indexes.
 
  Thanks,
 
  Jae Joo
 
  _
  Discover the new Windows Vista
  http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE




Re: Multiple indexes

2007-11-12 Thread Ryan McKinley


just use the standard collection distribution stuff.  That is what it is 
made for! http://wiki.apache.org/solr/CollectionDistribution


Alternatively, open up two indexes using the same config/dir -- do your 
indexing on one and the searching on the other.  when indexing is done 
(or finishes a big chunk) send commit/ to the 'searching' one and it 
will see the new stuff.


ryan



Jae Joo wrote:

Here is my situation.

I have 6 millions articles indexed and adding about 10k articles everyday.
If I maintain only one index, whenever the daily feeding is running, it
consumes the heap area and causes FGC.
I am thinking the way to have multiple indexes - one is for ongoing querying
service and one is for update. Once update is done, switch the index by
automatically and/or my application.

Thanks,

Jae joo


On Nov 12, 2007 8:48 AM, Ryan McKinley [EMAIL PROTECTED] wrote:


The advantages of a multi-core setup are configuration flexibility and
dynamically changing available options (without a full restart).

For high-performance production solr servers, I don't think there is
much reason for it.  You may want to split the two indexes on to two
machines.  You may want to run each index in a separate JVM (so if one
crashes, the other does not)

Maintaining 2 indexes is pretty easy, if that was a larger number or you
need to create indexes for each user in a system then it would be worth
investigating the multi-core setup (it is still in development)

ryan


Pierre-Yves LANDRON wrote:

Hello,

Until now, i've used two instance of solr, one for each of my

collections ; it works fine, but i wonder

if there is an advantage to use multiple indexes in one instance over

several instances with one index each ?

Note that the two indexes have different schema.xml.

Thanks.
PL


Date: Thu, 8 Nov 2007 18:05:43 -0500
From: [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Subject: Multiple indexes

Hi,

I am looking for the way to utilize the multiple indexes for signle

sole

instance.
I saw that there is the patch 215  available  and would like to ask

someone

who knows how to use multiple indexes.

Thanks,

Jae Joo

_
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE








Re: Best way to create multiple indexes

2007-11-12 Thread Ryan McKinley
For starters, do you need to be able to search across groups or 
sub-groups (in one query?)


If so, then you have to stick everything in one index.

You can add a field to each document saying what 'group' or 'sub-group' 
it is in and then limit it at query time


 q=kittens +group:A

The advantage to splitting it into multiple indexes is that you could 
put each index on independent hardware.  Depending on your queries and 
index size that may make a big difference.


ryan


Rishabh Joshi wrote:

Hi,

I have a requirement and was wondering if someone could help me in how to go 
about it. We have to index about 8-9 million documents and their size can be 
anywhere from a few KBs to a couple of MBs. These documents are categorized 
into many 'groups' and 'sub-groups'. I wanted to know if we can create multiple 
indexes based on 'groups' and then on 'sub-groups' in Solr? If yes, then how do 
we go about it? I tried going through the section on 'Collections' in the Solr 
Wiki, but could not make much use of it.

Regards,
Rishabh Joshi









Re: Best way to create multiple indexes

2007-11-12 Thread Dwarak R

Hi Guys

How do we add word documents / pdf / text / etc documents in solr ?. How the 
content of the files are stored or indexed ?. Does the documents are stored 
as XML in the filesystem ?


Regards

Dwarak R
- Original Message - 
From: Ryan McKinley [EMAIL PROTECTED]

To: solr-user@lucene.apache.org
Sent: Monday, November 12, 2007 7:43 PM
Subject: Re: Best way to create multiple indexes


For starters, do you need to be able to search across groups or sub-groups 
(in one query?)


If so, then you have to stick everything in one index.

You can add a field to each document saying what 'group' or 'sub-group' it 
is in and then limit it at query time


 q=kittens +group:A

The advantage to splitting it into multiple indexes is that you could put 
each index on independent hardware.  Depending on your queries and index 
size that may make a big difference.


ryan


Rishabh Joshi wrote:

Hi,

I have a requirement and was wondering if someone could help me in how to 
go about it. We have to index about 8-9 million documents and their size 
can be anywhere from a few KBs to a couple of MBs. These documents are 
categorized into many 'groups' and 'sub-groups'. I wanted to know if we 
can create multiple indexes based on 'groups' and then on 'sub-groups' in 
Solr? If yes, then how do we go about it? I tried going through the 
section on 'Collections' in the Solr Wiki, but could not make much use of 
it.


Regards,
Rishabh Joshi











This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in error, 
please notify the sender[EMAIL PROTECTED]  immediately and delete the 
original. Any other use of the email by you is prohibited.


RE: Best way to create multiple indexes

2007-11-12 Thread Rishabh Joshi

Ryan,

We currently have 8-9 million documents to index and this number will grow in 
the future. Also, we will never have a query that will search across groups, 
but, we will have queries that will search across sub-groups for sure.
Now, keeping this in mind we were thinking if we could have multiple indexes at 
the 'group' level at least.
Also, can multiple indexes be created dynamically? For example: In my 
application if I create a 'logical group', then an index should be created for 
that group.

Rishabh

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED]
Sent: Monday, November 12, 2007 7:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Best way to create multiple indexes

For starters, do you need to be able to search across groups or
sub-groups (in one query?)

If so, then you have to stick everything in one index.

You can add a field to each document saying what 'group' or 'sub-group'
it is in and then limit it at query time

  q=kittens +group:A

The advantage to splitting it into multiple indexes is that you could
put each index on independent hardware.  Depending on your queries and
index size that may make a big difference.

ryan


Rishabh Joshi wrote:
 Hi,

 I have a requirement and was wondering if someone could help me in how to go 
 about it. We have to index about 8-9 million documents and their size can be 
 anywhere from a few KBs to a couple of MBs. These documents are categorized 
 into many 'groups' and 'sub-groups'. I wanted to know if we can create 
 multiple indexes based on 'groups' and then on 'sub-groups' in Solr? If yes, 
 then how do we go about it? I tried going through the section on 
 'Collections' in the Solr Wiki, but could not make much use of it.


 Regards,
 Rishabh Joshi








Re: Multiple indexes

2007-11-12 Thread Jae Joo
I have built the master solr instance and indexed some files. Once I run
snapshotter, i complains the error..  - snapshooter -d data/index (in
solr/bin directory)
Did I missed something?

++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2007/11/12 12:38:40 taking snapshot
/solr/master/solr/data/index/snapshot.20071112123840
+ [[ -n '' ]]
+ mv 
/solr/master/solr/data/index/temp-snapshot.20071112123840/solr/master/solr/data/index/snapshot.20071112123840
mv: cannot access /solr/master/solr/data/index/temp-snapshot.20071112123840
Jae

On Nov 12, 2007 9:09 AM, Ryan McKinley [EMAIL PROTECTED] wrote:


 just use the standard collection distribution stuff.  That is what it is
 made for! http://wiki.apache.org/solr/CollectionDistribution

 Alternatively, open up two indexes using the same config/dir -- do your
 indexing on one and the searching on the other.  when indexing is done
 (or finishes a big chunk) send commit/ to the 'searching' one and it
 will see the new stuff.

 ryan



 Jae Joo wrote:
  Here is my situation.
 
  I have 6 millions articles indexed and adding about 10k articles
 everyday.
  If I maintain only one index, whenever the daily feeding is running, it
  consumes the heap area and causes FGC.
  I am thinking the way to have multiple indexes - one is for ongoing
 querying
  service and one is for update. Once update is done, switch the index by
  automatically and/or my application.
 
  Thanks,
 
  Jae joo
 
 
  On Nov 12, 2007 8:48 AM, Ryan McKinley [EMAIL PROTECTED] wrote:
 
  The advantages of a multi-core setup are configuration flexibility and
  dynamically changing available options (without a full restart).
 
  For high-performance production solr servers, I don't think there is
  much reason for it.  You may want to split the two indexes on to two
  machines.  You may want to run each index in a separate JVM (so if one
  crashes, the other does not)
 
  Maintaining 2 indexes is pretty easy, if that was a larger number or
 you
  need to create indexes for each user in a system then it would be worth
  investigating the multi-core setup (it is still in development)
 
  ryan
 
 
  Pierre-Yves LANDRON wrote:
  Hello,
 
  Until now, i've used two instance of solr, one for each of my
  collections ; it works fine, but i wonder
  if there is an advantage to use multiple indexes in one instance over
  several instances with one index each ?
  Note that the two indexes have different schema.xml.
 
  Thanks.
  PL
 
  Date: Thu, 8 Nov 2007 18:05:43 -0500
  From: [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Subject: Multiple indexes
 
  Hi,
 
  I am looking for the way to utilize the multiple indexes for signle
  sole
  instance.
  I saw that there is the patch 215  available  and would like to ask
  someone
  who knows how to use multiple indexes.
 
  Thanks,
 
  Jae Joo
  _
  Discover the new Windows Vista
  http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
 
 




Re: Search Multiple indexes In Solr

2007-11-08 Thread zx zhang
It is said that this new feather will be added in solr1.3, but I am not sure
about that.

I think the following  maybe useful for you:
https://issues.apache.org/jira/browse/SOLR-303
https://issues.apache.org/jira/browse/SOLR-255


2007/11/8, j 90 [EMAIL PROTECTED]:

 Hi, I'm new to Solr but very familiar with Lucene.

 Is there a way to have Solr search in more than once index, much like the
 MultiSearcher in Lucene ?

 If so how so I configure the location of the indexes ?



Re: Multiple indexes

2007-11-08 Thread John Reuning
I've had good luck with MultiCore, but you have to sync trunk from svn 
and apply the most recent patch in SOLR-350.


https://issues.apache.org/jira/browse/SOLR-350

-jrr

Jae Joo wrote:

Hi,

I am looking for the way to utilize the multiple indexes for signle sole
instance.
I saw that there is the patch 215  available  and would like to ask someone
who knows how to use multiple indexes.

Thanks,

Jae Joo





Search Multiple indexes In Solr

2007-11-07 Thread j 90
Hi, I'm new to Solr but very familiar with Lucene.

Is there a way to have Solr search in more than once index, much like the
MultiSearcher in Lucene ?

If so how so I configure the location of the indexes ?


Manage multiple indexes with Solr

2007-10-10 Thread ycrux
Hi guys !

Is it possible to configure Solr to manage different indexes depending on the 
added documents ?

For example:
* document 1, with uniq ID ui1 will be indexed in the indexA
* document 2, with uniq ID ui2 will be indexed in the indexB
* document 3, with uniq ID ui1 will be indexed in the indexA

Thus documents 1 and 3 are stored in index indexA and document 2 in 
index indexB.
In this case indexA and indexB are completely separate indexes on disk.

Thanks in advance

cheers
Y.



Re: Manage multiple indexes with Solr

2007-10-10 Thread Venkatraman S
i would be interested to know in both the cases :

Case 1 :
* document 1, with uniq ID ui1 will be indexed in the indexA
* document 2, with uniq ID ui2 will be indexed in the indexB
* document 3, with uniq ID ui3 will be indexed in the indexA

Case 2 :
* document 1, with uniq ID ui1 will be indexed in the indexA
* document 2, with uniq ID ui2 will be indexed in the indexB
* document 3, with uniq ID ui1 will be indexed in the indexA

-vEnKAt


  1   2   >