Just to give you a context of what I am talking about, I am collecting data
from different sources (such as articles, videos etc.). Moreover, I will be
doing enrichment on the data such as Entity Extraction. From my previous
experiment with Solr what I was doing is dumping all articles, videos meta
data into a single index (distributed into multiple shards). Now that made
the whole query very slow. So for entity extraction, I created another
index on the same shards and pushed entities there. This actually made
querying entities very quick as there was very little data on that index
(although it was residing on the same machine as the main index).

Based on that quick experiment, I was thinking if I  need to use another
approach for my data. For example, instead of just relying on Solr Cloud to
distribute my data on different shards, why don't I create another index
for each type of data I have, such as articles, videos and then perform
some sort of distributed search over them. Will that be better in some
sense, such as performance?

Which version of solr are you using?
Currently, I am using Solr 5.3. btw, I could not find segment info link. Is
it under Core Admin?

Regards,
Salman


On Fri, Nov 6, 2015 at 7:26 AM, Modassar Ather <modather1...@gmail.com>
wrote:

> Thanks for your response. I have already gone through those documents
> before. My point was that if I am using Solr Cloud the only way to
> distribute my indexes is by adding shards? and I don't have to do anything
> manually (because all the distributed search is handled by Solr Cloud).
>
> Yes as per my knowledge.
>
> How do I check how many segments are there in the index?
> You can see into the index folder manually. Which version of solr are you
> using? I don't remember exactly the start version but in the latest and
> Solr-5.2.1 there is a "Segments info" link available where you can see
> number of segments.
>
> Regards,
> Modassar
>
> On Thu, Nov 5, 2015 at 5:41 PM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Thanks for your response. I have already gone through those documents
> > before. My point was that if I am using Solr Cloud the only way to
> > distribute my indexes is by adding shards? and I don't have to do
> anything
> > manually (because all the distributed search is handled by Solr Cloud).
> >
> > What is the Xms and Xmx you are allocating to Solr and how much max is
> > used by
> > your solr?
> > Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB
> >
> > How many segments are there in the index? The more the segment the slower
> > is
> > the search.
> > How do I check how many segments are there in the index?
> >
> > Is this after you moved to solrcloud?
> > I have been using SolrCloud from the beginning.
> >
> > Regards,
> > Salman
> >
> >
> > On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather <modather1...@gmail.com>
> > wrote:
> >
> > > SolrCloud makes the distributed search easier. You can find details
> about
> > > it under following link.
> > > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
> > >
> > > You can also refer to following link:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> > >
> > > From size of your index I meant index size and not the total document
> > > alone.
> > > How many segments are there in the index? The more the segment the
> slower
> > > is the search.
> > > What is the Xms and Xmx you are allocating to Solr and how much max is
> > used
> > > by your solr?
> > >
> > > I doubt this as the slowness was happening for a long period of time.
> > > I mentioned this point as I have seen gc pauses of 30 seconds and more
> in
> > > some complex queries.
> > >
> > > I am facing delay of 2-3 seconds but previously I
> > > had delays of around 28 seconds.
> > > Is this after you moved to solrcloud?
> > >
> > > Regards,
> > > Modassar
> > >
> > >
> > > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari <salman.rah...@gmail.com
> >
> > > wrote:
> > >
> > > > Here is the current info
> > > >
> > > > How much memory is used?
> > > > Physical memory consumption: 5.48 GB out of 14 GB.
> > > > Swap space consumption: 5.83 GB out of 15.94 GB.
> > > > JVM-Memory consumption: 1.58 GB out of 3.83 GB.
> > > >
> > > > What is your index size?
> > > > I have around 70M documents distributed on 2 shards (so each shard
> has
> > > 35M
> > > > document)
> > > >
> > > > What type of queries are slow?
> > > > I am running normal queries (queries on a field) no faceting or
> > > highlights
> > > > are requested. Currently, I am facing delay of 2-3 seconds but
> > > previously I
> > > > had delays of around 28 seconds.
> > > >
> > > > Are there GC pauses as they can be a cause of slowness?
> > > > I doubt this as the slowness was happening for a long period of time.
> > > >
> > > > Are document updates/additions happening in parallel?
> > > > No, I have stopped adding/updating documents and doing queries only.
> > > >
> > > > This is what you are already doing. Did you mean that you want to add
> > > more
> > > > shards?
> > > > No, what I meant is that I read that previously there was a way to
> > chunk
> > > a
> > > > large index into multiple and then do distributed search on that as
> in
> > > this
> > > > article https://wiki.apache.org/solr/DistributedSearch. What I was
> > > looking
> > > > for how this is handled in Solr Cloud?
> > > >
> > > >
> > > > Regards,
> > > > Salman
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <
> > modather1...@gmail.com>
> > > > wrote:
> > > >
> > > > > What is your index size? How much memory is used? What type of
> > queries
> > > > are
> > > > > slow?
> > > > > Are there GC pauses as they can be a cause of slowness?
> > > > > Are document updates/additions happening in parallel?
> > > > >
> > > > > The queries are very slow to run so I was thinking to distribute
> > > > > the indexes into multiple indexes and consequently distributed
> > search.
> > > > Can
> > > > > anyone guide me to some sources (articles) that discuss this in
> Solr
> > > > Cloud?
> > > > >
> > > > > This is what you are already doing. Did you mean that you want to
> add
> > > > more
> > > > > shards?
> > > > >
> > > > > Regards,
> > > > > Modassar
> > > > >
> > > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <
> > salman.rah...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am using Solr cloud and I have created a single index that host
> > > > around
> > > > > > 70M documents distributed into 2 shards (each having 35M
> documents)
> > > > and 2
> > > > > > replicas. The queries are very slow to run so I was thinking to
> > > > > distribute
> > > > > > the indexes into multiple indexes and consequently distributed
> > > search.
> > > > > Can
> > > > > > anyone guide me to some sources (articles) that discuss this in
> > Solr
> > > > > Cloud?
> > > > > >
> > > > > > Appreciate your feedback regarding this.
> > > > > >
> > > > > > Regards,
> > > > > > Salman
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to