Filtering group query results

2018-10-04 Thread Greenhorn Techie
Hi, We have a requirement where we need to perform a group query in Solr where results are grouped by user-name (which is a field in our indexes) . We then need to filter the results based on numFound response parameter present under each group. In essence, we want to return results only where

Re: Multiple Queries per request

2018-10-02 Thread Greenhorn Techie
wrote: The Solr uses REST based calls which is done over http or https which cannot handle multiple requests at one shot. However what you can do is return all the necessary data at one shot and group them according to your needs. Thanks and regards, Shamik On 02-Oct-2018 8:11 PM, "Greenh

Multiple Queries per request

2018-10-02 Thread Greenhorn Techie
Hi, We are building a mobile app which would display results from Solr. At the moment, the idea is to have multiple widgets / areas on the mobile screen, with each area being served by a distinct Solr query. For example first widget would be display customer’s aggregated product usage, second

Metrics for a healthy Solr cluster

2018-08-16 Thread Greenhorn Techie
Hi, Solr provides numerous JMX metrics for monitoring the health of the cluster. We are setting up a SolrCloud cluster and hence wondering what are the important parameters / metrics to look into, to ascertain that the cluster health is good. Obvious things comes to my mind are CPU utilisation

Re: Calculating maxShardsPerNode

2018-08-13 Thread Greenhorn Techie
at 8:37 AM, Greenhorn Techie wrote: > Hi, > > Our cluster is a 20 node with numShards expected to be set to 10 and > replication expected to be 4. Wondering what is the best value to > set maxShardsPerNode to? Should I consider only numShards while calculating > the value i.e. beca

Calculating maxShardsPerNode

2018-08-13 Thread Greenhorn Techie
Hi, Our cluster is a 20 node with numShards expected to be set to 10 and replication expected to be 4. Wondering what is the best value to set maxShardsPerNode to? Should I consider only numShards while calculating the value i.e. because I have only 10 shards, should I set maxShardsPerNode to 1or

Re: CDCR traffic

2018-07-09 Thread Greenhorn Techie
Amrit, Further to the below conversation: As I understand, Solr supports SSL encryption between nodes within a Solr cluster and as well communications to and from clients. In the case of CDCR, assuming both the source and target clusters are SSL enabled, can we say that the source clusters’

Re: Solr Kerberos Authentication

2018-07-09 Thread Greenhorn Techie
Hi, Any thoughts on this please? Thanks On 5 July 2018 at 15:06:26, Greenhorn Techie (greenhorntec...@gmail.com) wrote: Hi, In the solr documentation, it is mentioned that blockUnknown property for Authentication plugin has the default value of false, which means any authenticated users

Re: Unbale to Create a Core

2018-07-06 Thread Greenhorn Techie
Erick, Good Evening!! A question further on the below. If schema-oriented design is recommended for production systems, then how should we design such that it production systems would cater for inevitable schema changes? Should we reindex the data and rebuild the collections again? Thanks On

Solr Kerberos Authentication

2018-07-05 Thread Greenhorn Techie
Hi, In the solr documentation, it is mentioned that blockUnknown property for Authentication plugin has the default value of false, which means any authenticated users will be allowed to use Solr. However, wondering whether this parameter only makes sense for Basic Authentication only or does it

CloudSolrClient - setDefaultCollection

2018-06-21 Thread Greenhorn Techie
Hi, While indexing, is there going to be any performance benefit to set the collection name first using setDefaultCollection (String

Re: Solr start script

2018-06-07 Thread Greenhorn Techie
? Thanks On 7 June 2018 at 15:08:02, Shawn Heisey (apa...@elyograg.org) wrote: On 6/7/2018 7:37 AM, Greenhorn Techie wrote: > When the above settings are passed as part of start script, does that mean > whenever a new collection is created, Solr is going to store the indexes in > HDFS?

Re: HDP Search - Configuration & Data Directories

2018-06-07 Thread Greenhorn Techie
Thanks Shawn. Will check with Hortonworks! On 7 June 2018 at 14:19:43, Shawn Heisey (apa...@elyograg.org) wrote: On 6/7/2018 6:35 AM, Greenhorn Techie wrote: > A quick question on configuring Solr with Hortonworks HDP. I have installed > HDP and then installed HDP Search using the

Solr start script

2018-06-07 Thread Greenhorn Techie
Hi, For our project purposes, we need to store Solr collections on HDFS. While exploring the documentation for the same, I have found lucidworks documentation ( https://doc.lucidworks.com/lucidworks-hdpsearch/3.0.0/Guide-Install-Manual.html#hdfs-specific-changes) , where it has been mentioned

Running Solr on HDFS - Disk space

2018-06-07 Thread Greenhorn Techie
Hi, As HDFS has got its own replication mechanism, with a HDFS replication factor of 3, and then SolrCloud replication factor of 3, does that mean each document will probably have around 9 copies replicated underneath of HDFS? If so, is there a way to configure HDFS or Solr such that only three

HDP Search - Configuration & Data Directories

2018-06-07 Thread Greenhorn Techie
Hi, A quick question on configuring Solr with Hortonworks HDP. I have installed HDP and then installed HDP Search using the steps described under the link - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_solr-search-installation/content/hdp-search30-install-mpack.html I have used

Re: SolrCloud Collection Backup - Solr 5.5.4

2018-06-04 Thread Greenhorn Techie
Thanks Shawn for your detailed reply. It has helped to better my understanding. Below is my summarised understanding. In a SolrCloud setup with version less than 6.1, there is no ‘elegant’ way of handling collection backups and restore. Instead, have to use the manual backup and restore APIs

SolrCloud Collection Backup - Solr 5.5.4

2018-06-01 Thread Greenhorn Techie
Hi, We are running SolrCloud with version 5.5.4. As I understand, Solr Collection Backup and Restore API are only supported from version 6 onwards. So wondering what is the best mechanism to get our collections backed-up on older Solr version. When I ran backup command on a particular node (curl

Impact of timeAllowed parameter

2018-05-31 Thread Greenhorn Techie
Hi, Wondering how would be the calling application informed that the search request has been impacted due to time-out vs it has completed normally? Is there something that is sent to the client as part of the response that time-out has been invoked? Thanks

Re: Navigating through Solr Source Code

2018-05-21 Thread Greenhorn Techie
Thanks for your responses. Best Regards! On 21 May 2018 at 16:40:10, Shawn Heisey (apa...@elyograg.org) wrote: On 5/21/2018 4:35 AM, Greenhorn Techie wrote: > As the documentation around Solr is limited, I am thinking to go through > the source code and understand the various bits and

Navigating through Solr Source Code

2018-05-21 Thread Greenhorn Techie
Hi, As the documentation around Solr is limited, I am thinking to go through the source code and understand the various bits and pieces. However, I am a bit confused on where to start as I my developing skills are a bit limited. Any thoughts on how best to start / where to start looking into

Re: SolrCloud replicaition

2018-05-03 Thread Greenhorn Techie
ple experiment, just take one replica of a two-replica system down and specify min_rf of 2. Best, Erick On Wed, May 2, 2018 at 9:20 PM, Greenhorn Techie <greenhorntec...@gmail.com> wrote: > Shalin, > > Given the earlier response by Erick, wondering when this scenario occurs &g

Re: SolrCloud replicaition

2018-05-02 Thread Greenhorn Techie
, Greenhorn Techie <greenhorntec...@gmail.com> wrote: > Hi, > > Good Morning!! > > In the case of a SolrCloud setup with sharing and replication in place, > when a document is sent for indexing, what happens when only the shard > leader has indexed the document, but the re

Re: Solr Heap usage

2018-05-02 Thread Greenhorn Techie
Thanks Shawn for the inputs, which will definitely help us to scale our cluster better. Regards On 2 May 2018 at 18:15:12, Shawn Heisey (apa...@elyograg.org) wrote: On 5/1/2018 5:33 PM, Greenhorn Techie wrote: > Wondering what are the considerations to be aware to arrive at an optimal >

Re: Indexing throughput

2018-05-02 Thread Greenhorn Techie
should speed up linearly with the number of shards, > because those are indexing in parallel. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On May 2, 2018, at 9:58 AM, Greenhorn Techie <greenhorntec...@gmail.com>

Indexing throughput

2018-05-02 Thread Greenhorn Techie
Hi, The current hardware profile for our production cluster is 20 nodes, each with 24cores and 256GB memory. Data being indexed is very structured in nature and is about 30 columns or so, out of which half of them are categorical with a defined list of values. The expected peak indexing

SorCloud Sharding

2018-05-02 Thread Greenhorn Techie
Hi, I have few questions on sharding in a SolrCloud setup: 1. How to know the optimal number of shards required for a SolrCloud setup? What are the factors to consider to decide on the value for *numShards* parameter? 2. In case if over sharding has been done i.e. if numShards has been set to a

SolrCloud replicaition

2018-05-02 Thread Greenhorn Techie
Hi, Good Morning!! In the case of a SolrCloud setup with sharing and replication in place, when a document is sent for indexing, what happens when only the shard leader has indexed the document, but the replicas failed, for whatever reason. Will the document be resent by the leader to the

Solr Heap usage

2018-05-01 Thread Greenhorn Techie
Hi, Wondering what are the considerations to be aware to arrive at an optimal heap size for Solr JVM? Though I did discuss this on the IRC, I am still unclear on how Solr uses the JVM heap space. Are there any pointers to understand this aspect better? Given that Solr requires an optimally

Query Regarding Solr Garbage Collection

2018-05-01 Thread Greenhorn Techie
Hi, Following the https://wiki.apache.org/solr/SolrPerformanceFactors article, I understand that Garbage Collection might be triggered due to significant increase in JVM heap usage unless a commit is performed. Given this background, I am curious to understand the reasons / factors that

Re: SolrCloud Heterogenous Hardware setup

2018-05-01 Thread Greenhorn Techie
heck CHANGES.txt. Best, Erick On Tue, May 1, 2018 at 7:59 AM, Greenhorn Techie <greenhorntec...@gmail.com> wrote: > Hi, > > We are building a SolrCloud setup, which will index time-series data. Being > time-series data with write-once semantics, we are planning to have >

SolrCloud Heterogenous Hardware setup

2018-05-01 Thread Greenhorn Techie
Hi, We are building a SolrCloud setup, which will index time-series data. Being time-series data with write-once semantics, we are planning to have multiple collections i.e. one collection per month. As per our use case, end users should be able to query across last 12 months worth of data, which

Re: Solr DR Replication

2017-12-07 Thread Greenhorn Techie
Any thoughts / help on this please. Thanks in advance. On Wed, 6 Dec 2017 at 16:21 Greenhorn Techie <greenhorntec...@gmail.com> wrote: > Hi, > > We are on Solr 5.5.2 and wondering what is the best mechanism for > replicating Solr indexes from a Disaster Recovery perspective.

Time-Series data indexing into Solr

2017-12-07 Thread Greenhorn Techie
Hi, Is there any recommended approach to index and search time-series data in Solr? Thanks in Advance.

Solr DR Replication

2017-12-06 Thread Greenhorn Techie
Hi, We are on Solr 5.5.2 and wondering what is the best mechanism for replicating Solr indexes from a Disaster Recovery perspective. As I understand only from Solr6 onwards, we have CDCR. However, I couldn't find much content around index replication management for older versions. Wondering if

Re: Solr on HDFS vs local storage - Benchmarking

2017-11-22 Thread Greenhorn Techie
gt; thus it is possible to move the replica to a different instance on a a > different host. > > regards, > Hendrik > > On 22.11.2017 14:59, Greenhorn Techie wrote: > > Hi, > > > > Good Afternoon!! > > > > While the discussion around issues related t

Solr on HDFS vs local storage - Benchmarking

2017-11-22 Thread Greenhorn Techie
Hi, Good Afternoon!! While the discussion around issues related to "Solr on HDFS" is live, I would like to understand if anyone has done any performance benchmarking for both Solr indexing and search between HDFS vs local file system. Also, from experience, what would the community folks

Solr / HDPSearch related

2017-11-10 Thread Greenhorn Techie
Hi, We have a HDP product cluster and are now planning to build a search solution for some of our business requirements. In this regard, I have the following questions. Can you please answer the below questions with respect to Solr? - As I understand, it is more performant to have SolrCloud

Solr Capacity Planning

2017-06-17 Thread Greenhorn Techie
Hi, We are planning to setup a Solr cloud for building a search application on huge volumes of data points (~hundreds of billions of solr documents) I would like to understand if there is any recommendation on how to size the infrastructure and hardware requirements for Solr clusters. Also, what