Re: Approach for Merge Database and Files

2018-06-26 Thread Angel Addati
Thank both. *"From your problem description, it looks like you want to gather the data from the DB and filesystem and combine them into a Solr document at index time, then index that document. " * Exactly. I don't know if the best approach is combine in index time or in query time. But I need

Re: Total Collection Size in Solr 7

2018-06-26 Thread Erick Erickson
Some work is being done on the admin UI, there are several JIRAs. Perhaps you'd like to join that conversation? We need to have input, especially in terms of what kinds of information would be useful from a practitioner's standpoint. Best, Erick On Mon, Jun 25, 2018 at 11:26 PM, Aroop Ganguly

AW: Adding tag to fq makes query return child docs instead of parent docs

2018-06-26 Thread Florian Fankhauser
_type_s:book} > acquisition_date_i:20180626 According to the documentation: https://lucene.apache.org/solr/guide/6_6/local-parameters-in-queries.html#LocalParametersinQueries-BasicSyntaxofLocalParameters You can't specify multiple localparams like that - it says "You may spe

Re: Approach for Merge Database and Files

2018-06-26 Thread Erick Erickson
>From your problem description, it looks like you want to gather the data from the DB and filesystem and combine them into a Solr document at index time, then index that document. Put enough information in Solr to fetch the document as necessary, often people don't put the entire file in Solr

Configuring load balancer for Kerberised Solr cluster

2018-06-26 Thread mosheB
We are trying to enable authentication mechanism in our Solr cluster using Kerberos authentication plugin. We use Active Directory as our KDC, each Solr node has its own SPN in the form of HTTP/@ and things are working as expected. Things are getting complicated while trying to configure our load

Re: Indexing Approach

2018-06-26 Thread Shawn Heisey
On 6/26/2018 8:24 AM, solrnoobie wrote: > - Each SP call will return 15 result sets. > - Each document can contain 300-1000 child documents. > - If the batch size is 1000, the child documents for each can contain > 300-1000 documents so that will eat up the 4g's allocated to the > application. If

Re: Total Collection Size in Solr 7

2018-06-26 Thread Aroop Ganguly
Hi Erick Sure I will look those jiras up. In the interim, is what Susmit suggested the only way to get the size info? Or is there something else you can recommend? Thanks Aroop > On Jun 26, 2018, at 6:53 AM, Erick Erickson wrote: > > Some work is being done on the admin UI, there are

Re: Indexing Approach

2018-06-26 Thread solrnoobie
Thanks for the tip. Although we have increased our application's heap to 4g and it is still not enough. I guess here are the things we think we did wrong: - Each SP call will return 15 result sets. - Each document can contain 300-1000 child documents. - If the batch size is 1000, the child

Re: Approach for Merge Database and Files

2018-06-26 Thread Erick Erickson
bq. I don't know if the best approach is combine in index time or in query time It Depends (tm). What is your goal? Let's say you have db_f1 and fm_f2 (db == from the database and fm = file data). If you want to form a Solr query like db_f1:something fm_f2:something_else you don't have much

Re: Total Collection Size in Solr 7

2018-06-26 Thread Erick Erickson
Aroop: Not that I know of. You could do a reasonable approximation by 1> check the index size (manually) with, say, 10M docs 2> check it again with 20M docs 3> use a match all docs query and do the math. That's clumsy but do-able. The reason I start with 10M and 20M is that index size does not

Re: Create an index field of type dictionary

2018-06-26 Thread Erick Erickson
Well, there's a multiValued field that's just a list of whatever (string, date, numeric, etc). What's the use-case? This feels like an "XY" problem. a "dictionary" type is usually some kind of structure that how want to have operate in a specific manner. Solr doesn't really deal at that level, it

Create an index field of type dictionary

2018-06-26 Thread Ritesh Kumar (Avanade)
Hello, Is it possible to create an index field of type dictionary. I have seen stringarry, datetime, bool etc. but I am looking for a field type like list of objects. Thanks [OCP Logo] Ritesh Avanade Infrastructure Team +1 (425) 588-7853 v-kur...@micrsoft.com

Re: Indexing Approach

2018-06-26 Thread Aroop Ganguly
Would you mind sharing details on 1. the Solr Cloud setup, how may nodes do you have at your disposal and how many shards do you have setup ? 2. The indexing technology, what are you using? Core java/.net threads ? Or a system like spark ? 3. Where do you see the exceptions? The indexer process

Indexing Approach

2018-06-26 Thread solrnoobie
We are currently having problems in out current production setup in solr. What we currently have is something like this: - Solr 6.6.3 (cloud mode) - 10 threads for indexing - 900k total documents - 500 documents per batch So in each thread, the process will call a stored procedure with a lot

Re: Total Collection Size in Solr 7

2018-06-26 Thread Susmit
Hi Aroop, i created a utility using solrzkclient api to read state.json, enumerated (one) replica for each shard and used /replication handler for size and added them up.. Sent from my iPhone > On Jun 25, 2018, at 7:24 PM, Aroop Ganguly wrote: > > Hi Team > > I am not sure how to ascertain

Re: Total Collection Size in Solr 7

2018-06-26 Thread Aroop Ganguly
I see, Thanks Susmit. I hoped there was something simpler, that could just be part of the collections view we now have in solr 7 admin ui. Or a at least a one stop api call. I guess this will be added in a later release. > On Jun 25, 2018, at 11:20 PM, Susmit wrote: > > Hi Aroop, > i created

Re: Indexing Approach

2018-06-26 Thread solrnoobie
1. We have 5 nodes and 3 zookeepers (will autoscale if needed) 2. We use java with the help of solrj / spring data for indexing. 3. We see the exception in our application so this is probably our fault and not solr's so I'm asking what is the best approach for documents with a lot of child

Re: Approach for Merge Database and Files

2018-06-26 Thread Peter Gylling Jørgensen
Hi, I would create a search alias, that contains the latest versions of the different collections. See: https://lucene.apache.org/solr/guide/7_3/collections-api.html#collections-api Then you use this alias to search for results You get better results if you define the same schema for all

Re: Solr Default query parser

2018-06-26 Thread Jason Gerlowski
The "Standard Query Parser" _is_ the lucene query parser. They're the same parser. As Shawn pointed out above, they're also the default, so if you don't specify any defType, they will be used. Though if you want to be explicit and specify it anyway, the value is defType=lucene Jason On Mon,

RE: Create an index field of type dictionary

2018-06-26 Thread Ritesh Kumar (Avanade)
Hey Eric, Thanks for response, it was a Sitecore related modifications we had to do to make it work. Thanks Ritesh -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, June 26, 2018 10:52 AM To: solr-user Subject: Re: Create an index field of type

Linux command to print top slow performing query (/get) from solr logs

2018-06-26 Thread Ganesh Sethuraman
Is there a way to print using Linux commands to print top slow performing queries from Solr 7 logs (/get handler or /select handler)? In the reverse sorted order across log files will be very useful and handy to trouble shoot Regards Ganesh

Change/Override Solrconfig.xml across collections

2018-06-26 Thread Ganesh Sethuraman
I would like to implement the Slow Query logging feature ( https://lucene.apache.org/solr/guide/6_6/configuring-logging.html#ConfiguringLogging-LoggingSlowQueries) across multiple collection without changing solrconfig.xml in each and every collection. Is that possible? I am using solr 7.2.1 If

Re: Indexing Approach

2018-06-26 Thread Shawn Heisey
On 6/26/2018 12:06 AM, solrnoobie wrote: We are having errors such as heap space error in our indexing so we decided to lower the batch size to 50. The problem with this is that sometimes it really does not help since 1 document can contain 1000 child documents and it will still have the heap

Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread neotorand
Thanks Erick, Though i saw this article in several places but never went through it seriously. Dont you think the below method is very exepensive autoParser.parse(input, textHandler, metadata, context); If the document size if bigger than it will need enough memory to hold the document(ie

Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread neotorand
Thanks Shawn, Yes I agree ERH is never suggested in production. I am writing my custom ones. Any pointer with this? What exactly i am looking is a custom indexing program to compile precisely the information that you need and send that to Solr. On the other hand i see the below method is very

Adding tag to fq makes query return child docs instead of parent docs

2018-06-26 Thread Florian Fankhauser
} acquisition_date_i:20180626 This works as expected. Now for some reason I want to exclude the above filter-query from a facet-query. Therefore I need to add a tag to the filter-query: q={!tag=datefilter}{!parent which=doc_type_s:book} acquisition_date_i:20180626 And now the error occures: Just

Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread Shawn Heisey
On 6/26/2018 7:13 AM, neotorand wrote: Dont you think the below method is very exepensive autoParser.parse(input, textHandler, metadata, context); If the document size if bigger than it will need enough memory to hold the document(ie ContentHandler). Any other alternative? I did find this:

Re: Adding tag to fq makes query return child docs instead of parent docs

2018-06-26 Thread Shawn Heisey
On 6/26/2018 7:22 AM, Florian Fankhauser wrote: Now for some reason I want to exclude the above filter-query from a facet-query. Therefore I need to add a tag to the filter-query: q={!tag=datefilter}{!parent which=doc_type_s:book} acquisition_date_i:20180626 According

Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread Erick Erickson
Well, if you were using ERH you'd have the same problem as it uses Tika. At least if you run Tika on some client somewhere, if you do have a document that blows out memory or has some other problem, your client can crash without taking Solr with it. That's one of the reasons, in fact, that we

Re: Total Collection Size in Solr 7

2018-06-26 Thread Aroop Ganguly
Hi Eric Thanks for the advice. One open question still, about point 1 below: how to get that magic number of size in GBs :) ? As I am mostly using streaming expressions, most of my fields are DocValues and not stored. I will look at the health endpoint to see what it gives me in connection

Approach for Merge Database and Files

2018-06-26 Thread angeladdati
Hi: I have two sources to indexing: Database: MetadataDB1, MetadataDB2, File Url... Files: MetadataF1, MetadataF2, File Url, Contain... I index the data base and the files. When I search, I need search and show the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1, MetadataF2,