Re: Joining across collections with Nested documents

2017-03-02 Thread Mikhail Khludnev
Related docs can be retrieved with https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents#TransformingResultDocuments-[subquery] but searching related docs is less ready. Here is a patch for query time join across collections https://issues.apache.org/jira/browse/SOLR-8297.

Re: Joining across collections with Nested documents

2017-03-02 Thread Walter Underwood
Make one collection with denormalized data. This looks like a relational, multi-table schema in Solr. That will be slow and painful. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 2, 2017, at 9:55 PM, Preeti Bhat

Joining across collections with Nested documents

2017-03-02 Thread Preeti Bhat
Hi All, I have two collections in solrcloud namely contact and company, they are in same solr instance. Company is relatively simpler document with id, Name, address etc... Coming over to Contact, this has the nested document like below. I would like to get the Company details using the

Re: How to update index after document expired.

2017-03-02 Thread XuQing Tan
SOLR gets the updated content from external source (by calling a REST api which returns xml content). so my question is how can I plug this logic in DocExpirationUpdateProcessorFactory, saying poll from external source and update indexing? for now i'm thinking to use a custom

Re: How to update index after document expired.

2017-03-02 Thread Alexandre Rafalovitch
Where would Solr get the updated content? Do you mean would it poll from external source to refresh? Then, no. And if it is pushed from external sources to Solr, then you just replace it as normal. Not sure if I understand your use-case exactly. Regards, Alex. http://www.solr-start.com/

How to update index after document expired.

2017-03-02 Thread XuQing Tan
Hi folks in our case, we have contents need to be refreshed periodically according to the TTL of each document. looks like DocExpirationUpdateProcessorFactory is a quite good fit except that it does delete the document only, but no way to update the indexing with the new document. I don't see

Re: Delta Import JDBC connection frame size larger than max length

2017-03-02 Thread Shawn Heisey
On 3/1/2017 8:48 AM, Liu, Daphne wrote: > Hello Solr experts, Is there a place in Solr (Delta Import > Datasource?) where I can adjust the JDBC connection frame size to 256 > mb ? I have adjusted the settings in Cassandra but I'm still getting > this error. NonTransientConnectionException: >

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Shawn Heisey
On 3/2/2017 8:04 AM, Caruana, Matthew wrote: > I’m currently performing an optimise operation on a ~190GB index with about 4 > million documents. The process has been running for hours. > > This is surprising, because the machine is an EC2 r4.xlarge with four cores > and 30GB of RAM, 24GB of

Re: Setting up to index multiple datastores

2017-03-02 Thread Shawn Heisey
On 3/2/2017 6:44 PM, Alexandre Rafalovitch wrote: > And if you are not using SolrCloud, you can have > collection=shard=core, so the terminology gets confused. But you can > definitely have many cores on one mail server. You can also make them > lazy, so not all cores have to be loaded. That would

Re: OR condition between !frange and normal query

2017-03-02 Thread Zheng Lin Edwin Yeo
Hi Emir, Thanks for your reply. For the query: q=_query_:"({!frange l=1}ms(startDate_dt,endDate_dt)" OR _query_:"startDate:[2000-01-01T00:00:00Z TO *] AND endDate:[2016-12-31T23:59:59Z]" Must the _query_ be one of the field in the index? I do not have any fields in the index that relates to

Re: Setting up to index multiple datastores

2017-03-02 Thread Alexandre Rafalovitch
And if you are not using SolrCloud, you can have collection=shard=core, so the terminology gets confused. But you can definitely have many cores on one mail server. You can also make them lazy, so not all cores have to be loaded. That would definitely allow you to have a core per user and only

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-02 Thread Alexandre Rafalovitch
What do you have for merge configuration in solrconfig.xml? You should be able to tune it to - approximately - whatever you want without doing the grand optimize: https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments Regards,

Re: Problem with facet and multivalued field

2017-03-02 Thread Sales
Yes, so the terms component will of course show me the same thing as the facet query, I am sure the facet query is not wrong. It shows ` in the values, no matter for which unique product key since there should be 0 of them since there is a splitby, was there something else you wanted me to look

Re: Setting up to index multiple datastores

2017-03-02 Thread Shawn Heisey
On 3/2/2017 2:58 PM, Daniel Miller wrote: > One of the many features of the Dovecot IMAP server is Solr support. > This obviously provides full-text-searching of stored mails - and it > works great. But...the focus of the Dovecot team and mailing list is > Dovecot configuration. I'm asking for

Re: Problem with facet and multivalued field

2017-03-02 Thread Erick Erickson
"should" is the operative term here. My guess is that the data you're putting in the index isn't what you think it is. I'd suggest you use the TermsComponent to examine the data actually in your index. Best, Erick On Thu, Mar 2, 2017 at 3:18 PM, Sales

Problem with facet and multivalued field

2017-03-02 Thread Sales
We are using Solr 4.10.4. I have a few Solr fields in schema.xml defined as follows: Both of them are loaded in via data-config.xml import handler, and they are defined there as: This has been working for years, but, lately, we have noticed

Setting up to index multiple datastores

2017-03-02 Thread Daniel Miller
One of the many features of the Dovecot IMAP server is Solr support. This obviously provides full-text-searching of stored mails - and it works great. But...the focus of the Dovecot team and mailing list is Dovecot configuration. I'm asking for some guidance on how I might optimize Solr.

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-02 Thread Caruana, Matthew
Yes, we already do it outside Solr. See https://github.com/ICIJ/extract which we developed for this purpose. My guess is that the documents are very large, as you say. Optimising was always an attempt to bring down the number of segments from 60+. Not sure how else to do that. > On 2 Mar

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-02 Thread Caruana, Matthew
I typically end up with about 60-70 segments after indexing. What configuration do you use to bring it down to 16? > On 2 Mar 2017, at 7:42 pm, Michael Joyner wrote: > > You can solve the disk space and time issues by specifying multiple segments > to optimize down to

Re: Does {!child} query support nested Queries ("v=")

2017-03-02 Thread Mikhail Khludnev
Hello, Frank! The closest equivalent would be q=+type:userAccount +givenName:test* And make sure please that it's parsed correctly with debugQuery=true. Can you also narrow the query to troubleshoot the difference? ahhh I probably understood.. shards results are merged by uniqueKey, can you

Re: Excessive Wire logging while indexing.

2017-03-02 Thread Erick Erickson
Glad to hear it's working. The trick (as you've probably discovered) is to properly map the meta-data to Solr fields. The extracting request handler does this, but the real underlying issue is that there's no real standard. Word docs might have "last_editor", PDFs might have just "author". And on

RE: Excessive Wire logging while indexing.

2017-03-02 Thread Phil Scadden
Got it all working with Tika and SolrJ. (Got the correct artifacts). Much faster now too which is good. Thanks very much for your help. Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of

Re: Using SOLR to search for Names from RDBMS

2017-03-02 Thread Alexandre Rafalovitch
You would absolutely want to read "Relevant Search" book first. It is based on Elasticsearch examples, but the concepts map to Solr (and there is an appendix). (The following is mostly for names, phone numbers, don't know about addresses) The core issue is that you will want to setup a bunch of

Re: OOM

2017-03-02 Thread Erick Erickson
When you restart, there are a bunch of threads that start up than can chew up stack space. If the message says something about "unable to start native thread" then it's not raw memory but the stack space. Doesn't really sound like this is your error, but thought I'd mention it. On Wed, Mar 1,

Using SOLR to search for Names from RDBMS

2017-03-02 Thread Bijesh EB
Hi All, First off, what a fabulous job you all are doing creating and supporting an open source solution! Great Work and many thanks for that. I am reasonably new to SOLR and our team is trying to integrate SOLR to a structured database to help with Searching Person Records (first name, last

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-02 Thread Michael Joyner
You can solve the disk space and time issues by specifying multiple segments to optimize down to instead of a single segment. When we reindex we have to optimize or we end up with hundreds of segments and very horrible performance. We optimize down to like 16 segments or so and it doesn't do

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Erick Erickson
It's _very_ unlikely that optimize will help with OOMs, so that's very probably a red herring. Likely the document that's causing the issue is very large or, perhaps, you're using the extracting processor and it might be a Tika issue, consider doing the Tika processing outside Solr if so, see:

Re: Excessive Wire logging while indexing.

2017-03-02 Thread Shawn Heisey
On 3/1/2017 6:59 PM, Phil Scadden wrote: > Exceptions never triggered but metadata was essentially empty except > for contentType, and content was always an empty string. I don’t know > what parser was doing, but I gave up and with the extractHandler route > instead which did at least build a full

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Caruana, Matthew
Thank you, these are useful tips. We were previously working with a 4GB heap and getting OOMs in Solr while updating (probably from the analysers) that would cause the index writer to close with what’s called a “tragic” error in the writer code. Only a hard restart of the service could bring

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Walter Underwood
6.4.0 added a lot of metrics to low-level calls. That makes many operations slow. Go back to 6.3.0 or wait for 6.4.2. Meanwhile, stop running optimize. You almost certainly don’t need it. 24 GB is a huge heap. Do you really need that? We run a 15 million doc index with an 8 GB heap (Java

Re: Distinguish exact match from wildcard match

2017-03-02 Thread Ahmet Arslan
Hi, how about q=code_text:bolt*=code_text:bolt Ahmet On Thursday, March 2, 2017 4:41 PM, Сергей Твердохлеб wrote: Hi, is there way to separate exact match from wildcard match in solr response? e.g. there are two documents: {code_text:bolt} and {code_text:bolter}. When

Solr 5.3.1: child query must only match non-parent docs

2017-03-02 Thread Kelly, Frank
Our customers are running this query where they have a filter on the parent objects (givenName, familyName etc) and then request the child objects ({!parent which etc) q=+(givenName:(+UserSearchControllerUTFN +1180460672*) familyName:(+UserSearchControllerUTFN +1180460672*)) +{!parent

Re: Boolean expression for spatial query

2017-03-02 Thread David Smiley
I recommend the MULTIPOINT approach. BTW if you go the route of multiple OR'ed sub-clauses, I recommend avoiding the _query_ syntax which predates Solr 4.x's (4.2?) ability to embed fully the sub-clauses more naturally; though you need to beware of the gotcha of needing to add a leading space.

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Otis Gospodnetić
Hi, It's simply expensive. You are rewriting your whole index. Why are you running optimize? Are you seeing performance problems you are trying to fix with optimize? Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training -

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Caruana, Matthew
Thank you. The question remains however, if this is such a hefty operation then why is it walking to the destination instead of running, so to speak? Is the process throttled in some way? > On 2 Mar 2017, at 16:20, David Hastings wrote: > > Agreed, and since it

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Otis Gospodnetić
Hi Matthew, I'm guessing it's the EBS. With EBS we've seen: * cpu.system going up in some kernels * low read/write speeds and maxed out IO at times Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ On

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread David Hastings
Agreed, and since it takes three times the space is part of the reason it takes so long, so that 190gb index ends up writing another 380 gb until it compresses down and deletes the two left over files. its a pretty hefty operation On Thu, Mar 2, 2017 at 10:13 AM, Alexandre Rafalovitch

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Alexandre Rafalovitch
Optimize operation is no longer recommended for Solr, as the background merges got a lot smarter. It is an extremely expensive operation that can require up to 3-times amount of disk during the processing. This is not to say yours is a valid question, which I am leaving to others to respond.

What is the bottleneck for an optimise operation?

2017-03-02 Thread Caruana, Matthew
I’m currently performing an optimise operation on a ~190GB index with about 4 million documents. The process has been running for hours. This is surprising, because the machine is an EC2 r4.xlarge with four cores and 30GB of RAM, 24GB of which is allocated to the JVM. The load average has been

Re: Distinguish exact match from wildcard match

2017-03-02 Thread Emir Arnautovic
Again, depending on your case, you can use functions in fl to return additional indicator if doc is exact match or not: q=code_text:bolt OR whatever=*,isExact:tf('code_text_exact', 'bolt') It will return isExact field with values >0 for any doc that has term 'bolt' in code_text_exact field.

Re: Arabic words search in solr

2017-03-02 Thread Steve Rowe
Hi Mohan, > On Feb 26, 2017, at 1:37 AM, mohanmca01 wrote: > > i searched with (bizNameAr: شرطة ازكي), and am getting: > […] > > the expected result is: "id": "82", > "bizNameAr": "شرطة عمان السلطانية - قيادة > شرطة محافظة الداخلية - -

Does {!child} query support nested Queries ("v=")

2017-03-02 Thread Kelly, Frank
This is Solr Cloud 5.3.1 I have a query like the following q={!child of="type:userAccount" v="givenName:test*”} Intent: Show me all children of the type:userAccount where userAccount.givenName:test* If I run the query multiple times I get a very different numFound difference 186,560 to

Re: Distinguish exact match from wildcard match

2017-03-02 Thread Alexandre Rafalovitch
You could still use scoring with distinct bands of values and include score field to see the assigned score. Then, on the client, you do rough grouping. You could try looking at highlighting, but that's probably computationally irrational for this purpose. You could try enabling debugging and

Re: minimal solrconfig example

2017-03-02 Thread Alexandre Rafalovitch
If you liked my minimal config, you may also appreciate the last presentation I did at the Lucene/Solr Revolution on deconstructing the examples. The slides are https://www.slideshare.net/arafalov/rebuilding-solr-6-examples-layer-by-layer-lucenesolrrevolution-2016 (the video is embedded at the

Re: Distinguish exact match from wildcard match

2017-03-02 Thread Сергей Твердохлеб
Hi Emir, Thanks for your answer. However in my case I really need to separate results, because I need to treat those resultsets differently. Thanks. 2017-03-02 15:57 GMT+02:00 Emir Arnautovic : > Hi Sergei, > > Usually you don't want to know which is which, but

Re: Distinguish exact match from wildcard match

2017-03-02 Thread Emir Arnautovic
Hi Sergei, Usually you don't want to know which is which, but you do want to have exact matches first. In case of simple queries and depending on your usecase, you can use score to make distinction. If "bolter" matches "bolt" because of some filters, you will need to index it in two fields

Distinguish exact match from wildcard match

2017-03-02 Thread Сергей Твердохлеб
Hi, is there way to separate exact match from wildcard match in solr response? e.g. there are two documents: {code_text:bolt} and {code_text:bolter}. When I search for "bolt" I want to get both results, but somehow grouped, so I can determine either it was found with exact or non-exact match.

messages in gc log not connected to gcs in indexing time

2017-03-02 Thread David Michael Gang
Hi all, When indexing data i get in the gc log messages like: 2017-03-02T10:43:17.872+: 1088.957: Total time for which application threads were stopped: 0.0002071 seconds, Stopping threads took: 0.888 seconds 2017-03-02T10:43:17.885+: 1088.970: Total time for which application

How to expose new Lucene field type to Solr

2017-03-02 Thread Mike Thomsen
Found this project and I'd like to know what would be involved with exposing its RestrictedField type through Solr for indexing and querying as a Solr field type. https://github.com/roshanp/lucure-core Thanks, Mike

Re: Arabic words search in solr

2017-03-02 Thread mohanmca01
Hi Stave, Any update on this. -- View this message in context: http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4323005.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: bin/solr -a doesn't work?

2017-03-02 Thread Markus Jelsma
Hi - don't bother anymore, it seems to work fine now. I don't know why, but it kept hanging without error message. Thanks, Markus -Original message- > From:Zheng Lin Edwin Yeo > Sent: Thursday 2nd March 2017 4:55 > To: solr-user@lucene.apache.org > Subject: Re:

Re: minimal solrconfig example

2017-03-02 Thread David Michael Gang
Thanks Charly. This is what i looked for. On Thu, Mar 2, 2017 at 11:07 AM David Michael Gang wrote: I use the latest version. Solr 6.4.1 On Thu, Mar 2, 2017 at 9:15 AM Aravind Durvasula wrote: Hi David, What is the solr version you are using?

Re: minimal solrconfig example

2017-03-02 Thread David Michael Gang
I use the latest version. Solr 6.4.1 On Thu, Mar 2, 2017 at 9:15 AM Aravind Durvasula wrote: > Hi David, > > What is the solr version you are using? > To get started, it's better to use the config file that comes out of the > box. > > Thanks, > Aravind > > > > -- > View

Re: OR condition between !frange and normal query

2017-03-02 Thread Emir Arnautovic
Hi Edwin, You can use subqueries: q=_query_:"({!frange l=1}ms(startDate_dt,endDate_dt)" OR _query_:"startDate:[2000-01-01T00:00:00Z TO *] AND endDate:[2016-12-31T23:59:59Z]" HTH, Emir On 02.03.2017 04:51, Zheng Lin Edwin Yeo wrote: Hi, Would like to check, how can we do an OR condition

Re: minimal solrconfig example

2017-03-02 Thread Charlie Hull
On 02/03/2017 06:58, David Michael Gang wrote: Hi all, I want to create my first solr collection I found an example of solrconfig here. https://github.com/apache/lucene-solr/blob/master/solr/example/files/conf/solrconfig.xml This is a file of more than thousand lines. As i understand this file