Re: Rename solrconfig.xml

2018-02-26 Thread Zheng Lin Edwin Yeo
Regarding the core.properties, understand from the Solr guide that we need to define the "config" properties first. However, my core.properties will only be created when I create the collection from the command http://localhost:8983/solr/admin/collections?action=CREATE=collection The

Question on "other language" than english stemmers and using both

2018-02-26 Thread TG Servers
Hi, I currently adapted this schema.xml for dovecot and Solr 7.2.1. Now this is stemming only english words. What do I have to do to use it for english AND german? Can I just put the according german filterfactorys appended to it or does that not work? E.g. ... ... Thanks, Thomas Original

Re: Spark-Solr connector version to be used

2018-02-26 Thread Uday Jami
Thanks Shawn. Sure, will check with Lucidworks. Thanks, Uday On Tue, Feb 27, 2018 at 12:09 PM, Shawn Heisey wrote: > On 2/26/2018 10:51 PM, Uday Jami wrote: > >> I am having solr 5.5.2 in my HDP cluster. To integrate it with spark >> version 2.2.0 can i use the latest

Re: Spark-Solr connector version to be used

2018-02-26 Thread Shawn Heisey
On 2/26/2018 10:51 PM, Uday Jami wrote: I am having solr 5.5.2 in my HDP cluster. To integrate it with spark version 2.2.0 can i use the latest spark-solr connector version 3.3.4 . As per the

Spark-Solr connector version to be used

2018-02-26 Thread Uday Jami
Hello All, I am having solr 5.5.2 in my HDP cluster. To integrate it with spark version 2.2.0 can i use the latest spark-solr connector version 3.3.4 . As per the

is it appropriate to use external cache for whole shards

2018-02-26 Thread park
I'm indexing and searching documents using solr 6.x. It is quite efficient when there are fewer shards and fewer cluster units. However, when the number of shards exceeds 30 and the size of each shard is 30G, the search performance is significantly reduced. Currently, usercache in solr is actively

Re: Rename solrconfig.xml

2018-02-26 Thread @Nandan@
You can change into core config file and then you can use any name . As i used as table_solrconfig.xml Same concept will applicable with schema.xml file too. On Feb 27, 2018 11:11 AM, "Zheng Lin Edwin Yeo" wrote: Hi Alexandre, Thanks for your reply. Will this cause other

Re: Rename solrconfig.xml

2018-02-26 Thread Zheng Lin Edwin Yeo
Hi Alexandre, Thanks for your reply. Will this cause other issues with the functionality if it is renamed? Regards, Edwin On 27 February 2018 at 07:15, Alexandre Rafalovitch wrote: > I believe this can be set with "config" property in the > core.properties file: >

Re: Rename solrconfig.xml

2018-02-26 Thread Alexandre Rafalovitch
I believe this can be set with "config" property in the core.properties file: https://lucene.apache.org/solr/guide/7_2/defining-core-properties.html#defining-core-properties Whether it is a good idea longer term, is a different question. Regards, Alex. On 23 February 2018 at 18:06, Zheng Lin

SOLR Similarity Difference

2018-02-26 Thread Hodder, Rick
I'm converting SOLR 4.10.2 to SOLR 7.1 I have the following three strings in both SOLR cores Action Technical Temporaries t/a CTR Corporation Action Technical Temporaries Action Technical Temporar If I search IDX_CompanyName: (Action AND Technical AND Temporaries AND t/a AND CTR AND

Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
On Mon, Feb 26, 2018 at 7:14 PM, Erick Erickson wrote: > > Faceting works on multivalued fields, perhaps you can do something with > that? > > The main difference I see in this case between facets and groups is that groups are sorted by score, so most relevant group

Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches

2018-02-26 Thread Webster Homer
Erick, No we didn't look at that. I will add it to the list. We have not seen performance issues with solr. We have much slower technologies in our stack. This project was to replace a system that was too slow. Thank you, I will look into it Webster On Mon, Feb 26, 2018 at 1:13 PM, Erick

Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches

2018-02-26 Thread Erick Erickson
Did you try enabling distributed IDF (statsCache)? See: https://lucene.apache.org/solr/guide/6_6/distributed-requests.html It's may not totally fix the issue, but it's worth trying. It does come with a performance penalty of course. Best, Erick On Mon, Feb 26, 2018 at 11:00 AM, Webster Homer

Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches

2018-02-26 Thread Webster Homer
Thanks Shawn, I had settled on this as a solution. All our use cases for Solr is to return results in order of relevancy to the query, so having a deterministic sort would defeat that purpose. Since we wanted to be able to return all the results for a query, I originally looked at using the

Re: configure jetty to use both http1.1 and H2

2018-02-26 Thread Jeff Dyke
Thanks for the reply Shawn, i certainly would not want to request a change to solr code to support H2, for my needs. I'll leave that up to the Solr team. :) as i don't see H/2 being a real improvement for something like solr. On Mon, Feb 26, 2018 at 10:29 AM, Shawn Heisey

Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches

2018-02-26 Thread Shawn Heisey
On 2/26/2018 10:26 AM, Webster Homer wrote: > We need the results by relevancy so the application sorts the results by > score desc, and the unique id ascending as the tie breaker This is the reason for the discrepancy, and why the different replica types don't have the same issue. Each NRT

Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Erick Erickson
Of course, and in that use-case you'd want a particular document to appear in all three categories. Another client may want the doc to appear in only the "most important" category, however that's defined. Another client may want the doc to appear in "the more recent" day (assuming we're grouping

NRT replicas miss hits and return duplicate hits when paging solrcloud searches

2018-02-26 Thread Webster Homer
I have an application which implements several different searches against a solrcloud collection. We are using Solr 7.2 and Solr 6.1 The collection b2b-catalog-material is created with the default Near Real Time (NRT) replicas. The collection has 2 shards each with 2 replicas. The application

Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
Hi Erick, please consider this case where there is a group products that are televisions. Now I have only one category per product, but in same cases like the television I could have more than one. Some products should be available simultaneously in more categories, thats why the field I was

Re: Challenges of Indexing Email

2018-02-26 Thread Erick Erickson
That's what the "Fix Version" field in the JIRA is for. For any "fixed" JIRA that field contains the Solr release it will be in, in this case 7.3. The release process for 7.3 will start in a couple of weeks, with the official release a week or so after that unless there are problems. Best, Erick

Re: Solr 7. Why UpdateRequestProcessorChain does not take any effect?

2018-02-26 Thread FiMko
Hi Shawn, Adding the default="true" helped! Why I keep this in the update chain because I'm doing some more transformations later on: Thanks a lot! -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Challenges of Indexing Email

2018-02-26 Thread Terry Steichen
Thanks Karthik. (1) I thought the fix would be in 7.2.1, but it is not.  Any idea when it will be available? (2) Is there any way to force Solr indexing to treat an email message (or thread) as plain text? Terry On 02/26/2018 10:37 AM, Karthik Ramachandran wrote: > There is bug report for

Re: FileDictionaryFactory:- pick source file from solr instead of zk config.

2018-02-26 Thread Erick Erickson
You can also change this limitation for ZooKeeper by setting jute.maxBuffer with a system variable, see: https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html. the 1M limit was chosen since in the "usual" case, ZooKeeper should only have relatively small files as far as Solr is concerned. If

Re: StandardTokenizer and splitting on mixedcase strings

2018-02-26 Thread Erick Erickson
Dan: The admin UI analysis page is invaluable for understanding exactly what element of your analysis chain does what. So when you restructure your analysis chain you can use it to see if the input transforms the way you want it to. Best, Erick On Mon, Feb 26, 2018 at 7:21 AM, Shawn Heisey

Re: Solr 7. Why UpdateRequestProcessorChain does not take any effect?

2018-02-26 Thread Shawn Heisey
On 2/26/2018 5:57 AM, FiMko wrote: > Can you say how to download schema in Solr 7 world? Because the schema.xml > approach has been obsoleted. The filename has changed in recent examples.  The file will be most likely be called managed-schema, not schema.xml.  The file may included a warning

Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Erick Erickson
What does "group by" mean on a field with more than one value? Say I have "A" and "B" in the field in a single document. What group does it go in, one labeld "A" or one labeled "B"? So IIUC, rather than do something which will be wrong it throws an error if the field is defined as multiValued.

Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
Hi Amrit, thanks for your help. I know that only 5/10% of documents in the collection have more than one value for the field I was trying to group by. So there isn't a particular memory usage in this case. Do you know if there is any other counter-indication I have to be aware of? I was

Re: Challenges of Indexing Email

2018-02-26 Thread Karthik Ramachandran
There is bug report for this https://issues.apache.org/jira/browse/SOLR-11622 which is fixed for future release. Before running into this issue we were running 6.4.2 which did not have this bug. On Mon, Feb 26, 2018 at 9:59 AM, Terry Steichen wrote: > I am using Solr 7.2.1

Re: Indexing timeout issues with SolrCloud 7.1

2018-02-26 Thread Shawn Heisey
On 2/23/2018 2:40 PM, Tom Peters wrote: > I included the last 25 lines from the logs from each of the five nodes during > that time period. Did you perhaps grep for "ERROR" and include the last 25 lines of that?  This excludes the vast majority of the information from the error, information that

Re: SOLR Score Range Changed

2018-02-26 Thread Shawn Heisey
On 2/23/2018 2:28 PM, Hodder, Rick wrote: > Combining everything into one query is what I'd prefer because as you said, > one would think that with everything in the same query, the score would > organize everything nicely. I don't recall writing anything like that.  How did you infer that from

Re: configure jetty to use both http1.1 and H2

2018-02-26 Thread Shawn Heisey
On 2/23/2018 1:28 PM, Jeff Dyke wrote: > Answering a bit of my own question, the underlying jetty would have to be > built with it, and get pushed into its jar directory. > > I think i'll put nginx in front of this, do a quick proxy forcing 1.1 and > move on, but if anyone knows any tricks, it'll

Re: StandardTokenizer and splitting on mixedcase strings

2018-02-26 Thread Shawn Heisey
On 2/23/2018 10:55 AM, Rick Leir wrote: > Lowercase filter before the tokenizer? Unless somebody invents a lowercasing CharFilter, which I don't think exists currently, that's not possible. Groups of Solr analysis components always run in the following order: First CharFilter entries are run.

Re: At which solr version was "Managed-schema" set as default?

2018-02-26 Thread Shawn Heisey
On 2/23/2018 10:40 AM, BlackIce wrote: > My idea was, in order to avoid confusion, which has been arising when > people start to modify schema.xml,... is to set Solr back into > "schema-mode" by having the user issue a command like SED or AWK in order > to add the corresponding command line into

Re: Upgrading from 4.6 to 7.2.1 SolrJ ContentStreamUpdateRequest problem

2018-02-26 Thread Shawn Heisey
On 2/26/2018 5:05 AM, Gabi wrote: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://127.0.0.1:8983/solr/collection1: ERROR: [doc=...] > unknown field provincecodeuser > > or same error with unknown field 'iddoc' > > Error is because it's

Challenges of Indexing Email

2018-02-26 Thread Terry Steichen
I am using Solr 7.2.1 and trying to index (among other documents) individual emails and collected email threats.  Ideally, the indexing would parse the email messages into their constituent fields.  But, for my purposes, an acceptable alternative is to merely index the messages a unstructured

Re:FileDictionaryFactory:- pick source file from solr instead of zk config.

2018-02-26 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
A similar problem came out with learning to rank models, and was fixed by https://issues.apache.org/jira/browse/SOLR-11250 Maybe it can be useful.. From: solr-user@lucene.apache.org At: 02/26/18 13:13:28To: solr-user@lucene.apache.org Subject: FileDictionaryFactory:- pick source file from

Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Amrit Sarkar
Vincenzo, As I read the source code; SchemaField.java /** * Sanity checks that the properties of this field type are plausible * for a field that may be used to get a FieldCacheSource, throwing * an appropriate exception (including the field name) if it is not. * FieldType subclasses can

Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
Hi, while trying to run a group query on a multivalue field I received this error: can not use FieldCache on multivalued field: true 400 4 org.apache.solr.common.SolrException org.apache.solr.common.SolrException can not use FieldCache on multivalued field:

FileDictionaryFactory:- pick source file from solr instead of zk config.

2018-02-26 Thread parthu
Hello there, I want to use Solr as a spelling server so created on e collection for Spell check, my source txt file size is greater than 2MB which is not supported by zookeeper (max zonde size is 1MB) and sourceLocation property of FileBasedSpellChecker is trying to find that file inside config

Re: Solr 7. Why UpdateRequestProcessorChain does not take any effect?

2018-02-26 Thread FiMko
Hi Shawn, Appreciate your help! Can you say how to download schema in Solr 7 world? Because the schema.xml approach has been obsoleted. BTW I have just commented out another chain "add-unknown-fields-to-the-schema", then restarted Solr. If I create a document updateRequestProcessorChain

Re: security authentication API via solrj?

2018-02-26 Thread Peter Sturge
Hi, Thanks for your response. I've done this using the 'raw' rest style, as I'm not familiar enough with the new solrj client. It would be quite nice to have a native solrj class for handling security mgt operations (add/delete users, roles etc.)..kind of like the

Upgrading from 4.6 to 7.2.1 SolrJ ContentStreamUpdateRequest problem

2018-02-26 Thread Gabi
We're trying to upgrade from and old standalone 4.6.0 installation to a 7.2.1 new one. After some work (of course it's not as easy as install and change schema.xml and data-config.xml file) it seems to be working with the database but not with documents. Database information is loaded

Re: LTR and 'searching' a streaming expression result

2018-02-26 Thread Gintautas Sulskus
Thanks very much, Joel. On Fri, Feb 23, 2018 at 4:36 PM, Joel Bernstein wrote: > In the scenario you describe above the answer is no. That's because the > joins rely on the sort order of the result set and require exporting of the > entire result set. Both those requirements

Re: Solr Phrase Count : How to get count of a phrase in a text field solr

2018-02-26 Thread Emir Arnautović
For start you don’t have to store it. Also, is 10 words shingle really needed? Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 24 Feb 2018, at 16:58, aneeshkappu wrote: >