RE: How to Apply 'implicit' routing in exist collection in solr 6.1.0

2017-04-04 Thread Ketan Thanki
Thanks Anshum, I have got some understanding regarding to it and i need to implement implicit routing for insert and retrieve documents from specific shard based on the id which I have use as router field . I have try by make changes on core.properties. but it can't work So can u please let me

Re: Number of shards - Best practice

2017-04-04 Thread Walter Underwood
> On Apr 4, 2017, at 7:38 PM, Muhammad Imad Qureshi > wrote: > > Hi > I was recently told that ideally the number of shards in a SOLR cluster > should be equal to a power of 2. If this is indeed a best practice, then what > is the rationale behind this

Solrj HttpSolrServer retryHandler

2017-04-04 Thread Lasitha Wattaladeniya
Hi folks, Is there a API to implement a retryHandler in HttpSolrServer ? I'm using solrj 4.10.4 Lasitha Wattaladeniya Software Engineer Mobile : +6593896893 Blog : techreadme.blogspot.com

Number of shards - Best practice

2017-04-04 Thread Muhammad Imad Qureshi
Hi I was recently told that ideally the number of shards in a SOLR cluster should be equal to a power of 2. If this is indeed a best practice, then what is the rationale behind this recommendation? ThanksImad

RE: Solr performance issue on indexing

2017-04-04 Thread Allison, Timothy B.
> Also we will try to decouple tika to solr. +1 -Original Message- From: tstusr [mailto:ulfrhe...@gmail.com] Sent: Friday, March 31, 2017 4:31 PM To: solr-user@lucene.apache.org Subject: Re: Solr performance issue on indexing Hi, thanks for the feedback. Yes, it is about OOM, indeed

RE: JSON facet bucket list not correct with sharded query

2017-04-04 Thread Karthik Ramachandran
Since the attachment was removed sending the code. import java.util.List; import java.util.Random; import java.util.UUID; import org.apache.solr.client.solrj.SolrRequest.METHOD; import org.apache.solr.client.solrj.impl.HttpSolrClient; import org.apache.solr.client.solrj.response.QueryResponse;

JSON facet bucket list not correct with sharded query

2017-04-04 Thread Karthik Ramachandran
We are using JSON facet to list files that are duplicate(mincount: 2) in pages, after 2-3 page we don't any result even though there are more results. Schema: Query:

Re: edismax parsing confusion

2017-04-04 Thread Greg Pendlebury
Try declaring your mm as 1 then and see if that assumption is correct. Default 'mm' values are complicated to describe and depend on a variety of factors. Generally if you want it to be a certain value, just declare it. On 5 April 2017 at 02:07, Abhishek Mishra wrote: >

Re: Solr Shingle is not working properly in solr 6.5.0

2017-04-04 Thread Steve Rowe
Hi Aman, I’ve created for this problem. -- Steve www.lucidworks.com > On Mar 31, 2017, at 7:34 AM, Aman Deep Singh > wrote: > > Hi Rich, > Query creation is correct only thing what causing the problem is that >

Problems creating index for suggestions

2017-04-04 Thread Alexis Aravena Silva
Hi, I'm creating an index for suggestions, when I rebuild the index with 8 documents, Solr creates a temp file that consumes over 20GB in the process and It takes more than 10 minutes in reindex, what is the problem?, It's illogic that Solr takes so long and consumes such size of my disk:

Re: Problem starting solr 6.5

2017-04-04 Thread wlee
Thanks. I chmod 777 of the solr directory and I can start solr 6.5 now. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-starting-solr-6-5-tp4328227p4328373.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Fq and termfrequency are not showing the correct results

2017-04-04 Thread Erick Erickson
Functions like termfreq operate on single terms post analysis Since it's an analyzed field you have no _term_ "bachelor's degree" or even "bachelor degree" in the field. You have two terms, "bachelor" and "degree". This also assumes that by "zero results" you mean you get no frequency information

Re:solr learning_to_rank (normalizer) unmatched argument type issue

2017-04-04 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Jianxiong, Thanks for reporting this. I think this is a bug and have filed https://issues.apache.org/jira/browse/SOLR-10421 ticket for fixing it. Regards, Christine - Original Message - From: solr-user@lucene.apache.org To: solr-user@lucene.apache.org At: 03/31/17 23:19:27 Hi,

Problem with multi-valued field using Solr CEL

2017-04-04 Thread Charlie Hubbard
So I'm trying to index documents using Solr CEL and Tika on Solr 5.4.1. I'm using the default configuration, but when I import my docs I'm getting this error: 125973 INFO (qtp840863278-17) [ x:fusearchiver] o.a.s.c.PluginBag Going to create a new requestHandler with {type = requestHandler,name

Re: edismax parsing confusion

2017-04-04 Thread Abhishek Mishra
Hello guys sorry for late response. @steve I am using solr 5.2 . @greg i am using default mm from config file(According to me it is default mm is 1). Regards, Abhishek On Tue, Apr 4, 2017 at 5:27 AM, Greg Pendlebury wrote: > eDismax uses 'mm', so knowing what that

Re: Phrase Fields performance

2017-04-04 Thread David Hastings
FYI, think i managed to get the results back and the speeds that i desired back reducing the number of fields in the qf/pf values from 6 to 4, also making sure to not boost the default field, and reducing the boost values to much smaller numbers but still significant enough to boost properly, so

Re: How to Apply 'implicit' routing in exist collection in solr 6.1.0

2017-04-04 Thread Anshum Gupta
Hi Ketan, I just want to be sure about your understanding of the 'implicit' router. Implicit router in Solr puts the onus of correctly routing the documents on the user, instead of 'implicitly' or automatically routing them. -Anshum On Tue, Apr 4, 2017 at 2:01 AM Ketan Thanki

Re: Using function queries for faceting

2017-04-04 Thread Mikhail Khludnev
Exclude users' products, calculate default price facet, then facet only user's products (in a main query) and sum facet counts. It's probably can be done with switching domains in json facets. On Tue, Apr 4, 2017 at 5:43 PM, Georg Sorst wrote: > Hi Mikhail, > > copying

Re: Using function queries for faceting

2017-04-04 Thread Georg Sorst
Hi Mikhail, copying the default field was my first attempt as well - however, the system in total has over 50.000 users which may have an individual price on every product (even though they usually don't). Still, with the copying approach this results in every document having 50.000 price fields.

RE: Solr 6.x leaking one SolrZkClient instance per second?

2017-04-04 Thread Markus Jelsma
Opened: https://issues.apache.org/jira/browse/SOLR-10420 Thanks, Markus -Original message- > From:Shalin Shekhar Mangar > Sent: Tuesday 4th April 2017 16:11 > To: solr-user@lucene.apache.org > Subject: Re: Solr 6.x leaking one SolrZkClient instance per

Re: Solr 6.x leaking one SolrZkClient instance per second?

2017-04-04 Thread Shalin Shekhar Mangar
Please open a Jira issue. Thanks! On Tue, Apr 4, 2017 at 7:16 PM, Markus Jelsma wrote: > Hi, > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in

Re: Solr Cloud 6.5.0 Replicas go down while indexing

2017-04-04 Thread Shawn Heisey
On 4/3/2017 7:52 AM, Salih Sen wrote: > We have a three server set up with each server having 756G ram, 48 > cores, 4SSDs (each having tree solr instances on them) and a dedicated > mechanical disk for zookeeper (3 zk instances total). Each Solr > instances have 31G of heap space allocated to

Re: Solr Cloud 6.5.0 Replicas go down while indexing

2017-04-04 Thread Michael Joyner
Try Increasing the number of connections your ZooKeeper allows to a very large number. On 04/04/2017 09:02 AM, Salih Sen wrote: Hi, One of the replicas went down again today somehow disabling all updates to cluster with error message "Cannot talk to ZooKeeper - Updates are disabled.” half

Re: How to Apply 'implicit' routing in exist collection in solr 6.1.0

2017-04-04 Thread Shawn Heisey
On 4/4/2017 3:00 AM, Ketan Thanki wrote: > Need the help for how to apply 'implicit' routing in existing > collections. e.g : I have configure the 2 collections with each has 4 > shard and 4 replica so what changes should i do for apply ' implicit' > routing. Make a new collection. Or delete the

Solr 6.x leaking one SolrZkClient instance per second?

2017-04-04 Thread Markus Jelsma
Hi, One of our nodes became berzerk after a restart, Solr went completely nuts! So i opened VisualVM to keep an eye on it and spotted a different problem that occurs in all our Solr 6.4.2 and 6.5.0 nodes. It appears Solr is leaking one SolrZkClient instance per second via

Implementing DIH - Using a non-datetime change tracking column to Identify delta

2017-04-04 Thread subinalex
Hi Experts, Can we use a non-datetime column to identify delta rows in deltaQuery for DIH configuration. Like for example in the below deltaQuery , deltaQuery="select ID from category where last_modified '${dih.last_index_time}'" the delta rows are picked when the last_modified datetime is

Re: Using function queries for faceting

2017-04-04 Thread Mikhail Khludnev
Hello Georg, You can probably use {!frange} and and a few facet.query enumerating price ranges, but probably it's easier to just copy default price across all empty price groups in index time. On Tue, Apr 4, 2017 at 1:14 PM, Georg Sorst wrote: > Hi list! > > My

Re: Solr Cloud 6.5.0 Replicas go down while indexing

2017-04-04 Thread Salih Sen
Hi, One of the replicas went down again today somehow disabling all updates to cluster with error message "Cannot talk to ZooKeeper - Updates are disabled.” half an hour. ZK Leader was on the same server with Solr instance so I doubt it has anything to do with network (at least between Solr and

Re: Problem starting solr 6.5

2017-04-04 Thread Rick Leir
Looks like a file permissions problem to me. On April 3, 2017 10:42:15 PM EDT, wlee wrote: >Try to start solr and get this error message. What is the problem ? > > >$ bin/solr start > >Exception in thread "main" java.nio.file.AccessDeniedException:

Getting counts for JSON facet percentiles

2017-04-04 Thread Georg Sorst
Hi list! Is it possible to get counts for the JSON facet percentiles? Of course I could trivially calculate them myself, they are percentiles after all, but there are cases where these may be off by one such as calculating the 50th percentile / median over 3 results. Thanks and best, Georg

Re: Do streaming expressions support range facets?

2017-04-04 Thread Joel Bernstein
The facet expression, which uses the json facet API, currently does not support range facets. So currently you would have to use the json facet API directly to do range facets. The facet expression will support range facets in the near future though. There is a ticket open which adds date

Using function queries for faceting

2017-04-04 Thread Georg Sorst
Hi list! My documents are eCommerce items. They may have a special price for a certain group of users, but not for other groups of users; in that case the default price should be used. So the documents look like something like this: item: id: 1 price_default: 11.5 price_group1: 11.2 item:

Fwd: Fq and termfrequency are not showing the correct results

2017-04-04 Thread Ayush Gupta
Hi Everyone, I have a document that contains data like this "Bachelor's degree is easier to get" in the 'body' field and I am making a query on this field searching for word 'Bachelor's degree' like this - query?fq=body:"bachelor%27s%

How to Apply 'implicit' routing in exist collection in solr 6.1.0

2017-04-04 Thread Ketan Thanki
Hi, Need the help for how to apply 'implicit' routing in existing collections. e.g : I have configure the 2 collections with each has 4 shard and 4 replica so what changes should i do for apply ' implicit' routing. Please do needful with some examples. Regards, Ketan. [CC Award Winners!]

Re: Solr Cloud 6.5.0 Replicas go down while indexing

2017-04-04 Thread Salih Sen
Hi, Sorry for the initial hurried up mail, here is some correction and further explanation: Problem I described previously was happening before we set zkClientTimeout value so it was 3 when it happened. autoCommit maxTime value is 15000 and autoSoftCommit maxTime is 6. We recently

Re: Problem starting solr 6.5

2017-04-04 Thread Erick Erickson
Looks like a permissions issue. Best, Erick On Mon, Apr 3, 2017 at 7:42 PM, wlee wrote: > Try to start solr and get this error message. What is the problem ? > > > $ bin/solr start > > Exception in thread "main" java.nio.file.AccessDeniedException: >

Re: How to Insert and retrieve data from specific shard in Solr 6.1.0

2017-04-04 Thread Erick Erickson
Why? By it's nature, SolrCloud usually doesn't care about what shard a document came from. Unless you use "implicit" routing, even _you_ don't know what shard the doc landed on. But if you insist, address the request to a particular _replica_ that happens to belong to the shard and add =false to

Re: Problem starting solr 6.5

2017-04-04 Thread Yasufumi Mizoguchi
Hi, I think you should check the permission of /usr/local/solr-6/solr-6.5.0/server/log (maybe, you do not have write permission on the directory) regards, Yasufumi On 2017/04/04 11:42, wlee wrote: Try to start solr and get this error message. What is the problem ? $ bin/solr start

Re: Using Tesseract OCR to extract PDF files in EML file attachment

2017-04-04 Thread Rick Leir
Tesseract prolly knows nothing of the EML format. Your scripts could pull EML's apart. On April 4, 2017 2:00:19 AM EDT, Zheng Lin Edwin Yeo wrote: >Hi, > >Currently, I am able to extract scanned PDF images and index them to >Solr >using Tesseract OCR, although the speed

How to Insert and retrieve data from specific shard in Solr 6.1.0

2017-04-04 Thread Ketan Thanki
Hi, Please help for the below mention query. I need to insert/update data to specific shard and also retrieve the data from specific shard in solr 6.1.0. Pease let me for the configuration/code changes required for the existing solr collections. Regards, [CC Award Winners!]

Using Tesseract OCR to extract PDF files in EML file attachment

2017-04-04 Thread Zheng Lin Edwin Yeo
Hi, Currently, I am able to extract scanned PDF images and index them to Solr using Tesseract OCR, although the speed is very slow. However, for EML files with PDF attachments that consist of scanned images, the Tesseract OCR is not able to extract the text from those PDF attachments. Can we