Re: Solr Issue While indexing Data

2017-07-19 Thread rajat rastogi
hi Shawn , Top out put is as follows regards Rajat -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Issue-While-indexing-Data-tp4339417p4346917.html Sent from the Solr - User

Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread shamik
Hi Koji, I'm using a copy field to preserve the original term with stopword. It's mapped to titleExact. textExact definition:

Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread Koji Sekiguchi
Hi Shamik, I'm sorry but I don't understand why you use KeywordRepeatFilter. I think it's normal to create separate fields to solve this kind of problems. Why don't you have another separate field which has ShingleFilter as I mentioned in the previous reply? Koji On 2017/07/20 12:13, shamik

Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread shamik
Thanks Koji, I've tried KeywordRepeatFilterFactory which keeps the original term, but the Stopword filter in the analysis chain will remove it nonetheless. That's why I thought of creating a separate field devoiding of stopwords/stemmers. Let me know if I'm missing something here. -- View this

Re: Highlighting words with special characters

2017-07-19 Thread Lasitha Wattaladeniya
Hi ahmet, But I have NgramTokenizerFactory at the end of indexing analyzer chain. Therefore I should still tokenize the email address. But how this affects the highlighting?, that's what I'm confused to understand Solr version : 4.10.4 Regards, Lasitha On 20 Jul 2017 08:28, "Ahmet Arslan"

Re: Returning unique values for suggestion

2017-07-19 Thread Zheng Lin Edwin Yeo
I am getting something similar to yours too, but I'm using Solr 6.5.1. "highlighting":{ "1":{ "content":["Incoming Call"]}, "2":{ "content":["Incoming Call"]}, "3":{ "content":["Outgoing Call"]}, "4":{ "content":["Outgoing Call"]},

Re: Get results in multiple orders (multiple boosts)

2017-07-19 Thread Susheel Kumar
Let me try to put an example for custom sort. On Wed, Jul 19, 2017 at 6:34 AM, Rick Leir wrote: > Luca, > You can pass a sort parameter in the query. User A could sort=date%20desc > and user b could sort=foofield%20asc. > > Maybe query functions can also help with this.

Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread Koji Sekiguchi
Hi Shamik, How about using ShingleFilter which constructs token n-grams from a token stream? http://lucene.apache.org/core/6_6_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html As for "about dynamic block", ShingleFilter produces "about dynamic" and "dynamic block".

Re: Highlighting words with special characters

2017-07-19 Thread Ahmet Arslan
Hi, Maybe name of the UAX29URLEMailTokenizer is deceiving you?It does *not* tokenize URLs and Emails. Actually it recognises them and emits them as a single token. Ahmet On Wednesday, July 19, 2017, 12:00:05 PM GMT+3, Lasitha Wattaladeniya wrote: Update, I changed the

DateRangeField and Timezone

2017-07-19 Thread Ulul
Hi everyone I'm trying to query on dates with time zone taken into account. I have the following document {"date" : "2016-12-31T04:15:00Z", "desc" : "winter time day before" } date being of type DateRangeField I would like to be able to perform a query based on local date. For instance the

Re: Copy field a source of copy field

2017-07-19 Thread Erick Erickson
OK, you'll need two fields pretty much for certain. The trick is getting _only_ genus names in the genus field. The simplest thing to do would be a straight copyField with a single keep word filter that contains a list of all the genera. That presupposes that the genera are disjoint sets from all

Issues trying to boost phrase containing stop word

2017-07-19 Thread Shamik Bandopadhyay
Hi, I'm trying to show titles with exact query phrase match at the top of the result. That includes supporting stop words as part of the phrase. For e.g. if I'm using "about dynamic "block" , I expect the title with "About Dynamic Blocks" to appear at the top. Since the title field uses

Re: 'ant test' gets stuck after aborting one run

2017-07-19 Thread Nawab Zada Asad Iqbal
Thanks Erick for the fix. Meanwhile, I had restarted the terminal, then the machine and cloned the repo again and then realized that the problematic status is somewhere else on the drive which I don't know. Nawab On Wed, Jul 19, 2017 at 12:57 PM, Erick Erickson wrote:

Re: 'ant test' gets stuck after aborting one run

2017-07-19 Thread Erick Erickson
This is often an issue with ivy, one of my least favorite "features" of Ivy. To cure it I delete all the *.lck files in my ivy cache. On my mac: cd ~/.ivy2 find . -name "*.lck" | xargs rm Best, Erick On Wed, Jul 19, 2017 at 11:21 AM, Nawab Zada Asad Iqbal wrote: > Hi > > > I

'ant test' gets stuck after aborting one run

2017-07-19 Thread Nawab Zada Asad Iqbal
Hi I stopped 'ant test' target before it finished, and now whenever I run it again, it is stuck at 'install-junit4-taskdef'. I have tried 'ant clean' but it didn't help. I guessed that it could be some locking thing in ivy or ant so I set ivy.sync to false in the common-build.xml "" I also

Re: Copy field a source of copy field

2017-07-19 Thread tstusr
Well, our documents consist on pdf files (between 20 to 200 pages). So, we catch words of all the file, for that, we use the extract handler, that's why we have this fields: We catch species in all the pdf content (On attr_content field) Species captured are used for ranking purposes. So,

Re: Solr Issue While indexing Data

2017-07-19 Thread Shawn Heisey
On 6/7/2017 5:10 AM, rajat.rast...@hindustantimes.com wrote: > My enviorment > > os :Ubuntu 14.04.1 LTS > java : Orcale hotspot 1.8.0_121 > solr version :6.4.2 > cpu :16 cores > ram :124 gb Everybody seems to want different information from you. Here's my contribution: On the linux

Re: regarding cursorMark feature for deep pagination

2017-07-19 Thread suresh pendap
Eric, Thanks!! for the link. -suresh On Wed, Jul 19, 2017 at 8:11 AM, Erick Erickson wrote: > Chris Hostetter has a writeup here that has a good explanation: > > https://lucidworks.com/2013/12/12/coming-soon-to-solr- >

Re: Getting IO Exception while Indexing

2017-07-19 Thread Walter Underwood
A 400 would not be a failure to connect. A 400 means that the client is sending a bad request. Look at the Solr logs. Most likely, the document is invalid. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 19, 2017, at 7:54 AM, Susheel Kumar

Re: regarding cursorMark feature for deep pagination

2017-07-19 Thread Erick Erickson
Chris Hostetter has a writeup here that has a good explanation: https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ Best, Erick On Tue, Jul 18, 2017 at 10:00 PM, suresh pendap wrote: > Hi, > > This question is

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Erick Erickson
also jstack can give you a full stack trace On Wed, Jul 19, 2017 at 5:47 AM, Markus Jelsma wrote: > Oh of course, didn't think about it. Will do next time this happens (which > might take a few weeks since we purged the index). > > It could be merging indeed, but

Re: Getting IO Exception while Indexing

2017-07-19 Thread Susheel Kumar
Is that always the problem with those documents or is it random. If it is the same documents always, look what is different in those docs. Usually i have seen these errors once in a while when SolrJ unable to connect/communicate with Solr. On Wed, Jul 19, 2017 at 10:23 AM, subbarao

Getting IO Exception while Indexing

2017-07-19 Thread subbarao
Hi all, we have solr cloud setup with 2 shards. In this we trying to index documents by taking a json format input, and creating a SolrDocument. and pushing to solr through solrJ. Then it is throwing exception saying *"SolrUpdate got error: IOException occured when talking to server" * and

Re: Returning unique values for suggestion

2017-07-19 Thread Walter Underwood
I was surprised to see duplicate suggestions coming from my 4.10.4 suggester. This is analyzing infix with terms loaded from the index. "titles_infix": { "chemistry": { "numFound": 10, "suggestions": [ { "term": "Chemistry", "weight": 5285, "payload": "" }, { "term": "Chemistry", "weight": 4548,

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Oh of course, didn't think about it. Will do next time this happens (which might take a few weeks since we purged the index). It could be merging indeed, but i don't understand why the scheduler would wait so long, should it not schedule the same when running a long time vs. a fresh start?

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Mikhail Khludnev
You can get stack from kill -3 jstack even from solradmin. Overall, this behavior looks like typical heavy merge kicking off from time to time. On Wed, Jul 19, 2017 at 3:31 PM, Markus Jelsma wrote: > Hello, > > No i cannot expose the stack, VisualVM samples won't

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Hello, No i cannot expose the stack, VisualVM samples won't show it to me. I am not sure if they're about to sync all the time, but every 15 minutes some documents are indexed (3 - 4k). For some reason, index time does increase with latency / CPU usage. This situation runs fine for many

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Mikhail Khludnev
> > The real distinction between busy and calm nodes is that busy nodes all > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as > second to fillBuffer(), what are they doing? Can you expose the stack deeper? Can they start to sync shards due to some reason? On Wed, Jul

Re: Solr Issue While indexing Data

2017-07-19 Thread Susheel Kumar
What is you current a) softcommit and hardcommit settings. you can share as it is from config and how are you committing then? b) how much is heap out of 124gb c) how many documents are you adding that is taking long and approx how many fields including copy fields? Thnx On Wed, Jul 19, 2017 at

Re: Solr Issue While indexing Data

2017-07-19 Thread rajat rastogi
Hi Erik, Some Logs of solr 2017-07-19 08:14:09.104 INFO (qtp434091818-6937) [ x:cda] o.a.s.u.p.LogUpdateProcessorFactory [cda] webapp=/solr path=/update params={wt=javabin=2}{add=[54945918f81f4e218b994b75]} 0 24362730 2017-07-19 08:14:09.181 DEBUG (qtp434091818-10997) [ x:cda6m]

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Hello, Not too much actually: avg-cpu:  %user   %nice %system %iowait  %steal   %idle   10.55    0.00    0.25    0.03    0.95   88.22 Device:    tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda   3.26    78.34   218.67  188942841  527408404 These are

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Rick Leir
Markus, What does iostat(1) tell you? Cheers -- Rick On July 19, 2017 5:35:32 AM EDT, Markus Jelsma wrote: >Hello, > >Another peculiarity here, our six node (2 shards / 3 replica's) cluster >is going crazy after a good part of the day has passed. It starts >eating

Re: Get results in multiple orders (multiple boosts)

2017-07-19 Thread Rick Leir
Luca, You can pass a sort parameter in the query. User A could sort=date%20desc and user b could sort=foofield%20asc. Maybe query functions can also help with this. Cheers -- Rick On July 19, 2017 4:39:59 AM EDT, Luca Dall'Osto wrote: >Hello,The problem of

Returning unique values for suggestion

2017-07-19 Thread Zheng Lin Edwin Yeo
Hi, Is there any configuration that we can set for the /suggest handler, so that the suggestion output will only return unique records, and not duplicated? Below is my /suggest handler. all json true content 100 id, score on content true false html 100 204800 true

6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Hello, Another peculiarity here, our six node (2 shards / 3 replica's) cluster is going crazy after a good part of the day has passed. It starts eating CPU for no good reason and its latency goes up. Grafana graphs show the problem really well After restarting 2/6 nodes, there is also quite a

Re: Solr Issue While indexing Data

2017-07-19 Thread rajat rastogi
Hi Eric , Thanks for your Reply. I tried the solution given , but it did not work. Please help me to narrow down the problem. Please let me know if any more inputs are required from my end viz schema, configs etc. Can this problem be related to GC ? regards Rajat -- View this message in

Re: Highlighting words with special characters

2017-07-19 Thread Lasitha Wattaladeniya
Update, I changed the UAX29URLEmailTokenizerFactory to StandardTokenizerFactory and now it shows highlighted text fragments in the indexed email text. But I don't understand this behavior. Can someone shed some light please On 18 Jul 2017 14:18, "Lasitha Wattaladeniya"

Re: Get results in multiple orders (multiple boosts)

2017-07-19 Thread Luca Dall'Osto
Hello,The problem of build an index is that each user has a custom source order and category order: are not static orders (for example user X could have category:5 as most important category but user Y could have category:9 as most important). Has anyone ever written a custom sort function in