Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Erick as per your advise I used cursorMarks (see code below). It was slightly better but Solr throws Exceptions randomly. Please look at the code and Stacktrace below 2015-09-26 01:00:45 INFO [a.b.c.AdhocCorrectUUID] - Indexed 500/1453133 2015-09-26 01:00:49 INFO [a.b.c.AdhocCorrectUUID] -

Re: firstSearcher cache warming with own QuerySenderListener

2015-09-25 Thread Erick Erickson
That's what the firstSearcher event in solrconfig.xml is for, exactly the case of autowarming Solr when it's just been started. The queries you put in that event are fired only when the server starts. So I'd just put my queries there. And you do not have to put a zillion queries here. Start with

Expensive GC Remark Phase for JNI Weak Reference

2015-09-25 Thread Keith L
Using: - JDK 1.8u40 - UseG1GC, ParallelRefProcEnabled, Xmx12g,Xms12g - Solr 4.10.4 When using G1GC we are seeing very high processing times in the GC Remark phase during reference processing. Originally we saw high times during WeakReference processing but adding"-XX:+ParallelRefProcEnabled"

Re: How to know index file in OS Cache

2015-09-25 Thread Gili Nachum
Gonna try Mikhail suggestion, but just for fun you can also empirically "test" for how much of a file is in the oshr...@matrix.co.il cache with: time cat > /dev/null The faster it completes the more blocks are cached you can take a baseline after manually purging of cache - don't recall the

Re: Help for Highlights

2015-09-25 Thread Erick Erickson
You're only returning the "submissaoid" and "tituloprojeto" fields (along with score), and dismax is probably searching across other fields (I can't tell from the fragment, it'll be the parameters set up in solrconfig.xml, the select handler). Add =all to the query and you'll see all the fields

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-25 Thread Alessandro Benedetti
Clear ! Now I understand the current situation. Hope the issue will be fixed soon and the conference is recorded, good luck! Cheers 2015-09-25 15:22 GMT+01:00 Yonik Seeley : > On Fri, Sep 25, 2015 at 5:07 AM, Alessandro Benedetti > wrote: > >

Re: Help on autocomplete / suggester

2015-09-25 Thread Alessandro Benedetti
Hi Andrea, really curious I found the province where I was born in the Solr mailing list :) Apart that , based on your requirements, it's not possible to use any suggester. You should definitely design a new Solr collection(core) for your requirements. Would be quite easy to provide those

firstSearcher cache warming with own QuerySenderListener

2015-09-25 Thread Christian Reuschling
Hey all, we want to avoid cold start performance issues when the caches are cleared after a server restart. For this, we have written a SearchComponent that saves least recently used queries. These are written to a file inside a closeHook of a SolrCoreAware at server shutdown. The plan is to

Re: Different ports for search and upload request

2015-09-25 Thread Uwe Reh
Am 25.09.2015 um 00:05 schrieb Siddhartha Singh Sandhu: *Never did this. *But how about this crazy idea: Take an Amazon EFS and share it between 2 EC2. I think, you are on the right way. Imho this requirement should be solved external. Option 1: Hide your Solr node behind a http-proxy

Using a plugin to filter in schema.xml

2015-09-25 Thread Siddhartha Singh Sandhu
Hi, I wanted to use the twitter-text libraries github implementation to filter the tokens(hashtags) in my text. I know I can use the Pattern Matching tokenizer also, but would trust twitter's library more then my own regex to do the job for me. I wanted to use it in unison with the

Re: recovering mode loop

2015-09-25 Thread Erick Erickson
On a quick look at the replica jstack (the leader didn't come through in text form) there's nothing that jumps out. I _have_ seen lots and lots of updates coming through one at a time do some weird things with replicas going in and out of recovery, so that's a good intuition to follow up on.

Re: Help for Highlights

2015-09-25 Thread Erick Erickson
Glad to help! Erick 2015-09-25 10:05 GMT-07:00 Leandro Henrique : > Hello Erick, > > Very, very, very thanks! The highlights "null" was fields with stored > parameter setted to "false". > > Thanks again! > > Leandro. > > > Date: Fri, 25 Sep 2015 09:14:16 -0700 > >

Re: How to know index file in OS Cache

2015-09-25 Thread Jeff Wartes
I’ve been relying on this: https://code.google.com/archive/p/linux-ftools/ fincore will tell you what percentage of a given file is in cache, and fadvise can suggest to the OS that a file be cached. All of the solr start scripts at my company first call fadvise (FADV_WILLNEED) on all the

Re: [Open source] SolrCloud High Availability (HAFT) Library - Bloomreach

2015-09-25 Thread Shawn Heisey
On 9/25/2015 11:30 AM, Nitin Sharma wrote: > It would be great if we can link this in the solrcloud contributions wiki. > Can you give us access to that? Just create an account on the wiki, tell us the username, and you'll be added quickly to the group that allows editing.

Re: Help on autocomplete / suggester

2015-09-25 Thread Andrea Gazzarini
Hi Alessandro, Yes, I read a lot of posts from you about the Suggester component, including your blog, so the province name was just to catch your attention :D...just kidding, I'm living there. Many many thanks Best, Andrea On 25 Sep 2015 17:12, "Alessandro Benedetti"

bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
I have been trying to re-index the docs (about 1.5 million) as one of the field needed part of string value removed (accidentally introduced). I was issuing a query for 100 docs getting 4 fields and updating the doc (atomic update with "set") via the CloudSolrClient in batches, However from time

Re: [Open source] SolrCloud High Availability (HAFT) Library - Bloomreach

2015-09-25 Thread Shawn Heisey
On 9/25/2015 12:00 PM, Nitin Sharma wrote: > My user name is nitin.sharma. Does this give edit access to the > confluence page as well? You are added as a contributor on the Solr wiki. Only Apache committers for the Solr project have access to edit the confluence wiki. This is because the

Re: Using a plugin to filter in schema.xml

2015-09-25 Thread Siddhartha Singh Sandhu
I need a go to for writing the custom tokenizer. any suggestions? On Fri, Sep 25, 2015 at 2:36 PM, Siddhartha Singh Sandhu < sandhus...@gmail.com> wrote: > For sure. > > On Fri, Sep 25, 2015 at 1:13 PM, Alexandre Rafalovitch > wrote: > >> I think (I lost the library link)

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Walter Underwood
Sure. 1. Delete all the docs (no commit). 2. Add all the docs (no commit). 3. Commit. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 25, 2015, at 2:17 PM, Ravi Solr wrote: > > I have been trying to re-index the docs

Re: firstSearcher cache warming with own QuerySenderListener

2015-09-25 Thread Walter Underwood
Right. I chose the twenty most frequent terms from our documents and use those for cache warming. The list of most frequent terms is pretty stable in most collections. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 25, 2015, at 8:38 AM,

Re: [Open source] SolrCloud High Availability (HAFT) Library - Bloomreach

2015-09-25 Thread Nitin Sharma
Hi Shawn, My user name is nitin.sharma. Does this give edit access to the confluence page as well? Thanks, Nitin On Fri, Sep 25, 2015 at 10:44 AM, Shawn Heisey wrote: > On 9/25/2015 11:30 AM, Nitin Sharma wrote: > > It would be great if we can link this in the solrcloud

Re: Using a plugin to filter in schema.xml

2015-09-25 Thread Siddhartha Singh Sandhu
For sure. On Fri, Sep 25, 2015 at 1:13 PM, Alexandre Rafalovitch wrote: > I think (I lost the library link) you would need to build a bridge by > doing a custom Analyzer or Tokenizer and then using the library under > the covers. Would be a nice contribution to open-source

RE: Help for Highlights

2015-09-25 Thread Leandro Henrique
Hello Erick, Very, very, very thanks! The highlights "null" was fields with stored parameter setted to "false". Thanks again! Leandro. > Date: Fri, 25 Sep 2015 09:14:16 -0700 > Subject: Re: Help for Highlights > From: erickerick...@gmail.com > To: solr-user@lucene.apache.org > > You're only

Re: Expensive GC Remark Phase for JNI Weak Reference

2015-09-25 Thread Shawn Heisey
On 9/25/2015 8:53 AM, Keith L wrote: > Using: > - JDK 1.8u40 > - UseG1GC, ParallelRefProcEnabled, Xmx12g,Xms12g > - Solr 4.10.4 My own testing has not been extremely rigorous, and I have not spent a lot of time looking at the fine details in the GC logs. The details of your message that I

Dataimporthandler sql query don't run

2015-09-25 Thread Jens Mayer
Hello everyone, I need to run the following query to import my index from a H2 database: but if I start to full-import nothing happens. The last information from my log file is the following: [25.09.2015 20:06:24.418 INFO  commitScheduler-11-thread-1 o.a.s.u.DirectUpdateHandler2.commit:548]

Re: Using a plugin to filter in schema.xml

2015-09-25 Thread Alexandre Rafalovitch
I think (I lost the library link) you would need to build a bridge by doing a custom Analyzer or Tokenizer and then using the library under the covers. Would be a nice contribution to open-source if you managed to achieve that. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and

[Open source] SolrCloud High Availability (HAFT) Library - Bloomreach

2015-09-25 Thread Nitin Sharma
Hi all, We are glad to announce that we have open sourced the SolrCloud HAFT library under the Apache License Version 2.0. HAFT is a High Availability and Fault Tolerant framework for solrcloud. It was built from the ground up at Bloomreach to

Re: How to know index file in OS Cache

2015-09-25 Thread Edward Ribeiro
You can use pcstat ( https://github.com/tobert/pcstat ) to get page cache statistics for files. I have used this app in the past to see which and how much Lucene indexes were on Linux page cache. Edward On Fri, Sep 25, 2015 at 2:22 PM, Jeff Wartes wrote: > > > I’ve

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Walter Underwood
Sorry, I did not mean to be rude. The original question did not say that you don’t have the docs outside of Solr. Some people jump to the advanced features and miss the simple ones. It might be faster to fetch all the docs from Solr and save them in files. Then modify them. Then reload all of

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
No problem Walter, it's all fun. Was just wondering if there was some other good way that I did not know of, that's all  Thanks Ravi Kiran Bhaskar On Friday, September 25, 2015, Walter Underwood wrote: > Sorry, I did not mean to be rude. The original question did not

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Erick Erickson
How are you querying Solr? You say you query for 100 docs, update then get the next set. What are you using for a marker? If you're using the start parameter, and somehow a commit is creeping in things might be weird, especially if you're using any of the internal Lucene doc IDs. If you're

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Walter, Not in a mood for banter right now Its 6:00pm on a friday and Iam stuck here trying to figure reindexing issues :-) I dont have source of docs so I have to query the SOLR, modify and put it back and that is seeming to be quite a task in 5.3.0, I did reindex several times with 4.7.2 in

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Thanks for responding Erick. I set the "start" to zero and "rows" always to 100. I create CloudSolrClient instance and use it to both query as well as index. But I do sleep for 5 secs just to allow for any auto commits. So query --> client.add(100 docs) --> wait --> query again But the weird

Re: [Open source] SolrCloud High Availability (HAFT) Library - Bloomreach

2015-09-25 Thread Nitin Sharma
Thanks. On Fri, Sep 25, 2015 at 2:35 PM, Shawn Heisey wrote: > On 9/25/2015 12:00 PM, Nitin Sharma wrote: > > My user name is nitin.sharma. Does this give edit access to the > > confluence page as well? > > You are added as a contributor on the Solr wiki. > > Only Apache

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Erick Erickson
Wait, query again how? You've got to have something that keeps you from getting the same 100 docs back so you have to be sorting somehow. Or you have a high water mark. Or something. Waiting 5 seconds for any commit also doesn't really make sense to me. I mean how do you know 1> that you're going

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
thank you for taking time to help me out. Yes I was not using cursorMark, I will try that next. This is what I was doing, its a bit shabby coding but what can I say my brain was fried :-) FYI this is a side process just to correct a messed up string. The actual indexing process was working all the

RE: How to know index file in OS Cache

2015-09-25 Thread Markus Jelsma
Hello - as far as i remember, you don't. A file itself is not the unit to cache, but blocks are. Markus -Original message- > From:Aman Tandon > Sent: Friday 25th September 2015 5:56 > To: solr-user@lucene.apache.org > Subject: How to know index file in OS

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

2015-09-25 Thread Charlie Hull
On 23/09/2015 16:23, Alexandre Rafalovitch wrote: You may find the following articles interesting: http://discovery-grindstone.blogspot.ca/2014/01/searching-in-solr-analyzing-results-and.html ( a whole epic journey) https://dzone.com/articles/indexing-chinese-solr The latter article is great

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-25 Thread Alessandro Benedetti
There is an undocumented "method" parameter - I need to enable that to > allow switching between the docvalues approach and the UnInvertedField > approach. > Only to clarify, please correct me Yonik if my understanding is wrong or outdated : To calculate facets, without going into the

Re: How to know index file in OS Cache

2015-09-25 Thread Aman Tandon
okay thanks Markus :) With Regards Aman Tandon On Fri, Sep 25, 2015 at 12:27 PM, Markus Jelsma wrote: > Hello - as far as i remember, you don't. A file itself is not the unit to > cache, but blocks are. > Markus > > > -Original message- > > From:Aman Tandon

Re: recovering mode loop

2015-09-25 Thread Lorenzo Fundaró
I think the attachment was stripped off from the mail :( . here's a public link. https://drive.google.com/file/d/0B_z8xmsby0uxRDZEeWpLcnR2b3M/view?usp=sharing On 25 September 2015 at 09:59, Lorenzo Fundaró < lorenzo.fund...@dawandamail.com> wrote: > This is the last logs i've got, even with a

Re: More Like This on numeric fields - BF accepted by MLT handler

2015-09-25 Thread Upayavira
Alessandro, I'd suggest you review the code of the MoreLikeThisHandler. It is a little knotty, but it would be worth your while understanding what is going on there. Basically, there are three phases: phase #1: parse the source document into a list of terms (avoided if term vectors enabled and

Re: Help on autocomplete / suggester

2015-09-25 Thread Andrea Gazzarini
Sorry, in the first point I meant "prefix_search" Best, Andrea On 09/24/2015 11:18 AM, Andrea Gazzarini wrote: Hi guys, as part of a customer requirement, I need to provide an autocomplete / suggester feature. For that reason I started looking at the Suggester Component. The target Solr

Re: Autowarm and filtercache invalidation

2015-09-25 Thread Shawn Heisey
On 9/24/2015 3:11 PM, Jeff Wartes wrote: > Answering my own question: Looks like the default filterCache regenerator > uses the old cache to re-executes queries in the context of the new > searcher and does nothing with the old cache value. > > So, the new searcher’s cache contents will be

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-25 Thread Uwe Reh
Am 25.09.2015 um 05:16 schrieb Yonik Seeley: I did some performance benchmarks and opened an issue. It's bad. https://issues.apache.org/jira/browse/SOLR-8096 Hi Yonik, thanks a lot for your investigation. Using the JSON Facet API is fast and seems to be a usable workaround for new

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

2015-09-25 Thread Zheng Lin Edwin Yeo
Hi Charlie, Thanks for your comment. I faced the compatibility issues with Paoding when I tried it in Solr 5.1.0 and Solr 5.2.1, and I found out that the code was optimised for Solr 3.6. Which version of Solr are you using when you tried on the Paoding? Regards, Edwin On 25 September 2015 at

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

2015-09-25 Thread Charlie Hull
On 25/09/2015 11:43, Zheng Lin Edwin Yeo wrote: Hi Charlie, Thanks for your comment. I faced the compatibility issues with Paoding when I tried it in Solr 5.1.0 and Solr 5.2.1, and I found out that the code was optimised for Solr 3.6. Which version of Solr are you using when you tried on the

Re: Different ports for search and upload request

2015-09-25 Thread Alexandre Rafalovitch
How about you do indexing on a completely different node and then swap the index into production using Solr aggregate aliases? https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection The problem here is that deleting existing content is

Re: How to know index file in OS Cache

2015-09-25 Thread Aman Tandon
Awesome thank you Mikhail. This is what I was looking for. This was just a random question poped up in my mind. So I just asked this on the group. With Regards Aman Tandon On Fri, Sep 25, 2015 at 2:49 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > What about Linux: > $less

More Like This on numeric fields - BF accepted by MLT handler

2015-09-25 Thread Alessandro Benedetti
Hi guys, was just investigating a little bit in how to include numeric fields in the MLT calculations. As we know, we are currently building a smart lucene query based on the document in input ( the one to search for similar ones) and run this query to obtain the similar docs. Because the MLT is

Re: How to know index file in OS Cache

2015-09-25 Thread Mikhail Khludnev
What about Linux: $less /proc//maps $pmap On Fri, Sep 25, 2015 at 10:57 AM, Markus Jelsma wrote: > Hello - as far as i remember, you don't. A file itself is not the unit to > cache, but blocks are. > Markus > > > -Original message- > > From:Aman Tandon

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-25 Thread Yonik Seeley
On Fri, Sep 25, 2015 at 6:33 AM, Uwe Reh wrote: > Am 25.09.2015 um 05:16 schrieb Yonik Seeley: >> >> I did some performance benchmarks and opened an issue. It's bad. >> https://issues.apache.org/jira/browse/SOLR-8096 > > > Hi Yonik, > thanks a lot for your

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-25 Thread Yonik Seeley
On Fri, Sep 25, 2015 at 5:07 AM, Alessandro Benedetti wrote: >There is an undocumented "method" parameter - I need to enable that to > >> allow switching between the docvalues approach and the UnInvertedField >> approach. >> > > Only to clarify, please correct me

Help for Highlights

2015-09-25 Thread Leandro Henrique
Dear Colleagues of Solr-list, I am using the Solr 5.0 on my work to index textual base of approximately 3500 documents. The documents are stored in XML files. Almost everything is right and functioning normally ... unless the highlight functionality. This feature is not working well! After a