RE: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Markus Jelsma
Hi - you need to use the CursorMark feature for larger sets: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results M. -Original message- > From:Ajinkya Kale > Sent: Monday 28th September 2015 20:46 > To: solr-user@lucene.apache.org;

Re: Cost of having multiple search handlers?

2015-09-28 Thread Gili Nachum
A different solution to the same need: I'm measuring response times of different collections measuring online/batch queries apart using New Relic. I've added a servlet filter that analyses the request and makes this info available to new relic over a request argument. The built in new relic solr

Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Ravi Solr
Gili I was constantly checking the cloud admin UI and it always stayed Green, that is why I initially overlooked sync issues...finally when all options dried out I went individually to each node and quieried and that is when i found the out of sync issue. The way I resolved my issue was shut down

Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Ajinkya Kale
Hi, I am trying to retrieve all the documents from a solr index in a batched manner. I have 100M documents. I am retrieving them using the method proposed here https://nowontap.wordpress.com/2014/04/04/solr-exporting-an-index-to-an-external-file/ I am dumping 10M document splits in each file. I

Re: Cost of having multiple search handlers?

2015-09-28 Thread Walter Underwood
We did the same thing, but reporting performance metrics to Graphite. But we won’t be able to add servlet filters in 6.x, because it won’t be a webapp. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 28, 2015, at 11:32 AM, Gili Nachum

error reporting during indexing

2015-09-28 Thread Matteo Grolla
Hi, if I need fine grained error reporting I use Http Solr server and send 1 doc per request using the add method. I report errors on exceptions of the add method, I'm using autocommit so I'm not seing errors related to commit. Am I loosing some errors? Is there a better way? Thanks

Re: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Gili Nachum
If you can't use CursorMark, then I suggest not using the start parameter, instead sort asc by a unique field and and range the query to records with a field value larger then the last doc you read. Then set rows to be whatever you found can fit in memory. On Mon, Sep 28, 2015 at 10:59 PM,

Re: More Like This on numeric fields - BF accepted by MLT handler

2015-09-28 Thread Upayavira
You could use the MLT query parser, and combine that with other queries, whether as filters or boosts. You can't yet use stream.body yet, so would need to use the handler if you need that. Upayavira On Mon, Sep 28, 2015, at 09:53 AM, Alessandro Benedetti wrote: > Hi Upaya, > thanks for the

Re: Cost of having multiple search handlers?

2015-09-28 Thread Upayavira
I would expect this to be negligible. Upayavira On Mon, Sep 28, 2015, at 01:30 PM, Oliver Schrenk wrote: > Hi, > > I want to register multiple but identical search handler to have multiple > buckets to measure performance for our different apis and consumers (and > to find out who is actually

Re: PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Alessandro Benedetti
>From the Solr wiki, the default facet.limit should be 100 ! Anyway I find the way field facet is shown for field path hierarchy token filtered fields, to be not so user friendly. Ideally for those fields we should show a facet representation similar to facet pivot. Should be nice to think an idea

entity processing order during updates

2015-09-28 Thread Roxana Danger
Hello, I am importing in solr 2 entities coming from 2 different tables, and I have defined an update request processor chain with two custom processor factories: - the first processor factory needs to be executed first for one type of entities and then for the other (I differentiate the

Re: Cost of having multiple search handlers?

2015-09-28 Thread Shawn Heisey
On 9/28/2015 6:30 AM, Oliver Schrenk wrote: > I want to register multiple but identical search handler to have multiple > buckets to measure performance for our different apis and consumers (and to > find out who is actually using Solr). > > What are there some costs associated with having

Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Gili Nachum
Were all of shard replica in active state (green color in admin ui) before starting? Sounds like it otherwise you won't hit the replica that is out of sync. Replicas can get out of sync, and report being in sync after a sequence of stop start w/o a chance to complete sync. See if it might have

Keyword match distance rule issue

2015-09-28 Thread anil.vadhavane
Hello, I'm using Lucene Solr 4.10.4 for Keyword match functionality. I found some issues with distance rule. I have added search keyword with distance 2 "Bridgewater~2". When I make search it did not return "bridwater" in results which should be. If I change placing of 'ge' at any other place

[ANNOUNCE] Luke 5.3.0 released

2015-09-28 Thread Dmitry Kan
This is a major release supporting lucene / solr 5.3.0. Download the zip here: https://github.com/DmitryKey/luke/releases/tag/luke-5.3.0 This release runs on Java8 and does not run on Java7. The release includes a number of pull requests and github issues. Worth mentioning:

Re: PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Upayavira
There is also facet.limit which says how many facet entries to return. Is that catching you? The document either matches your query, or doesn't. If it does, then all values of the Parameter field should be included in your faceting. But, perhaps not all facet buckets are being returned to you -

Re: What kind of nutch documents does Solr index?

2015-09-28 Thread Upayavira
I suspect you may be better off asking this on the Nutch user list. The decisions you are describing will be within the Nutch codebase, not Solr. Someone here may know (hopefully) but you may get more support over on the Nutch list. One suggestion -start with a clean, empty index. Run a crawl.

Cost of having multiple search handlers?

2015-09-28 Thread Oliver Schrenk
Hi, I want to register multiple but identical search handler to have multiple buckets to measure performance for our different apis and consumers (and to find out who is actually using Solr). What are there some costs associated with having multiple search handlers? Are they neglible?

What kind of nutch documents does Solr index?

2015-09-28 Thread Daniel Holmes
Hi, I am using apache Nutch 1.7 to crawl and apache Solr 4.7.2 for indexing. In my tests there is a gap between number of fetched results of Nutch and number of indexed documents in Solr. For example one of the crawls is fetched 23343 pages and 1146 images successfully while in the Solr 19250 docs

PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Moen Endre
How does facet_count work with a facet field that is defined as solr. PathHierarchyTokenizerFactory? I have multiple records that contains field Parameter which is of type PathHierarchyTokenizerFactory. E.g "Parameter": [ "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE>WATER

Re: More Like This on numeric fields - BF accepted by MLT handler

2015-09-28 Thread Alessandro Benedetti
Hi Upaya, thanks for the explanation, I actually already did some investigations about it ( my first foundation was : http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ ) and then I took a look to the code. Was just wondering what the community was thinking about

Re: Keyword match distance rule issue

2015-09-28 Thread Alessandro Benedetti
Maybe it's a silly observation... But are you lowercasing at indexing/querying time ? Can you show us the schema analysis config for the field type you use ? Because strictly talking about Levenshtein distance bridwater is 3 edits from Bridgewater. Cheers 2015-09-28 8:26 GMT+01:00 anil.vadhavane

Re: position of the search term

2015-09-28 Thread Alessandro Benedetti
So, based on my knowledge, it is not possible ( except if you customise the component) . Read here : http://lucene.472066.n3.nabble.com/How-do-I-recover-the-position-and-offset-a-highlight-for-solr-4-1-4-2-td4051763.html Another data structure that you can think as useful is to store the Term

Re: firstSearcher cache warming with own QuerySenderListener

2015-09-28 Thread Christian Reuschling
Erick, Walter and all, as I wrote, I am aware of the firstSearcher event, we tried it manually before we choosed to enhance the QuerySenderListener. I think our usage scenario (I didn't wrote about it for simplicity) is a bit different from yours, what makes this necessary. We are implementing

Re: query parsing

2015-09-28 Thread Alessandro Benedetti
happy to read that, regarding the spellcheck, is a different thing, so let us know for further details ! Cheers 2015-09-27 18:59 GMT+01:00 Mark Fenbers : > I am delighted to announce that I have it all working again! Well, not > all, just the searching! > > I deleted my

RE: New Project setup too clunky

2015-09-28 Thread Duck Geraint (ext) GBJH
Huh, strange - I didn't even notice that you could create cores through the UI. I suppose it depends what order you read and infer from the documentation. See "Create a Core": https://cwiki.apache.org/confluence/display/solr/Running+Solr I followed the "solr create -help" option to work out how

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-28 Thread Toke Eskildsen
On Sun, 2015-09-27 at 14:47 +0200, Uwe Reh wrote: > Like Walter Underwood wrote, in technical sense faceting on authors > isn't a good idea. In a technical sense, there is no good or bad about faceting on high-cardinality fields in Solr. The faceting code is fairly efficient (modulo the newly

CloudSolrClient timeout settingsr

2015-09-28 Thread Arcadius Ahouansou
CloudSolrClient has zkClientTimeout/zkConnectTimeout for access to zookeeper. It would be handy to also have the possibility to set something like soTimeout/connectTimeout for accessing the solr nodes similarly to the old non-cloud client. Currently, in order to set a timeout for the client

Re: Cost of having multiple search handlers?

2015-09-28 Thread Jeff Wartes
One would hope that https://issues.apache.org/jira/browse/SOLR-4735 will be done by then. On 9/28/15, 11:39 AM, "Walter Underwood" wrote: >We did the same thing, but reporting performance metrics to Graphite. > >But we won’t be able to add servlet filters in 6.x,

Re: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Ajinkya Kale
If I am not wrong this works only with Solr version > 4.7.0 ? On Mon, Sep 28, 2015 at 12:23 PM Markus Jelsma wrote: > Hi - you need to use the CursorMark feature for larger sets: > https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results > M. > > > >

Re: error reporting during indexing

2015-09-28 Thread Erick Erickson
You shouldn't be losing errors with HttpSolrServer. Are you seeing evidence that you are or is this mostly a curiosity question? Do not it's better to batch up docs, your throughput will increase a LOT. That said, when you do batch (e.g. send 500 docs per update or whatever) and you get an error

RE: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread will martin
http://opensourceconnections.com/blog/2014/07/13/reindexing-collections-with-solrs-cursor-support/ -Original Message- From: Ajinkya Kale [mailto:kaleajin...@gmail.com] Sent: Monday, September 28, 2015 2:46 PM To: solr-user@lucene.apache.org; java-u...@lucene.apache.org Subject: Solr

Re: CloudSolrClient timeout settingsr

2015-09-28 Thread Shawn Heisey
On 9/28/2015 4:04 PM, Arcadius Ahouansou wrote: > CloudSolrClient has zkClientTimeout/zkConnectTimeout for access to > zookeeper. > > It would be handy to also have the possibility to set something like > soTimeout/connectTimeout for accessing the solr nodes similarly to the old > non-cloud

Re: Cost of having multiple search handlers?

2015-09-28 Thread Walter Underwood
We built our own because there was no movement on that. Don’t hold your breath. Glad to contribute it. We’ve been running it in production for a year, but the config is pretty manual. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 28, 2015,

RE: String index out of range exception from Spell check

2015-09-28 Thread Dyer, James
This looks similar to SOLR-4489, which is marked fixed for version 4.5. If you're using an older version, the fix is to upgrade. Also see SOLR-3608, which is similar but here it seems as if the user's query is more than spellcheck was designed to handle. This should still be looked at and

highlighting

2015-09-28 Thread Mark Fenbers
Greetings! I have highlighting turned on in my Solr searches, but what I get back is tags surrounding the found term. Since I use a SWT StyledText widget to display my search results, what I really want is the offset and length of each found term, so that I can highlight it in my own way

Passing Basic Auth info to HttpSolrClient

2015-09-28 Thread Steven White
Hi, I'm using HttpSolrClient to connect to Solr. Everything works until when I enabled basic authentication in Jetty. My question is, how do I pass to SolrJ the basic auth info. so that I don't get a 401 error? Thanks in advance Steve

RE: PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Moen Endre
Yes, that solved my problem. There must be an implisite facet.limit set because I tried the same url query with face.limit=1. And got back records with "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC" Cheers! Endre -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: 28.