Re: Partial Counts in SOLR

2014-03-17 Thread Salman Akram
Below is one of the sample slow query that takes mins! ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or purchase* or repurchase*)) w/10 (executive or director) If a filter is used it comes in fq but what can be done about plain keyword search? On Sun, Mar 16, 2014 at 4:37

SolrCloud - CompositeId Document Routing Problem

2014-03-17 Thread saurish
Hi,I am testing the solrcloud's compositeId routing but failing to get documents pertaining to a route. PFB the steps for the same. pls point where i am making mistake in the configuration or let me know if i have to do something more I'm using zookeeper 3.4.5 and two tomcat 7 server

Re: PROBLEM SOLRJ

2014-03-17 Thread Ángel Miralles
Thanks both of us. There was a problem with the server URL like Greg had said. ;) El 14/03/14 16:20, Furkan KAMACI escribió: Hi; There is another issue. It seems like you are using SolrCloud. If so check here: https://wiki.apache.org/solr/Solrj#Using_with_SolrCloud Thanks; Furkan KAMACI

Does CachedSqlEntityProcessor works?

2014-03-17 Thread manju16832003
I tried to use *CachedSqlEntityProcessor* in DataImportHandler with Sub-entity query. It does not seems to be working. Here is my query entity name=listing dataSource=mysql query=SELECT id,make, model FROM LISTING entity name=account dataSource=mssql query=SELECT name,email FROM CUSTOMER WHERE

solr velocity not loading

2014-03-17 Thread Umapathy S
I am running solr 4.6.1. Trying to get velocity running, but it throws class not found error. solrconfig.xml already has the VelocityResponseWriter added. The Exception stack trace (snipped) {msg=lazy loading error,trace=org.apache.solr.common.SolrException: lazy loading error at

Re: solr velocity not loading

2014-03-17 Thread Ahmet Arslan
Hi instanced is collection1 in your case. Putting jars under /solr/collection1/lib should work. What jar files did you put there? Did you include  solr-velocity-4.6.1.jar commons-beanutils-1.7.0.jar commons-collections-3.2.1.jar velocity-1.7.jar velocity-tools-2.0.jar On Monday, March 17,

How to return more fields on Solr 4.5.1 Suggester?

2014-03-17 Thread omer sonmez
I am using Solr 4.5.1 to suggest movies for my system. What i need solr to return not only the move_title but also the movie_id that belongs to the movie. As an example; this is kind of what i need:response lst name=responseHeader int name=status0/int int name=QTime1/int

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-17 Thread adfel70
we currently have arround 200gb in a server. I'm aware of the RAM issue, but it somehow doesnt seems related. I would expect search latency problems. not strange eofexceptions. regarding the http.timeout - I didn't change anything concerning this. Do I need to explicitly set something different

Re: solr velocity not loading

2014-03-17 Thread Umapathy S
Thanks Ahmet. I had everything except solr-velocity-4.6.1.jar (which I can understand now). Thought it was now part of solr-core itself. Copied from dist/ to collection1/lib. Its working. Thanks On 17 March 2014 13:03, Ahmet Arslan iori...@yahoo.com wrote: Hi instanced is collection1 in

Re: Does CachedSqlEntityProcessor works?

2014-03-17 Thread Gora Mohanty
On 17 March 2014 18:13, manju16832003 manju16832...@gmail.com wrote: I tried to use *CachedSqlEntityProcessor* in DataImportHandler with Sub-entity query. It does not seems to be working. Here is my query entity name=listing dataSource=mysql query=SELECT id,make, model FROM LISTING entity

Doing spatial search on multiple location points

2014-03-17 Thread Varun Gupta
Hi, I am trying to find out if solr supports doing a spatial search on multiple location points. Basically, while querying solr, I will be giving multiple lat-long points and solr will be returning documents which are closer to any of the given points. If this is not possible, is there any way

Re: How to return more fields on Solr 4.5.1 Suggester?

2014-03-17 Thread Lajos
Hi Omer, That's not how its meant to work; the suggester is giving you potentially matching terms by looking at the set of terms for the given field across the index. Possibly you want to look at the MoreLikeThis component or handler? It will return matching documents, from which you have

Re: How to return more fields on Solr 4.5.1 Suggester?

2014-03-17 Thread Erick Erickson
Perhaps index the concatenation of the two fields, something like this: hard rain (1998)!14 Then have the app layer peel off the !14 for displaying the title to the user. Then use the 14 however you need to. Best, Erick On Mon, Mar 17, 2014 at 6:28 AM, Lajos la...@protulae.com wrote: Hi Omer,

Re: Doing spatial search on multiple location points

2014-03-17 Thread Smiley, David W.
Absolutely. The most straight-forward approach is to use the default query parser comprised of OR clauses of geofilt query parser based clauses. Another way to do it in Solr 4.7 that is probably faster is to use WKT with the custom “buffer extension: myLocationRptField:BUFFER(MULTIPOINT(x y, x

Solrcloud: DistribIndexing question

2014-03-17 Thread ku3ia
Hi all! A have a solrcloud (v. 4.6.0) cluster of 5 shards. Four of them I need to move to another server (configs and indexes). So as I understand I need to clear zookeeper data and after restart it will be updated. My question is, for example, in a future I need to update a specific document, is

Difference between addfield and setfield in SolrInputDocument

2014-03-17 Thread vit
Could someone explain me, please, the difference between addfield and setfield in SolrInputDocument -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-addfield-and-setfield-in-SolrInputDocument-tp4124809.html Sent from the Solr - User mailing list archive at

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-17 Thread Shawn Heisey
On 3/17/2014 7:07 AM, adfel70 wrote: we currently have arround 200gb in a server. I'm aware of the RAM issue, but it somehow doesnt seems related. I would expect search latency problems. not strange eofexceptions. regarding the http.timeout - I didn't change anything concerning this. Do I

Re: Difference between addfield and setfield in SolrInputDocument

2014-03-17 Thread Yonik Seeley
On Mon, Mar 17, 2014 at 10:22 AM, vit bulgako...@yahoo.com wrote: Could someone explain me, please, the difference between addfield and setfield in SolrInputDocument addField will add another value to any existing values for the field. setField will just overwrite anything that is already

Re: Solrcloud: DistribIndexing question

2014-03-17 Thread Erick Erickson
The algorithm is only sensitive to the shard ID, you should be able to freely move the data to another node. BTW, perhaps the easiest way to do this would be to set up a replica for the shards you care about on the new hardware (assuming connectivity) and let Solr do the synchronization for you.

Re: Difference between addfield and setfield in SolrInputDocument

2014-03-17 Thread Furkan KAMACI
Hi; addField is works like that: public void addField(String name, Object value, float boost ) { SolrInputField field = _fields.get( name ); if( field == null || field.value == null ) { setField(name, value, boost); } else { field.addValue( value, boost ); }

AutoSuggest like Google in Solr using Solarium Client.

2014-03-17 Thread Sohan Kalsariya
Can anyone suggest me the best practices how to do SpellCheck and AutoSuggest in solarium. Can anyone give me example for that? -- Regards, *Sohan Kalsariya*

Re: Solrcloud: DistribIndexing question

2014-03-17 Thread ku3ia
Erick Erickson wrote The algorithm is only sensitive to the shard ID, you should be able to freely move the data to another node. BTW, perhaps the easiest way to do this would be to set up a replica for the shards you care about on the new hardware (assuming connectivity) and let Solr do

RE: AutoSuggest like Google in Solr using Solarium Client.

2014-03-17 Thread Suresh Soundararajan
Hi Sohan, The best approach for the auto suggest is using the facet query. Please refer the link : http://solr.pl/en/2010/10/18/solr-and-autocomplete-part-1/ Thanks, SureshKumar.S From: Sohan Kalsariya sohankalsar...@gmail.com Sent: Monday, March 17,

Re: AutoSuggest like Google in Solr using Solarium Client.

2014-03-17 Thread Michael McCandless
I think it's best to use one of the many autosuggesters Lucene/Solr provide? E.g. AnalyzingInfixSuggester is running here: http://jirasearch.mikemccandless.com But that's just one suggester... there are many more. Mike McCandless http://blog.mikemccandless.com On Mon, Mar 17, 2014 at 10:44

RE: Empty string in tfloat type field

2014-03-17 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi greg, I added the below processor (RemoveBlankFieldUpdateProcessorFactory) I am still getting same problem. My XML looks like filed=Price/field E.g Pri -Original Message- From: Greg Walters [mailto:greg.walt...@answers.com] Sent: Friday, March 14, 2014 9:32 AM To:

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-17 Thread Steve Rowe
Martin, You’re right, a bug was introduced by SOLOR-5354. I’ve opened an issue https://issues.apache.org/jira/browse/SOLR-5875 and will commit the fix shortly. I hope to include this fix in a 4.7.1 release. Steve On Mar 8, 2014, at 1:32 AM, Martin de Vries mar...@downnotifier.com wrote:

Spatial maxDistErr changes

2014-03-17 Thread Steven Bower
If am only indexing point shapes and I want to change the maxDistErr from 0.09 (1m res) to 0.00045 will this break as in searches stop working or will search work but any performance gain won't be seen until all docs are reindexed? Or will I have to reindex right off? thanks, steve

Re: Empty string in tfloat type field

2014-03-17 Thread Ahmet Arslan
Hi, This config works for me : updateRequestProcessorChain name=remove     processor class=solr.TrimFieldUpdateProcessorFactory /       processor class=solr.RemoveBlankFieldUpdateProcessorFactory /       processor class=solr.RunUpdateProcessorFactory /   /updateRequestProcessorChain  

Solr faceted search not working for a certain request handler

2014-03-17 Thread vit
We have a big Solr search application where I need to add a faceted search for a certain request handler. And it does not work whereas for select handler it does. I tried to find something in the configuration but could not. If possible, please let me know where I should look at to find the

Re: Solr faceted search not working for a certain request handler

2014-03-17 Thread Shawn Heisey
On 3/17/2014 9:25 AM, vit wrote: We have a big Solr search application where I need to add a faceted search for a certain request handler. And it does not work whereas for select handler it does. I tried to find something in the configuration but could not. If possible, please let me know where

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-17 Thread Steve Rowe
Martin, I’ve committed the SOLR-5875 fix, including to the lucene_solr_4_7 branch. Any chance you could test the fix? Thanks, Steve On Mar 17, 2014, at 11:16 AM, Steve Rowe sar...@gmail.com wrote: Martin, You’re right, a bug was introduced by SOLOR-5354. I’ve opened an issue

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-17 Thread Walter Underwood
Does this bug happen in the non-sharded case? --wunder On Mar 17, 2014, at 9:15 AM, Steve Rowe sar...@gmail.com wrote: Martin, I’ve committed the SOLR-5875 fix, including to the lucene_solr_4_7 branch. Any chance you could test the fix? Thanks, Steve On Mar 17, 2014, at 11:16 AM,

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-17 Thread Steve Rowe
No, only QueryComponent.mergeIds() is only called for distributed queries. Steve On Mar 17, 2014, at 12:18 PM, Walter Underwood wun...@wunderwood.org wrote: Does this bug happen in the non-sharded case? --wunder On Mar 17, 2014, at 9:15 AM, Steve Rowe sar...@gmail.com wrote: Martin, I’ve

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-17 Thread Walter Underwood
Thanks. That is probably worth a mention in the bug and release notes. --wunder On Mar 17, 2014, at 9:33 AM, Steve Rowe sar...@gmail.com wrote: No, only QueryComponent.mergeIds() is only called for distributed queries. Steve On Mar 17, 2014, at 12:18 PM, Walter Underwood

Re: AutoSuggest like Google in Solr using Solarium Client.

2014-03-17 Thread bbi123
Not sure if you have already seen this one.. http://www.solarium-project.org/2012/01/suggester-query-support/ You can also use edge N gram filter to implement typeahead auto suggest. -- View this message in context:

Re: Difference between addfield and setfield in SolrInputDocument

2014-03-17 Thread Jack Krupansky
For multivalued fields on atomic update, add will append a value to the existing list of values, while set will discard the existing values and start a fresh list of values. So, you could do a set followed by a sequence of add's to set a new list of values for a multivalued field. -- Jack

Re: example schema now stores most field values

2014-03-17 Thread Michael Sokolov
That's a good point -- we may not really need to bother with all that. I guess I tend to do it partly as a way to become aware of new features. Well, sometimes there are required additions to the schema. For example, the _version_ field was added at some time, and you really do need it. I

any project for record linkage, fuzzy grouping, and deduplication based on Solr/Lucene?

2014-03-17 Thread Mobius ReX
For example, given a new big department merged from three departments. A few employees worked for two or three departments before merging. That means, the attributes of one person might be listed under different departments' databases. One additional problem is that one person can have different

Re: Help me understand these newrelic graphs

2014-03-17 Thread Software Dev
Otis, I want to get those spikes down lower if possible. As mentioned in the above posts that the 25ms timing you are seeing is not really accurate because that's the average response time for ALL requests including the bulk add operations which are generally super fast. Our true response time is

More heap usage in Solr during indexing

2014-03-17 Thread solr2020
Hi, we have 80 million records in index now and we are indexing 800k records everyday.We have one shard and 4 replicas in 4 servers under solrcloud. Currently we have 16GB heap but during indexing sometimes it is reaching 16GB and sometimes its normal. What is the reason to use the max heap at

Re: More heap usage in Solr during indexing

2014-03-17 Thread Greg Walters
Are your JVM running out of ram (actual exceptions) or is the used heap just reaching 16G prior to a garbage collection? If it's the later then that is expected behavior and is how Java's garbage collection works. Thanks, Greg On Mar 17, 2014, at 1:26 PM, solr2020 psgoms...@gmail.com wrote:

Re: any project for record linkage, fuzzy grouping, and deduplication based on Solr/Lucene?

2014-03-17 Thread Jack Krupansky
See: https://cwiki.apache.org/confluence/display/solr/De-Duplication -- Jack Krupansky -Original Message- From: Mobius ReX Sent: Monday, March 17, 2014 1:59 PM To: solr-user@lucene.apache.org Subject: any project for record linkage, fuzzy grouping, and deduplication based on

Re: More heap usage in Solr during indexing

2014-03-17 Thread Greg Walters
It's entirely possible that you're seeing higher memory usage while indexing due to more objects being created and abandoned. Another thing to consider could be your commit settings. Perhaps http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html can answer some of your

Re: More heap usage in Solr during indexing

2014-03-17 Thread solr2020
previously we faced OOM when we try to index 1.2M records at the same time. Now we divided that into two chunks and indexing twice. So now we are not getting OOM but heap usage is more. So we are analyzing and trying to find the cause to make sure we shouldn't get OOM again. -- View this

Re: More heap usage in Solr during indexing

2014-03-17 Thread Shawn Heisey
On 3/17/2014 12:39 PM, solr2020 wrote: previously we faced OOM when we try to index 1.2M records at the same time. Now we divided that into two chunks and indexing twice. So now we are not getting OOM but heap usage is more. So we are analyzing and trying to find the cause to make sure we

Re: More heap usage in Solr during indexing

2014-03-17 Thread solr2020
Yes Shawn. our data source is oracle DB. Here is the datasource section config. dataSource name=jdbc driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@dbname:port:dbname user=user password=password batchSize=5000 autoCommit=false

Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
Hello, We recently upgraded to Solr Cloud 4.7 (went from a single node Solr 4.0 instance to 3 node Solr 4.7 cluster). Part of out application does an automated traversal of all documents that match a specific query. It does this by iterating through results by setting the start and rows

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
I should add each node has 16G of ram, 8GB of which is allocated to the JVM. Each node has about 200k docs and happily uses only about 3 or 4gb of ram during normal operation. It's only during this deep pagination that we have seen OOM errors. On Mon, Mar 17, 2014 at 3:14 PM, Mike Hugo

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Steve Rowe
Hi Mike, The OOM you’re seeing is likely a result of the bug described in (and fixed by a commit under) SOLR-5875: https://issues.apache.org/jira/browse/SOLR-5875. If you can build from source, it would be great if you could confirm the fix addresses the issue you’re facing. This fix will be

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
Thanks Steve, That certainly looks like it could be the culprit. Any word on a release date for 4.7.1? Days? Weeks? Months? Mike On Mon, Mar 17, 2014 at 3:31 PM, Steve Rowe sar...@gmail.com wrote: Hi Mike, The OOM you're seeing is likely a result of the bug described in (and fixed by

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Steve Rowe
Mike, Days. I plan on making a 4.7.1 release candidate a week from today, and assuming nobody finds any problems with the RC, it will be released roughly four days thereafter (three days for voting + one day for release propogation to the Apache mirrors): i.e., next Friday-ish. Steve On Mar

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
Thanks! On Mon, Mar 17, 2014 at 3:47 PM, Steve Rowe sar...@gmail.com wrote: Mike, Days. I plan on making a 4.7.1 release candidate a week from today, and assuming nobody finds any problems with the RC, it will be released roughly four days thereafter (three days for voting + one day for

[ANN] sadat: generate fake docs for your Solr index

2014-03-17 Thread xavier jmlucjav
Hi, A couple of times I found myself in the following situation: I had to work on a Solr schema, but had no docs to index yet (the db was not ready etc). In order to start learning js, I needed some small project to practice, so I thought of this small utility. It allows you to generate fake

/suggest

2014-03-17 Thread Steve Huckle
Hi, The Suggest Search Component that comes preconfigured in Solr 4.7.0 solrconfig.xml seems to thread dump when I call it: http://localhost:8983/solr/suggest?spellcheck=onq=acwt=jsonindent=true msg:No suggester named default was configured, Can someone tell me what's going on there?

Re: CollapsingQParserPlugin returning different result set

2014-03-17 Thread shamik
Hi Joel, Thanks for taking a look into this. Here's the information you had requested.*ADSKDedup:*I've attached separate files for debug information for each query.Let me know if you need any information.Regards,Shamik CollapsingQParserPlugin_Query_Debug.txt

Re: /suggest

2014-03-17 Thread Lajos
Hi Steve, I've posted previously about a nice Stackoverflow exception I got when using this component ... can you post what you see? I've used it successfully in with a custom dictionary like this: searchComponent name=newsuggester class=solr.SuggestComponent lst name=suggester

Re: Result merging takes too long

2014-03-17 Thread Jeff Wartes
This is highly anecdotal, but I tried SOLR-1880 with 4.7 for some tests I was running, and saw almost a 30% improvement in latency. If you¹re only doing document selection, it¹s definitely worth having. I¹m reasonably certain that the patch would work in 4.6 too, but the test file relies on some

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Greg Pendlebury
Shouldn't all deep pagination against a cluster use the new cursor mark feature instead of 'start' and 'rows'? 4 or 5 requests still seems a very low limit to be running into an OOM issues though, so perhaps it is both issues combined? Ta, Greg On 18 March 2014 07:49, Mike Hugo

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
Cursor mark definitely seems like the way to go. If I can get it to work in parallel then that's additional bonus On Mon, Mar 17, 2014 at 5:41 PM, Greg Pendlebury greg.pendleb...@gmail.comwrote: Shouldn't all deep pagination against a cluster use the new cursor mark feature instead of

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Greg Pendlebury
My suspicion is that it won't work in parallel, but we've only just asked the ops team to start our upgrade to look into it, so I don't have a server yet to test. The bug identified in SOLR-5875 has put them off though :( If things pan out as I think they will I suspect we are going to end up

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Yonik Seeley
On Mon, Mar 17, 2014 at 7:14 PM, Greg Pendlebury greg.pendleb...@gmail.com wrote: My suspicion is that it won't work in parallel Deep paging with cursorMark does work with distributed search (assuming that's what you meant by parallel... querying sub-shards in parallel?). -Yonik

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Greg Pendlebury
Sorry, I meant one thread requesting records 1 - 1000, whilst the next thread requests 1001 - 2000 from the same ordered result set. We've observed several of our customers trying to harvest our data with multi-threaded scripts that work like this. I thought it would not work using cursor marks...

SolrCloud - inconsistent result for the same query

2014-03-17 Thread shamik
Hi, I'm using SolrCloud 4.4 version with 2 shards having 2 replica each. Lately, I'm observing issues where an obsolete document will suddenly show up in search result. I'm crawling a bunch of source system on a daily basis and updating the Solr index. Now, when I'm searching for a specific

Re: [ANN] sadat: generate fake docs for your Solr index

2014-03-17 Thread Alexandre Rafalovitch
Looks interesting. I like the admin-page-extra integration and point-at-local-solr aspects. I looked at both approaches before as well, but nothing in the public code. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch -

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
Greg and I are talking about the same type of parallel. We do the same thing - if I know there are 10,000 results, we can chunk that up across multiple worker threads up front without having to page through the results. We know there are 10 chunks of 1,000, so we can have one thread process

Re: Result merging takes too long

2014-03-17 Thread Shalin Shekhar Mangar
That's great Jeff! Thanks for sharing your experience. SOLR-5768 will make it even better. https://issues.apache.org/jira/browse/SOLR-5768 On Tue, Mar 18, 2014 at 3:35 AM, Jeff Wartes jwar...@whitepages.com wrote: This is highly anecdotal, but I tried SOLR-1880 with 4.7 for some tests I was

Send many files to update/extract

2014-03-17 Thread Александр Вандышев
Who knows how to index a lot of files with ExtractingRequestHandler using a single query?