Re: Merge tool based on mergefactor

2013-06-19 Thread Cosimo Streppone
On 06/19/2013 03:21 AM, Otis Gospodnetic wrote: You could call the optimize command directly on slaves, but specify the target number of segments, e.g. /solr/update?optimize=truemaxSegments=10 Not sure I recommend doing this on slaves, but you could - maybe you have spare capacity.

Adding documents in Solr plugin

2013-06-19 Thread Avner Levy
I have a core with millions of records. I want to add a custom handler which scan the existing documents and update one of the field (delete and add document) based on a condition (age12 for example). All fields are stored so there is no problem to recreate the document from the search result.

Re: Adding documents in Solr plugin

2013-06-19 Thread Upayavira
This could be a very useful feature. To do it properly, you'd want some new update syntax, extending that of the atomic updates. That is, a new custom request handler could do it, but might now be the best way. If I were to try this, I'd look into the atomic update tickets in JIRA and see what

Disable Replication for all Cores in a single Command

2013-06-19 Thread Ralf Heyde
Hello Folks, is it possible to disable the replication for ALL cores using one command? We currently use Solr 3.6. Currently we have a curl operation, which fires: http://slave_host:port/solr/core/admin/replication/index.jsp?poll=disable In the documentation there is a URL-Command which

UnInverted multi-valued field

2013-06-19 Thread Jochen Lienhard
Hi @all. We have the problem that after an update the index takes to much time for 'warm up'. We have some multivalued facet-fields and during the startup solr creates the messages: INFO: UnInverted multi-valued field

Re: Solr string field stripping new lines line breaks

2013-06-19 Thread sodoo
Dears, My english is bad. But I will try to explain. I have indexed databases and files. The files included : docx, pdf, txt. Then I have indexed all of data. But my indexed document pdf files text all of through continued. I try to appear line break text. Document files text line breaks to

getting different search results for words with same meaning in Japanese language

2013-06-19 Thread Yash Sharma
Hi, we have two japanese words with the same meaning ソフトウェア and ソフトウエア (notice the difference in capital I looking character - word meaning is 'software' in the english language). When ソフトウェア is searched, it gives around 8 search results but when ソフトウエア is searched, it gives only 2 search

Solr Suggest does not work in solrcloud environment

2013-06-19 Thread Sharp
Hi Guys I am having difficulties running a suggest Search Handler in a solrcloud environment. The configuration was tested on a standalone machine and works fine there. Here is my configuration: *Schema.xml* field name=suggest type=suggest_text indexed=true stored=false multiValued=true /

how to reterieve all results from lucene searcher.search() method

2013-06-19 Thread neeraj shah
hello, Is there any way to get all the search result. In lucene we get top documents by giving the limit like top 100,1000... etc. but if i want to get all results. How can I achieve that?? Query qu = new QueryParser(Version.LUCENE_36,field, analyzer).parse(query); TopDocs hits =

Re: SOLR Cloud - Disable Transaction Logs

2013-06-19 Thread Erick Erickson
Right, NRT is not tied to cloud, but it is tied to the update log. And you bring up an interesting issue when you talk about avilibility zones. SolrCloud is fairly chatty in that all of the nodes need to talk to all the other nodes in the network and they will. If the nodes are separated by an

Re: Solr cloud: zkHost in solr.xml gets wiped out

2013-06-19 Thread Erick Erickson
Thanks for the confirmation! I was wondering where these bits came from wt=javabin version=2 since I wasn't seeing them, but you mentioned SolrCloud, so that explains things. It'll be tonight before I commit the fix I'm afraid, I'm traveling and need to put one more test in. Best Erick On Tue,

Re: How to define my data in schema.xml

2013-06-19 Thread Mysurf Mail
Well, Avoiding flattening the db to a flat table sounds like a great plan. I found this solution http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example import.a join. not handling a flat table. On Tue, Jun 18, 2013 at 5:53 PM, Jack Krupansky j...@basetechnology.comwrote: You can

Re: PostingsSolrHighlighter not working on Multivalue field

2013-06-19 Thread Erick Erickson
Well, _how_ does it fail? unless it's a type it should be multiValued (not capital 'V'). This probably isn't the problem, but just in case. Anything in the logs? What is the field definition? Did you re-index after changing to multiValued? Best Erick On Tue, Jun 18, 2013 at 11:01 PM, Floyd

Re: Solr string field stripping new lines line breaks

2013-06-19 Thread Erick Erickson
First, please start a new thread when you change the topic, doing so makes the threads easier to track. But what is your evidence that line breaks are stripped? The stored data is a verbatim copy of the data that went in to the field, nothing at all is changed. So one of several things is

Sharding and Replication

2013-06-19 Thread Asif
Hi, I had questions on implementation of Sharding and Replication features of Solr/Cloud. 1. I noticed that when sharding is enabled for a collection - individual requests are sent to each node serving as a shard. 2. Replication too follows above strategy of sending individual documents to the

Re: Solr Suggest does not work in solrcloud environment

2013-06-19 Thread Aloke Ghoshal
Hi, Check the obvious first, that you have rebuilt reloaded the suggest dictionary individually on all nodes. Also the other checks here: http://stackoverflow.com/questions/6653186/solr-suggester-not-returning-any-results Then, try with one of query component OR distrib=false setting:

Re: UnInverted multi-valued field

2013-06-19 Thread Jack Krupansky
Take a look at using DocValues for faceted fields. -- Jack Krupansky -Original Message- From: Jochen Lienhard Sent: Wednesday, June 19, 2013 5:30 AM To: solr-user@lucene.apache.org Subject: UnInverted multi-valued field Hi @all. We have the problem that after an update the index

Re: Disable Replication for all Cores in a single Command

2013-06-19 Thread Shawn Heisey
On 6/19/2013 2:18 AM, Ralf Heyde wrote: Hello Folks, is it possible to disable the replication for ALL cores using one command? We currently use Solr 3.6. Currently we have a curl operation, which fires: http://slave_host:port/solr/core/admin/replication/index.jsp?poll=disable In the

Re: Solr Cloud Hangs consistently .

2013-06-19 Thread Rishi Easwaran
Update!! Got SOLR cloud working, was able to do 90k document inserts with replicationFactor=2, with my jmeter script, previously was getting stuck with 3k inserts or less. After some investigation, figured out that ulimits for my process were not being set properly, OS defaults were kicking

Highlighting using hl.q without a df field

2013-06-19 Thread AdamP
Is it possible to use the hl.q field if you’re using the extended dismax query parser and have defined the “qf” field, but not a “df” field? Here’s a sample query: q=drivefq=cat:electronicshl=truehl.fl=cat,namehl.q=drive cat:electronics. In this case I want to highlight the facet

Re: UnInverted multi-valued field

2013-06-19 Thread Toke Eskildsen
On Wed, 2013-06-19 at 11:30 +0200, Jochen Lienhard wrote: INFO: UnInverted multi-valued field {field=mt_facet,memSize=18753256,tindexSize=54,time=170,phase1=156,nTerms=17,bigTerms=3,termInstances=903276,uses=0} 170ms does not sound like much to me. What are you hoping for? We know, that the

Re: yet another optimize question

2013-06-19 Thread Andre Bois-Crettez
indeed the actual syntax for per field facet is : f.mysparefieldname.facet.method=enum André On 06/18/2013 09:00 PM, Petersen, Robert wrote: Hi Andre, Wow that is astonishing! I will definitely also try that out! Just set the facet method on a per field basis for the less used sparse

Re: Solr Suggest does not work in solrcloud environment

2013-06-19 Thread Sharp
Hi Aloke Thanks for your reply. It works with the http://url.com:8983/solr/mycore/suggest?q=barwt=jsondistrib=true parameter or when inserted into the defaults requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str

How to dynamically add geo fields to a query using a request handler

2013-06-19 Thread ade-b
Hi We have a request handler defined in solrconfig.xml that specifies a list of fields to return for the request using the fl name. E.g. str name=flcreatedDate/str When constructing a query using solrj that uses this request handler, we want to conditionally add the geo spatial fields that will

another transaction log + commit question

2013-06-19 Thread Joshi, Shital
Hi, We hard committed (/update/csv?commit=true) about 20,000 documents to SolrCloud (5 shards, 1 replicas = 10 jvm instances). We have commented out both autoCommit and autoSoftCommit settings from solrconfig.xml. What we noticed that the transaction log size never goes down to 0. We thought

Re: UnInverted multi-valued field

2013-06-19 Thread Roman Chyla
On Wed, Jun 19, 2013 at 5:30 AM, Jochen Lienhard lienh...@ub.uni-freiburg.de wrote: Hi @all. We have the problem that after an update the index takes to much time for 'warm up'. We have some multivalued facet-fields and during the startup solr creates the messages: INFO: UnInverted

Re: TieredMergePolicy reclaimDeletesWeight

2013-06-19 Thread Michael McCandless
The default is 2.0, and higher values will more strongly favor merging segments with deletes. I think 20.0 is likely way too high ... maybe try 3-5? Mike McCandless http://blog.mikemccandless.com On Tue, Jun 18, 2013 at 6:46 PM, Petersen, Robert robert.peter...@mail.rakuten.com wrote: Hi

Re: Question about SOLR search relevance score

2013-06-19 Thread Gora Mohanty
On 19 June 2013 21:15, sérgio Alves sd_t_al...@hotmail.com wrote: [...] Right now we're having problems with some common search terms. They return varied results on the search results, and the products which should appear first in the results, are scored lower than other, seemingly unrelated,

Question about SOLR search relevance score

2013-06-19 Thread sérgio Alves
Hi. My name is Sérgio Alves and I'm a developer in a project that uses solr as its search engine. Right now we're having problems with some common search terms. They return varied results on the search results, and the products which should appear first in the results, are scored

RE: Question about SOLR search relevance score

2013-06-19 Thread Swati Swoboda
Hi Sergio, Append 'debugQuery=on' to your queries to learn more about how your queries are being evaluated/ranked. i.e. qf=attributes_name^15+attributes_brand^10+attributes_category^8debugQuery=on You'll get an XML section that is dedicated to debug information. I've found

Apparent odd interaction between autoCommit values and indexing ram buffer

2013-06-19 Thread Shawn Heisey
I've run into something a little odd that's been happening for a while. The apparent symptoms: Two index segments are created every time an autoCommit (hard, not soft) happens during a DIH full-import. Here's the directory listing from the first few minutes of importing, and a related

Update by query?

2013-06-19 Thread Timothy Potter
Quick check to see if Solr supports an update-by-query feature or if anyone has thought about something like this ... similar to delete-by-query My specific use case is a metadata field needs to be updated for N docs where N 1 and the set can easily be identified by a query. Currently, I have to

SOLR : ArrayIndexOutOfBoundsException from SolrDispatchFilter

2013-06-19 Thread Rohit Kumar
Need help to figure out the error below. *Code Snippet*: public class ConnectionComponent extends SearchComponent { @Override public void process(ResponseBuilder rb) throws IOException { NamedList nList = new SimpleOrderedMap(); NamedList nl= new SimpleOrderedMap();

Re: Apparent odd interaction between autoCommit values and indexing ram buffer

2013-06-19 Thread Shawn Heisey
On 6/19/2013 10:38 AM, Shawn Heisey wrote: Looking at the numDocs for each segment, here's what I think is happening: The autoCommit kicks in after the first 25000 docs (25002 to be precise), but the ram buffer isn't emptied. The next 3339 documents get indexed, at which point the ram buffer

Re: Update by query?

2013-06-19 Thread Jack Krupansky
It has come up before as a nice feature to have, but isn't in Solr right now. I'd say go ahead and file a Jira for a new feature. -- Jack Krupansky -Original Message- From: Timothy Potter Sent: Wednesday, June 19, 2013 12:57 PM To: solr-user@lucene.apache.org Subject: Update by

Wildcards and Phrase queries

2013-06-19 Thread Isaac Hebsh
Hi, I'm trying to understand what is the status of enabling wildcards on phrase queries? Lucene JIRA issue: https://issues.apache.org/jira/browse/LUCENE-1486 Solr JIRA issue: https://issues.apache.org/jira/browse/SOLR-1604 It looks like these issues are not going to be solved in the close

RE: yet another optimize question

2013-06-19 Thread Petersen, Robert
Hi Walter, I used to have larger settings on our caches but it seemed like I had to make the caches that small to reduce memory usage to keep from getting the dreaded OOM exceptions. Also our search is behind Akamai with a one hour TTL. Our slave farm has a load balancer in front of twelve

RE: TieredMergePolicy reclaimDeletesWeight

2013-06-19 Thread Petersen, Robert
OK thanks, will do. Just out of curiosity, what would having that set way too high do? Would the index become fragmented or what? -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, June 19, 2013 9:33 AM To: solr-user@lucene.apache.org

Sharding and Replication clarification

2013-06-19 Thread Asif
Hi, I had questions on implementation of Sharding and Replication features of Solr/Cloud. 1. I noticed that when sharding is enabled for a collection - individual requests are sent to each node serving as a shard. 2. Replication too follows above strategy of sending individual documents to the

Re: TieredMergePolicy reclaimDeletesWeight

2013-06-19 Thread Michael McCandless
Way too high would cause it to pick highly lopsided merges just because a few deletes were removed. Highly lopsided merges (e.g. one big segment and N tiny segments) can be horrible because it can lead to O(N^2) merge cost over time. Mike McCandless http://blog.mikemccandless.com On Wed, Jun

RE: TieredMergePolicy reclaimDeletesWeight

2013-06-19 Thread Petersen, Robert
Oh! Thanks for the info. I'll change that right away. -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, June 19, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: TieredMergePolicy reclaimDeletesWeight Way too high would cause it

Re: yet another optimize question

2013-06-19 Thread Walter Underwood
I generally run with an 8GB heap for a system that does no faceting. 32GB does seem rather large, but you really should have room for bigger caches. The Akamai cache will reduce your hit rate a lot. That is OK, because users are getting faster responses than they would from Solr. A 5% hit rate

RE: yet another optimize question

2013-06-19 Thread Petersen, Robert
We actually have hundreds of facet-able fields, but most are specialized and are only faceted upon if the user has drilled into the particular category to which they are applicable and so they are only indexed for products in those categories. I guess it is the facets that eat up so much of

solr spatial search with distance to search results

2013-06-19 Thread PeterKerk
I was reading this: http://wiki.apache.org/solr/SpatialSearch I have this Solr query:

fq vs q parameter

2013-06-19 Thread Learner
Hi, I am currently using the below configuration in one of my handler and I was thinking of removing the values from q parameter and including as a part of fq parameter. Can someone let me know if there is any performance improvement when using fq parameter compared to q? str name=q

Re: fq vs q parameter

2013-06-19 Thread Michael Della Bitta
Yes, definitely, fq parameters don't affect scoring and can be cached. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions

Re: fq vs q parameter

2013-06-19 Thread adityab
I see that your query has boost value so this mean you need Solr to Score on each match document. One of the key difference between q and fq is thats fq will not have any impact on score. where as having it in q will score each document based on the Similarity Score. -- View this message in

Re: fq vs q parameter

2013-06-19 Thread adityab
+1 q and fq both can be cached. -- View this message in context: http://lucene.472066.n3.nabble.com/fq-vs-q-parameter-tp4071748p4071759.html Sent from the Solr - User mailing list archive at Nabble.com.

Informal poll on running Solr 4 on Java 7 with G1GC

2013-06-19 Thread Timothy Potter
I'm sure there's some site to do this but wanted to get a feel for who's running Solr 4 on Java 7 with G1 gc enabled? Cheers, Tim

Re: Adding documents in Solr plugin

2013-06-19 Thread Otis Gospodnetic
I think this makes sense. Timothy asked about update by query in the last 24 hours and this sounds like the same thing. Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Jun 19, 2013 at 3:52 AM, Avner Levy av...@checkpoint.com wrote: I have a core with millions of records.

Re: Merge tool based on mergefactor

2013-06-19 Thread Otis Gospodnetic
Hi, On Wed, Jun 19, 2013 at 3:52 AM, Cosimo Streppone cos...@streppone.it wrote: On 06/19/2013 03:21 AM, Otis Gospodnetic wrote: You could call the optimize command directly on slaves, but specify the target number of segments, e.g. /solr/update?optimize=truemaxSegments=10 Not sure I

update solr.xml dynamically to add new cores

2013-06-19 Thread smanad
Hi, Is there a way to edit solr.xml as a part of debian package installation to add new cores. In my use case, there 4 solr indexes and they are managed/configured by different teams. The way I am thinking packages will work is as described below, 1. There will be a solr-base debian package

Partial update using solr 4.3 with csv input

2013-06-19 Thread smanad
I was going through this link http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of the comments is about support for csv. Since the comment is almost a year old, just wondering if this is still true that, partial updates are possible only with xml and json input? Thanks,

Re: Partial update using solr 4.3 with csv input

2013-06-19 Thread Jack Krupansky
Correct, no atomic update for CSV format. There just isn't any place to put the atomic update options in such a simple text format. -- Jack Krupansky -Original Message- From: smanad Sent: Wednesday, June 19, 2013 8:30 PM To: solr-user@lucene.apache.org Subject: Partial update using

SolrCloud - Score calculation

2013-06-19 Thread Learner
Hi, Sorry if its a very basic question but I am pretty new to SolrCloud and I am trying to understand the underlying mechanism for calculating relevancy. Currently we are using SOLR 3.6.X and we use shards to perform distributed searching. Our shards are not of equal size hence sometimes the

Re: SolrCloud - Score calculation

2013-06-19 Thread Upayavira
The reason for the issue you are seeing is the IDF component in te score. IDF = inverse document frequency. The document frequency is the number of times a document appears in the index. The higher the document frequency, the mre common the term and thus the less relevant it is. The document

Re: Informal poll on running Solr 4 on Java 7 with G1GC

2013-06-19 Thread Shawn Heisey
On 6/19/2013 4:18 PM, Timothy Potter wrote: I'm sure there's some site to do this but wanted to get a feel for who's running Solr 4 on Java 7 with G1 gc enabled? I have tried it, but found that G1 didn't give me any better GC pause characteristics than CMS without tuning, and may have actually

Re: PostingsSolrHighlighter not working on Multivalue field

2013-06-19 Thread Floyd Wu
Hi Erick, multivalue is my typo, thanks for your reminding. There is no log show anything wrong or exception occurred. The field definition as following field name=summary type=text indexed=true stored=true omitNorms=false termVectors=true termPositions=true termOffsets=true

RE: Solr 4.2 in SolrCloud mode lost response for update but search is normal

2013-06-19 Thread Kevin Xiang
From the coredump information,it seem that the issue is the same as the jira: https://issues.apache.org/jira/browse/SOLR-4400: Rapidly opening and closing cores can lead to deadlock Mark Miller: Does the issue happen again? Thanks. From: Qun Wang Sent: 2013年6月20日 11:24 To: