RE: Explain score is different from score

2016-03-19 Thread G, Rajesh
I don’t use boost at index time and query time. Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. This e-mail and/or its attachments are intended

Re: Document Cache

2016-03-19 Thread Shawn Heisey
On 3/18/2016 8:22 AM, Rallavagu wrote: > So, each soft commit would create a new searcher that would invalidate > the old cache? > > Here is the configuration for Document Cache > > initialSize="10" autowarmCount="0"/> > > true In an earlier message, you indicated you're running into OOM. I

Ping handler in SolrCloud mode

2016-03-19 Thread Tom Evans
Hi all I have a cloud setup with 8 nodes and 3 collections, products, items and skus. All collections have just one shard, products has 6 replicas, items has 2 replicas, skus has 8 replicas. No node has both products and items, all nodes have skus Some of our queries join from sku to either

Re: Solr5 Optimize

2016-03-19 Thread Erick Erickson
In general, don't bother with optimize unless the index is quite static, i.e. there are very few adds/updates or those updates are done in batches and rarely (i.e. once a day or less frequently). As far as space, this will require that you have at _least_ as much free space on your disks as your

Re: Ping handler in SolrCloud mode

2016-03-19 Thread Tom Evans
On Wed, Mar 16, 2016 at 2:14 PM, Tom Evans wrote: > Hi all > > [ .. ] > > The option I'm trying now is to make two ping handler for skus that > join to one of items/products, which should fail on the servers which > do not support it, but I am concerned that this is a

RE: Ping handler in SolrCloud mode

2016-03-19 Thread Davis, Daniel (NIH/NLM) [C]
Shawn Heisey wrote: > On 3/16/2016 10:11 AM, Tom Evans wrote: > > This worked, I would still be interested in a lighter-weight approach > > that doesn't involve joins to see if a given collection has a shard on > > this server. I suspect that might require a custom ping handler plugin > >

RE: Indexing both meta-data and full content of HTML

2016-03-19 Thread Davis, Daniel (NIH/NLM) [C]
So, I think I've solved my problem, it basically comes from having only done Data Import Handler with any depth. I'll simply use extract request processing handler with some literal fields. -Original Message- From: Davis, Daniel (NIH/NLM) [C] Sent: Wednesday, March 16, 2016 11:47 AM To:

Re: Why is multiplicative boost prefered over additive?

2016-03-19 Thread Upayavira
Yes. Boosting adjusts an existing score. That original score can vary, e.g. depending upon how many search terms there are. If you use additive boosting, when you add a boost to a search with one term, (e.g. between 0 and 1) you get a different effect compared to when you add the same boost to a

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Robert Brown
Thanks for the added input. I'll certainly look into the machine learning aspect, will be good to put some basic knowledge I have into practice. I'd been led to believe the tie parameter didn't actually do a lot. :-/ On 03/18/2016 12:07 PM, Nick Vasilyev wrote: I work with a similar

Document Cache

2016-03-19 Thread Rallavagu
Solr 5.4 embedded Jetty Is it the right assumption that whenever a document that is returned as a response to a query is cached in "Document Cache"? Essentially, if I request for any entry like /select?q=id: will it be cached in "Document Cache"? If yes, what is the TTL? Thanks in advance

Re: how to update billions of docs

2016-03-19 Thread Jack Krupansky
It would be nice to have a wiki/doc for "Bulk Field Update" that listed all of these techniques and tricks. And, of course, it would be so much better to have an explicit Lucene feature for this. It could work in the background like merge and process one segment at a time as efficiently as

Re: Solr5 Optimize

2016-03-19 Thread Rallavagu
Erick, Thanks for the response. Comments in line... On 3/16/16 9:56 AM, Erick Erickson wrote: In general, don't bother with optimize unless the index is quite static, i.e. there are very few adds/updates or those updates are done in batches and rarely (i.e. once a day or less frequently). As

Re: how to update billions of docs

2016-03-19 Thread Jack Krupansky
That's another great example of a mode that Bulk Field Update (my mythical feature) needs - switch a list of fields from stored to docvalues. And maybe even the opposite since there are scenarios in which docValues is worse than stored and you would only find that out after indexing... billions

RE: Explain score is different from score

2016-03-19 Thread Rick Sullivan
Try adding the following to your schema just under the tag:   This seems to solve the problem for me. Well I at least haven't yet found any cases where I see the score discrepancy. Thanks, -Rick > From: r...@ricksullivan.net > To:

Re: SolrCloud App Unit Testing

2016-03-19 Thread Steve Davids
Naveen, The Solr codebase generally uses the base “SolrTestCaseJ4” class and sometimes mixes in the cloud cluster. I personally write a generic abstract base test class to fit my needs and have an abstract `getSolrServer` method with an EmbeddedSolrServer implementation along with a separate

Re: DIG issue with SolrEntityProcessor 5.4.1

2016-03-19 Thread William Bell
I will try to see if I can create a use case and fix it. On Wed, Mar 16, 2016 at 10:00 AM, William Bell wrote: > We are running this inside of another entity in DIH. There appears to be > an issue. We get 2 calls to the survey core if hits > 0. If hits = 0 we get > 1 call.

Re: RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Kelly, Frank
Any thoughts on this? Hoping for just a quick 1) Yes - once ZooKeeper loses a Quorum you need to restart Solr and your SolrJ Client 2) No - that¹s not expected behavior - Solr and SolrJ should recover - please file a JIRA issue Cheers! Frank Kelly Principal Software Engineer Predictive

Re: Solr 5.5.0 ClassNotFoundException solr.MockTokenizerFactory after DIH setup

2016-03-19 Thread Victor D'agostino
Hi It is a new server on CentOS release 6.7 with java-1.6.0-openjdk.x86_64 and java-1.7.0-openjdk-devel.x86_64 installed. I can parse the logs to extract jar files which are loaded but which ones am I supposed to look for ? They are all located in /data/solr-5.5.0/ : [root@LXLYOSOL31

Re: No live SolrServers available to handle this request

2016-03-19 Thread Shawn Heisey
On 3/17/2016 11:29 PM, Anil wrote: > Thanks Shawn. we are using 4.10.3. > > I don't see any issues with replicas of all shards at the time of > exception. health of all shards is good in CDH. I do not know what CDH is. I'm guessing it's third-party software. As far as I'm aware, Solr doesn't

Boosts for relevancy (shopping products)

2016-03-19 Thread Robert Brown
Hi, I currently have an index of ~50m docs representing shopping products: name, description, brand, category, etc. Our "qf" is currently setup as: name^5 brand^2 category^3 merchant^2 description^1 mm: 100% ps: 5 I'm getting complaints from the business concerning relevancy, and was

Re: Document Cache

2016-03-19 Thread Rallavagu
Thanks for the recommendations Shawn. Those are the lines I am thinking as well. I am reviewing application also. Going with the note on cache invalidation for every two minutes due to soft commit, wonder how would it go OOM in simply two minutes or is it likely that a thread is holding the

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard
On Friday, March 18, 2016 5:11 PM, wun...@wunderwood.org wrote: > > I used a popularity score based on the DVD being in people's queues and the > streaming views. > The Peter Jackson films were DVD only. They were in about 100 subscriber > queues. > The first Twilight film was in 1.25 million

Re: Stopping Solr JVM on OOM

2016-03-19 Thread Binoy Dalal
Hi Shawn, Your thoughts on this? On Mon, Mar 14, 2016 at 2:11 PM Binoy Dalal wrote: > I set the heap to 16 mb and tried to index about 350k records using a DIH. > This did throw an OOM for that particular thread in the console, but the > oom script wasn't called and solr

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Yonik Seeley
On Wed, Mar 16, 2016 at 11:10 PM, Erick Erickson wrote: > Personally I prefer to hand-edit the files. Me too, I hand edit managed-schema all the time. IMO, the warning is a bit overkill. -Yonik

indexing Free-form text description

2016-03-19 Thread Vis Sw
Hi, I am trying to understand the best way to index and search "free text field" e.g. notes or description... Please suggest what will be the best field type, tokenizer, filter... to query Free-form text description of a field. Any example will be great... Regards

Re: Document Cache

2016-03-19 Thread Rallavagu
On 3/18/16 8:56 AM, Emir Arnautovic wrote: Problem starts with autowarmCount="5000" - that executes 5000 queries when new searcher is created and as queries are executed, document cache is filled. If you have large queryResultWindowSize and queries return big number of documents, that will eat

Re: Solr5 Optimize

2016-03-19 Thread Rallavagu
Thanks Erick. This helps. On 3/16/16 10:11 AM, Erick Erickson wrote: First of all, "optimize-like" does _not_ happen "every time a commit happens". What _does_ happen is the current state of the index is examined and if certain conditions are met _then_ segment merges happen. Think of these as

Solr5 Optimize

2016-03-19 Thread Rallavagu
All, Solr 5.4 with emdbedded Jetty (4G heap) Trying to understand behavior of "optimize" operation if not run explicitly. What is the frequency at which this operation is run, what are the storage requirements and how do we schedule it? Any comments/pointers would greatly help. Thanks in

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Shawn Heisey
On 3/16/2016 1:14 AM, Alexandre Rafalovitch wrote: > So, I am looking at the Solr 5.5 examples with their all-in by-default > managed schemas. And I am scratching my head on the workflow users are > expected to follow. > > One example is straight from documentation: > "With the above

Would it be better to make my Schema changes within the renamed "/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf/schema.xml" instead of the way that I am doing it now via curl -X PO

2016-03-19 Thread John Mitchell
I noticed that within "/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf" it has a file called "managed-schema" and within this file it says "This is the Solr schema file. This file should be named "schema.xml" and should be in the conf directory". Currently I have not renamed

Re: stop words as blacklist

2016-03-19 Thread Binoy Dalal
Like Ahmet says, a custom update request processor is the best way to go, and it's pretty simple too. I have a ready to use example here: https://github.com/lttazz99/SolrPluginsExamples On Fri, Mar 18, 2016 at 9:21 PM Ahmet Arslan wrote: > Hi John, > > Do you want to

Re: High Cpu sys usage

2016-03-19 Thread Shawn Heisey
On 3/16/2016 8:59 AM, Patrick Plaatje wrote: > From the sar output you supplied, it looks like you might have a memory issue > on your hosts. The memory usage just before your crash seems to be *very* > close to 100%. Even the slightest increase (Solr itself, or possibly by a > system service)

Re: how to update billions of docs

2016-03-19 Thread sudsport s
I think there are no inplace updates in solr , that means updates behaves like inserts and marking old version deleted. so behaviors should be same as indexing billions of docs. On Wed, Mar 16, 2016 at 3:52 PM, Mohsin Beg Beg wrote: > Hi, > > I have a requirement to

Re: Document Cache

2016-03-19 Thread Rallavagu
On 3/18/16 9:27 AM, Emir Arnautovic wrote: Running single query that returns all docs and all fields will actually load as many document as queryResultWindowSize is. What you need to do is run multiple queries that will return different documents. In case your id is numeric, you can run

publish solr on galsshfish server

2016-03-19 Thread Adel Mohamed Khalifa
Hello All, What is the requirement for installing solr on glassfish server, and how can I do it? Regards, Adel Khalifa | Developer | Saudisoft-Egypt | Tel: +2 023 303 2037 - ext 112 | M +2 01149247744 | Fax +2 023 303 2036 | Follow us on

RE: publish solr on galsshfish server

2016-03-19 Thread Adel Mohamed Khalifa
I build my webpage for searching and create a servlet for it but it is not working I using this Ajax for calling servlet :- $.ajax({ url: contextPath + '/GetResults', data: { qu: $("#query").val() }, dataType:

Re: High Cpu sys usage

2016-03-19 Thread Otis Gospodnetić
Hi, I looked at those metrics outputs, but nothing jumps out at me as problematic. How full are your JVM heap memory pools? If you are using SPM to monitor your Solr/Tomcat/Jetty/... look for a chart that looks like this: https://apps.sematext.com/spm-reports/s/zB3JcdZyRn If some of these

Re: Why is multiplicative boost prefered over additive?

2016-03-19 Thread Shawn Heisey
On 3/18/2016 6:34 AM, jimi.hulleg...@svensktnaringsliv.se wrote: > I'm not sure I follow your logic now. If one can express the popularity as a > value between 0.0 and 1.0, why can't one use that, together with a weight > (indicating how much the popularity should influence the score, in

Re: No live SolrServers available to handle this request

2016-03-19 Thread Anil
and defType is edismax On 18 March 2016 at 10:40, Anil wrote: > HI Michael, > > i could not post the query. i know its difficult to find out the root > cause without query. sorry about that. > > query includes expand/collpase and query filter (fq) and 2 to 3 terms with >

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread John Smith
Hi, For once I might be of some help: I've had a similar configuration (large set of products from various sources). It's very difficult to find the right balance between all parameters and requires a lot of tweaking, most often in the dark unfortunately. What I've found is that omitNorms=true

Re: Query behavior.

2016-03-19 Thread Modassar Ather
What I understand by q.op is the default operator. If there is no AND/OR in-between the terms the default will be AND as per my setting of q.op=AND. But what if the query has AND/OR explicitly put in-between the query terms? I just think that if (A OR B) is the query then the result should be

Re: using solr AnalyticsQuery API vs facet API

2016-03-19 Thread sudsport s
Thanks Joel for responding. but I am still not sure when to use Solr analytics API i vs JSON facet API (What is difference between ValueSource vs PostFilter) I know that ValueSource is useful to implement functions. On Wed, Mar 16, 2016 at 9:49 AM, sudsport s wrote: >

Re: Query behavior.

2016-03-19 Thread Alessandro Benedetti
I think what he tried to explain was : " Input query : *fl:(java OR book)* Instead of having the query parser parsing : *+((fl:java fl:book)~2) *( which seems what is happening right now) He want the query parser to parse : +((fl:java fl:book)) ( without the mm expressed) More than the outer

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-19 Thread Zheng Lin Edwin Yeo
Thanks Shawn for your reply. Yes, I'm looking to see if we can implement a combination of tokenizes and filters. However, I tried before that we can only implement one tokenizer for each fieldType. So is it true that I can only stick to one tokenizer, and the rest of the implementation have to

Re: No live SolrServers available to handle this request

2016-03-19 Thread Anil
HI Michael, i could not post the query. i know its difficult to find out the root cause without query. sorry about that. query includes expand/collpase and query filter (fq) and 2 to 3 terms with AND. please share your thoughts. thanks. Regards, Anil On 17 March 2016 at 19:46, michael solomon

Re: RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Kelly, Frank
Thanks for taking look I’m not sure https://issues.apache.org/jira/browse/SOLR-8326 is a match as we aren’t using PKIAuthPlugin -Frank Frank Kelly Principal Software Engineer Predictive Analytics Team (SCBE/HAC/CDA) HERE 5 Wayside Rd, Burlington, MA 01803, USA 42° 29' 7" N 71° 11' 32” W

Re: FW: SolrCloud App Unit Testing

2016-03-19 Thread GW
I think the easiest way to write apps for Solr is with some kind of programming language and the REST API. Don't bother with the PHP or Perl modules. They are deprecated and beyond useless. just use the HTTP call that you see in Solr Admin. Mind the URL encoding when putting together your server

Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard
Hi, After reading a bit on various sites, and especially the blog post "Comparing boost methods in Solr", it seems that the preferred boosting type is the multiplicative one, over the additive one. But I can't really get my head around *why* that is so, since in most boosting problems I can

Re: indexing Free-form text description

2016-03-19 Thread Erick Erickson
This question is way too general to answer in any detail, so I'd just start with the text_general fieldType in any of the stock schema.xml files. It would be well for you to get familiar with the admin/analysis page, as you'll have a zillion questions about what each change you make to that

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
Running single query that returns all docs and all fields will actually load as many document as queryResultWindowSize is. What you need to do is run multiple queries that will return different documents. In case your id is numeric, you can run something like id:[1 TO 100] and then id:[100 TO

How is _rest_managed.json used?

2016-03-19 Thread Alexandre Rafalovitch
Hello, What is _rest_managed.json actually for? I can see the mechanics in the Ref Guide and even found where it is managed by source code. But I cannot figure out how it actually fits into a workflow. It seems to be a registry of REST managed components (e.g. synonyms) for when they are NOT

Re: No live SolrServers available to handle this request

2016-03-19 Thread Anil
Thanks Shawn. we are using 4.10.3. I don't see any issues with replicas of all shards at the time of exception. health of all shards is good in CDH. Regards, Anil On 18 March 2016 at 10:52, Shawn Heisey wrote: > On 3/17/2016 4:22 AM, Anil wrote: > > We are using

Re: High Cpu sys usage

2016-03-19 Thread Patrick Plaatje
Hi, >From the sar output you supplied, it looks like you might have a memory issue >on your hosts. The memory usage just before your crash seems to be *very* >close to 100%. Even the slightest increase (Solr itself, or possibly by a >system service) could caused the system crash. What are the

Re: Document Cache

2016-03-19 Thread Erick Erickson
First, I want to make sure when you say "TTL", you're talking about documents being evicted from the documentCache and not the "Time To Live" option whereby documents are removed completely from the index. The time varies with the number of new documents fetched. This is an LRU cache whose size

Re: Would it be better to make my Schema changes within the renamed "/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf/schema.xml" instead of the way that I am doing it now via curl -

2016-03-19 Thread Shawn Heisey
On 3/18/2016 7:31 AM, John Mitchell wrote: > My question would it be better to make my Schema changes within the renamed > "/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf/schema.xml" > instead of the way that I am doing it now via curl -X POST -H >

Re: from zookeper embedded to standalone

2016-03-19 Thread Erick Erickson
Looking forward to finding out if it works as I haven't had to do this myself ;). As Upayavira mentions though, you might have to do some fancy dancing with the ZK quorum. I'm assuming that once all the data is moved around, shutting down _all_ the Zookeepers (and Solrs!) and reconfiguring the

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard
On Thursday, March 17, 2016 11:21 PM, u...@odoko.co.uk wrote: > > If you use additive boosting, when you add a boost to a search with one term, > (e.g. between 0 and 1) > you get a different effect compared to when you add the same boost to a > search with four terms (e.g. between 0 and 4).

Re: High Cpu sys usage

2016-03-19 Thread YouPeng Yang
Hi To Patrick: Never mind .Thank you for your suggestion all the same. To Otis. We do not use SPM. We monintor the JVM just use jstat becasue my system went well before ,so we do not need other tools. But SPM is really awesome . Still looking for help. Best Regards 2016-03-18 6:01

Explain score is different from score

2016-03-19 Thread G, Rajesh
Mismatch in score displayed in debug and score field. Please refer attached xml. When I search for title_ws:(Microsoft Ofice 365). If the results are displayed by explain score order then we would have the expected result “Microsoft Office 365” then “Lync - Microsoft Office 365” Lync -

RE: Explain score is different from score

2016-03-19 Thread Rick Sullivan
Yes it seems to be something similar, but the normalization isn't applied to all retrieved documents, which messes with the document rankings. Some documents have the exact values from the 'explain' response, while others are normalized. -Rick > Date:

Re: stop words as blacklist

2016-03-19 Thread Ahmet Arslan
Hi John, Do you want to skip that document in the indexing process? Or, you want to index that document, but you don't want to retrieve it if it is queried with stop words? There is a KeepWordFilterFactory to detect if a document contains a black-list word. To skip a certain document that

Re: No live SolrServers available to handle this request

2016-03-19 Thread michael solomon
What query do you try? On Thu, Mar 17, 2016 at 12:22 PM, Anil wrote: > HI, > > We are using solrcloud with zookeeper and each collection has 5 shareds and > 2 replicas. > we are seeing "org.apache.solr.client.solrj.SolrServerException: No live > SolrServers available to

RE: how to update billions of docs

2016-03-19 Thread Ken Krugler
As others noted, currently updating a field means deleting and inserting the entire document. Depending on how you use the field, you might be able to create another core/container with that one field (plus the key field), and use join support. Note that

Re: High Cpu sys usage

2016-03-19 Thread YouPeng Yang
Hi Shawn Actually,there are three Solr instances(The top three PIDs is the three instances),and the datafile size of the stuff is 851G,592G,49G respectively ,and more and more data will be added as time going.I think it may be rare as the large scope as my solrcloud service .and it is now one

how to update billions of docs

2016-03-19 Thread Mohsin Beg Beg
Hi, I have a requirement to replace a value of a field in 100B's of docs in 100's of cores. The field is multiValued=false docValues=true type=StrField stored=true indexed=true. Atomic Updates performance is on the order of 5K docs per sec per core in solr 5.3 (other fields are quite big).

Re: Shard splitting for immediate performance boost?

2016-03-19 Thread Robert Brown
Thanks Erick, I have another index with the same infrastructure setup, but only 10m docs, and never see these slow-downs, that's why my first instinct was to look at creating more shards. I'll definitely make a point of investigating further tho with all the things you and Shawn mentioned,

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Shawn Heisey
On 3/16/2016 7:51 PM, Jay Potharaju wrote: > Does using schema API mean that no upconfig to zookeeper and no reloading > of all the nodes in my solrcloud? In which scenario should I not use schema > API, if any? The documentation says that a reload occurs automatically after the schema

Solr 5.5.0 ClassNotFoundException solr.MockTokenizerFactory after DIH setup

2016-03-19 Thread Victor D'agostino
H guys I have a java.lang.ClassNotFoundException: solr.MockTokenizerFactory after a fresh 5.5.0 setup with DIH and a collection named "db". The tgz file is from http://apache.crihan.fr/dist/lucene/solr/5.5.0/solr-5.5.0.tgz Any idea why this class is missing at startup ? Should i download

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Scott Stults
You're not going to be able to look at field boosts by themselves to judge relevancy because it's very much a data-driven optimization problem. For example, if you only sell iPhone cases but no iPhones, a search for "black iphone" should show a bunch of black iPhone cases at the top of the

Re: using solr AnalyticsQuery API vs facet API

2016-03-19 Thread Joel Bernstein
https://issues.apache.org/jira/browse/SOLR-8492 shows an example of the AnalyticsQuery where the merge is being handled by the Streaming API. I actually think this is nicer then then using MergeStrategy. The Streaming API gives you full control over the merge from the shards. Joel Bernstein

Re: Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-19 Thread Pradeep Chandra
Hi Sir, I downloaded the file from http://search.maven.org/#artifactdetails%7Ccom.vividsolutions%7Cjts-core%7C1.14.0%7Cjar as you said in my previous post. Then I copied the .jar file into the server/lib directory...That is the thing only I did. At the first time I tried with small polygons.

Re: Explain score is different from score

2016-03-19 Thread Ahmet Arslan
Hi Rick and Rajesh, I wasn't able re-produce this neither with lucene nor solr. What version of solr is this? Are you using a sharded request? @BeforeClass public static void beforeClass() throws Exception { initCore("solrconfig.xml", "schema.xml"); assertU(adoc("id", "1722669", "title", "Lync

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Robert Brown
Thanks Scott and John, As luck would have it I've got a PhD graduate coming for an interview today, who just happened to do her research thesis on information retrieval with quantum theory and machine learning :) John, it sounds like you're describing my system! Shopping products from

Is there any JIRA changed the stored order of multivalued field?

2016-03-19 Thread forest_soup
We have a field named "attachmentnames": We do POST to add data to Solr v4.7 and Solr v5.3.2 respectively. The attachmentnames are in 789, 456, 123 sequence: { "add": { "overwrite": true, "doc": { "id":"1",

RE: Explain score is different from score

2016-03-19 Thread Rick Sullivan
Hi Rajesh, I've been seeing the same problem you have. My debug scores seem to be what I expect, but the actual scores applied by Solr are sometimes divided by an integer. I raised the same question in this email distribution about a week ago, but haven't yet found a solution. There's also a

Re: publish solr on galsshfish server

2016-03-19 Thread Upayavira
This is not recommended. It may work, and if it does, a future update to Solr may stop it working, without warning. Solr is to be considered its own app, to be run using its own embedded servlet container, as this allows the project to manage its own configuration and to test thoroughly that it

[ANNOUNCEMENT] Luke 5.5.0 released

2016-03-19 Thread Dmitry Kan
Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-5.5.0 Fixed in this release: #50 (Literally, the upgrade to Lucene 5.5.0) Enjoy! -- Dmitry Kan Luke

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard
On Friday, March 18, 2016 4:25 PM, wun...@wunderwood.org wrote: > > That works fine if you have a query that matches things with a wide range of > popularities. But that is the easy case. > > What about the query "twilight", which matches all the Twilight movies, all > of which are popular

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Erick Erickson
I think you're mixing up schema and config? The message about not hand-modifying is for schema.xml (well, managed-schema). To lock it down you need to modify solrconfig.xml... There shouldn't need to be any need to unload, just reload? And I just skipped the e-mail so maybe I'm way off base.

RE: Explain score is different from score

2016-03-19 Thread Rick Sullivan
I'm not. I only have query boosts. > Date: Fri, 18 Mar 2016 16:42:36 + > From: iori...@yahoo.com.INVALID > To: solr-user@lucene.apache.org > Subject: Re: Explain score is different from score > > Hi Rick, > > This could be a bug I think. Do you guys

Re: Ping handler in SolrCloud mode

2016-03-19 Thread Tom Evans
On Wed, Mar 16, 2016 at 4:10 PM, Shawn Heisey wrote: > On 3/16/2016 8:14 AM, Tom Evans wrote: >> The problem occurs when we attempt to query a node to see if products >> or items is active on that node. The balancer (haproxy) requests the >> ping handler for the appropriate

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
You still haven't explained what exactly you are trying to accomplish with that outer level AND/+/MUST. Please be specific - why you insist on "+((fl:java fl:book))" rather than "fl:java fl:book". -- Jack Krupansky On Fri, Mar 18, 2016 at 12:12 AM, Modassar Ather wrote:

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Alessandro Benedetti
In a relevancy problem I would repeat what my colleagues already pointed out : Data is key. We need to understand first of all our data before we can understand what is relevant and what is not. Once we specify a groundfloor which make sense ( and your basic approach + proper schema configuration

Re: Ping handler in SolrCloud mode

2016-03-19 Thread Shawn Heisey
On 3/16/2016 10:11 AM, Tom Evans wrote: > This worked, I would still be interested in a lighter-weight approach > that doesn't involve joins to see if a given collection has a shard on > this server. I suspect that might require a custom ping handler plugin > however. If you are doing joins, then

Re: High Cpu sys usage

2016-03-19 Thread Shawn Heisey
On 3/16/2016 8:27 PM, YouPeng Yang wrote: > Hi Shawn >Here is my top screenshot: > >https://www.dropbox.com/s/jaw10mkmipz943y/topscreen.jpg?dl=0 > >It is captured when my system is normal.And I have reduced the memory > size down to 48GB originating from 64GB. It looks like you have

Solrj , how to create collection

2016-03-19 Thread Iana Bondarska
Hi, Could you please tell me, is it possible to create new collection on solr server only using solrj,without manual creation of core folder on server. I'm using solrj v.5.5.0,standalone client. Thanks, Iana

Solr 4.10 Suggestor

2016-03-19 Thread Matt Kuiper
All, Using the Suggestor component and running Solr 4.10. I have read that on Solr startup (or commit, depending on config) the building of the Suggestor can be CPU intensive and take some time. Does anyone know how to determine that the Suggestor has completed it's build? Something to look

Re: indexing Free-form text description

2016-03-19 Thread Alexandre Rafalovitch
Well, Solr ships with nearly 10 examples. So, if you go through them, you will know quite a lot. This article (mine) may help you to navigate them: http://blog.outerthoughts.com/2015/11/oh-solr-home-where-art-thou/ More specifically, as Erick said, your question is too generic. One step forward

Shard splitting for immediate performance boost?

2016-03-19 Thread Robert Brown
Hi, I have an index of 60m docs split across 2 shards (each with a replica). When load testing queries (picking random keywords I know exist), and randomly requesting facets too, 95% of my responses are under 0.5s. However, during some random manual tests, sometimes I see searches taking

RE: RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
I am wondering whether this might be the bug of SOLR-8326, which is fixed in Solr 5.4 That's my guess as a user who ran into the bug myself. -Original Message- From: Kelly, Frank [mailto:frank.ke...@here.com] Sent: Wednesday, March 16, 2016 3:09 PM To: solr-user@lucene.apache.org

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-19 Thread Zheng Lin Edwin Yeo
I found that in WordDelimiterFilterFactory, there is a parameter called splitOnNumerics, which does the same function as what HMMChineseTokenizer did. - *splitOnNumerics="1"* causes alphabet => number transitions to generate a new part [Solr 1.3]: - "j2se" => "j" "2" "se"

Re: Why is multiplicative boost prefered over additive?

2016-03-19 Thread Walter Underwood
Think about using popularity as a boost. If one movie has a million rentals and one has a hundred rentals, there is no additive formula that balances that with text relevance. Even with log(popularity), it doesn’t work. With multiplicative boost, we only care about the difference between the

Error starting solr 5.5 - Cannot open solr.log:No such file or directory

2016-03-19 Thread Shamik Bandopadhyay
Hi, I'm trying to upgrade from Solr 5.0 to 5.5. I'm getting the following error: tail: cannot open `/mnt/ebs2/solrhome/logs/solr.log' for reading: No such file or directory I'm running on CentOS 6.7. The same startup script has been working fine for 5.0 till now. I'm executing as user "solr".

Re: High Cpu sys usage

2016-03-19 Thread YouPeng Yang
Hi Shawn Here is my top screenshot: https://www.dropbox.com/s/jaw10mkmipz943y/topscreen.jpg?dl=0 It is captured when my system is normal.And I have reduced the memory size down to 48GB originating from 64GB. We have two hardware clusters ,each is comprised of 3 machines,and On one

[nested] how to specify a path for multiple nesting?

2016-03-19 Thread Alisa Z .
Hi all, I have a deeply multi-level data structure (up to 6-7 levels deep) where due to the nature of the data some nested documents can have same type names at various levels. How to form a proper query on a nested field that would contain "a path"  that defines that field? I'll clarify

Re: Explain score is different from score

2016-03-19 Thread Ahmet Arslan
Hi Rick, This could be a bug I think. Do you guys use index time boosts? Ahmet On Friday, March 18, 2016 6:15 PM, Rick Sullivan wrote: Yes it seems to be something similar, but the normalization isn't applied to all retrieved documents, which messes with the document

Re: No live SolrServers available to handle this request

2016-03-19 Thread Shawn Heisey
On 3/18/2016 9:55 PM, Anil wrote: > Thanks for your response. > CDH is a Cloudera (third party) distribution. is there any to get the > notifications copy of it when cluster state changed ? in logs ? > > I can assume that the exception is result of no availability of replicas > only. Agree? Yes,

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Alessandro Benedetti
Actually if you are able to collect past ( or future signals) like clicks or purchase, i would rather focus on the features of your products rather than the products themselves. What will happen is that you are going to be able rank in a better way products based on how their feature should affect

Re: Solr:Skip document from indexing when it matches specific value

2016-03-19 Thread Shawn Heisey
On 3/16/2016 5:36 AM, solr2020 wrote: > How we can ignore a document from indexing into solr when a field matches > particular value. > Eg. we would like to ignore a document from indexing when document's field > path matches value "/content". Do we have any OOTB processors to accomplish > this in

Re: indexing Free-form text description

2016-03-19 Thread Vis Sw
Thanks a lot Erick and Alex... I am going through the documents and blogs... thanks for the pointers. Here is what I tried starting with "text_general"... a) Looks like it breaks on whitespace for e.g. for project_collaborator values as "myproject122_USC Dan Forrester ", "myproject123_USC

  1   2   >