using solr AnalyticsQuery API vs facet API

2016-03-18 Thread sudsport s
Hi , I am planning to write custom aggregator in solr which will use some probabilistic data structures per shard to accumate results and then after shard merging results will be sent to user as integer. I explored 2 options to do this 1. Solr analytics API

Re: Actual (specific) RT Search?

2016-03-18 Thread Erick Erickson
bq: My guess so far is that the filter has to fetch the unique key for all documents in results, which consumes a lot of resources. Guessing here and going from memory, but... If you have some code like reader.get(doc).get("id") it'll totally barf. Problem here is that to get the id field, it has

Re: Solr 5.5.0 ClassNotFoundException solr.MockTokenizerFactory after DIH setup

2016-03-18 Thread Shawn Heisey
On 3/17/2016 10:39 AM, Victor D'agostino wrote: > I have a java.lang.ClassNotFoundException: solr.MockTokenizerFactory > after a fresh 5.5.0 setup with DIH and a collection named "db". > > The tgz file is from > http://apache.crihan.fr/dist/lucene/solr/5.5.0/solr-5.5.0.tgz > > Any idea why this

Re: Solr Wiki - Request to add to contributors group

2016-03-18 Thread Alessandro Benedetti
Shawn, thank you very much ! So, I didn't have an account in the old wiki, can you add me as contributor ? Just created. I will then proceed adding the classification documentation. AlessandroBenedetti benedetti.ale...@gmail.com Cheers On Wed, Mar 16, 2016 at 1:01 AM, Shawn Heisey

Re: indexing pdf files using post tool

2016-03-18 Thread Binoy Dalal
Like Francisco said, use a custom update processor to map the fields the way you want and add it to your update chain. On Wed, 16 Mar 2016, 18:16 Francisco Andrés Fernández, wrote: > Vidya, I don't know if I'm understanding it very well but, I think that the > best way is to

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-18 Thread Shawn Heisey
On 3/16/2016 4:33 AM, Zheng Lin Edwin Yeo wrote: > I found that HMMChineseTokenizer will split a string that consist of > numbers and characters (alphanumeric). For example, if I have a code that > looks like "1a2b3c4d", it will be split to 1 | a | 2 | b | 3 | c | 4 | d > This has caused the

Re: Solr5 Optimize

2016-03-18 Thread Erick Erickson
First of all, "optimize-like" does _not_ happen "every time a commit happens". What _does_ happen is the current state of the index is examined and if certain conditions are met _then_ segment merges happen. Think of these as "partial optimizes". This is under control of the TieredMergePolicy by

Re: Making managed schema unmutable correctly?

2016-03-18 Thread Erick Erickson
Well, if using managed schema in SolrCloud, all the updates to the nodes is automatic so it's easier from that perspective. To me, the sweet spot for managed schema is that it lends itself to some kind of front end that allows you to deal with the schema visually, one can envision widgets,

RE: Why is multiplicative boost prefered over additive?

2016-03-18 Thread jimi.hullegard
On Thursday, March 17, 2016 7:58 PM, wun...@wunderwood.org wrote: > > Think about using popularity as a boost. If one movie has a million rentals > and one has a hundred rentals, there is no additive formula that balances > that with text relevance. Even with log(popularity), it doesn't work.

Re: Solr 5.5 error at startup - ClassNotFoundException: org.simpleframework.xml.core.Persister

2016-03-18 Thread Shawn Heisey
On 3/17/2016 2:32 PM, Shamik Bandopadhyay wrote: > [2016-03-17 20:23:34,760]ERROR > 9350[coreLoadExecutor-7-thread-1-processing-n:54.176.219.134:8983_solr] - > org.apache.solr.core.CoreContainer.create(CoreContainer.java:827) - Error > creating core [knowledge]:

Re: Why is multiplicative boost prefered over additive?

2016-03-18 Thread Walter Underwood
That works fine if you have a query that matches things with a wide range of popularities. But that is the easy case. What about the query “twilight”, which matches all the Twilight movies, all of which are popular (millions of views). Or “Lord of the Rings” which only matches movies with

Re: Why is multiplicative boost prefered over additive?

2016-03-18 Thread Walter Underwood
Popularity has a very wide range. Try my example, scale 1 million and 100 into the same 1.0-0.0 range. Even with log popularity. As another poster pointed out, text relevance scores also have a wide range. In practice, I never could get additive boost to work right at Netflix at both ends of

Re: Boosts for relevancy (shopping products)

2016-03-18 Thread Nick Vasilyev
Tie does quite a bit, without it only the highest weighted field that has the term will be included in relevance score. Tie let's you include the other fields that match as well. On Mar 18, 2016 10:40 AM, "Robert Brown" wrote: > Thanks for the added input. > > I'll

Re: Solr:Skip document from indexing when it matches specific value

2016-03-18 Thread Jan Høydahl
Hi No OOTB as I know, but it would be 3 lines to create a custom one, which simply aborts the chain instead of calling super.processAdd(command) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 16. mar. 2016 kl. 12.36 skrev solr2020 : > > Hi, >

FW: SolrCloud App Unit Testing

2016-03-18 Thread Madhire, Naveen
Hi, I am writing a Solr Application, can anyone please let me know how to Unit test the application? I see we have MiniSolrCloudCluster class available in Solr, but I am confused about how to use that for Unit testing. How should I create a embedded server for unit testing? Thanks, Naveen

RE: Making managed schema unmutable correctly?

2016-03-18 Thread Davis, Daniel (NIH/NLM) [C]
Thanks for saying. I thought as soon as I sent it that my motivation might just be to brag that I know something that long-time Solr folks like you might not. I actually know so very little, not just about how Lucene works, but how to make Solr solve concrete problems beyond the simple. I

Re: indexing pdf files using post tool

2016-03-18 Thread Jan Høydahl
Hi You can look at the Apache Tika project or the PDFBox project to parse your files before sending to Solr. Alternatively, if your processing is very simple, you can use the built-in Tika as U just did, and then deploy some UpdateRequestProcessor’s in order to modify the Tika output into

Re: Document Cache

2016-03-18 Thread Rallavagu
So, each soft commit would create a new searcher that would invalidate the old cache? Here is the configuration for Document Cache autowarmCount="0"/> true Thanks On 3/18/16 12:45 AM, Emir Arnautovic wrote: Hi, Your cache will be cleared on soft commits - every two minutes. It seems that

Re: Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration

2016-03-18 Thread Paul Hoffman
On Tue, Mar 15, 2016 at 07:58:21PM -0600, Shawn Heisey wrote: > On 3/15/2016 2:56 PM, Paul Hoffman wrote: > >> It sure looks like I started Solr from my blacklight project dir. > >> > >> Any ideas? Thanks, > >> > > You may need to get some help from the blacklight project. I've got > absolutely

BYOPW in security.json

2016-03-18 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
When using security.json (in Solr 5.4.1 for instance), is there a recommended method to allow users to change their own passwords? We certainly would not want to grant blanket security-edit to all users; but requiring users to divulge their intended passwords (in Email or by other means) to the

Re: No live SolrServers available to handle this request

2016-03-18 Thread Anil
HI Shawn, Thanks for your response. CDH is a Cloudera (third party) distribution. is there any to get the notifications copy of it when cluster state changed ? in logs ? I can assume that the exception is result of no availability of replicas only. Agree? Regards, Anil On 18 March 2016 at

stop words as blacklist

2016-03-18 Thread John Blythe
hey all, is there any out of the box way to use your stop words to completely skip a document? if something has X in its description when being indexed i just want to ignore it altogether / when something is searched with X then go ahead and automatically return 0 results. quick context: using

Re: [nested] how to specify a path for multiple nesting?

2016-03-18 Thread Mikhail Khludnev
Hello, Please find inline On Wed, Mar 16, 2016 at 10:10 PM, Alisa Z. wrote: > Hi all, > I have a deeply multi-level data structure (up to 6-7 levels deep) where > due to the nature of the data some nested documents can have same type > names at various levels. How to form a

RE: Explain score is different from score

2016-03-18 Thread G, Rajesh
Can someone help? Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.. This e-mail and/or its attachments are intended only for the use of the

Re: Making managed schema unmutable correctly?

2016-03-18 Thread Jay Potharaju
Does using schema API mean that no upconfig to zookeeper and no reloading of all the nodes in my solrcloud? In which scenario should I not use schema API, if any? Thanks Jay On Wed, Mar 16, 2016 at 6:22 PM, Shawn Heisey wrote: > On 3/16/2016 1:14 AM, Alexandre Rafalovitch

No live SolrServers available to handle this request

2016-03-18 Thread Anil
HI, We are using solrcloud with zookeeper and each collection has 5 shareds and 2 replicas. we are seeing "org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request". i dont see any issues with replicas. what would be root cause of the exception ?

Re: Solr 4.10 Suggestor

2016-03-18 Thread Alessandro Benedetti
Hi Matt, when you say : " soon looking to move to a different approach (ngrams) : do you mean creating a specific core, with a specific analysis for the fields of interest ? Upgrading Solr is not an option in your condition ? Cheers On Wed, Mar 16, 2016 at 10:05 PM, Matt Kuiper