Re: How can I get a monotonically increasing field value for docs?

2015-09-30 Thread Chris Hostetter
: Small potato: I assume cursor mark breaks when the number of shards changes : while keeping the original values doesn't, since the relative position is : encoded per shard...But that's an edge case. I don't understand your question ... the encoded cursorMark values don't know about thing

Re: Find records with no values in solr.LatLongType fied type

2015-09-30 Thread Ishan Chattopadhyaya
There's also a function, exists(), which might work here, and result in a neater query. e.g. something like: q=*:* -exists(usrlatlong_0_coordinate) Haven't tried it, though. https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-AvailableFunctions On Wed, Sep 30, 2015

Re: Passing Basic Auth info to HttpSolrClient

2015-09-30 Thread Ishan Chattopadhyaya
In latest Solr release, you can use the basic auth plugins for authentication instead of doing something at the Jetty level. https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin Right at the end, there's a note on how to use SolrJ with this. Also, there exists:

Re: MoreLikeThisHandler with mltipli input documents

2015-09-30 Thread Szűcs Roland
Hi Alessandro, Exactly. The response time varies but let's have a concrete other example. This is my call: http://localhost:8983/solr/bandwpl/mlt?q=id:10812=id This is my result: { "responseHeader":{ "status":0, "QTime":6232}, "response":{"numFound":4564,"start":0,"docs":[ {

real tf-idf numbers for all the terms

2015-09-30 Thread Roland Szűcs
Hi all, Is there any out of the box way to get the tf-idf values for all the terms? The termVectorComponent is not good for this. Even if I set tv.tf_idf, tv.tf, tv.df to true and I get the tf and df values, the tf-idf calculation is a pure division. Where is the log transformation of the inverse

Re: MoreLikeThisHandler with mltipli input documents

2015-09-30 Thread Szűcs Roland
Hi Alessandro, You are right. I forget to mention one important factor. For 3000 hungarian e-books the approach you mentioned is absolutely fine as the response time is some 0.7 sec. But when I use the same mlt for 5600 polish e-books the response time is 7 sec which is definetely not acceptable

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

2015-09-30 Thread Charlie Hull
On 30/09/2015 04:09, Zheng Lin Edwin Yeo wrote: Hi Charlie, Hi, I've checked that Paoding's code is written for Solr 3 and Solr 4 versions. It is not written for Solr 5, thus I was unable to use it in my Solr 5.x version. I'm pretty sure we had to recompile it for v4.6 as wellit has

Re: Solr 4.8 - Updating zkhost list in solr.xml without requiring a restart

2015-09-30 Thread Upayavira
Why don't you create DNS names, or such, so that you can replace a zookeeper instance at the same hostname:port rather than having to edit solr.xml across your whole Solr farm? The idea is that your list of zookeeper hostnames is a virtual one, not a real one. Upayavira On Wed, Sep 30, 2015, at

Re: MoreLikeThisHandler with mltipli input documents

2015-09-30 Thread Alessandro Benedetti
I am still missing why you quote the number of the documents... If you have 5600 polish books, but you use the MLT only when you land in the page of a specific book ... I think i still miss the point ! MLT on 1 polish book, takes 7 secs ? 2015-09-30 9:10 GMT+01:00 Szűcs Roland

Re: MoreLikeThisHandler with mltipli input documents

2015-09-30 Thread Szűcs Roland
Hello Upayavira, We use the ajax call and it can work when it takes only some seconds (even the 7 sec can be acceptable in this case) as the customers first focus on the product page and if they are not satisfied with the e-book they will need the offer. I am just started to scare what will

Re: Solr 4.8 - Updating zkhost list in solr.xml without requiring a restart

2015-09-30 Thread Shawn Heisey
On 9/29/2015 9:40 PM, pramodmm wrote: > In the meantime, please help me validate what we are doing is right. > Currently, our zookeeper instances are running on vmware machines and when > one of them dies and we get a new machine as a replacement - we install > zookeeper and make it a part of the

Re: MoreLikeThisHandler with mltipli input documents

2015-09-30 Thread Upayavira
Could you do the MLT as a separate (AJAX) request? They appear a little afterwards, whilst the user is already reading the page? Or, you could do offline clustering, in which case, overnight, you compare every document with every other, using a (likely non-solr) clustering algorithm, and store

Re: solrcloud not displaying store fields

2015-09-30 Thread Chris Hostetter
the results you've posted make no sense to me unless the documents are out of sync between multiple replicas of whatever shard is hosting the doc that you get in the first result -- both of your queries, even though you are sending them to a specific replica, are general requests so they are

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

2015-09-30 Thread Charlie Hull
On 30/09/2015 10:13, Zheng Lin Edwin Yeo wrote: Hi Charlie, Hi Edwin, Thanks for your reply. Seems like quite a number of the chinese tokenizers are not really compatible with the newer versions of Solr I'm also looking at HMMChineseTokenizer and JiebaTokenizer to see if they are suitable

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

2015-09-30 Thread Zheng Lin Edwin Yeo
Hi Charlie, Thanks for your reply. Seems like quite a number of the chinese tokenizers are not really compatible with the newer versions of Solr I'm also looking at HMMChineseTokenizer and JiebaTokenizer to see if they are suitable to be used for Solr 5.x too. Regards, Edwin On 30 September

[poll] virtualization platform for SOLR

2015-09-30 Thread Bernd Fehling
Dear solr users, while setting up some new servers (virtual machines) using XEN I was thinking about an alternative like KVM. My last tests with KVM is a while ago and XEN performed much better in the area of I/O and CPU usage. This lead me to the idea to start a poll about virtualization

Re: MoreLikeThisHandler with mltipli input documents

2015-09-30 Thread Alessandro Benedetti
This query time is still suspicious ... Have you tried to play with MLT params ? Min term frequency ? Min Doc Freq ? You can reduce the terms to query, Parameter Description mlt.qf Query fields and their boosts using the same format as that used by the DisMaxRequestHandler. These fields must

Advice for configuring solr 3.5.1 on Cent OS

2015-09-30 Thread Porky Pig
Hello. I managed to compile Solr 3.5.1 from source with the ant compiler. I am able to start solr but not much else. It appears that it can't find its java libraries. Also the solr-webapp subpath doesn't contain anything while other similar path does. I'm attaching two log files which I believe

Re: What kind of nutch documents does Solr index?

2015-09-30 Thread Daniel Holmes
Thank you Upayavira for your anser. In the case I described maxDoc is 19263. As I check the Nutch, default indexing filter in Nutch is basic indexing filter and also it have a property to delete gone and permanently redirected pages which it value was false for me. I think the problem is still

Re: What kind of nutch documents does Solr index?

2015-09-30 Thread NutchDev
What Nutch does is, after fetching document from server they are passed to parser to parse and parser detects the document type and accordingly do the parsing. One possibility could be parser had failed to parse some documents. and that's why you are getting count mismatch. -- View this

Re: Keyword match distance rule issue

2015-09-30 Thread anil.vadhavane
Hi Benedetti, Yes, at first it looks like a user error and I am surprised as well with the case. We tested this on two different system. We tried it with lower case input but it is not matching. We are using the standard title column to store the data. Even we tried with 3, 4 and 5 edit distance

Re: entity processing order during updates

2015-09-30 Thread Alexandre Rafalovitch
Have you tried just having two separate endpoints each with its own definition of DIH and URP? Then, you just hit those end-points one at a time in whatever order you need. Seems easier than a custom switching logic. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a

Join with faceting and filtering

2015-09-30 Thread Troy Edwards
I am working with the following indices *Item* ItemId - string Description - text (query on this) Categories - Multivalued text (query on this) Sellers - Multivalued text (query on this) SellersString - Multivalued string (Need to facet and filter on this) *ContractItem* ContractItemId -

Re: Regression tests and evaluate quality of results

2015-09-30 Thread Doug Turnbull
Sounds exactly like our tool Quepid http://quepid.com :) which is our test driven search toolbox. Whether or not Quepid is the right fit for your application, we advocate for a style of work called Test-Driven Relevancy.

Re: Find records with no values in solr.LatLongType fied type

2015-09-30 Thread Kamal Kishore Aggarwal
Thanks Erick..it worked.. On Wed, Sep 16, 2015 at 9:21 PM, Erick Erickson wrote: > Top level queries need a *:* in front, something like > q=*:* -usrlatlong_0_coordinate:[* TO *] > > I just took a quick check and just using usrlatlong:[* TO *] > encounters a parse

Re: entity processing order during updates

2015-09-30 Thread Roxana Danger
Do you mean creating 2 instances and then generating a third one (or updating one of them) for merging their data? Is it not guaranteed that the entities in the DIH are imported in the order described in the db-config file? Thank you very much, Roxana On 30 September 2015 at 14:48, Alexandre

Re: Keyword match distance rule issue

2015-09-30 Thread Jack Krupansky
This feature is known as fuzzy query, not keyword match. Unfortunately, the edit distance limit is limited to 2. 3 or more are not supported. Lucene itself still has the old "slow" fuzzy query that supports larger edit distances, but Solr has no syntax for selecting it. Actually, this limit of 2

Regression tests and evaluate quality of results

2015-09-30 Thread marotosg
Hi, I have some doubts about how to define a process to evaluate the quality of search results. I have a solr collection with 4M documents with information about people. I search across several fields like first name ,second name, email, address, phone etc. There is plenty of logic in the

RE: Passing Basic Auth info to HttpSolrClient

2015-09-30 Thread Davis, Daniel (NIH/NLM) [C]
HttpSolrClient can accept the Apache Commons HttpClient in its constructor: https://lucene.apache.org/solr/5_3_1/solr-solrj/org/apache/solr/client/solrj/impl/HttpSolrClient.html You can use the HttpClientBuilder

Re: Regression tests and evaluate quality of results

2015-09-30 Thread Ahmet Arslan
Hi, Testing quality requires "right answers" (query relevance judgments), which is expensive to create. Once you have qrels, you can evaluate effectiveness of your system with metrics (MAP, ERR@20, NDCG@20, etc) Here is a presentation you might find relevant.

Re: Regression tests and evaluate quality of results

2015-09-30 Thread Toke Eskildsen
On Wed, 2015-09-30 at 06:58 -0700, marotosg wrote: > b) Based on full data. I would like to run queries and see if the results > are good enough. That's the part I am not sure if makes sense or how to do > it. Seems like an exact match for http://quepid.com/ (I am not affiliated) - Toke

Re: Cloud Deployment Strategy... In the Cloud

2015-09-30 Thread Steve Davids
Our project built a custom "admin" webapp that we use for various O activities so I went ahead and added the ability to upload a Zip distribution which then uses SolrJ to forward the extracted contents to ZK, this package is built and uploaded via a Gradle build task which makes life easy on us by

Re: Solr 4.8 - Updating zkhost list in solr.xml without requiring a restart

2015-09-30 Thread pramodEbay
> The idea is that your list of zookeeper hostnames is a virtual one, not > a real one. Thanks for the suggestion. Looks like I am not alone in thinking along the same lines. I am planning on doing that and was not sure if anyone else tried this approach and validated that it worked. --

MongoDB to Solr connector - anyone done it?

2015-09-30 Thread Gili Nachum
Hi, Looking to learn from experience of others, what works best? Looking for a production grade solution to efficiently push data of a multi-sharded Mongo to a multi-sharded Solr in a continues manner and in a one off fashion. Not having to write any code would be a nice bonus. What I found so

Re: Keyword match distance rule issue

2015-09-30 Thread anil.vadhavane
Hi Jack, Thanks for a quick reply. I understood your point regarding the edit distances related restriction in Solr. Yes, the query string does not contain actual quotes. The query should match with 2 edit distance. As I mentioned, if we try "Bridffwater~2", Solr matching it. We haven't noticed

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

2015-09-30 Thread Zheng Lin Edwin Yeo
Hi Charlie, Yes sure, I'm now finalising my testing with all the different tokenizer, and trying to understand how each of the tokenizer actually works. Hopefully will be able to share something useful about my experience once I'm done with it. Regards, Edwin On 30 September 2015 at 17:25,

Re: Advice for configuring solr 3.5.1 on Cent OS

2015-09-30 Thread Shawn Heisey
On 9/30/2015 4:34 AM, Porky Pig wrote: > Hello. > > I managed to compile Solr 3.5.1 from source with the ant compiler. > > I am able to start solr but not much else. > It appears that it can't find its java libraries. Also the solr-webapp > subpath doesn't contain anything while other similar

Re: entity processing order during updates

2015-09-30 Thread Alexandre Rafalovitch
Hmm. It seems I misread " the second processor needs to be executed after complete the first one." In fact, I am still unsure what that is supposed to mean. Could you give a more concrete example of the sequence with say 2 items of each time and what you see vs. what you expect to see. And I

Re: entity processing order during updates

2015-09-30 Thread Roxana Danger
Of course, thank you! Hopefully, it will be more clear now. I have: - in db-config: ... - in config:

Re: Keyword match distance rule issue

2015-09-30 Thread Alessandro Benedetti
Hi, Solr does not support more than 2 as an edit distance ! You need to customise this at code level if you want to. If in the index we have : bridwater Bridgewater (3) Bridffwater (3) This is really weird, but please , can you tell me what exactly have indexed for that field ? Can you check

Re: [poll] virtualization platform for SOLR

2015-09-30 Thread Shawn Heisey
On 9/30/2015 3:12 AM, Bernd Fehling wrote: > while setting up some new servers (virtual machines) using XEN I was > thinking about an alternative like KVM. My last tests with KVM is > a while ago and XEN performed much better in the area of I/O and > CPU usage. > This lead me to the idea to start

Way to determine (via analyzer) what fields/types will be created for a given field name?

2015-09-30 Thread Bill Dueber
Let’s say I have [I ​started thinking this sort of thing through a while back ] If I index a field named lastname_st, I end up with: - field lastname_t of type text - field lastname of