Re: search design question

2016-04-05 Thread Midas A
thanks Binoy for replying , i am giving you few use cases a) shoes in nike or nike shoes Here "nike " is brand and in this case my query entity is shoe and entity type is brand and my result should only pink nike shoes b) " 32 inch LCD TV sony " 32 inch is size , LCD is entity type

Re: search design question

2016-04-05 Thread Binoy Dalal
Could you describe your problem in more detail with examples of your use cases. On Wed, 6 Apr 2016, 11:03 Midas A, wrote: > i have to do entity and entity type mapping with help of search query > while building solr query. > > how i should i design with the solr for

search design question

2016-04-05 Thread Midas A
i have to do entity and entity type mapping with help of search query while building solr query. how i should i design with the solr for search. Please guide me .

Re: Multiple data-config.xml in one collection?

2016-04-05 Thread Yangrui Guo
Thanks man. I'd love to learn more about the Talend OpenStudio project you're working on. Is it based on Lucene/Solr or a different project? On Tuesday, April 5, 2016, Davis, Daniel (NIH/NLM) [C] wrote: > Yangrui, > > Let me clarify - to have multiple data imports run

Re: Update Speed: QTime 1,000 - 5,000

2016-04-05 Thread Erick Erickson
bq: Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000 QTimes for what? The update? Queries? If for queries, autowarming may help, especially as your soft commit is throwing away all the top-level caches (i.e. the ones configured in solrconfig.xml) every minute. It

MLT Query Parser

2016-04-05 Thread Shamik Bandopadhyay
Hi, I'm trying to use the new MLT query parser in a SolrCloud mode. As per the documentation, here's the syntax, {!mlt qf=name}1 where "1" is the id. What I'm trying to undertsand is whether "id" is a mandatory field in making this work? Right now,I'm getting mlt documents based on a

Re: Multiple data-config.xml in one collection?

2016-04-05 Thread Shawn Heisey
On 4/5/2016 1:16 PM, Yangrui Guo wrote: > So if I implement multiple dataimporthandler and do a full import, does > Solr perform import of all handlers at once or can just specify which > handler to import? Thank you Each handler has a name, which starts with a forward slash. Normally it's named

Re: Update Speed: QTime 1,000 - 5,000

2016-04-05 Thread John Bickerstaff
A few thoughts... >From a black-box testing perspective, you might try changing that softCommit time frame to something longer and see if it makes a difference. The size of your documents will make a difference too - so the comparison to 300 - 500 on other cloud setups may or may not be

RE: Multiple data-config.xml in one collection?

2016-04-05 Thread Davis, Daniel (NIH/NLM) [C]
Yangrui, Let me clarify - to have multiple data imports run concurrently, my impression is that you must have different requestHandlers declared in your solrconfig.xml By default, Data Import Handler is not multi-threaded; having multiple requestHandlers for it is a workaround to this, not a

Re: Sort order for *:* query

2016-04-05 Thread Steven White
This is all good stuff. Thank you all for your insight. Steve On Mon, Apr 4, 2016 at 6:15 PM, Yonik Seeley wrote: > On Mon, Apr 4, 2016 at 6:06 PM, Chris Hostetter > wrote: > > : > > : Not sure I understand... _version_ is time based and hence

RE: Multiple data-config.xml in one collection?

2016-04-05 Thread Davis, Daniel (NIH/NLM) [C]
Yangrui, Solr will just do one data import.You can have a script invoke more than one, and they will run concurrently. There are some risks with that, depending on what you are doing. If it's just pulling from a database, I think you are all right. I've even had 4 run concurrently to

Re: CompositId router

2016-04-05 Thread John Bickerstaff
In terms of #2, this might be of use... https://wiki.apache.org/solr/HowToReindex On Tue, Apr 5, 2016 at 3:08 PM, Anuj Lal wrote: > I am new to solr. Need some advice from more experienced solr team > members > > I am upgrading 4.4 solr cluster to 5.5 > > > One of the

CompositId router

2016-04-05 Thread Anuj Lal
I am new to solr. Need some advice from more experienced solr team members I am upgrading 4.4 solr cluster to 5.5 One of the step I am doing for upgrade is to bootstrap from existing 4.4 solr home ( after upgrading solr installation to 5.5) All of the nodes comes up correctly and I can

Update Speed: QTime 1,000 - 5,000

2016-04-05 Thread Robert Brown
Hi, I'm currently posting updates via cURL, in batches of 1,000 docs in JSON files. My setup consists of 2 shards, 1 replica each, 50m docs in total. These updates are hitting a node at random, from a server across the Internet. Apart from the obvious delay, I'm also seeing QTime's of

Re: Solr 5.5 Security feature is not working on it.

2016-04-05 Thread Anshum Gupta
Hi Vijay, Can you provide more information about what you were trying to do and why do you think this isn't working? The more details you can provide, the better. * What's your SolrCloud setup * How did you enable security * What do you expect ? * What do you see ? On Tue, Apr 5, 2016 at 1:02

Solr 5.5 Security feature is not working on it.

2016-04-05 Thread Vijayakumar Ramdoss
Hi All,We are recently start leveraging the Solr 5.5 version in the Cloud mode. Even enabling the security in the SolrCloud. Its not working looking your advice to debug the issue. cat security.json{"authentication":{ "class":"solr.BasicAuthPlugin", "blockUnknown": true,

Re: Multiple data-config.xml in one collection?

2016-04-05 Thread Yangrui Guo
Hi Daniel, So if I implement multiple dataimporthandler and do a full import, does Solr perform import of all handlers at once or can just specify which handler to import? Thank you Yangrui On Tuesday, April 5, 2016, Davis, Daniel (NIH/NLM) [C] wrote: > If Shawn is

Re: How to Get info about clusterstate in solr 5.2.1 just like ping request handler with distrib=true

2016-04-05 Thread John Bickerstaff
>From some docs I'm working on - this command (against one solr box) got me the entire cluster's state... Don't know if it'll work for you, but just in case... There may be an api command that is similar - not sure. I'm mostly operating on the command line right now. (statdx is the name of my

Re: Multiple data-config.xml in one collection?

2016-04-05 Thread John Bickerstaff
My own choices were driven mostly by the usage of the data - from a more architectural perspective. I have "appDocuments" and "appImages" for one of the applications I'm supporting. Because they are so closely connected (an appDocuments can have N number of appImages and appImages can belong to

RE: Can't get phrase field boosting to work using edismax

2016-04-05 Thread jimi.hullegard
Some more input, before I call it a day. Just for the heck of it, I tried changing minClauseSize to 0 using the Eclipse debugger, so that it didn't return null at line 1203, but instead returned the TermQuery on line 1205. Then everything worked exactly as it should. The matching document got

RE: Can't get phrase field boosting to work using edismax

2016-04-05 Thread jimi.hullegard
I now used the Eclipse debugger, to try and see if I can understand what is happening, I it seems like the ExtendedDismaxQParser simply ignores my pf parameter, since it doesn't interpret it as a phrase query.

Re: SolrCloud backup/restore

2016-04-05 Thread Zisis Tachtsidis
Thank you both for the clarification and proposals! This solrcloud_manager looks very promising. I'll try it out, the shared filesystem requirement is no issue for me. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-backup-restore-tp4267954p4268197.html Sent

RE: Can't get phrase field boosting to work using edismax

2016-04-05 Thread jimi.hullegard
OK. Interesting. But... I added a solr.TrimFilterFactory at the end of my analyzer definition. Shouldn't that take care of the added space at the end? The admin analysis page indicates that it works as it should, but I still can't get edismax to boost. -Original Message- From: Jack

RE: Multiple data-config.xml in one collection?

2016-04-05 Thread Davis, Daniel (NIH/NLM) [C]
You have choices: - Use a separate collection for each data import - Use the same collection for each data import, differentiating them using a field you can query The choice depends on the objects and how they will be use, and I trust others on this list to have better advise on how to

Contrib module for Document Clustering

2016-04-05 Thread davidphilip cherian
Hi, Is there any contribution(open source contrib module) that routes documents to shards based on document similarity technique? Or any suggestions that integrates mahout to solr for this use case? >From what I know, currently there are two document route strategies as explained here

Re: Multiple data-config.xml in one collection?

2016-04-05 Thread Yangrui Guo
Hi thanks for the answer. Yes I will be using DIH to import data from different database connections. Do I have to create a collection for each connection? On Tuesday, April 5, 2016, Shawn Heisey wrote: > On 4/5/2016 8:12 AM, Yangrui Guo wrote: > > I'm using Solr Cloud to

Re: SolrCloud backup/restore

2016-04-05 Thread Jeff Wartes
There is some automation around this process in the backup commands here: https://github.com/whitepages/solrcloud_manager It’s been tested with 5.4, and will restore arbitrary replication factors. Ever assuming the shared filesystem for backups, of course. On 4/5/16, 3:18 AM, "Reth RM"

Re: SolrCloud no leader for collection

2016-04-05 Thread Jeff Wartes
I recall I had some luck fixing a leader-less shard (after a ZK quorum failure) by forcably removing the records for the down-state replicas from the leader election list, and then forcing an election. The ZK path looks like collections//leader_elect/shardX/election. Usually you’ll find the

RE: Multiple data-config.xml in one collection?

2016-04-05 Thread Davis, Daniel (NIH/NLM) [C]
If Shawn is correct, and you are using DIH, then I have done this by implementing multiple requestHandlers each of them using Data Import Handler, and have each specify a different XML file for the data config. Instead of using data-config.xml, I've used a large number of files such as:

Re: Multiple data-config.xml in one collection?

2016-04-05 Thread Shawn Heisey
On 4/5/2016 8:12 AM, Yangrui Guo wrote: > I'm using Solr Cloud to index a number of databases. The problem is there > is unknown number of databases and each database has its own configuration. > If I create a single collection for every database the query would > eventually become insanely long.

Re: Can't get phrase field boosting to work using edismax

2016-04-05 Thread Jack Krupansky
It looks like the code constructing the boost phrase for pf will always add a trailing blank, which is never a problem when a normal tokenizer is used that removes white space, but the keyword tokenizer will preserve that extra space, which prevents an exact match. See line 531:

RE: using tokens to influence boost and score rather than filtering

2016-04-05 Thread Markus Jelsma
Hello - i would certainly go for edismax' boost parameter, as it multiplies scores. You can always do a regular boost query via {!boost ..} but edismax makes it much easier. Markus -Original message- > From:John Blythe > Sent: Tuesday 5th April 2016 15:36 > To:

Re: How to Get info about clusterstate in solr 5.2.1 just like ping request handler with distrib=true

2016-04-05 Thread Binoy Dalal
You could use the zkcli.sh script to directly query your zookeeper ensemble and get the cluster status. See if that works for you. On Tue, 5 Apr 2016, 17:28 preeti kumari, wrote: > Hi Reth, > > I had already checked this but issue is it gives me info about shards/cores >

Multiple data-config.xml in one collection?

2016-04-05 Thread Yangrui Guo
Hello I'm using Solr Cloud to index a number of databases. The problem is there is unknown number of databases and each database has its own configuration. If I create a single collection for every database the query would eventually become insanely long. Is it possible to upload different config

using tokens to influence boost and score rather than filtering

2016-04-05 Thread John Blythe
hi all, i'm trying to do something similar to a simple fq=x on my query. i'm using the regular ol' select handler. instead of blocking out all items not related to x via the filter query i'd like to simply give them priority. is there a way to do this? it seems like the dismax bq function may be

RE: Issue with updating existing document (Solr-6.x/7.0)

2016-04-05 Thread Rohana Rajapakse
Thanks for the reply. Yes, I did built "master" branch. I will try branch_6_0. However, the same code worked in Solr-4.10 and the SolrInputDocument created out of a Lucene Document had the same stuff (stored/indexed/tokenized). It did not complain or break. I will first try this in Solr-6.0

Re: Issue with updating existing document (Solr-6.x/7.0)

2016-04-05 Thread Shawn Heisey
On 4/5/2016 6:07 AM, Rohana Rajapakse wrote: > I am trying to update the value of one field in an existing document, and it > throws me the exception given below. > For the update, I am using my own update handler which created a > SolrInputDocument from and Existing Solr Document. > > I am

Re: Solr 4 replication

2016-04-05 Thread Shawn Heisey
On 4/5/2016 4:09 AM, abhi Abhishek wrote: > Thanks MIkhail. > is there a way to have a push Replication. any Contributions or > Anything what could in this case? The master-slave replication in ALL versions of Solr (including 5.x) is pull, as already mentioned. This cannot be changed.

Issue with updating existing document (Solr-6.x/7.0)

2016-04-05 Thread Rohana Rajapakse
Hi, I am trying to update the value of one field in an existing document, and it throws me the exception given below. For the update, I am using my own update handler which created a SolrInputDocument from and Existing Solr Document. I am using Solr6.x built from the source code obtained from

Re: How to Get info about clusterstate in solr 5.2.1 just like ping request handler with distrib=true

2016-04-05 Thread preeti kumari
Hi Reth, I had already checked this but issue is it gives me info about shards/cores hosted on one server where i am hitting the query not the whole cluster info hosted on different servers. What i need is whole info about all shards/cores hosted on different servers forming my collection.

Can't get phrase field boosting to work using edismax

2016-04-05 Thread jimi.hullegard
Hi, I'm trying to boost documents using a phrase field boosting (ie the pf parameter for edismax), but I can't get it to work (ie boosting documents where the pf field match the query as a phrase). As far as I can tell, solr, or more specifically the edismax handler, does *something* when I

Re: SolrCloud backup/restore

2016-04-05 Thread Reth RM
Yes. It should be backing up each shard leader of collection. For each collection, for each shard, find the leader and request a backup command on that. Further, restore this on new collection, in its respective shard and then go on adding new replica which will duly pull it from the newly added

SolrCloud no leader for collection

2016-04-05 Thread Tom Evans
Hi all, I have an 8 node SolrCloud 5.5 cluster with 11 collections, most of them in a 1 shard x 8 replicas configuration. We have 5 ZK nodes. During the night, we attempted to reindex one of the larger collections. We reindex by pushing json docs to the update handler from a number of processes.

Re: How to Get info about clusterstate in solr 5.2.1 just like ping request handler with distrib=true

2016-04-05 Thread Reth RM
Have you already looked at cluster status api? https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18 On Tue, Apr 5, 2016 at 10:09 AM, preeti kumari wrote: > Hi, > > I am using solr 5.2.1 . We need to configure F5 load balancer with >

Re: Solr 4 replication

2016-04-05 Thread abhi Abhishek
Thanks MIkhail. is there a way to have a push Replication. any Contributions or Anything what could in this case? Thanks, Abhishek On Tue, Apr 5, 2016 at 1:29 AM, Mikhail Khludnev wrote: > It's pull, but you can trigger pulling. > > On Mon, Apr 4, 2016 at